Monitoring Extreme-Scale Apache Kafka Using eBPF at New Relic

New Relic runs one of the larger Apache Kafka® installations in the world, ingesting circa 125 petabytes a month, or approximately three billion data points per minute. Anton Rodriguez is the architect of the system, responsible for hundreds of clusters and thousands of clients, some of them implemented in non-standard technologies. In addition to the large volume of servers, he works with many teams, which must all work together when issues arise.Monitoring New Relic's large Kafka installation is critical and of course challenging, even for a company that itself specializes in monitoring. Specific obstacles include determining when rebalances are happening, identifying particularly old consumers, measuring consumer lag, and finding a way to observe all producing and consuming applications.One way that New Relic has improved the monitoring of its architecture is by directly consuming metrics from the Linux kernel using its new eBPF technology, which lets programs run inside the kernel without changing source code or adding additional modules (the open-source tool Pixie enables access to eBPF in a Kafka context). eBPF is very low impact, so doesn’t affect services, and it allows New Relic to see what’s happening at the network level—and to take action as necessary.EPISODE LINKSMonitoring Kafka Without Instrumentation Using eBPFWhat Is eBPF and Why Does It Matter for Observability?Kafka MonitoringKafka Summit: Monitoring Kafka Without Instrumentation Using eBPFWatch the video version of this podcastKris Jenkins’ TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)   

Om Podcasten

Streaming Audio features all things Apache Kafka®, Confluent, real-time data, and the cloud. We cover frequently asked questions, best practices, and use cases from the Kafka community—from Kafka connectors and distributed systems, to data mesh, data integration, modern data architectures, and data mesh built with Confluent and cloud Kafka as a service. Join our hosts as they stream through a series of interviews, stories, and use cases with guests from the data streaming industry. Apache®️, Apache Kafka, Kafka, and the Kafka logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.