Martin Mao on Observability, focusing on Alerting, Triage, & RCA

Observability is a crucial aspect of operating Microservices at scale today. Today on the InfoQ podcast, Wes Reisz speaks with Chronosphere’s CEO Martin Mao about how he thinks about observability. Specifically, the two discuss Chronosphere’s strategy for implementing a successful observability program. Starting with alerting, Martin discusses how metrics (usually things like RED metrics or Google’s Four Golden Signals) are tools to aggregate counts and let operators know when things are moving towards an incident. In stage two of this approach, operators begin to isolate and triage what’s happening in an effort to provide a quick system restoration. Finally, Martin talks about root cause analysis (RCA) in the final stage as a way of preventing what happened from happening again. Martin uses this three stage approach (and the questions that should be asked in each of these stages) as a way of focusing on what’s important (or reducing things like Mean Time to Recovery) in a modern cloud native architecture. Observability is the ability to understand the state of a system by observing its outputs, on today’s podcast we talk about a strategy for implementing a meaning observability program. Read a transcript of this interview: https://bit.ly/3AZYpkD Subscribe to our newsletters: - The InfoQ weekly newsletter: bit.ly/24x3IVq - The Software Architects’ Newsletter [monthly]: www.infoq.com/software-architects-newsletter/ Upcoming Virtual Events - events.infoq.com/ InfoQ Live: live.infoq.com/ - July 20, 2021 - August 17, 2021 Follow InfoQ: - Twitter: twitter.com/InfoQ - LinkedIn: www.linkedin.com/company/infoq - Facebook: bit.ly/2jmlyG8 - Instagram: @infoqdotcom - Youtube: www.youtube.com/infoq

Om Podcasten

Software engineers, architects and team leads have found inspiration to drive change and innovation in their team by listening to the weekly InfoQ Podcast. They have received essential information that helped them validate their software development map. We have achieved that by interviewing some of the top CTOs, engineers and technology directors from companies like Uber, Netflix and more. Over 1,200,000 downloads in the last 3 years.