Major AWS Outage Highlights Dependencies within Cloud Providers (Week of Nov. 23-30) | Outage Deep Dive

If you’re an AWS customer or rely on services that use AWS, you might have noticed the major, hours-long outage last week. On November 25th, at approximately 5:15 am PST, users of Kinesis, a real-time processor of streaming data, began to experience service interruptions. The issue was not network-related, and AWS later issued a detailed incident post-mortem analysis identifying an existing operating system configuration issue that was triggered by a maintenance event that involved adding server capacity. Over the course of the day, Amazon attempted several mitigation measures, but the outage was not completely resolved until approximately 10:23 pm PST. What was notable about this outage was its blast radius, which extended far beyond AWS’s direct customers. Several AWS services that use Kinesis, including Cognito and CloudWatch, were affected, as were any user of applications consuming those services (e.g., Ring, iRobot, Adobe). This is a good reminder of the risk of hidden service dependencies, as well as the need for visibility to understand and communicate with customers when something’s gone wrong.

Om Podcasten

This is The Internet Report, a podcast uncovering what’s working and what’s breaking on the Internet—and why. Tune in to hear ThousandEyes’ Internet experts dig into some of the most interesting outage events from the past couple weeks, discussing what went awry—was it the Internet, or an application issue? Plus, learn about the latest trends in ISP outages, cloud network outages, collaboration network outages, and more.