Six Ways to Prevent Network Outages in 2023

By: Dritan Suljoti

We are living in an era in which Internet resilience is critical. The last eighteen months have shown us that outages are a fact of life on today’s Internet – whether a micro-outage taking down the checkout page on the mecca of eCommerce sites, Amazon, for two days of intermittent failures in the lead up to Christmas 2022 or the mega outage that took down Facebook, Messenger, WhatsApp and Instagram on October 4, 2021, costing significant revenue losses and untold damages to reputation and brand. That’s why we see companies looking to detect issues faster and at a much earlier stage – ideally before end users are impacted.

At Catchpoint, we think of Internet resilience as the combination of four factors: availability, reachability, performance, and reliability. As part of this, we proactively monitor our customers’ sites and services for outages 365/24/7 and create industry benchmarks for public use when we publish our findings and analysis of significant outages.

Drawing on years of experience supporting our customers (who have some of the world’s busiest websites) and instituting our own incident management policies, we’ve put together some key lessons learned from recent failures. What follows are six ways to prevent outages in 2023.

Assume no service is immune to failure

IT must operate by assuming that no service is too big to fail. There is a fallacy that just because a service is very big and widely used, it won’t go down. Over the last eighteen months, however, we have seen failures happen to many Internet giants: Facebook, Salesforce, Amazon, Google Cloud, Spotify, Ticketmaster… the list goes on. Nobody is immune. At the start of this year, on January 11th, 2023, we saw the FAA’s NOTAM system (needed for pilot safety measures) go down, making headline news and massively disrupting air travel. Despite the fact the outage was dealt with in around 90 minutes, chaos ensued with around 7,000 flights delayed and 1,100 canceled. It’s hard to calculate the economic cost, but it’s safe to assume that it would have been in the hundreds of millions. That may appear high, but Gartner analysis from 2014 (and still widely cited) put the average cost of an outage at $5,600 per minute (that’s $6,700 in today’s terms). Gartner also notes that large enterprises will see costs closer to $9,000 per minute (that’s $11,000 today). These numbers don’t factor in the long-tail impact of lost productivity or damage to reputation. Moreover, as per Dun & Bradstreet, 59 percent of Fortune 500 companies suffer from a minimum of 1.6 hours of downtime each week. Translate that to an average cost of $643,200 to $1,056,000, and you can see why it’s so important to be proactive about preparing for when, not if, the next outage occurs.

Rethink what “you can’t control”

There are certain things within your IT team’s control: containers, VMs, hardware, code, service configurations, and so on. Typically, we have systems in place to monitor these individual components of the system stack, alongside other processes that allow us to pay significant attention to these areas, as we should.

When an outage happens, however, this type of monitoring (usually infrastructure monitoring, tracing, or logging) is not sufficient to get in front of the situation. Issues will happen in areas beyond your control. You can’t ignore this. You must plan and solve for it. Moreover, the widescale move to cloud-based applications over the last several years has made it increasingly challenging to determine where issues lie, which can leave companies at the mercy of third-party providers and networks. Without access to independent monitoring sources that can see across the Internet landscape, IT teams are left with poor to no visibility into components that are a critical part of their infrastructure, but outside their production environments. What may feel outside a company’s control, however, with proper planning, best practice monitoring and observability techniques, and robust relationship-building with third parties.

Institute Internet Performance Monitoring

Instituting Internet Performance Monitoring (IPM) will allow you to monitor every component within the Internet stack that impacts your business yet might traditionally be perceived as beyond your control, or not worth considering because we think the


Latest Updates

Subscribe to our YouTube Channel