The Google Outage and Dreaded “404: Not Found” Error
Whentrying toaccess a websitethat is not loading, you may ask yourself, “Is thisissueonly happening on my device?”.The other day,youwere notalone.
Many prominent websites that rely on the Google Cloud Platform experienced abriefoutagein which their webpages returned “404: Not Found” errors.This raises two important questions. How did this disruption happen, and what are the implications for online businesses and customers?
The Google Cloud Platform Outage
TheGoogle CloudPlatform,with infrastructure available in over 200 countries and territories,is trusted by many of theworld’sleading retail, financial services, healthcare, and media companies.Itdelivers over 90 information technology services, such as computing, networking,storage,and databases.
What Happened?
According to a statement from Google, the culpritwascloudnetworkinginterference. At a high level, cloud networking provides connectivity to and between applications andon-premises, edge, and cloud-based services.After a preliminary root cause analysis, Google determined that a latent bug in a network configuration service was triggered during routine system operations.
Let’s Explore in More Detail
Many popular websites and services, such as Snapchat, Etsy, and Home Depot, were down forapproximately 1 hour and 49 minuteson Tuesday, November 16thdue to a Google cloud networking issue.Referencing thestatement from Google, Google Cloud Networking experienced issues with Google Cloud Load Balancing (GCLB).
Load balancing is the methodical distribution of network or application traffic (incoming requests from client devices) to multiple backend servers depending on which are more capable to fulfill those requests.
With over 80 load balancing locations worldwide,GCLBsupports more than 1 million queries per second, with the intent of high performance and low latency.However,on Tuesday, there was a disruption in this flow.
How Were Online Businesses and Customers Impacted?
The GCLB service interruption impacted several downstream Google Cloud services. As a result, affected web pages returned 404 errors, indicating that the requested page was not available. In addition to experiencing this error, many website owners were unable to make changes to their website load balancing and observed a decrease in site traffic.
Blue Triangle observed that many sites only experienced a sporadic end user impact. For example, irregular performance and revenue indicators during the incident that are atypical of the normal sales process.
One-OffIncident orFrequentTrend?
In recent years there has been global growth in cloud services and greater cloud adoption worldwide. There has also been no shortage of headlinesoncloud outages and issues, and how it affects online businessesandtheircustomers.
For example, in December 2020, the Google Cloud Platform also suffered a major outage, affecting the company’s technical support and ability to connect with customers externally. Just one month prior, Amazon Web Services (AWS), one of the most popular cloud computing services in the world, experienced a prolonged, large-scale outage. Like Google Cloud, AWS is the backbone of many websites and applications. The ripple effect was widely felt across the internet. In 2020, Microsoft users were also impacted by a series of problems that crashed Azure, the company’s cloud computing service for application management.
Vulnerability
The reliance of services on other providers can help website owners offer content at a massive scale, however, if there is a problem, then it may have far-reaching implications. This was also the case during the six-hour outage that occurred last month that severely impacted Facebook, now known as Meta Platforms. ReadHow Facebook Broke the Internet to learn more about protecting your site’s revenue from social media platform outages.
The Bottom Line
Companies that solely rely on cloud services for their infrastructure may be putting their online business at risk for loss of revenue, security vulnerabilities, and poor site performance and customer experience in the event that service availability is compromised.
Through proactive anomaly detection and SLA thresholds, you may be better positioned to manage outages and mitigate risk. Be alerted to the problem before the customer journey is negatively impacted and customers raise awareness of an unpleasant buying experience.
As a best practice, it is important to monitor your domains in real-time and deploy a tag governance approach that defers tags later in the page load cycle to avoid site-blocking issues. It is also crucial to add first and third-party tags to your web pages in a way that will not cause the site to completely shut down.
Learn More
Learn more about theBlue TriangleContentSecurity Policy Managerto help proactively protect your site’s performance and revenue during an outage.
Adam Wood
As the content marketing manager and strategist for Blue Triangle, Adam's day is filled with content marketing activities, writing, and coffee. He graduated with an English and Business degree, which was more than just writing APA or MLA style papers. Adam has nearly a decade of experience in non-profit and corporate sectors, including healthcare, education, and eCommerce. Most recently, the payments and FinTech space before joining Blue Triangle, empowering performance-driven organizations to deliver frictionless digital experiences.