Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

DoubleClick Outage: Another Lesson in Third-Party Optimization

DZone's Guide to

DoubleClick Outage: Another Lesson in Third-Party Optimization

Google's DoubleClick ad service experienced a serious outage last Tuesday. Check out the timeline and what ad publishers should take away.

· Performance Zone ·
Free Resource

xMatters delivers integration-driven collaboration that relays data between systems, while engaging the right people to proactively resolve issues. Read the Monitoring in a Connected Enterprise whitepaper and learn about 3 tools for resolving incidents quickly.

TL;DR: Lessons learned from a DoubleClick outage

Hundreds of websites and their user experience were impacted yesterday when Google's DoubleClick suffered a major outage that lasted for hours. Catchpoint first reported the outage at 10:00 EDT on Tuesday, March 13 th in Europe and at 15:00 EDT in US. As soon as the issue was identified, we had a team investigating and analyzing the impact and extent of the outage. You can read our initial analysis of the incident here.

The data Catchpoint aggregated from our synthetic monitoring confirmed there was a drastic drop in performance for all ad requests being served from two domains: doubleclick.net and google.adservices.com. Our customers were alerted within minutes, we suggested temporarily removing these requests to limit the impact of the outage on their website and prevent a negative user experience.

Outage Timeline

The day unfolded with Catchpoint picking up a sudden performance degradation on the website of some of our customers. We continued to monitor the major websites globally. Drilling into the data, the requests causing the delay were identified.

The Google Ads status page reported the issue at around 13:00 EDT. A bug in DoubleClick was identified as the cause of the performance degradation. The bug was fixed, and the issue was resolved at 19:30 EDT.

Catchpoint was able to "catch" the issue, approximately 3 hours before the Google Ads status page posted their first update on the issue.

Our customers that had specific zone-based alerts configured in their account received notifications of a third-party issue impacting user experience.

These alerts helped us detect the outage as soon as it happened.

We can see the disruption caused by the third-party requests on several websites in the table below. In the US, multiple retail sites using DoubleClick to serve ads slowed down by almost 400%:





The impact echoed across different verticals as seen from the data below:





Catchpoint also collected data using Real User Monitoring or RUM. There was a noticeable difference in the number of pageviews.



















The chart above shows the impact on one of our customer's website. We can see an approximate drop of 30% in page views when the performance dipped by more than 100%.

If we extrapolate the data to include the 500+ major websites that were affected by the performance degradation, then the impact on user experience would have been staggering and translated into a major loss of revenue.

Meanwhile, in Europe, major websites saw a similar drop in performance.

The Aftermath

This incident was a classic example of how third-party services can disrupt performance and bring down major websites. We can see the difference in performance before and after the issue began in the chart below.

The host doubleclick.net was experiencing high latency; TCP connections took longer to establish, and this generated HTTP 503 errors. The response time of other page requests were impacted, eventually pushing the onload event. This resulted in a higher document complete and webpage response time.

Lessons Learned

Catchpoint has always reiterated the importance of optimizing and monitoring third-party performance. We have published several blogs detailing incidents and how they contribute to negative user experience.

There are a few key points to remember when integrating third-party services on your website:

  • Always configure third-party tags to load later in the page load process, so they don't impact document complete (that is, when the user is able to interact with the page).
  • Set scripts to load asynchronously, this will minimize bottlenecks caused by unresponsive scripts.
  • Avoid cluttering the page with multiple third-party scripts and implement them only where necessary.
  • Ensure third-party scripts are not outdated; third-party services tend to update their code versions.
  • If you are using different third-party services on the same page, check for conflicting scripts in the code during implementation to avoid bugs and errors during code execution.
  • Always monitor third-party services proactively to ensure there is no performance degradation.
  • Finally, use a tag manager to bring all the third-party services under one window. This makes it easier to manage scripts. It also allows you to configure when you want the scripts to load (as illustrated below):








The outage was significant and impacted user experience across a large number of sites, so it's a given that Google would have had all hands on deck as soon as the alarm bells rang. Kudos to the team at DoubleClick that handled the incident, from the status updates to issue resolution; they managed to fix the bug within a few hours and prevent a potential catastrophe for ad publishers.

Discovering, responding to, and resolving incidents is a complex endeavor. Read this narrative to learn how you can do it quickly and effectively by connecting AppDynamics, Moogsoft and xMatters to create a monitoring toolchain.

Topics:
performance ,doubleclick ,monitoring and performance ,outage ,google ,Ad publishing

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}