Over a million developers have joined DZone.

Identifying Outages with Real User Monitoring

DZone's Guide to

Identifying Outages with Real User Monitoring

The team at Catchpoint talks about how collecting data from actual users can not only provide insight from the user experience, but identify outages as well.

· Performance Zone ·
Free Resource

SignalFx is the only real-time cloud monitoring platform for infrastructure, microservices, and applications. The platform collects metrics and traces across every component in your cloud environment, replacing traditional point tools with a single integrated solution that works across the stack.

Understanding the user experience is important for anybody with an online presence. Many analytics solutions are available to help organizations understand what page yields greater user engagement, what paths a user takes through a site, which pieces of content are most popular, and how fast pages load. Real user monitoring (RUM) has grown in popularity in recent years with specifications such as navigation timing, resource timing, and user timing. These help organizations collect data from actual users providing insight into the user experience.

Information can be obtained from RUM that isn’t always available with synthetic monitoring; you cannot have synthetic agents in every city, every ISP, every browser version, every OS. What ISP is being used? What browser? What device? RUM provides a view into the user’s world and the data obtained can help organizations improve the digital experience.

That being said, RUM should not be the only solution used to measure the digital experience, as there are some things RUM can’t provide. RUM can only provide data for sites with active traffic. If your site has not launched yet or if there is an outage no information can be gathered. We believe that synthetic and RUM should be used together to get a complete picture.

With RUM, there can be a lot of noise. Have you ever been to a large sporting event or a concert and tried to access a mobile application or web page only to have it take forever or fail to load completely? The majority of the time, these failures are not due to the application but are rather due to congestion on the network, too many people trying to access data across the same route. Issues like these have the potential to skew RUM data and generate noise when trying to understand performance. Nobody wants to receive an alert only to determine it was caused by noise.

When there’s an outage, you want to know as soon as possible. Every second that your site is unavailable means money lost, damage to your brand, and unhappy customers. The sooner you know something is going wrong, the better. Synthetic monitoring has traditionally been used to alert teams to outages and application issues to expedite troubleshooting.

RUM can only provide information when users are on the site. If users can’t access the site, then no information can be captured and analyzed. However, what can be analyzed are historical trends and patterns. There are patterns to how applications are accessed with peak traffic certain days of the week or certain times of the day. Variations from these patterns can indicate a regional problem where synthetic tests may not be running.

Catchpoint’s Outage Analyzer uses predictive models based on statistical analysis of historical data to identify regional outages. A color-coded map quickly reveals whether traffic levels are as expected, if they have dropped compared to historical trends, and how widespread an outage is. For example, last month when Dyn was under a DDoS attack, there would have been a noticeable drop in site visits as users weren’t able to resolve DNS. Failure to resolve DNS results in the site being unreachable and no data will be collected via RUM.


This information can help organizations conduct an analysis of the impact an outage had and determine whether or not action needs to be taken. Regional outages may be related to mother nature, human error, or infrastructure issues. Sometimes, there is nothing you can do to prevent a failure, but before you decide an outage was unavoidable, you need to know that there was actually an outage.

SignalFx is built on a massively scalable streaming architecture that applies advanced predictive analytics for real-time problem detection. With its NoSample™ distributed tracing capabilities, SignalFx reliably monitors all transactions across microservices, accurately identifying all anomalies. And through data-science-powered directed troubleshooting SignalFx guides the operator to find the root cause of issues in seconds.

performance ,outages ,real user montioring

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}