RUM vs. Synthetic Data Analysis
When performing statistical analysis on aggregations, it is very important to understand its subsets. This is especially true when interpreting RUM data aggregated across multiple regions, since an end user’s distance from the datacenter hosting the website impacts the performance of the page on the end user’s browser. To illustrate this point, consider the chart below showing three days of data from a US-based website.
The wave patterns in the Document Complete and # Page Views measurements are inversely related, indicating at first glance that when traffic volume increases, the performance improves. On further analysis, by analyzing the performance by geographical regions, it becomes clear that the higher traffic volume is coming from the US during peak business hours.
Performance on average in the United States is better than international, and it lacks the variance in Document Complete seen in the first chart. Clearly, international traffic during US off-peak makes up the great majority of the end users and is causing the jump on the averages across all users. Median and percentiles show the same behavior, as 80%+ of the page views during the US off-peak are from outside US.
The inverted pattern is not seen in Synthetic data since tests are run at the same frequency throughout the day from all testing locations, United States (local) or International. Following a similar process when examining synthetic data, dividing results into regions, the performance disparity becomes clear.
It is best to avoid drawing conclusions based on high level, or aggregation of data with different characteristics, without investigating the underlying subsets of measurements as well as the user components that are driving the data.