Any self-respecting database needs to be able to provide a whole host of metrics for the user.
Let's talk about something simple, like requests/second metrics. This seems like a pretty easy metric to have, right? Every second, you have N of requests, and you just show that.
However, it turns out that just showing the latest req/sec number isn’t very useful, primarily because a lot of traffic actually have valleys and peaks. So, you want to have the req/sec not for a specific second but for some time ago (like the req/sec over the last minute and the last 15 minutes).
One way to do that is to use an exponentially weighted moving average. You can read about their use in Unix in these articles. However, the idea is that as we add samples, we’ll put more weight on the recent samples and also take into account historical data.
That has a nice property in that it reacts quickly to changes in behavior, but it smooths them out so that you see a gradual change over time. The bad thing about it is that it is not accurate (in the sense that this isn’t very easy for us to correlate to exact numbers) and it smooths out changes.
On the other hand, you can take exact metrics. Going back to the req/sec number, we can allocate an array of 900 longs (so, enough for 15 minutes with one measurement per second) and just use this cyclic buffer to store the details. The good thing about this is that it is very accurate and we can easily correlate results to external numbers (such as the results of a benchmark).
With the exact metrics, we get the benefit of being able to get the per-second data and look at peaks and valleys and measure them. With exponentially weighted moving averages, we have a more immediate response to changes, but it is never actually accurate.
It is a bit more work, but it makes for much more understandable code. On the other hand, it can result in strangeness. If you have a burst of traffic — let’s say 1,000 requests over three seconds — then the average req/sec over the last minute will stay fixed at 50 req/sec for a whole minute, which is utterly correct and completely misleading.
I’m not sure how to handle this specific scenario in a way that is both accurate and expected by the user.