Beyond Sessions: Centering Users in Mobile App Observability
User sessions are crucial for understanding the intersection of users, app behavior, and the business impact of app performance. But are they enough?
Join the DZone community and get the full member experience.
Join For FreeObservability providers often group periods of activity into sessions as the primary way to model a user’s experience within a mobile app. Each session represents a contiguous chunk of time during which telemetry about the app is gathered, and it usually coincides with a user actively using the app. Therefore, sessions and their associated telemetry are a good way to represent user experience in discrete blocks.
But is this really enough? Is there a better way to understand the intersection of users, app behavior, and the business impact of app performance?
To answer those questions, I’d like to share my thoughts on the current state of mobile observability, how we got here, and why we should move beyond sessions and focus on users as the main signal to measure app health in the long term.
What Are You Observing?
When you add instrumentation to make your mobile app observable, what exactly are you observing? Traditionally, there are two ways of answering this question, and both are missing a piece of the larger picture.
Observing “The App”
First, it could be answering the question of what object is being observed, in which case the answer would unsurprisingly be “the app.” But what is implied but not stated is that traditionally, you are observing the entire deployment of the app in aggregate, an app that is potentially running on millions of different mobile devices. The data provided by observability tooling that are typically scrutinized are aggregates of telemetry from said devices: the total number of crashes, the P99 of cold app startup time, etc.
By looking at aggregates, you are observing the big picture of how your app is running in production, an abstraction that provides a high-level overview of app quality. What you don’t get from aggregates are how individual users experience the app and the sequence of events that lead to specific anomalies that are hard to reproduce in-house.
Observing “The Users”
A second reading of the question is more nuanced: You are observing the users of your app, specifically what is happening in the app while people are using it. This is where sessions come in, to provide telemetry collected on one device for a period of time, laid out so you can see what is happening in the app in sequence. This is how you can find correlations between events in an ad hoc fashion, which is tremendously helpful for debugging difficult-to-reproduce problems.
Mobile observability providers use sessions as a key selling point of their RUM products. Sessions combine mobile telemetry with the runtime context, including event sequencing, and viewed together, they could explain performance anomalies. Seeing the events that preceded an app crash along with the details of said app crash can really speed up the debugging of hard-to-reproduce issues. Providing better telemetry and a more useful context in sessions has traditionally been one way how mobile observability providers differentiate themselves.
Why Not Both?
Combining insights gleaned from both readings of the question can lead to powerful results. Not only can you drill into outliers and debug the cause of hard-to-reproduce problems, but you can also use the aggregates to tell you how many people are impacted by each problem. In addition, you can examine commonalities among affected users to uncover further clues for finding the root cause.
Based on this telemetry, powerful datasets and visualizations can be built that reveal key details of mobile performance problems, as well as quantify their pervasiveness. It can do that not only for the problems you know about but also for the ones that you may not have anticipated. In other words, it can surface the unknown unknowns, which is the hallmark of good observability tooling. To varying degrees, most of the mobile observability platforms out there today can provide this level of insight.
Is This It?
So far, the status quo sounds great. If you can get all this from the current generation of mobile observability tooling, what more can you ask for? Before I answer this very obviously leading question, I want to go back to the original question: What is being observed? And instead of simply asking that, I want to zoom out even further: Why do you want to observe what you are trying to observe?
Why Are You Observing?
Asking about the what of mobile observability clarifies the types of questions you want the tooling to answer, but it doesn’t get to the core of why you want those questions answered – that is, what are you going to do when you get those answers, and are they complete enough to give you the means to do what you want?
Traditionally, mobile observability tooling is used to monitor crashes, ANRs, and other performance problems so that they can be fixed in future releases. Mobile developers and other users of the tooling not only want to know how frequently these problems occur, but they also want enough information to help them find the root causes. Knowing is only half the battle: If the tooling doesn't provide enough debugging information, it is next to useless.
In other words, performance problems are the what while finding the cause and ultimately fixing the issues are the why.
The Limitations of Aggregates
Traditional backend observability data is usually first looked at in aggregate, and the same is true for mobile: How many times has a particular crash occurred, what is the P99 app startup time, etc? Existing issues are ranked according to their perceived severity, and the order they are worked on – and whether they are worked on at all – is largely based on that. The higher the severity, the higher the priority.
And how is the severity of a performance problem determined? This usually comes down to a combination of how frequently a problem occurs, and “how bad” the problem is when it occurs. Aggregates like frequency and regression rates provide the baseline data for this assessment, but those numbers are filtered through the lens of the people doing the prioritizing, through their experience and understanding of the app, in order for the severity to be worked out.
Using aggregates alone as the data points to determine severity is difficult, even for knowledgeable people, because it’s missing one key puzzle piece: how users are individually impacted when they encounter a particular performance problem. Knowing that the P99 app startup time is 30% slower won’t tell you the increased level of frustration experienced by those who were impacted by the extra delay.
That is because individual users are nowhere to be found when you look at aggregates like P99.
Aggregates treat an app as a single system, not as the millions of individual systems that it actually is, each running on a different device with an individual user behind it who is experiencing the app and its performance problems in their own unique way.
While you know the increase in the absolute time it took for the app to start, how can you properly, objectively, assess the impact of that regression if you can’t quantify how this has affected how those users are using your app? For some, it may just be waiting a little longer for the loading screen to disappear, but for others, they may be so annoyed that this was the straw that broke the camel’s back, that they won’t use your app again. Determining how and if a performance issue affects future app usage is the key to understanding impact, and aggregates aren’t designed to give you that kind of insight.
The Limitations of Sessions
In the traditional backend observability space, users are represented in telemetry as a high-cardinality attribute, if they are represented at all. This is because the utility of knowing the specific users making requests is limited for backend performance tracking. There are often other factors that are more directly relevant, and high-cardinality attributes are not generally useful for aggregation.
The main use case for tracking users explicitly in backend data is the potential to link them to your mobile data. This linkage provides additional attributes that can then be associated with the request that led to slow backend traces. For example, you can add context that may be too expensive to be tracked directly in the backend, like the specific payload blobs for the request, but that is easily collectible on the client.
For mobile observability, tracking users explicitly is of paramount importance. In this space, platforms, and vendors recognize that modeling a user’s experience is essential because knowing the totality and sequencing of the activities around the time a user experiences performance problems is key for debugging. By grouping temporally related events for a user and presenting them in a chronologically sorted order, they have created what has become de rigueur in mobile observability: the user session.
Presenting telemetry this way allows mobile developers to spot patterns and provide explanations as to why performance problems occur. This is especially useful for difficult-to-reproduce problems that may not be apparent if you simply looked at aggregates. Sometimes, it’s not obvious that a particular crash happens right after the device loses network connectivity – not unless you look at a user’s telemetry laid out in sequential order. This is the power of user sessions, and why they have become table stakes for mobile observability.
But there is still a gap: User sessions are but a slice of time in the journey a user takes with a mobile app. An implicit assumption when looking at a session is that things happening within it will only impact other things that happen within the same session. If you zoom out a bit to consider multiple sequential sessions for the same user, you can start getting more context (e.g., a crash in a previous session on a particular screen always leads to the next app startup being really slow). But the utility of this technique starts to fray as you consider more and more sessions for a user. It gets increasingly harder to find direct causal linkage between events in general if they are farther apart.
While looking at session timelines is useful for debugging specific performance problems from the perspective of a representative user, it is difficult to predict any long-term impact those problems might have on the user and how they use your app. Perhaps even more difficult is drawing any conclusions about the broader impacts of performance problems on your app and the company’s key metrics like revenue and DAU.
In other words, sessions are useful for debugging performance problems, not for assessing their long-term impact.
Putting the “User” Ahead of “Sessions”
If sessions alone are not sufficient to assess the long-term impact of performance problems on key company metrics, what is still missing? In short, it requires a fundamental change: to center your observability practices around understanding user behavior in the long run, particularly when their perception of the app’s performance changes. This will involve aggregating data in novel ways that are not often seen in mobile observability.
To do this, you must first track the behavior of users throughout their lifetime using the app. Specifically, you need to look at the behavior of users after they encounter a performance problem and compare that to their behavior before they encounter said problem. You can also group users into cohorts – similar users who were impacted vs. similar users who weren’t. By observing the difference in other behaviors, you may begin to see correlations between performance issues and negative user trends. And if you’re lucky, some of those correlational relationships may turn out to be causal, which would allow you to determine the impact more directly through further analysis and experimentation. In other words, rather than simply looking at the impact on the counts of crashes that may or may not matter, you can look at how churn and conversion rates for your app are affected, which definitely matter.
How you can do that is the subject of an entirely different post, but suffice it to say, you can’t even begin to do this type of analysis until you start aggregating mobile telemetry in the correct way: through the lens of aggregate behavior of different user cohorts. And before you can do that, you need to start collecting and annotating telemetry in a way that allows that level of aggregation, provided that your tooling supports this.
This is to say, the question you should ask yourself is this: Is your mobile observability telemetry conducive to being broken down by user cohorts, linked together with other datasets to give a full-stack view of app performance from the user’s perspective, and analyzed to show the overall engagement of those users in the long run? If the answer is yes, then you have all the ingredients you need to fully leverage mobile observability beyond just looking at sessions for crash debugging.
Opinions expressed by DZone contributors are their own.
Comments