Native App Network Performance Analysis
Native App Performance validation using app instrumentation from real users and collation of network har files from real devices outside the data center.
Join the DZone community and get the full member experience.Join For Free
When 54 percent of the internet traffic share is accounted for by Mobile, it's certainly nontrivial to acknowledge how your app can make a difference to that of the competitor!
Once you're done with the app development and all looks good on the simulators and internal network devices—but out there in the wild, with bandwidth restrictions, TCP congestion, cache hit/miss, the device configuration, your user may not experience what you intend to provide & not every unhappy customer leaves feedback; they just stop coming.
What Options Do You Have?
You could use real device grid providers like Browserstack or Perfecto ( Perforce ) or Headspin for User rendering and network performance validation of your pre-release app version or add app instrumentation to do beta validation of what your users are experiencing.
Now, this introduces the second stage of concerns on how good is the app instrumentation and how to validate what the majority of the users are experiencing and what bottlenecks are present in the app
For instance, this could be what your users are experiencing as far as the latency of the application is concerned.
Added to this, an error in the instrumentation could mislead the team to a possible improvement in latency of critical service with no revenue generation.
Measure It Right and Validate
Across different releases of the app version, changing instrumentation, product features, backend network, and HTTP protocols one constant factor was the USER and how users perceive the app when using the app at their convenience. The scalable model was to build systems in place to replicate the eyeballing of users independent of the app code or page layout across apps to derive user-perceived metrics for validation. Our current performance validation life cycle involves validation of the app instrumentation to that of the user-perceived metrics to capture the instrumentation errors much ahead in the product life cycle.
Even before we take a look at how efficiently to analyze the network performance in a native application, let's see what could change for the user with regard to network performance:
Typical search view performance:
When the latency of the critical services degraded:
When there is an issue with resource download:
When the same is served from cache:
When the instrumentation is done wrong (user will not see a difference but you will misread your customer)
What's Happening At the Network Level
Normally you would expect to see the same network waterfall the service to get search results and associated resources to render the images on the native device based on the result set:
Latency increase in the critical service:
Latency increase due to change in client code, i.e a Blocking call:
Latency increase due to retry and timeouts:
Latency variations due to change in priority:
Now when we have too many network waterfall graphs to deep dive into, it becomes a tedious effort in ascertaining what is wrong with the app. Typically the close5 samples to the median/degraded sample value are considered and an attempt is made to identify the services responsible for blocking the other services, the # of requests made, duration incurred is validated. However, waterfall representation is not the best fit for network analysis; instead, let's address this more intuitively— using what I call Blue Dot representation.
What you see in the x-y plane is the same waterfall data set, represented based on the start duration and the duration taken, with blue dots indicating the trigger point.
For instance, based on the above blue dot representation, it is evident that the resource call
Img-call-1(t4) is blocked or dependent on service call
t3to be done as the blue dot is beyond the
t3 whole duration. Also the closer the dots to the y axis, the more Async nature of the services/app.
As you see with blue dot representation, now it's easy to analyze network waterfall splits, indicating the blocking calls, order of calls, duration is taken—eventually, the factor which you would like to focus upon release after release is to keep the area under the blue dots * the average duration for critical services as lower as possible.
Example: Real data set — Blue Dot representation:
The actual service details are masked for privacy reasons, however, as you see the dot representation is again color-coded as blue or orange to identify the critical & non-critical services. Like you see above, the first 3 calls above are noncritical and have moved the invocation of the critical services by about 591 ms. A quick skim above helps to identify
- Critical and non-critical service order
- Service # 11 blocking service #12 invocation
- The highest duration is taken for service # 10
- Duplicate retry calls
Enhanced representation of the same includes duration represented color-coded as,
indicating DNS, Connect, SSL, SEND, WAIT, RECEIVE times to understand the impact of DNS caching, TLS issues and response payload buffering impact.
Opinions expressed by DZone contributors are their own.