As developers, we would rather be writing code all day than doing anything else. Especially meetings or fighting production problems. Unfortunately, both are part of the job. All developers need to understand the basics of web performance monitoring. It won’t help you get out of meetings, but it will help prevent production fires and put them out faster. Although, I guess it might also help you avoid meetings about production problems.
What Is Web Performance Monitoring?
Web performance monitoring is a mixture of web application monitoring techniques to ensure that your application is online, loading quickly, and working correctly. As you know, it is possible for your web application to be online and loading quickly, but returning an error every time. It is important to monitor your application in multiple different ways to ensure your entire application stack is working correctly.
As some say, “Performance is a feature.”
If I go to order pizza online and their website is being really slow or throwing errors, I will just order pizza from somewhere else. Next time, I might remember that bad experience and not even go back to that other pizza restaurant.
Google found that reducing the speed of Google searches by just 400 milliseconds would cause 8,000,000 fewer searches to be run every day.
Monitoring the performance of your web application is critical to ensure that all of your customers are happy.
Monitoring Guide Topics
In this guide, we are going to cover a few things that every developer needs to know about web performance monitoring.
- Application Availability
- Application Errors
- Important Web Requests or Key Transactions
- Application Dependencies (SQL, MongoDB, Redis, etc.)
- Web Application Metrics
- Monitoring Other “Stuff”
How to Monitor Application Availability
The easiest part of web performance monitoring is ensuring your web application is online and available for your users. The best way to do application availability monitoring is with a simple HTTP ping monitor that runs every minute.
For example, we use this at Stackify to monitor our various web applications and marketing web sites. We can monitor the response time and ensure that they are responding with an HTTP status code of 200.
WEBSITE AVAILABILITY MONITORING WITH RETRACE
Most monitoring tools even allow you to check for specifying text as part of the response. This is useful to ensure that the page is not only loading with a 200 HTTP status code but returning a proper response. In the previous example, we are looking for “Sign In” on our login page.
As another example, we also use it to monitor our Elasticsearch cluster. We can look for specific text like “number_of_nodes”:11 to know if the correct number of Elasticsearch nodes is up and running. These types of HTTP checks can be very useful for a wide array of things!
ELASTICSEARCH MONITORING WITH RETRACE
How to Monitor Application Errors
If your web application is not working correctly, your error logs are your first line of defense. You should immediately check your error logs for potential problems.
For example, a couple of days ago our system was working fine but was running really slow. We received an alert from Retrace about page load times being too high and high error rates. We checked our error monitoring dashboard, and this is what we saw:
ERROR RATES VIA RETRACE
We could instantly see a big spike in exceptions around 12:45. It also showed us that we had two specific exceptions that were happening a lot.
TOP ERRORS VIA RETRACE
Thanks to having these errors, we knew exactly where to dig deeper. It was the smoke from the production fire that we needed. Our exception logs pointed us directly to Redis as being a problem in this instance. Since our applications rely heavily on Redis for caching, any issue with it definitely causes some problems.
On that day, the issue ended up going away on its own. Azure was having some weird issue with their Redis hosting for a few minutes.
Error monitoring is a critical component of web application performance monitoring. You should send all of your application errors to an error monitoring and reporting service.
How to Monitor Important Web Requests or Key Transactions
Every application has important web requests or “key transactions” that should be closely monitored. Depending on your type of web application, it could be a wide variety of things. What you need to monitor for an e-commerce site is very different than a REST API.
Here are some of the common things to consider monitoring:
- High volume web requests.
- Problematic web requests that tend to slow down.
- Critical web requests, like a shopping cart page.
Let me provide some more details with a couple of good examples.
Example #1: Customer Page in a CRM Application
A long time ago I worked on a CRM application. It had one specific page that did a ton of SQL queries to load a lot of details about a customer. We had to load things about what they had purchased, contact history, notes, etc.
This particular page was one of the most used pages in our CRM software. Any little hiccup in performance due to slow SQL queries quickly caused this page to load slowly.
By monitoring the performance of this particular page, we could help ensure our software was running well and our users would be happy. It was the perfect way for us to measure the “pulse” of the performance of our entire application.
Example #2: Super High-Volume API Request
My next example is essentially a microservice. It is a simple web service application that gets a ton of traffic to one specific web request. It may be a simple web application, but the performance of this particular web request is absolutely mission critical for our business.
The API request handles all of the log data uploaded to our log management system. This single transaction gets called hundreds of time per second. By setting up a key transaction monitor within Retrace, we can closely monitor the performance of this specific API transaction or the entire application.
We can monitor the average response time, error rate, requests per minute, or the calculated satisfaction score.
KEY TRANSACTION MONITORING VIA RETRACE
How to Monitor Application Dependencies (SQL, MongoDB, Redis, etc.)
Today’s applications rely on numerous application dependencies. Include SQL databases, NoSQL, caching, and usually multiple external web services. If you want comprehensive web performance monitoring, you need to monitor your entire application stack.
To monitor how application dependencies affect the performance of your application, you need a tool, like Retrace, that can track the performance of your application down to the code level. Retrace automatically supports dozens of frameworks like SQL Server, Oracle, MongoDB, Elasticsearch, Redis, and many others.
Example #1: Redis Problems
For example, in the screenshot below you can see how each application dependency impacts the overall performance of the application. You can quickly see that for some reason Redis is causing some performance problems based on the big yellow spikes in the graph.
APPLICATION DEPENDENCY PERFORMANCE VIA RETRACE
Example #2: Weird Usage of External Web Service
Here is another example from an application that is not having any major issues. Requests are typically loading under 200 milliseconds. However, you can also see that a good part of the time is spent on “Web External” which is external HTTP web service calls. Upon further research, the application was calling an external web service too often. I would have never noticed this any other way!
From the chart below you can also see how it tracks Azure Storage, Database, Elasticsearch, Redis, Azure Service Bus, and external HTTP services. If any part of our application stack has a spike in performance, this chart will help us instantly identify it.
APPLICATION DEPENDENCY PERFORMANCE VIA RETRACE
Understanding the performance of your application dependencies is really important. These types of insights are invaluable for quickly identifying application performance problems or opportunities for optimization.
How to Monitor Web Application Metrics
Monitoring key metrics about your application and its frameworks is critical to web performance monitoring. For example, your web server can provide metrics around how many requests per second you are receiving. The .NET Framework or JVM can provide key metrics around garbage collection statistics and many other things. These types of metrics are important to track over time and monitor.
It is also important to monitor basic metrics about your servers. Things like CPU, memory usage, disk performance, etc. Maxing out the CPU on your server is always going to cause web performance problems. But if that isn’t the problem, you will need application specific metrics to dig deeper.
Odds are, you may also want to track some custom metrics that are unique to your application.
Example #1: Monitoring How Much Data Our App Receives
One thing we want to monitor at Stackify is how much data we are receiving. We receive terabytes of logs, errors, metrics, and code level performance data every week. To help better correlate with other metrics, we want to know, for example, how many log messages we are receiving per minute. This helps us correlate things like server CPU, requests per second, and other metrics together.
CUSTOM METRICS MONITORING VIA RETRACE
We are able to do this by using Stackify’s custom metrics functionality. With just a couple lines of code, we can report and then monitor our incoming app log count as part of our larger web performance monitoring strategy.
Example #2: Redis Statistics
As another example, we also added some tracking around how we use Redis. It has some stats available on its end, but this enabled us to understand how our exact code uses it.
REDIS MONITORING VIA RETRACE
Yesterday, we happened to have an issue with Redis on Microsoft Azure. These metrics came in handy to help us quickly identify the problem.
REDIS MONITORING VIA RETRACE
How to Monitor Other “Stuff”
OK, let’s face it, every software application is a snowflake. They all have a lot of common things that are easily monitored. But, there are usually some really specific things about your application that are really critical to you. How do you monitor the weird stuff?
Example #1: Monitoring a Daily Background Process
At Stackify, one of the most important things we want to monitor is our billing processes. The last thing we want to do is not bill our customers (again).
Our billing process runs once a day in the middle of the night. It isn’t a web application. It is a scheduled task that is handled by Quartz, our job scheduler. It really isn’t related to web performance monitoring. But it is critical to our web application. How do we monitor it?
We use really good error handling logic in our billing code and use a custom .NET exception type called BillingException. We then monitor our application logs with Retrace to look for any of those exceptions. If we see any of those in our logs, the red phone goes off, and red lights start flashing everywhere (OK, maybe just an SMS alert is sent).
Example #2: Monitoring a Weird SLA Requirement
At my last company, we had a contract with a car company that required that we send them data every day. That company sent us some leads for people who wanted to buy cars, we processed it, and then our users did some activities with it. We had to send them data back about what was going on with those leads. Did the dealership contact those leads, etc.?
So how do you monitor if you have done something or not?
In this case, our data model and processing were set up in a way that we could query a SQL Server to get a list of status changes that needed to be sent. We had a console application set up on a schedule to send them every hour or so.
To monitor that we were hitting our SLAs, we made another program that ran a similar query and just alerted us if any records were found by the query.
If your application is important to your business and your users, web performance monitoring is also very important. In this guide, we looked at several different things that you should consider monitoring. Including monitoring if your web application is up or down, but also the performance of specific web requests, metrics, errors, and more.