Case study: Performance Tuning a Web Shop – Part 2
Monitoring with JAMon
When we found a service to be too slow, we wanted to analyze it and identify the bottlenecks. This usually concerned one or more remote call to the database or the mainframe. So we wanted to see which database and mainframe calls were a bottleneck. And since we tested with representative load, a common Java profiler turned out to be not useful most of the time. They cannot deal with JDBC queries or differentiate for parameters specifying the particular mainframe call to make. In addition, they have high overhead and cannot deal with the required load. So, we wanted a different, more lightweight way of measuring here.
Here shines JAMon API, Java Application Monitor. This is a free, simple, high performance, thread safe, Java API that allows developers to easily monitor production applications. See below for a screenshot of our JAMon View page. It shows the chosen counters with statistics like number of hits, average response time, total response time, standard deviation, minimum and maximum response time, which differs from a profiler. A profiler either samples the stacks or is event based. Stack sampling is expensive. An event based profiler instruments chosen methods and is notified when the method is called. It collects and transports this data from the application process to the profiler process or even to a remote profiler process. This data transportation has its overhead, especially when many places are instrumented. JAMon works differently. It does not store all individual samples, just the statistics, in the same VM. So the hits are incremented on method access and the average response time is calculated incrementally. The JAMon results are accessed only on request by e.g. a JSP page. This results in lightweight memory usage and lightweight CPU usage. It actually is most useful in production where we measured an overhead of < 1%.
Figure 1. Screenshot of the JAMon summary page with running statistics for each measured counter.
The newest version of JAMon, version 2.7, comes out of the box with interception of the JDBC driver as well as interception of HTTP requests, for instance in Tomcat. This means that with just configuration you measure all incoming web traffic and outgoing calls to the database. We used the API, which is simple and is basically used as a stopwatch: with start and stop. We make use of Spring AOP to create an Interceptor as seen in the following screenshot.
Figure 2. Invoking JAMon API from a Spring interceptor.
We don’t use the JAMon Interceptor provided with Spring since it is only activated in debug mode, clearly not what we are looking for. The statistics from JAMon now provides us with very valuable information from the actual system behavior in production to isolate bottlenecks.
The next figure shows where we measure with JAMon in the web shop system.
Figure 3. Measuring the Java service layer with JAMon: incoming calls, in between Spring beans and outgoing calls.
Reporting with JARep
Since we now had this valuable response time information from production, we also wanted to see how response times change after deploying a new release or a patch. For this, one of us started copy-pasting the counter statistics in Excel and resetting JAMon every morning around 8:00 o’clock. He added some VB script and generated a graph of hits and the average response time values of a set of interesting counters over time. Now we could see the real effects of changes in production, besides other interesting behavior like the weekly pattern and effects of increasing load on response times. However, this approach had some drawbacks like you can imagine. We did not measure in weekends nor holidays, it was error prone, it took Excel longer and longer to process all data and we would like to see with higher resolution than one value per day.
To meet these needs, we developed some pieces of code and a JSP to fetch JAMon data once every five minutes, store it in the database and generate reports from that data. This simple prototype has evolved in what we now call JARep. See the next figure for an overview how JAMon and JARep are deployed.
Figure 4. Deployment of JAMon and JARep for a cluster of four JVM’s.
Management of the web shop IT organization felt that open source tools JMeter and JAMon helped considerable to achieve their performance goals. In return, they offered to donate JARep to the open source community. This also gave them the opportunity to benefit from new development on the tool. I think this is a great example of how open source can and should work and all parties win.
We have put JARep on Sourceforge under the GNU public license. We are currently using it and developing it further for multiple customers of Xebia. We added a Swing client which is deployed with Java Webstart, to get better graphics and a richer user experience in general.
The next figures show screenshots of the jsp page with fictional data based on the actual production data. The user basically chooses the counters to show, time period to show, the time resolution, the diagrams to show, toggle aggregation and toggle ignore peaks. The first result screens show the four counters called ‘verwerkenOrder’ (processOrder.) These are the fully classified names prefixed with JVM instance alias. By aggregating, the average is taken of those four instances for average response time, the maximum of the four is taken for the maximum time, and the sum of the four is taken for the total time and number of hits. This gives us the complete picture for the cluster.
If we want to see the values for each JVM individually, we just don’t aggregate, as seen in the following screenshot.
Another useful feature is to show the top counters, like the top 10 for average response time, or the top 5 of total time. See next figure. It shows the top 10 of total response time from January until March. It can clearly be seen that the introduction of newer, faster hardware for the database on the 5th of March has its effects on response time, e.g., the green line.
We are currently working on a number of things like notifications when thresholds are exceeded. An item on the wish list is support for not just elapsed time, but more generic, any value. This is already supported by the newer versions of JAMon. For instance order amount to see (near) real time sales.
Key lessons learned
The key lessons learned are as follows.
- For predicting response times in production and preventing problems, it is important to simulate user behavior in a representative way. You need to avoid unreal caching effects. You can achieve this by load testing with Apache JMeter.
- For seeing real application behavior in production, be able to isolate bottlenecks and be able to react to incidents quickly, it is important to monitor actual response times of application components in production. This can be achieved by using JAMon.
- For discovering changes over time, like with the introduction of a new release or long term trends, it is important to store performance statistics on component level over time and be able to analyze them visually. This further facilitates finding bottlenecks, quick resolution of incidents and enables to be prepared for the future. This can be achieved by use of JARep.
Open source tools Apache JMeter, JAMon and JARep each greatly facilitated our performance tuning approach for the web shop. Thanks to these tools we could tune based on evidence and speedup crucial parts of the application with a factor of ten. Moreover, the customer IT organization now has a process in place using these tools to systematically assure performance by testing, tuning and monitoring their web shop.
About Jeroen Borgers
Jeroen Borgers is a senior consultant with Xebia - IT Architects. Xebia is an international IT consultancy and project organization specialized in enterprise Java and agile development. Jeroen helps customers on enterprise Java performance issues and leads Xebia's performance practice. He is instructor of Java Performance Tuning courses. He has worked on various Java projects in several industries since 1996, as a developer, architect, team lead, quality officer, mentor, auditor, performance tester, tuner and troubleshooter. Jeroen is specialized in Java performance since 2005. He is an internationally acknowledged speaker on that topic and he regularly publishes articles. You can read more on the Xebia's blog at http://blog.xebia.com.