Gathering Useful Data About Your Software

DZone 's Guide to

Gathering Useful Data About Your Software

· Big Data Zone ·
Free Resource

Every developer is occasionally guilty of writing code that they know is not perfect, but is good enough for now. In fact, this is often the correct approach.In other words, it is often more useful and appropriate to ship something that works than to spend excessive amounts of time striving for a paragon of algorithmic excellence.

However, every time you add one of these “good enough” solutions to your project, you should plan to revisit the code and clean it up when you have more time to spend on it.

In this article, excerpted from Re-Engineering Legacy Software we’ll talk about some of the steps you need to take when you’re gathering data about your software.  

We want to gather metrics about our legacy software, in order to help us answer the following questions:

  • What state is the code in to start with? Is it really in as bad shape as you think?
  • What should be our next target for refactoring at any given time?
  • How much progress are we making with our refactoring? Are we improving the quality fast enough to keep up with the entropy introduced by new changes?

First of all, we need to decide what to measure. This depends largely on the particular software, but the simple answer is: measure everything you can. You want as much raw data as you can get your hands on, to help guide you in your decision-making. This may include some of the following metrics, as well as many others that are not on the list.

Bugs and Coding Standard Violations

Static analysis tools can analyze a codebase and detect possible bugs or poorly written code. Static analysis involves looking through the code (either the human-readable source code or the machine-readable compiled code) and flagging any pieces of code that match a pre-defined set of patterns or rules.

For example, a bug-finding tool such as FindBugs may flag any code that fails to close an InputStream that it has opened, because this may result in a resource leak and should thus be considered a bug. A style-checking tool such as Checkstyle searches for code that violates a given set of style rules. For example it may flag any code that is incorrectly indented or is missing a Javadoc comment.

Of course the tools are not perfect, and they produce both false positives (e.g., flagging code as a bug even though it's not) and false negatives (e.g., failing to detect a serious bug). But they provide a very good indication of the overall state of the codebase and can be very useful when deciding your next refactoring target, as they can pinpoint “hotspots” of poor quality code.

For Java code, the big three tools are FindBugs, PMD and Checkstyle.


One goal of your refactoring may be to improve the performance of a legacy system. If that is the case, you will need to measure that performance.

Performance Tests

If you already have performance tests, great! If not, you will need to write some. You can start with very simple tests. For example, let’s look at a system with the architecture shown in in Figure 1.

Figure 1 Architecture of a system for tracking employees' time worked

The Audit component is in charge of processing the logs output by the nightly batches and generating reports from them. Say the batches output tens of thousands of logs every night, and the Audit component needs to handle a year's worth of logs. That adds up to a lot of data, so we're interested in maximizing the performance of the system.

If you wanted to test the performance of the Audit component, you could start with the following test:

  1. Start the system in a known state
  2. Feed it 1 million lines of dummy log data
  3. Time how long it takes to process the data and generate an audit report
  4. Shut down the system and clean up

Over time, you could extend the test to provide more fine-grained performance data. This might involve making changes to the system under test, e.g., adding performance logging or timing APIs to allow to you measure the performance of various parts of the system.

If starting up the whole system before the test (and tearing it down afterwards) is slow and cumbersome, you may instead want to write more fine-grained tests that measure the performance of individual sub-systems rather than the system as a whole. These tests are often easier to set up and quicker to run, but they depend on being able to run individual parts of the software in isolation. With a legacy application this is often easier said than done, so you may need to put in some refactoring effort before you are able to write tests like these.

Figure 2 The Audit component's processing pipeline

For example, say the Audit component has three stages in its processing pipeline, as shown in Figure 2: parsing the incoming log data, calculating the report's content and finally rendering the report and writing it to a file. You may want to write separate performance tests for each stage, so that you can find the performance bottlenecks in the system. But if the code for each processing stage is highly coupled, it's difficult to test any one of them in isolation. You will need to refactor the code into three separate classes before you can write your performance tests.

Monitor Performance in Production

If your software is a web application, it's very easy to collect performance data from the production system. Any decent web server will be able to output the processing time of every request to a log file. You could write a simple script to aggregate this data to calculate percentile response times per hour, per day, etc.

For example, assuming your webserver outputs one access log file per day and the request processing time is the final column of the tab-separated file, the following shell snippet would output the 99% percentile response time for a given day's accesses. You could run this script every night and email the results to the development team.

1. awk'{print $NF}' apache_access_$(date+%Y%m%d).log | \
2.     sort-n | \
3.     awk'{sorted[c]=$1; c++;} END{print sorted[int(NR*0.99-0.5)]}'

Here's what each line of the snippet does:

  1. Select only the final column of the log file
  2. Sort the requests in order of increasing processing time
  3. Print the row that's 99% of the way down the file

Of course this script is very primitive, but at least it provides you with one simple, understandable piece of data per day that you can use to track the quality of your software. Using this as a starting point, you could write a more powerful program in your favorite scripting language. You may wish to consider:

  • Filtering out noise such as images, CSS, JavaScript and other static files
  • Calculating per-URL performance metrics so that you can flag performance hotspots
  • Outputting the results as a graph to make it easier to visualize performance
  • Building an online app to allow team members to view performance trends, once you have a few months' worth of data

However, before you get too carried away, be aware that there are already plenty of tools available to help you with this kind of analysis. It's better to use existing open-source tools wherever possible, rather than hacking together a bunch of scripts that merely re-invent the wheel. One excellent tool that I use to measure and visualize the performance of production systems is Kibana.

Kibana is a tool that makes it easy to build dashboards for visualizing log data. It relies on a search engine called Elasticsearch, so before we can use Kibana we need to get our log data into an Elasticsearch index. I usually do this using a system called Fluentd. The great thing about this setup is that log data is fed directly from the production servers to Elasticsearch, making it visible on the Kibana dashboard within seconds. So you can use it not only for visualizing long-term performance trends of your system but also for monitoring the performance of the production system in real-time, allowing you to spot issues and react to them quickly.

Figure 3 shows a typical setup. Application logs are collected by Fluentd and forwarded in real-time to Elasticsearch, where they are indexed and made available for viewing on Kibana dashboards.

Figure 3 Visualizing site performance with Fluentd, Elasticsearch and Kibana

Figure 4 shows a detail from a Kibana dashboard. Kibana lets you visualize your log data in a number of different ways, including line and bar graphs.

Figure 4 Screenshot of a Kibana dashboard

With Kibana it's easy to build dashboards that are understandable to all members of your organization, not just developers. This can be useful when trying to communicate the benefits of your refactoring project or to demonstrate your team's progress to non-techical stakeholders. Permanently displaying the dashboard in a highly visible position in the office can also be a great motivator.

Error Counts

Measuring performance is all well and good, but it doesn't matter how fast your code runs if it's not doing it's job correctly, i.e., giving users the results they expect and not throwing any errors.

A count of the number of errors happening in production is a simple but useful indicator of the quality of your software, as seen from the end-user's perspective. If your software is a website then you could count the number of 500 Internal Server Error responses that your server generates per day. This information should be available in your web server's access logs, so you could write a script to count the error responses every day and email this number to your developers. If you want more detailed error information, such as stacktraces, and you want to view errors in realtime, I recommend a system called Sentry.

If your software runs in customers' environments rather than your own data center, you don't have the luxury of having access to all production log data, but it is still possible to estimate the number of errors occurring. For example, you could introduce an automatic error reporting feature into your product that contacts your server whenever an exception occurs. A more low-tech solution would be to simply count the support requests you receive from angry customers!

Timing Common Tasks

Remembering that we are planning to improve the software and its development process as a whole, and not just the code, metrics such as the following may be useful.

Time to Setup the Development Environment from Scratch

Every time a new member joins your team, ask them to time how long it takes to get a fully functional version of the software and all relevant development tools running on their local machine. You can use automation to reduce this time, thus lowering the barrier to entry for new developers and allowing them to start being productive as soon as possible, but that’s for another article.

Time Taken to Release or Deploy the Project

If creating a new release is taking a long time, it may be a sign that the process has too many manual steps. The process of releasing software is inherently amenable to automation, and automating the process will both speed it up and reduce the probability of human error. Making the release process easier and faster will encourage more frequent releases, which in turn leads to more stable software.

Average Time to Fix a Bug

This metric can be a good indicator of communication between team members. Often a developer will spend days tracking down a bug, only to find out later that another team member had seen a similar problem before and could have fixed the issue within minutes. If bugs are getting fixed more quickly, it's likely that the members of your team are communicating well and sharing valuable information.

Commonly Used Files

Knowing which files in your project are edited most often can be very useful when deciding your next refactoring target. If one particular class is edited by developers very often, it's an ideal target for refactoring.

Note that this is slightly different from the other metrics described above, as it's not a measure of project quality, but it's still useful data.

You can make use of your version control system to calculate this data automatically. For example, if you're using git, here's a one-liner that will list the 10 files that were edited most often during the last 90 days.

1. git log --since="90 days ago"--pretty=format:""--name-only | \
2. grep"[^\s]"| \
3. sort| uniq-c | \
4. sort-nr | head-10

Here is the result of running it against a randomly chosen project, Apache Spark:

view source

01. 59 project/SparkBuild.scala
02. 52 pom.xml
03. 46 core/src/main/scala/org/apache/spark/SparkContext.scala
04. 33 core/src/main/scala/org/apache/spark/util/Utils.scala
05. 28 core/pom.xml
06. 27 core/src/main/scala/org/apache/spark/rdd/RDD.scala
07. 21 python/pyspark/rdd.py
08. 21 docs/configuration.md
09. 17 make-distribution.sh
10. 17 core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala

This shows that, excluding build files, the most commonly edited file was SparkContext.scala. Thus, if this were a legacy codebase that we were looking to refactor, it would probably be wise to focus our attention on this file.

In applications that have been in production for a long time, many areas of the applications become fairly static, while development tends to cluster around just a few hotspots of functionality. In the case of our TimeTrack application, for example, you might find that the UI for registering hours worked hasn't changed for years, whereas managers are regularly coming up with feature requests for new and obscure ways to generate reports. In this case, it would obviously make sense to focus any refactoring efforts on the report generation module.

Measure Everything You Can

I've given you a few examples of data that you can collect, but this list is by no means exhaustive. When it comes to defining and measuring metrics, the possibilities are endless. A quick brainstorming session with your team will no doubt provide you with plenty of other ideas for metrics to measure.

Of course, just because you can measure something, it doesn't mean it's necessarily useful data. You could measure the number of Z's in your codebase, the average number of fingers on a developer's hand or the distance between the production server and the moon, but it's hard to see how these relate to quality!

Silly examples aside, it's always better to have too much information than not enough. A good rule of thumb is, if in doubt, measure it. As you and your team work with the data, you will gradually discover which metrics are most suitable for your particular needs. If a given metric is not working for you, feel free to drop it.

bigdata ,debugging ,big data ,data gathering

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}