Using Linux Perf Tools
The Linux performance analysis tool (perf) is a powerful tool to profile applications in GNU/Linux. It works by using a mix of hardware and software counters to provide a very efficient way to measure CPU cycles, instructions, cache misses, etc.
In this article, we are going to focus on how to use this tool to identify hotspots in our application, which is essential when debugging performance problems in production systems or improving the performance of an existing application.
First you need to install perf tools using your distribution package manager. Here are some examples:
#install perf in Archlinux sudo pacman -Sy perf #install perf in Fedora sudo dnf install perf
As a part of this tutorial, we are going to use a sample Node server application, which is located in this Github URL. The directory for this example is cpu_bound; there you can find the following files:
- This is a basic ExpressJS web service, with two basic services /fib and /fast more on those later.
- Is a bash script to run some concurrent network request, this will generate some traffic into our service.
Once you have this file and have installed the dependencies (npm install), we can proceed to check our services to see how well they perform. For this, just run test.sh to execute the following steps:
- Run the Node application.js server.
- Execute 150 curl concurrent requests against the two services (you can decrease this amount).
- Wait for the request to be complete, then kill the node process.
Using a VM with one core and 500 MB of ram, it takes 83 seconds to handle 150 requests. Now the question is, how to make it faster? Like Michael Abrash advises us, we should start by making no assumptions: "Assume nothing. I cannot emphasize this strongly enough—when you care about performance, do your best to improve the code and then measure the improvement. If you don’t measure performance, you’re just guessing, and if you’re guessing, you’re not very likely to write top-notch code."
The first step is to tell Node to emit profiling information. We can do this by passing the --perf-basic-prof flag to the Node command.
Now we can re-run our script, but this time we should tell perf to record what's happening inside the Node process by executing perf record in a new terminal.
perf record -F 99 -p `pgrep -n node` # Example: # F is the sampling frequency in hertz. # p is the proccess ID (PID). # Example: pgrep -n node --> returns the PID of a process called node.
Running this command will generate a file called perf.data in the same folder where you make the call, so I suggest that you pick the folder first before making the call.
Also for convenience, I wrote this script to execute the recording and remove legacy perf.data; you can use it to test multiple combinations of arguments.
This UI shows the following information:
- Overhead: shows the time spent by the CPU.
- Command: shows the process (in this case, node).
- Shared Object: shows the ELF (Executable Linux Format) or where this instruction comes from, in our case, the v8 API send the probing logs to a temporary location in /tmp/perf-5671. Perf takes this file and assembles the timeline.
- Symbol: this has two components; the first is the privilege where [.] means user level and [k] kernel level call, and the second component is the symbol itself.
Looking for Bottlenecks
Here is where the fun starts. We need to look at the data and try figure out where our precious CPU cycles are getting wasted. In this naive example, the answer is clear, but first, let's take a close look at the instruction using 99% of our CPU.
This line references a v8 interpreter/compiler trace, so let's get some more details:
- LazyCompile: This is an event generated by the v8 compiler, meaning that your function is being compiled and optimized only when needed.
- Asterisk*: This is good news, meaning that your code was successfully transformed to native code by the JIT compiler. If you see a tilde (~) that means your code is being interpreted (slow).
If we jump to the application.js:9, we would find that our computational expensive function is just a Fibonacci function. Now is time to plan some optimization to transform this function into constant time one; I will leave that as an exercise for the reader.
The perf command tool should help to reach a better diagnosis of performance issues not only in NodeJS, but also applications written in other languages. Also, it can collect a great variety of CPU/system events like cache-misses, branch miss-prediction, instructions executed, etc. without slowing down the running process.