Over a million developers have joined DZone.

Next Generation Linux Tracing With BPF

Sasha Goldshtein takes us through a demo of his BPF tracing tool for Linux, available on GitHub.

Download Forrester’s “Vendor Landscape, Application Performance Management” report that examines the evolving role of APM as a key driver of customer satisfaction and business success, brought to you in partnership with BMC.

BPF is the next Linux tracing superpower, and its potential just keeps growing. The BCC project just merged my latest PR, which introduces USDT probe support in BPF programs. Before we look at the details, here’s an example of what it means:

# trace -p $(pidof node) 'u:node:http__server__request "%s %s (from %s:%d)" arg5, arg6, arg3, arg4'
TIME     PID   COMM FUNC                  -
04:50:44 22185 node http__server__request GET /foofoo (from ::1:51056)
04:50:46 22185 node http__server__request GET / (from ::1:51056)

Yep, that’s Node.js running on Linux with my BPF trace tool attaching to the http__server__request probe, which is invoked for each incoming HTTP request. The argdist tool has support for these probes as well, and you can discover USDT probes easily using the tplist tool.

You can try this example yourself by building Node from source with the –with-dtrace configuration switch. If you have all the prerequisites, it should be as easy as:

git clone --depth=1 https://github.com/nodejs/node
cd node
./configure --with-dtrace

Discovering USDT Probes

USDT probes are static tracing markers placed in an executable or library. The probes are just nop instructions emitted by the compiler, whose locations are recorded in the notes section of the ELF binary. Tracing apps can instrument these locations and retrieve probe arguments. Specifically, uprobes (which BPF already supports) can be used to instrument the traced location.

To discover whether an executable or library contains USDT probes, run readelf -n and look for NT_STAPSDT notes. For example:

$ readelf -n node
  stapsdt 0x00000067 NT_STAPSDT (SystemTap probe descriptors)
    Provider: node
    Name: http__server__response
    Location: 0x0000000000ef36c6, Base: 0x0000000001294154, Semaphore: 0x0000000001606c36
    Arguments: 8@%rax 8@-1128(%rbp) -4@-1132(%rbp) -4@-1136(%rbp)
  stapsdt 0x00000060 NT_STAPSDT (SystemTap probe descriptors)
    Provider: node
    Name: http__client__response
    Location: 0x0000000000ef3b7a, Base: 0x0000000001294154, Semaphore: 0x0000000001606c3a
    Arguments: 8@%rdx 8@-1128(%rbp) -4@%eax -4@-1136(%rbp)
  stapsdt 0x00000089 NT_STAPSDT (SystemTap probe descriptors)
    Provider: node
    Name: http__client__request
    Location: 0x0000000000ef412b, Base: 0x0000000001294154, Semaphore: 0x0000000001606c38
    Arguments: 8@%rax 8@%rdx 8@-2168(%rbp) -4@-2172(%rbp) 8@-2216(%rbp) 8@-2224(%rbp) -4@-2176(%rbp)
  stapsdt 0x00000089 NT_STAPSDT (SystemTap probe descriptors)
    Provider: node
    Name: http__server__request
    Location: 0x0000000000ef4854, Base: 0x0000000001294154, Semaphore: 0x0000000001606c34
    Arguments: 8@%r14 8@%rax 8@-4328(%rbp) -4@-4332(%rbp) 8@-4288(%rbp) 8@-4296(%rbp) -4@-4336(%rbp)

Parsing this output isn’t hard, but can be a bit confusing. If you prefer something more compact, try the tplist tool from BCC. For example, here’s how to discover all the probes in the node executable, and then list the variables available for one specific probe:

$ tplist -l /home/vagrant/node/node
/home/vagrant/node/node node:gc__start
/home/vagrant/node/node node:gc__done
/home/vagrant/node/node node:net__server__connection
/home/vagrant/node/node node:net__stream__end
/home/vagrant/node/node node:http__server__response
/home/vagrant/node/node node:http__client__response
/home/vagrant/node/node node:http__client__request
/home/vagrant/node/node node:http__server__request

$ tplist -v -l /home/vagrant/node/node 'node:gc__start'
/home/vagrant/node/node node:gc__start [sema 0x1606c3c]
  location 0xef2994 raw args: 4@%esi 4@%edx 8@%rdi
    4 unsigned bytes @ register %esi
    4 unsigned bytes @ register %edx
    8 unsigned bytes @ register %rdi

This output means that the gc__start probe has three arguments — two 32-bit values and one 64-bit value. To understand their meaning, you’d need access to the source or documentation of these probes. For example:

[vagrant@fedora-22-x86-64 tools]$ grep -r gc__start ~/node/*
/home/vagrant/node/src/node.stp:probe node_gc_start = process("node").mark("gc__start")
/home/vagrant/node/src/node_provider.d: probe gc__start(int t, int f, void *isolate);

Looking at the node.stp file in more detail, we can find the way these flags were meant to be interpreted. But instead let’s look at another probe, which is slightly more interesting:

$ grep -A 20 http__server__request node/src/node.stp
probe node_http_server_request = process("node").mark("http__server__request")
  remote = user_string($arg3);
  port = $arg4;
  method = user_string($arg5);
  url = user_string($arg6);
  fd = $arg7;

  probestr = sprintf("%s(remote=%s, port=%d, method=%s, url=%s, fd=%d)",

Now you know that the sixth argument is the request URL, and the fifth argument is the HTTP method for that request.

Some other libraries known to contain USDT probes include libc, libpthread, libm, libstdc++, and many others. Try them out with tplist and have fun!

Instrumenting a Program With USDT Probes

To instrument your own program with USDT probes, you need the systemtap-sdt-dev package for your system. Then, the simplest approach is to just #include <sys/sdt.h> and start using the DTRACE_PROBE macros:

if (value < 0) {
  DTRACE_PROBE1(myapp, value_was_negative, value);

Occasionally, you would want to emit a probe only if there is a tracing program attached. This is especially relevant if generating the traced parameters is expensive, and you’d rather do it only if you know someone cares about the probe. If you need this behavior, you’ll need a bit more of the USDT infrastructure. First, you need a .d file that describes your probes:

provider myapp {
  probe value_was_negative(int val);
  probe app_exited(int exit_code, char *result);

Next, you use the dtrace tool (this is from the systemtap-sdt-dev package, and do not confuse it with DTrace of Solaris fame) to generate a header file to include and an object file to link with:

$ dtrace -G -s myapp.d -o myapp-trace.o
$ dtrace -h -s myapp.d -o myapp-trace.h

The generated header file has declarations for your probes, such as MYAPP_VALUE_WAS_NEGATIVE(), and macros that you can use to test if the probe is enabled, such as MYAPP_VALUE_WAS_NEGATIVE_ENABLED(). The way this works is that each probe now has a global variable that needs to be incremented by the tracing program to indicate that the probe should be enabled (this global variable is also called the probe’s semaphore). All that’s left is to actually invoke the probe:

#include "myapp-trace.h"


  MYAPP_APP_EXITED(0, "Everything's dandy");

Finally, link with myapp-trace.o to obtain an instrumented executable. To verify everything’s right, run tplist or readelf -n on the end result.

BCC Support

Adding BCC support for USDT probes entailed writing a probe enumerator (USDTReader), which parses the NT_STAPSDT ELF notes and uses this information to determine which locations to probe. Probes associated with semaphores need to be enabled by incrementing the semaphore’s location in the requested process — yes, this actually requires poking /proc/$PID/mem to enable the probe.

Next, we had to parse the probe’s arguments and determine how to retrieve them at the probe’s locations. The ELF note description contains argument data such as -4@-4(%rbp), which becomes a series of statements in the BPF program, such as:

int tmp;
bpf_probe_read(&tmp, sizeof(tmp), (void *)(ctx->bp - 4));

The end result is that you can use the trace and argdist tools to instrument USDT probes and access their arguments. Node is an interesting example, and so are libpthread, libc, and a bunch of other applications compiled with USDT support. See the man pages and examples for argdist and trace on the BCC repository for more information.


BCC now has support for USDT probes, which are used by various user-mode processes and libraries for static instrumentation and tracing. The trace and argdist tools can record and display USDT tracing data, and the tplist tool can discover USDT probes in an existing process, library, or executable.

See Forrester’s Report, “Vendor Landscape, Application Performance Management” to identify the right vendor to help IT deliver better service at a lower cost, brought to you in partnership with BMC.


Published at DZone with permission of Sasha Goldshtein, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}