Evaluation of PMP Profiling Tools
Evaluation of PMP Profiling Tools
Take a look at how tools including GDB, eu-stack, and Quickstack stack up against one another when it comes to stack tracing and PMP profiling before debugging.
Join the DZone community and get the full member experience.Join For Free
In this blog post, we’ll look at some of the available PMP profiling tools.
While debugging or analyzing issues with Percona Server for MySQL, we often need a quick understanding of what’s happening on the server. Percona experts frequently use the pt-pmp tool from Percona Toolkit (inspired by the Poor Man's Profiler).
pt-pmp tool collects application stack traces GDB and then post-processes them. From this, you get a condensed, ordered list of the stack traces. The list helps you understand where the application spent most of its time — either running something or waiting for something.
Getting a profile with
pt-pmp is handy, but it has a cost — it’s quite intrusive. In order to get stack traces, GDB has to attach to each thread of your application, which results in interruptions. Under high loads, these stops can be quite significant (up to 15-30-60 seconds). This means that the
pt-pmp approach is not really usable in production.
Below, I’ll describe how to reduce GDB overhead, and also what other tools can be used instead of GDB to get stack traces.
By default, the symbol resolution process in GDB is very slow. As a result, getting stack traces with GDB is quite intrusive (especially under high loads).
There are two options available that can help notably reduce GDB tracing overhead.
Use readnever patch. RHEL and other distros based on it include GDB with the readnever patch applied. This patch allows you to avoid unnecessary symbol resolving with the
--readneveroption. As a result, you get up to 10 times better speed.
Use gdb_index. This feature was added to address symbol resolving issue by creating and embedding a special index into the binaries. This index is quite compact. I’ve created and embedded gdb_index for Percona server binary (it increases the size around 7-8MB). The addition of the gdb_index speeds up obtaining stack traces/resolving symbols two to three times.
# to check if index already exists: readelf -S | grep gdb_index # to generate index: gdb -batch mysqld -ex "save gdb-index /tmp" -ex "quit" # to embed index: objcopy --add-section .gdb_index=tmp/mysqld.gdb-index --set-section-flags .gdb_index=readonly mysqld mysqld
The eu-stack from the elfutils package prints the stack for each thread in a process or core file. Symbol resolving also is not very optimized in eu-stack. By default, if you run it under load, it will take even more time than GDB. But eu-stack allows you to skip resolving completely, so it can get stack frames quickly and then resolve them without any impact on the workload later.
Quickstack is a tool from Facebook that gets stack traces with minimal overheads.
Now, let’s compare all the above profilers. We will measure the amount of time it needs to take all the stack traces from Percona Server for MySQL under a high load (sysbench OLTP_RW with 512 threads).
The results show that eu-stack (without resolving) got all stack traces in less than a second, and that Quickstack and GDB (with the readnever patch) got very close results. For other profilers, the time was around two to five times higher. This is quite unacceptable for profiling (especially in production).
There is one more note regarding the
pt-pmp tool. The current version only supports GDB as the profiler. However, there is a development version of this tool that supports GDB, Quickstack, eu-stack, and eu-stack with offline symbol resolving. It also allows you to look at stack traces for specific threads (tids). So for instance, in the case of Percona Server for MySQL, we can analyze just the purge, cleaner or IO threads.
Below are the command lines used in testing:
# gdb & gdb+gdb_index time gdb -ex "set pagination 0" -ex "thread apply all bt" -batch -p `pidof mysqld` > /dev/null # gdb+readnever time gdb --readnever -ex "set pagination 0" -ex "thread apply all bt" -batch -p `pidof mysqld` > /dev/null # eu-stack time eu-stack -s -m -p `pidof mysqld` > /dev/null # eu-stack without resolving time eu-stack -q -p `pidof mysqld` > /dev/null # quickstack - 1 sample time quickstack -c 1 -p `pidof mysqld` > /dev/null # quickstack - 1000 samples time quickstack -c 1000 -p `pidof mysqld` > /dev/null
Published at DZone with permission of Alexey Stroganov , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.