Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Troubleshooting With an Interactive Drill-down in a Spectrogram Using Csysdig

DZone's Guide to

Troubleshooting With an Interactive Drill-down in a Spectrogram Using Csysdig

Decisions that can seem negligible, like choosing which API you use when writing to disk, can have a big impact on the performance of your application. Measuring things tends to be extremely important, especially when paired with the right visualizations and the ability to drill down and investigate.

· Performance Zone
Free Resource

Transform incident management with machine learning and analytics to help you maintain optimal performance and availability while keeping pace with the growing demands of digital business with this eBook, brought to you in partnership with BMC.

Don’t want to read? Just watch this 30 second screen capture of spectrogram drill-down in csysdig.


Background on the Sysdig Spectrogram

Around a year ago, I published a blog post featuring a new sysdig chisel called spectrogram. I suggest you go read it, but if you don’t have time here’s a recap of how it works:

  • It captures every system call (e.g. open, close, read, write, socket, etc.) and it measures the call’s latency every 2 seconds. It organizes the calls into buckets ranging from 1 nanosecond to 10 seconds.
  • The spectrogram uses color to indicate how many calls are in each bucket
    • Black means no calls during the last sample
    • Shades of green mean 0 to 100 calls
    • Shades of yellow mean hundreds of calls
    • Shades of red mean thousands of calls or more

As a consequence, the left side of the diagram tends to show stuff that is fast, while the right side shows stuff that is slow.

csysdig spectrogram output









The spectrogram is great at showing patterns and outliers in the behavior of a system or application. But, until now you are left to your own devices to troubleshoot "what is causing this spot in the chart?"

With a little more work, I was able to include it in csysdig, sysdig’s command line ncurses UI, and also enhance it to be more useful.

Making the Spectrogram Interactive in Csysdig

Including the spectrogram in an interactive UI like csysdig makes it possible to enrich it in a couple of nice ways: arbitrary selections and drill-down into events.

The spectrogram can now be applied very easily to arbitrary selections in csysdig by clicking on F12 (for a full spectrogram) or SHIFT+F12 (for a spectrogram that includes file I/O activity only). As an example, you can easily view spectrogram information for a process, a directory, a file, a container, or, thanks to csysdig’s integration with Kubernetes, a pod or a service. Having such a visualization just one keyboard stroke away makes it easy to use it for exploration and troubleshooting.

Even more interestingly, now you can use the mouse to drag over areas of a spectrogram and observe the events that belong to that area in the spectrogram.

csysdig spectrogram output

A rectangular selection in the spectrogram isolates events that belong to a latency range (the width of the rectangle) in a given interval in time (the height of the rectangle).

Once a selection is made, you are brought to a page that the events respecting the criteria of the selection.

Besides being positively fun, this drill down functionality tends to be very useful. Now, your answer is just one mouse drag away.

Here’s an example.

Writing To Disk in C, fwrite() vs write()

We are going to profile the following two simple C programs:

    #include <stdlib.h>
    #include <fcntl.h>

    int main()
    {
        int j;
        char* buf = (char*)malloc(5000 * 1000);
        int fd;

        fd = open("write.bin", O_CREAT | O_WRONLY);

        for(j = 0; j < 5000; j++)
        {
            write(fd, buf, j * 1000);
        }
    }

And:

    #include <stdio.h>
    #include <stdlib.h>

    int main()
    {
        int j;
        char* buf = (char*)malloc(5000 * 1000);

        FILE* f;
        f = fopen("write.bin", "w+");

        for(j = 0; j < 5000; j++)
        {
            fwrite(buf, j * 1000, 1, f);
        }
    }

They both do exactly the same things—they perform 5000 writes of increasing sizes to disk. The first one (on top) works at the file descriptor level, while the second one uses the C standard I/O library.

Let’s take a look at the spectrogram of the first program:

csysdig spectrogram output













As expected, latencies increase with write buffer sizes at a pretty regular pace. If we observe the size of the disk writes by drilling down from any portion of the chart, it’s easy to see that they correspond to the ones that the program specifies.

csysdig terminal output





The three isolated dots on the right of the spectrogram are slow writes that take between half a second and a second, likely because of OS cache flushes.

Now, let’s looks at the spectrogram of the second program:

csysdig spectrogram output













In this case, we clearly see a bimodal behavior, with an activity area in the tens of microseconds and another one between 1 and 30 milliseconds. What’s going on?

The right side contains a bunch of increasing size writes that sort of match the buffer size specified by the user in the call to fwrite(), but just a bit smaller. If we select an area in the left side of the chart, on the other hand, we can clearly see that these are all 4096 bytes disk writes.

csysdig terminal output

These writes are fast because they are smaller. This bimodal behavior is caused by the additional layer of buffering introduced by fwrite(). Data is stored by fwrite() before being flushed to disk, the first 4KB are written right away and then the rest of the buffer is flushed in (typically) a single second write.

Which way is the best one? Buffered or unbuffered?

Well, as usual, it depends. In the case above, with big and sequential writes, avoiding the additional buffering offers more predictable latency and slightly better performance. On the other hand, check out what happens when we do many tiny 1-byte writes:

    #include <stdlib.h>
    #include <fcntl.h>

    int main()
    {
        int j;
        char buf[1];
        int fd;

        fd = open("write.bin", O_CREAT | O_WRONLY);

        for(j = 0; j < 5000000; j++)
        {
            write(fd, buf, 1);
        }
    }

And:

    #include <stdio.h>
    #include <stdlib.h>

    int main()
    {
        int j;
        char buf[1];
        FILE* f;

        f = fopen("write.bin", "w+");

        for(j = 0; j < 5000000; j++)
        {
            fwrite(buf, 1, 1, f);
        }
    }

This is the spectrum of the top program:

csysdig spectrogram output

As you can see, a lot of write system calls cause the program to take tens of seconds to complete, and generate quite a bit of system stress:

csysdig terminal output




This, on the other hand, is the spectrum of the second program:

csysdig spectrogram output

Pretty short, eh? Everything is coalesced into a few 4096KB buffers, the number of system calls is very low, and the program completes in less than half a second.

csysdig terminal output

Conclusions

Decisions that can seem negligible, like choosing which API you use when writing to disk, can have a big impact on the performance of your application.

The complexity of modern systems often makes it hard to determine what the implications of these decisions are. Measuring things tends to be extremely important, especially when paired with the right visualizations and the ability to drill down and investigate.

Luckily, sysdig and csysdig are very easy to install and can be a valuable aid.

Evolve your approach to Application Performance Monitoring by adopting five best practices that are outlined and explored in this e-book, brought to you in partnership with BMC.

Topics:
devops ,troubleshooting ,linux ,tools & methods ,command line ,command line interface ,systems administration

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}