Monitoring on Unix from scratch

DZone 's Guide to

Monitoring on Unix from scratch

· Performance Zone ·
Free Resource

Linux, and in general many Unix-like systems, bundle lots of useful command line tools for monitoring the resources of a machine for performance's sake, with tracked parameters such as occupied memory, CPU utilization, or disk requests.

This list of commands, complete with flags and sample output, should be run while there is some kind of load on the system, in order to find out for example which resources are scarce (usually I/O) and which are instead not the bottleneck and shouldn't be upgraded.

There is also a different family of tools which run their workloads over an empty system, like Iozone and other benchmarks; in this article our interest is over monitoring tools that run over an existing production system.


top is maybe the most famous monitoring tool, which Linux users like me commonly use at home. By default, it works in real time, showing the processes ordered by their CPU utilization.

You can use the -d $i flag to change the updating interval, but more importantly -b to get top measurements on the standard output (stop it with CTRL-C).

[11:03:58][giorgio@Desmond:~]$ top -b > top.log
^C[11:04:10][giorgio@Desmond:~]$ head top.log
top - 11:04:09 up  1:09,  3 users,  load average: 0.00, 0.08, 0.12
Tasks: 166 total,   1 running, 164 sleeping,   0 stopped,   1 zombie
Cpu(s):  6.8%us,  2.0%sy,  0.0%ni, 88.0%id,  3.1%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3344984k total,  2202872k used,  1142112k free,   451016k buffers
Swap:  3905532k total,        0k used,  3905532k free,   929932k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
  920 root      20   0 77108  19m 9348 S    2  0.6   2:16.51 Xorg               
 2515 giorgio   20   0  156m  16m  12m S    2  0.5   0:06.72 gnome-terminal     
 2933 giorgio   20   0  802m 291m  33m S    2  8.9   3:54.36 firefox-bin 


free displays current RAM statistics. You can use the -m switch to display memory quantities in megabytes instead of 109553440-like byte amount.

By default, free runs only once: the -s 5 (or any other value) flag will make it print the information every 5 seconds.

The used and free column show the amount of used and free RAM in the system:

[11:05:29][giorgio@Desmond:~]$ free -m
             total       used       free     shared    buffers     cached
Mem:          3266       2283        983          0        440        908
-/+ buffers/cache:        934       2332
Swap:         3813          0       3813

The line -/+ buffers/cache is the one that you should read. Since empty Ram is wasted, the system keeps blocks which has to be written on the disk in buffers and block that would otherwise have to be read from disk in caches. If there is a necessity, this Ram can be free, so my Ubuntu installation with Firefox and some other opened applications occupies 934 Megabytes of Ram, not 2283.


By default (and I use it for that) sar reports CPU information, with an interval that you can decide:

[11:11:53][giorgio@Desmond:~]$ sar 1
Linux 2.6.38-8-generic (Desmond)    06/20/2011  _i686_  (2 CPU)

11:11:54 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
11:11:55 AM     all      6.03      0.00      1.51      0.00      0.00     92.46
11:11:56 AM     all      5.00      0.00      1.50      0.00      0.00     93.50
11:11:57 AM     all      0.50      0.00      1.00      4.98      0.00     93.53
11:11:58 AM     all      6.53      0.00      0.50      4.02      0.00     88.94

The sum of the user and system times for each row is the utilization (a percentage): the higher this value, the worse the performance. iowait is a handy parameter which tells us the percentage of time the CPU is waiting for the disk or for another input/output device: an higher percentage points to I/O as a bottleneck.


iostat is similar to sar, and is included in the same tool suite. But by default it reports also I/O information:

[11:13:25][giorgio@Desmond:~]$ iostat 4
Linux 2.6.38-8-generic (Desmond)    06/20/2011  _i686_  (2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           6.40    0.02    1.90    2.80    0.00   88.88

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda              26.40        85.54       112.00     402562     527112
sdb               6.45       155.31        21.64     730911     101824

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.50    0.00    0.50    0.50    0.00   98.50

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda               1.00         0.00        16.00          0         64
sdb               1.00         0.00         8.00          0         32

The cpu measurements are the same of sar; but you get also transactions per seconds (and per-device), and kilobytes read and written (per second and in the interval selected).


dstat is a general purpose, comprehensive monitoring tool. By adding flags, you specify parameter that you want to track in parallel:

  • -c: CPU
  • -d: disk
  • -m: memory usage
  • -n: network; size of data sent and received.
  • -g: virtual memory: pages in and out of main memory.
  • -y: system stats, like number of context switches.


[10:50:41][giorgio@Desmond:~]$ dstat -cdngy
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
  8   2  87   4   0   0| 334k  133k|   0     0 |   0     0 | 454  1447
 17   8  75   0   0   0|   0     0 |   0     0 |   0     0 | 885  3445
 10   9  81   0   0   0|   0     0 | 169B  132B|   0     0 | 670  3341
 19   4  75   1   0   0|   0    56k|9683B 6501B|   0     0 | 705  2316 
[10:54:47][giorgio@Desmond:~]$ dstat -mt
------memory-usage----- ----system----
 used  buff  cach  free|     time     
 755M  438M  892M 1181M|20-06 10:54:49
 755M  438M  892M 1181M|20-06 10:54:50
 770M  438M  892M 1166M|20-06 10:54:51^C

dstat is handy because it displays different metrics in a custom output format that is constructed simply by placing them side by side; it is an easy to parse format, and by referring to | and spaces you can extract pretty much any number very quickly. It's also a fixed output: always large a certain number of chars.

As you can see, where size in bytes apply, dimensions are displayed as B|k|M.

A final thing: the output of these programs is almost always thought for further processing, so you can save it in log files by piping it (as shown also with top):

[11:19:27][giorgio@Desmond:~]$ free -m > free.log
[11:19:33][giorgio@Desmond:~]$ cat free.log
             total       used       free     shared    buffers     cached
Mem:          3266       2141       1125          0        443        899
-/+ buffers/cache:        797       2468
Swap:         3813          0       3813


Apart from showing you any kind of voyeuristic system information from the machine which I'm writing on, I introduced some lots of tools for doing the same thing. It's important to be aware of the existence of different programs, as only some of them may be available in the system where you perform measurements (like an hosted server or a standard virtaual machine.)

Remember also that, following the Unix tradition, each of these commands has a man page accessible with man command_name (and quittable with q). Good profiling!


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}