Monitoring on Unix from scratch
Monitoring on Unix from scratch
Join the DZone community and get the full member experience.Join For Free
xMatters delivers integration-driven collaboration that relays data between systems, while engaging the right people to proactively resolve issues. Read the Monitoring in a Connected Enterprise whitepaper and learn about 3 tools for resolving incidents quickly.
Linux, and in general many Unix-like systems, bundle lots of useful command line tools for monitoring the resources of a machine for performance's sake, with tracked parameters such as occupied memory, CPU utilization, or disk requests.
This list of commands, complete with flags and sample output, should be run while there is some kind of load on the system, in order to find out for example which resources are scarce (usually I/O) and which are instead not the bottleneck and shouldn't be upgraded.
There is also a different family of tools which run their workloads over an empty system, like Iozone and other benchmarks; in this article our interest is over monitoring tools that run over an existing production system.
top is maybe the most famous monitoring tool, which Linux users like me commonly use at home. By default, it works in real time, showing the processes ordered by their CPU utilization.
You can use the -d $i flag to change the updating interval, but more importantly -b to get top measurements on the standard output (stop it with CTRL-C).
[11:03:58][giorgio@Desmond:~]$ top -b > top.log ^C[11:04:10][giorgio@Desmond:~]$ head top.log top - 11:04:09 up 1:09, 3 users, load average: 0.00, 0.08, 0.12 Tasks: 166 total, 1 running, 164 sleeping, 0 stopped, 1 zombie Cpu(s): 6.8%us, 2.0%sy, 0.0%ni, 88.0%id, 3.1%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3344984k total, 2202872k used, 1142112k free, 451016k buffers Swap: 3905532k total, 0k used, 3905532k free, 929932k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 920 root 20 0 77108 19m 9348 S 2 0.6 2:16.51 Xorg 2515 giorgio 20 0 156m 16m 12m S 2 0.5 0:06.72 gnome-terminal 2933 giorgio 20 0 802m 291m 33m S 2 8.9 3:54.36 firefox-bin
free displays current RAM statistics. You can use the -m switch to display memory quantities in megabytes instead of 109553440-like byte amount.
By default, free runs only once: the -s 5 (or any other value) flag will make it print the information every 5 seconds.
The used and free column show the amount of used and free RAM in the system:
[11:05:29][giorgio@Desmond:~]$ free -m total used free shared buffers cached Mem: 3266 2283 983 0 440 908 -/+ buffers/cache: 934 2332 Swap: 3813 0 3813
The line -/+ buffers/cache is the one that you should read. Since empty Ram is wasted, the system keeps blocks which has to be written on the disk in buffers and block that would otherwise have to be read from disk in caches. If there is a necessity, this Ram can be free, so my Ubuntu installation with Firefox and some other opened applications occupies 934 Megabytes of Ram, not 2283.
By default (and I use it for that) sar reports CPU information, with an interval that you can decide:
[11:11:53][giorgio@Desmond:~]$ sar 1 Linux 2.6.38-8-generic (Desmond) 06/20/2011 _i686_ (2 CPU) 11:11:54 AM CPU %user %nice %system %iowait %steal %idle 11:11:55 AM all 6.03 0.00 1.51 0.00 0.00 92.46 11:11:56 AM all 5.00 0.00 1.50 0.00 0.00 93.50 11:11:57 AM all 0.50 0.00 1.00 4.98 0.00 93.53 11:11:58 AM all 6.53 0.00 0.50 4.02 0.00 88.94
The sum of the user and system times for each row is the utilization (a percentage): the higher this value, the worse the performance. iowait is a handy parameter which tells us the percentage of time the CPU is waiting for the disk or for another input/output device: an higher percentage points to I/O as a bottleneck.
iostat is similar to sar, and is included in the same tool suite. But by default it reports also I/O information:
[11:13:25][giorgio@Desmond:~]$ iostat 4 Linux 2.6.38-8-generic (Desmond) 06/20/2011 _i686_ (2 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 6.40 0.02 1.90 2.80 0.00 88.88 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 26.40 85.54 112.00 402562 527112 sdb 6.45 155.31 21.64 730911 101824 avg-cpu: %user %nice %system %iowait %steal %idle 0.50 0.00 0.50 0.50 0.00 98.50 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 1.00 0.00 16.00 0 64 sdb 1.00 0.00 8.00 0 32
The cpu measurements are the same of sar; but you get also transactions per seconds (and per-device), and kilobytes read and written (per second and in the interval selected).
dstat is a general purpose, comprehensive monitoring tool. By adding flags, you specify parameter that you want to track in parallel:
- -c: CPU
- -d: disk
- -m: memory usage
- -n: network; size of data sent and received.
- -g: virtual memory: pages in and out of main memory.
- -y: system stats, like number of context switches.
[10:50:41][giorgio@Desmond:~]$ dstat -cdngy ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 8 2 87 4 0 0| 334k 133k| 0 0 | 0 0 | 454 1447 17 8 75 0 0 0| 0 0 | 0 0 | 0 0 | 885 3445 10 9 81 0 0 0| 0 0 | 169B 132B| 0 0 | 670 3341 19 4 75 1 0 0| 0 56k|9683B 6501B| 0 0 | 705 2316
[10:54:47][giorgio@Desmond:~]$ dstat -mt ------memory-usage----- ----system---- used buff cach free| time 755M 438M 892M 1181M|20-06 10:54:49 755M 438M 892M 1181M|20-06 10:54:50 770M 438M 892M 1166M|20-06 10:54:51^C
dstat is handy because it displays different metrics in a custom output format that is constructed simply by placing them side by side; it is an easy to parse format, and by referring to | and spaces you can extract pretty much any number very quickly. It's also a fixed output: always large a certain number of chars.
As you can see, where size in bytes apply, dimensions are displayed as B|k|M.
A final thing: the output of these programs is almost always thought for further processing, so you can save it in log files by piping it (as shown also with top):
[11:19:27][giorgio@Desmond:~]$ free -m > free.log [11:19:33][giorgio@Desmond:~]$ cat free.log total used free shared buffers cached Mem: 3266 2141 1125 0 443 899 -/+ buffers/cache: 797 2468 Swap: 3813 0 3813
Apart from showing you any kind of voyeuristic system information from the machine which I'm writing on, I introduced some lots of tools for doing the same thing. It's important to be aware of the existence of different programs, as only some of them may be available in the system where you perform measurements (like an hosted server or a standard virtaual machine.)
Remember also that, following the Unix tradition, each of these commands has a man page accessible with man command_name (and quittable with q). Good profiling!
Opinions expressed by DZone contributors are their own.