Back to Basics With Unix: System Visibility

By Charlie Schluting | Apr 9, 2008 | Print this Page
http://www.enterprisenetworkingplanet.com/netsysm/article.php/3739711/Back-to-Basics-With-Unix-System-Visibility.htm

Charlie SchlutingUnix systems have forever been opaque and mysterious to many people. They generally don't have nice graphical utilities for displaying system performance information; you have to know how to coax the information you need. Furthermore, you need to know how to interpret the information you're given. Let's take a look at some common system tools that can provide tons of visibility into what the opaque OS is really doing.

Unfortunately, the same tools don't exist universally across all Unix variants. A few commonly underused ones do, however, and that is what we'll focus on first.

Disk Activity

A common source of "slowness" is disk I/O, or rather the lack of available I/O. On Linux especially, it may be a difficult diagnosis. Often the load average will climb quickly, but without any corresponding processes in top eating much CPU. Linux counts "iowait" as CPU time when calculating load average. I've seen load numbers in the tens of thousands on more than one occasion.

The easiest way to see what's happening to your disks is to run the "iostat" program. Via iostat, you can see how many read and write operations are happening per device, how much CPU is being utilized, and how long each transaction takes. Many arguments are available for iostat, so do spend some time with the man page on your specific system. By default, running 'iostat' with no arguments produces a report about disk IO since boot. To get a snapshot of "now" add a numerical argument last, which will prompt iostat to gather statistics for that number of seconds.

Linux will show number of blocks read or written per second, along with some useful CPU statistics. This is one particularly busy server:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.36    0.07    5.21   23.80    0.00   69.57

Device:   tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda       18.22     15723.35       643.25 65474958946 2678596632

Notice that iowait is at 23 percent. This means that 23 percent of the time this server is waiting on disk I/O. Some Solaris iostat output shows a similar thing, just represented differently(iostat -xnz):

    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  295.3   79.7 5657.8  211.0  0.0 10.3    0.0   27.4   0 100 d101
  134.8   16.4 4069.8  116.0  0.0  3.5    0.0   23.3   0  90 d105

The %b (block) column shows that I/O to device d101 is 100 percent blocked waiting for the device to complete transaction. The average service time isn't good either: disk reads shouldn't take 27.4ms. Arguably, Solaris's output is more friendly to parse, since it gives the reads per second in kilobytes rather than blocks. We can quickly calculate that this server is reading about 19KB per read by dividing the number of KB read per second by the number of reads that happened. In short: this disk array is being taxed by large amounts of read requests.

Vmstat

The "vmstat" program is also universally available, and extremely useful. It, too, provides vastly different information between operating systems. The vmstat utility will show you statistics about the virtual memory subsystem, or, to put it simply: swap space. It is much more complex than just swap, as nearly every IO operation involves the VM system when pages of memory are allocated. A disk write, network packet send, and the obvious "program allocates RAM" all impact what you see in vmstat.

Running vmstat with the -p argument will print out statistics about disk IO. In Solaris you get some disk information anyway, as seen below:

 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr m0 m1 m2 m7   in   sy   cs us sy id
 0 0 0 7856104 526824 386 2401 0 0 0  0  0  3  0  0  0 16586 22969 12576 8 9 83
 1 0 0 7851344 522016 18 678 32 0  0  0  0  2  0  0  0 13048 11737 10197 7 6 86
 0 0 0 7843584 514128 76 3330 197 0 0 0  0  2  0  0  0 4762 131492 4441 16 8 76

A subtle, but important difference between Solaris and Linux is that Solaris will start scanning for pages of memory that can be freed before it will actually start swapping RAM to disk. The 'sr' column, scan rate, will start increasing right before swapping takes place, and continue until some RAM is available. The normal things are available in all operating systems; these include: swap space, free memory, pages in and out (careful, this doesn't mean swapping is happening), page faults, context switches, and some CPU idle/system/user statistics. Once you know how to interpret these items you quickly learn to infer what they indicate about the usage of your system.

The two main programs for finding "slowness" are therefore iostat and vmstat. Before the obligatory tangent into what Dtrace can do for you, here are a few other tools that no Unix junkie should leave home without:

lsof
Lists open files (including network ports) for all processes
netstat
Lists all sockets in use by the system
mpstat
Shows CPU statistics (including IO), per-processor

Dtrace

We cannot talk about system visibility without mentioning Dtrace. Invented by Sun, Dtrace provides dynamic tracing of everything about a system. Dtrace gives you the ability to ask any arbitrary question about the state of a system, which works by calling "probes" within the kernel. That sounds intimidating, doesn't it?

Let's say that we wanted to know what files were being read or written on our Linux server that has a high iowait percentage. There's simply no way to know. Let's ask the same question of Solaris, and instead of learning Dtrace, we'll find something useful in the Dtrace ToolKit. In the kit, you'll find a few neat programs like iosnoop and iotop, which will tell you which processes are doing all the disk IO operations. Neat, but we really want to know what files are being accessed so much. In the FS directory, the rfileio.d script will provide this information. Run it, and you'll see every file that's read or written, and cache hit statistics. There's no way to get this information in other Unixes, and this is just one simple example of how Dtrace is invaluable.

The script itself is about 90 lines, inclusive of comments, but the bulk of it is dealing with cache statistics. An excellent way to start learning Dtrace is to simply read the Dtrace ToolKit scripts.

Don't worry if you're not a Solaris admin: Dtrace is coming soon to a FreeBSD near you. SystemTap, a replica of Dtrace, will be available for Linux soon as well. Until then, and even afterward, the above mentioned tools will still be invaluable. If you can quickly get disk IO statistics and see if you're swapping the majority of system performance problems are solved. Dtrace also provides amazing application tracing functionality, and if you're looking at the application itself, you already know the slowness isn't likely being caused by a system problem.