Lay Your Unix System Bare With DTrace

Charlie SchlutingNow that DTrace is available in OS X 10.5 (Leopard), I won’t be called a Solaris bigot for praising it. Thank you, Apple. DTrace is the most innovative software released for any Unix flavor this century. It allows a never-before-imagined level of system visibility, enabling sysadmins, developers, and even users to get answers to previously impossible questions. This week we’ll talk about DTrace and how to use it, then next week we’ll dive into a D scripting tutorial.

DTrace is a Dynamic Tracing facility, originally built into Solaris 10. It enables both programmers and administrators to quickly identify system problems by allowing them to look into exactly what userland programs or the operating system is doing. DTrace has a 41-chapter manual, a large part of which explains the usage of D, the DTrace language. Suspiciously similar to Awk, the D language provides a method by which administrators can ask arbitrary questions of the operating system. With more than 46,000 test points available, DTrace provides the most flexible method on the market for diagnosis of in-depth problems. That is not to say it’s overly complex and only useful for complex issues. In fact, the opposite is true.

How it Works

DTrace dynamically modifies a program once it’s loaded into memory. Before anything can execute, it must be loaded into memory. A sufficiently intelligent tracing program, like DTrace, therefore has the opportunity to insert code into a program before it runs. Clearly this must be run with administrative privileges.

Before DTrace, the only way to debug an application was to recompile it with debugging symbols enabled. This allowed a debugger to run the application, and gather information as it ran. The resulting binary would be much larger, and would also run much slower. DTrace can be used on any application without recompiling it, and even without restarting it. Other user-space programs designed to show you what system calls are being executed, like truss or strace, actually stop the program’s execution after every system call. This creates a huge performance problem, and it can even crash some applications. This is not a concern with DTrace: it can be used on production systems without fear of a crash. It uses no resources when not in use, and very little additional system calls when activated.

User-space programs are one thing, and indeed you can get a bit of information in some form (list of system calls) without DTrace, but finding out what the kernel is doing was historically impossible. DTrace probes, programmable sensors, are present in the kernel, so you can ask almost anything you want. There are more than 40,000 probes that can be activated at will, depending on the OS in question. A given sensor is programmed to provide the information of value to you, and when it’s triggered, DTrace gathers the data.

A DTrace script will often ask for timestamps or arguments to functions. A DTrace user can see how long a function call takes, how often it executes, what the stack trace looks like, and answer many other difficult questions.

Using DTrace

Users may want to find out certain things about their applications or the kernel their application is running on without becoming a DTrace expert. We’ll cover as much as possible this week without getting too deep into D programming, for those who can benefit from just being able to run basic commands or pre-written scripts. Next week will be all about D programming.

First, it should be noted that we can get a list of all available probes with the command: ‘dtrace –l’. It’s not so useful unless you know what you’re looking for, but you need to know how to find this information if you wish to gather information using a probe that’s not part of a pre-written script.

The DTT (DTrace Toolkit) provides a suite of scripts that can provide so much information themselves, it’s possible that some sysadmins will never need to learn D scripting. The Docs/Contents file included in the DTT explains what each script does. You will find that DTrace can replicate every system-wide statistics tool you’ve ever used (think: iostat, vmstat), but it also goes one step further. The DTT provides the most useful scripts for systems administrators and application developers. Use tcpsnoop to see what processes are sending what packets, or iosnoop to see what processes are writing what files. The ability to see “what” and “how much” leaves one speechless. Before the days of DTrace, admins were often found staring at a terminal wondering, “what’s happening,” or “what’s doing that.” Not any more.

Start with running DTrace yourself. The toolkit provides scripts that make your life easier, but once you get used to DTrace, you can easily start constructing your own.

Let’s begin by asking what system calls are taking place. In this example, we’re asking to instrument all syscall entry porints, by specifying the syscall provider and name “entry”:

dtrace -n 'syscall:::entry’
0   9299          ioctl:entry

The sample ouput line isn’t so useful, as it just shows that some process made an ioctl() call. Something you’ll see over and over again is the command to summarize, and list by process name. The syscall:::entry example above can be modified to summarize what process made the most system calls:

dtrace -n 'syscall:::entry {@[execname]=count(); }'
smbd           5638
save           14378
ruby           182150

It’s clear that this server is relatively busy running Samba, a program called ‘save,’ and a ruby program. The standard tools such as prstat or top should reflect that too. We’re getting into the D scripting realm, so we’ll stop there for now.

The true power of DTrace, for overall system information, can be realized by running the DTrace Toolkit. When you need to delve deeper into a problem, specifically into applications themselves, you’ll need to geek out on DTrace a bit. Come back next week for the full D language tutorial.

Latest Articles

Follow Us On Social Media

Explore More