This post is an introduction to DTrace and a provider I implemented for the Smalltalk Squeak/Pharo VM.
DTrace is a powerful, kernel-based dynamic tracing framework. It was originally built for Solaris but is being added to Linux and is shipped with OSX since version 10.5. DTrace can provide very detailed information that is useful when you need to track down bugs or performance issues (e.g., syscalls, I/O events). Programs are dynamically instrumented -- meaning you can deliver fully optimized binaries and start a debugging session on a live production server.
DTrace allows one to add high-level probes, which give the capability of creating static instrumentation points anywhere in the VM. That means, for a high-level language like Smalltalk we can add probes for events such as method enter and method return. So, what does that buy us for Squeak/Pharo? This allows us to use the DTrace technology to trace Smalltalk code execution. DTrace allows unprecedented view across all layers of Smalltalk software. With DTrace we can simultaneously observe and correlate events from Smalltalk, the VM, and the OS.
The provider I’ve implemented currently provides the following events: method enter and return, object allocation, full and incremental GC begin and end, and process activation (that is, when the scheduler transfers control to a Smalltalk process). Similar providers exist for languages such as Java and Ruby.
To wet your appetite, here is an example that illustrates the basic usage of DTrace for Squeak/Pharo. For this example, lets assume our application has a high GC rate so to start tracking down the problem we want to know if there are any classes that are extensively instantiated during a given program run. In the DTrace scripting language (a subset of C named D), we define the probes we are interested in and how to print and aggregate the traced data:
Squeak$target:::proc_transferto
{
self->priority = arg3;
}
Squeak$target:::objectalloc
/ self->priority == 30 /
{
@number[copyinstr(arg0)] = count();
@size[copyinstr(arg0)] = sum(arg1);
}
dtrace:::END
{
... print number and size vars ...
}
The first line captures the firing of the proc_transferto probe, which is triggered whenever a Smalltalk process is activated. Through the variable arg3 we can grab the process’ priority, which we store for later use. With the next statement we tell DTrace that we are also interested in the probe objectalloc, which is fired each time the VM creates a new object. However, we only want the probe to fire when the current process priority is 30. Like this we restrict the triggered events to the processes in question (e.g., only events from Seaside processes, which run at priority 30, but excluding GUI activity).
The variables ‘number’ and ‘size’ are maps that associate each class name with the number of instantiations, respectively with the total sum of bytes allocated. The class name is fetched from arg0, the size of the newly instantiated object from arg1.
The pseudo probe END is run when the tracing session is terminated (that is, when you hit Ctrl-c). In this declaration we specify how to print the results of the aggregated data. The printout shows the top 4 classes:
Number of instances created per class
CLASS NUMBER
Array 8248
Rectangle 19670
Float 41521
Point 79272
Total memory allocated for instances of class (in bytes)
CLASS SIZE
Array 162516
Rectangle 236040
Float 498252
Point 951264
These numbers are taken from tracing the execution of the statement ‘Browser open’ in Pharo. The most instantiated class is Point with 79.000 instances, which together consume 0.9MB (!) of memory. For the complete D source see the file objectalloc.d in the attached zip file (see below).
As a follow up question, we may want to know which methods generated the Point instances. With another small script we can quickly answer also that question.
Rectangle>>intersect: 2534
ImageFillStyle>>initialize 2566
Form>>extent 2792
Point>>asIntegerPoint 5102
SmallInteger>>adaptToPoint:andSend: 7480
Point>>- 10271
Point>>+ 13369
The script remembers for each methodenter and methodreturn even the current class and method in the variable self->currentcaller. On each objectalloc event we then do a @methods[self->currentcaller] = count().
1. Download the ZIP file:
2. Unzip and execute start.command (just double click). This should start up a Pharo image using the instrumented VM.
3. Open a terminal window and execute
sudo dtrace -s objectalloc.d -p `ps -ef | grep squeak | grep -v grep | awk '{ print $2 }'`
4. In the Pharo image evaluate
[ Browser open ] forkAt: 30
5. When the browser has opened (takes a while), quit the D script (Ctr-c). Its output is printed to the terminal.
Follow my previous blog post that explains how to build a Squeak/Pharo VM from zero. Before generating the sources, file in the following changeset:
Also, you need to add the following header file into your ../platforms/unix/src/vm/
methodenter
arg0: name of class
arg1: name of method
arg2: primitive index (0 if not a primitive)
arg3: successFlag of the previously run primitive (false if method is run due to a primitive failure; for non-primitive methods arg3 is always true)
methodreturn
arg0: name of class
arg1: name of method
arg2: primitive index (0 if not a primitive)
objectalloc
arg0: name of class
arg1: size in bytes
gcbegin
gcend
igcbegin
igcend
proc_transferto
arg0: the previous process' pid (hash of Process instance)
arg1: the previous process' priority
arg2: the new process' pid
arg3: the new process' priority
While the low level probes of DTrace do not add overhead to the programs (because of dynamic instrumentation), the ones I added to the VM (aka. User-land Statically Defined Tracing (USDT)) do slow down the execution. The overhead compared to a vanilla VM are:
Bytecodes/sec
Sends/sec
1.5%
5.3%
This is not a huge slowdown, but it is still measurable. There may also be room for tuning the C code I added to the VM.
There are some more probes that would be useful to add. One of these would be a custom probe that can be triggered via a primitive from Smalltalk code. Like this one could fire high-level events, for instance, Seaside-start-request and Seaside-end-request, which would allow one to associate or scope traces to specific Seaside requests. Another useful feature would be to be able to access the current Smalltalk stack. This would allow for sampling; similar to message tally.
DTrace by itself does not provide visualizations but since its output can be formatted as required (using printf) and piped into a file, one can feed the data to a visualization tool (e.g., print output in MSE format, import it into Moose, and create interactive visualizations with Mondrian).
The interesting use cases will emerge from debugging real-world problems. What makes DTrace a compelling technique is that it allows one to seamlessly combine data from different providers. For instance, you can now figure out which of your Smalltalk code triggers expensive I/O events because it hits the disk. Or you could add Firefox probes and see how control flows between Ajax and Smalltalk.
If you find this useful or if you have interesting scripts to share, please drop me a message.
Posted by Adrian Lienhard on May 10, 2010