This post is an introduction to DTrace and a provider I implemented for the Smalltalk Squeak/Pharo VM.
DTrace is a powerful, kernel-based dynamic tracing framework. It was originally built for Solaris but is being added to Linux and is shipped with OSX since version 10.5. DTrace can provide very detailed information that is useful when you need to track down bugs or performance issues (e.g., syscalls, I/O events). Programs are dynamically instrumented -- meaning you can deliver fully optimized binaries and start a debugging session on a live production server.
DTrace allows one to add high-level probes, which give the capability of creating static instrumentation points anywhere in the VM. That means, for a high-level language like Smalltalk we can add probes for events such as method enter and method return. So, what does that buy us for Squeak/Pharo? This allows us to use the DTrace technology to trace Smalltalk code execution. DTrace allows unprecedented view across all layers of Smalltalk software. With DTrace we can simultaneously observe and correlate events from Smalltalk, the VM, and the OS.
The provider I’ve implemented currently provides the following events: method enter and return, object allocation, full and incremental GC begin and end, and process activation (that is, when the scheduler transfers control to a Smalltalk process). Similar providers exist for languages such as Java and Ruby.
To wet your appetite, here is an example that illustrates the basic usage of DTrace for Squeak/Pharo. For this example, lets assume our application has a high GC rate so to start tracking down the problem we want to know if there are any classes that are extensively instantiated during a given program run. In the DTrace scripting language (a subset of C named D), we define the probes we are interested in and how to print and aggregate the traced data:
Squeak$target:::proc_transferto
{
self->priority = arg3;
}
Squeak$target:::objectalloc
/ self->priority == 30 /
{
@number[copyinstr(arg0)] = count();
@size[copyinstr(arg0)] = sum(arg1);
}
dtrace:::END
{
... print number and size vars ...
}
The first line captures the firing of the proc_transferto probe, which is triggered whenever a Smalltalk process is activated. Through the variable arg3 we can grab the process’ priority, which we store for later use. With the next statement we tell DTrace that we are also interested in the probe objectalloc, which is fired each time the VM creates a new object. However, we only want the probe to fire when the current process priority is 30. Like this we restrict the triggered events to the processes in question (e.g., only events from Seaside processes, which run at priority 30, but excluding GUI activity).
The variables ‘number’ and ‘size’ are maps that associate each class name with the number of instantiations, respectively with the total sum of bytes allocated. The class name is fetched from arg0, the size of the newly instantiated object from arg1.
The pseudo probe END is run when the tracing session is terminated (that is, when you hit Ctrl-c). In this declaration we specify how to print the results of the aggregated data. The printout shows the top 4 classes:
Number of instances created per class
CLASS NUMBER
Array 8248
Rectangle 19670
Float 41521
Point 79272
Total memory allocated for instances of class (in bytes)
CLASS SIZE
Array 162516
Rectangle 236040
Float 498252
Point 951264
These numbers are taken from tracing the execution of the statement ‘Browser open’ in Pharo. The most instantiated class is Point with 79.000 instances, which together consume 0.9MB (!) of memory. For the complete D source see the file objectalloc.d in the attached zip file (see below).
As a follow up question, we may want to know which methods generated the Point instances. With another small script we can quickly answer also that question.
Rectangle>>intersect: 2534
ImageFillStyle>>initialize 2566
Form>>extent 2792
Point>>asIntegerPoint 5102
SmallInteger>>adaptToPoint:andSend: 7480
Point>>- 10271
Point>>+ 13369
The script remembers for each methodenter and methodreturn even the current class and method in the variable self->currentcaller. On each objectalloc event we then do a @methods[self->currentcaller] = count().
1. Download the ZIP file:
2. Unzip and execute start.command (just double click). This should start up a Pharo image using the instrumented VM.
3. Open a terminal window and execute
sudo dtrace -s objectalloc.d -p `ps -ef | grep squeak | grep -v grep | awk '{ print $2 }'`
4. In the Pharo image evaluate
[ Browser open ] forkAt: 30
5. When the browser has opened (takes a while), quit the D script (Ctr-c). Its output is printed to the terminal.
Follow my previous blog post that explains how to build a Squeak/Pharo VM from zero. Before generating the sources, file in the following changeset:
Also, you need to add the following header file into your ../platforms/unix/src/vm/
methodenter
arg0: name of class
arg1: name of method
arg2: primitive index (0 if not a primitive)
arg3: successFlag of the previously run primitive (false if method is run due to a primitive failure; for non-primitive methods arg3 is always true)
methodreturn
arg0: name of class
arg1: name of method
arg2: primitive index (0 if not a primitive)
objectalloc
arg0: name of class
arg1: size in bytes
gcbegin
gcend
igcbegin
igcend
proc_transferto
arg0: the previous process' pid (hash of Process instance)
arg1: the previous process' priority
arg2: the new process' pid
arg3: the new process' priority
While the low level probes of DTrace do not add overhead to the programs (because of dynamic instrumentation), the ones I added to the VM (aka. User-land Statically Defined Tracing (USDT)) do slow down the execution. The overhead compared to a vanilla VM are:
Bytecodes/sec
Sends/sec
1.5%
5.3%
This is not a huge slowdown, but it is still measurable. There may also be room for tuning the C code I added to the VM.
There are some more probes that would be useful to add. One of these would be a custom probe that can be triggered via a primitive from Smalltalk code. Like this one could fire high-level events, for instance, Seaside-start-request and Seaside-end-request, which would allow one to associate or scope traces to specific Seaside requests. Another useful feature would be to be able to access the current Smalltalk stack. This would allow for sampling; similar to message tally.
DTrace by itself does not provide visualizations but since its output can be formatted as required (using printf) and piped into a file, one can feed the data to a visualization tool (e.g., print output in MSE format, import it into Moose, and create interactive visualizations with Mondrian).
The interesting use cases will emerge from debugging real-world problems. What makes DTrace a compelling technique is that it allows one to seamlessly combine data from different providers. For instance, you can now figure out which of your Smalltalk code triggers expensive I/O events because it hits the disk. Or you could add Firefox probes and see how control flows between Ajax and Smalltalk.
If you find this useful or if you have interesting scripts to share, please drop me a message.
Posted by Adrian Lienhard on May 10, 2010
I've been building and modifying Squeak VMs for a couple of years, but each time I started afresh it took me considerable amount of time to set up a working toolchain. Changes in Pharo, VMMaker, platform C code, make files, in system libraries etc. can introduce subtle problems and the official documentation typically is obsolete. To help me – and others – be faster next time, I wrote the following step-by-step guide to build a Unix VM on OSX 10.6.
Make sure you have xcode installed (http://developer.apple.com/tools/xcode/).
Install cmake >= 2.6.2 from http://www.cmake.org/
Create a work directory on your desktop (it is important it has no spaces!):
mkdir ~/Desktop/vmbuilding; cd ~/Desktop/vmbuilding
Let’s download a fresh Pharo image for later testing. In case you are behind a proxy, set the variable http_proxy to make wget work.
export http_proxy=http://<proxy server>
wget http://gforge.inria.fr/frs/download.php/25397/Pharo1.0-10508-rc2dev10.01.2.zip
unzip Pharo1.0-10508-rc2dev10.01.2.zip
Next we check out the VM platform-specific C sources and make files
svn co http://squeakvm.org/svn/squeak/trunk/platforms/ -r 2151
Note the use of hardcoded revision numbers to make sure these instructions still work in the future.
We start by compiling the source that come with Subversion. This allows us to test the basic setup without the sources we will generate ourselves.
cd platforms; mkdir build; cd build
../unix/cmake/configure --without-RomePlugin --without-vm-display-Quartz
make
If you got here without any errors you can try to start the new VM. Since Quartz does not work anymore on Snow Leopard, we use X11 as display driver and hence have to start up X11 first:
open /Applications/Utilities/X11.app
./squeak ~/Desktop/vmbuilding/Pharo1.0-10508-rc2dev10.01.2/Pharo1.0-10508-rc2dev10.01.2.image &
The Pharo image should start up now. Print “SmalltalkImage current vmVersion”, which shows the version of the image from which the VM source was built.
The result should be 'Squeak3.10.2 of ''5 June 2008'' [latest update: #7179]'.
Print and store the result of “1 tinyBenchmarks” then quit the image.
Start up the same image with your Mac VM:
open ~/Desktop/vmbuilding/Pharo1.0-10508-rc2dev10.01.2/Pharo1.0-10508-rc2dev10.01.2.image
In a workspace in the Pharo image do the following. If you are behind a proxy first let Pharo know. Then load VMMaker through the following Metacello config (thanks Torsten!):
HTTPSocket useProxyServerNamed: '<proxy server>' port: <port>.
Gofer new
squeaksource: 'MetacelloRepository';
package: 'ConfigurationOfVMMaker';
load.
((Smalltalk at: #ConfigurationOfVMMaker) project version: '1.2') load
This takes some time. When it finished loading, we open the VMMaker GUI:
Preferences disable: #raiseDeprecatedWarnings.
VMMakerTool openInWorld
First let’s backup the original sources
mv ../unix/src ../unix/src_original
In the VMMaker GUI:
Change path to platform code: /Users/<user>/Desktop/vmbuilding/platforms
Change path to generated sources: /Users/<user>/Desktop/vmbuilding/platforms/unix/src
Don’t forget to accept the changed field by cmd-s. Leave the other settings unchanged.
Make all plugins internal (right-click in list of plugins). Then select as not built (you can drag and drop between lists): Aio, FFT, FT2, InternetConfig, MacMenubar, Mpeg3, QuickTime, Security, Sound, TestOSA
Select as external: B3DAccelerator, FFI, FileCopy, HostWindow, MIDI, UUID, XDisplayControl.
Note, that I haven't cared to make Freetype work and have just excluded it above. In a related blog post, Laurent describes how to patch Freetype to compile on Linux.
Evaluate the following expression to force the use of LF for line breaks (else the awk script triggered by cmake does not properly work).
MultiByteFileStream defaultToLF.
Click the button Entire to generate the VM sources in the directory ./unix/src/. After the VMMaker has completed, save and quit the image.
We create a new build directory and compile the source we created in step 3.
mkdir ../build_new; cd ../build_new
../unix/cmake/configure --without-vm-display-Quartz
make
./squeak ~/Desktop/vmbuilding/Pharo1.0-10508-rc2dev10.01.2/Pharo1.0-10508-rc2dev10.01.2.image &
If you get here without an error, you made it! Congrats! ;)
A good way to test the healthiness of a new VM is to run the tiny benchmarks.
1 tinyBenchmarks
The result should be similar to the ones you obtained in step 1.
On my MacBook Pro 10.6.2 with 2.4GHz Intel Core 2 Duo, gcc 4.2.1 I got
479400749 bytecodes/sec; 12303286 sends/sec.
Posted by Adrian Lienhard on March 11, 2010