Contact Search RSS Feed

Filter: Yearly view
Filter (Chronologic, Yearly view)
2010>>
January(1)
February
March(1)
April(1)
May(1)
June
July
August
September
October
November
December

Smalltak meets DTrace

EDIT: the VM patches and the examples are available from Github now. If you make changes, please send me pull requests!

This post is an introduction to DTrace and a provider I implemented for the Smalltalk Squeak/Pharo VM.

What is DTrace and why is it interesting for Smalltalk?

DTrace is a powerful, kernel-based dynamic tracing framework. It was originally built for Solaris but is being added to Linux and is shipped with OSX since version 10.5. DTrace can provide very detailed information that is useful when you need to track down bugs or performance issues (e.g., syscalls, I/O events). Programs are dynamically instrumented -- meaning you can deliver fully optimized binaries and start a debugging session on a live production server.

DTrace allows one to add high-level probes, which give the capability of creating static instrumentation points anywhere in the VM. That means, for a high-level language like Smalltalk we can add probes for events such as method enter and method return. So, what does that buy us for Squeak/Pharo? This allows us to use the DTrace technology to trace Smalltalk code execution. DTrace allows unprecedented view across all layers of Smalltalk software. With DTrace we can simultaneously observe and correlate events from Smalltalk, the VM, and the OS.

The provider I’ve implemented currently provides the following events: method enter and return, object allocation, full and incremental GC begin and end, and process activation (that is, when the scheduler transfers control to a Smalltalk process). Similar providers exist for languages such as Java and Ruby.

A simple example

To wet your appetite, here is an example that illustrates the basic usage of DTrace for Squeak/Pharo. For this example, lets assume our application has a high GC rate so to start tracking down the problem we want to know if there are any classes that are extensively instantiated during a given program run. In the DTrace scripting language (a subset of C named D), we define the probes we are interested in and how to print and aggregate the traced data:

Squeak$target:::proc_transferto
{
  self->priority = arg3;
}

Squeak$target:::objectalloc
/ self->priority == 30 /
{
  @number[copyinstr(arg0)] = count();
  @size[copyinstr(arg0)] = sum(arg1);
}

dtrace:::END
{
   ... print number and size vars ...
}

The first line captures the firing of the proc_transferto probe, which is triggered whenever a Smalltalk process is activated. Through the variable arg3 we can grab the process’ priority, which we store for later use. With the next statement we tell DTrace that we are also interested in the probe objectalloc, which is fired each time the VM creates a new object. However, we only want the probe to fire when the current process priority is 30. Like this we restrict the triggered events to the processes in question (e.g., only events from Seaside processes, which run at priority 30, but excluding GUI activity).

The variables ‘number’ and ‘size’ are maps that associate each class name with the number of instantiations, respectively with the total sum of bytes allocated. The class name is fetched from arg0, the size of the newly instantiated object from arg1.

The pseudo probe END is run when the tracing session is terminated (that is, when you hit Ctrl-c). In this declaration we specify how to print the results of the aggregated data. The printout  shows the top 4 classes:

Number of instances created per class

CLASS           NUMBER
Array             8248
Rectangle      19670
Float              41521
Point              79272

Total memory allocated for instances of class (in bytes)
CLASS           SIZE
Array              162516
Rectangle      236040
Float              498252
Point              951264

These numbers are taken from tracing the execution of the statement ‘Browser open’ in Pharo. The most instantiated class is Point with 79.000 instances, which together consume 0.9MB (!) of memory. For the complete D source see the file objectalloc.d in the attached zip file (see below).

As a follow up question, we may want to know which methods generated the Point instances. With another small script we can quickly answer also that question.

Rectangle>>intersect:                                    2534
ImageFillStyle>>initialize                            2566
Form>>extent                                                 2792
Point>>asIntegerPoint                                  5102
SmallInteger>>adaptToPoint:andSend:     7480
Point>>-                                                          10271
Point>>+                                                         13369

The script remembers for each methodenter and methodreturn even the current class and method in the variable self->currentcaller. On each objectalloc event we then do a @methods[self->currentcaller] = count().

Quick start (for now only OS X)

1. Download the ZIP file:

2. Unzip and execute start.command (just double click). This should start up a Pharo image using the instrumented VM.

3. Open a terminal window and execute

sudo dtrace -s objectalloc.d -p `ps -ef | grep squeak | grep -v grep | awk '{ print $2 }'`

4. In the Pharo image evaluate

[ Browser open ] forkAt: 30

5. When the browser has opened (takes a while), quit the D script (Ctr-c). Its output is printed to the terminal.

How to build your own DTrace-instrumented VM

Follow my previous blog post that explains how to build a Squeak/Pharo VM from zero. Before generating the sources, file in the following changeset:

Also, you need to add the following header file into your ../platforms/unix/src/vm/

EDIT: the VM patches and the examples are available from Github now. If you make changes, please send me pull requests!

 

Currently available probes

methodenter
       arg0: name of class
       arg1: name of method
       arg2: primitive index (0 if not a primitive)
       arg3: successFlag of the previously run primitive (false if method is run due to a primitive failure; for non-primitive methods arg3 is always true)

methodreturn
       arg0: name of class
       arg1: name of method
       arg2: primitive index (0 if not a primitive)

objectalloc
      arg0: name of class
      arg1: size in bytes

gcbegin

gcend

igcbegin

igcend

proc_transferto
      arg0: the previous process' pid (hash of Process instance)
      arg1: the previous process' priority
      arg2: the new process' pid
      arg3: the new process' priority

Performance overhead

While the low level probes of DTrace do not add overhead to the programs (because of dynamic instrumentation), the ones I added to the VM (aka. User-land Statically Defined Tracing (USDT)) do slow down the execution. The overhead compared to a vanilla VM are:

Bytecodes/sec
Sends/sec

1.5%
5.3%

This is not a huge slowdown, but it is still measurable. There may also be room for tuning the C code I added to the VM.

Where to go from here?

There are some more probes that would be useful to add. One of these would be a custom probe that can be triggered via a primitive from Smalltalk code. Like this one could fire high-level events, for instance, Seaside-start-request and Seaside-end-request, which would allow one to associate or scope traces to specific Seaside requests. Another useful feature would be to be able to access the current Smalltalk stack. This would allow for sampling; similar to message tally.

DTrace by itself does not provide visualizations but since its output can be formatted as required (using printf) and piped into a file, one can feed the data to a visualization tool (e.g., print output in MSE format, import it into Moose, and create interactive visualizations with Mondrian).

The interesting use cases will emerge from debugging real-world problems. What makes DTrace a compelling technique is that it allows one to seamlessly combine data from different providers. For instance, you can now figure out which of your Smalltalk code triggers expensive I/O events because it hits the disk. Or you could add Firefox probes and see how control flows between Ajax and Smalltalk.

If you find this useful or if you have interesting scripts to share, please drop me a message.

Posted by Adrian Lienhard on May 10, 2010


Building a Pharo/Squeak VM from first Principles

I've been building and modifying Squeak VMs for a couple of years, but each time I started afresh it took me considerable amount of time to set up a working toolchain. Changes in Pharo, VMMaker, platform C code, make files, in system libraries etc. can introduce subtle problems and the official documentation typically is obsolete. To help me – and others – be faster next time, I wrote the following step-by-step guide to build a Unix VM on OSX 10.6.

0. Preparation

Make sure you have xcode installed (http://developer.apple.com/tools/xcode/).
Install cmake >= 2.6.2 from http://www.cmake.org/

Create a work directory on your desktop (it is important it has no spaces!):

mkdir ~/Desktop/vmbuilding; cd ~/Desktop/vmbuilding

Let’s download a fresh Pharo image for later testing. In case you are behind a proxy, set the variable http_proxy to make wget work.

export http_proxy=http://<proxy server>
wget http://gforge.inria.fr/frs/download.php/25397/Pharo1.0-10508-rc2dev10.01.2.zip
unzip Pharo1.0-10508-rc2dev10.01.2.zip

Next we check out the VM platform-specific C sources and make files

svn co http://squeakvm.org/svn/squeak/trunk/platforms/ -r 2151

Note the use of hardcoded revision numbers to make sure these instructions still work in the future.

1. Compile from existing source

We start by compiling the source that come with Subversion. This allows us to test the basic setup without the sources we will generate ourselves.

cd platforms; mkdir build; cd build
../unix/cmake/configure --without-RomePlugin --without-vm-display-Quartz
make

If you got here without any errors you can try to start the new VM. Since Quartz does not work anymore on Snow Leopard, we use X11 as display driver and hence have to start up X11 first:

open /Applications/Utilities/X11.app
./squeak ~/Desktop/vmbuilding/Pharo1.0-10508-rc2dev10.01.2/Pharo1.0-10508-rc2dev10.01.2.image &

The Pharo image should start up now. Print “SmalltalkImage current vmVersion”, which shows the version of the image from which the VM source was built.
The result should be 'Squeak3.10.2 of ''5 June 2008'' [latest update: #7179]'.

Print and store the result of “1 tinyBenchmarks” then quit the image.

2. Install VMMaker

Start up the same image with your Mac VM:
open ~/Desktop/vmbuilding/Pharo1.0-10508-rc2dev10.01.2/Pharo1.0-10508-rc2dev10.01.2.image

In a workspace in the Pharo image do the following. If you are behind a proxy first let Pharo know. Then load VMMaker through the following Metacello config (thanks Torsten!):

HTTPSocket useProxyServerNamed: '<proxy server>' port: <port>.
Gofer new
      squeaksource: 'MetacelloRepository';
      package: 'ConfigurationOfVMMaker';
      load.
((Smalltalk at: #ConfigurationOfVMMaker) project version: '1.2') load

This takes some time. When it finished loading, we open the VMMaker GUI:

Preferences disable: #raiseDeprecatedWarnings.
VMMakerTool openInWorld

3. Configure VMMaker and generate sources

First let’s backup the original sources

mv ../unix/src ../unix/src_original

In the VMMaker GUI:
Change path to platform code: /Users/<user>/Desktop/vmbuilding/platforms
Change path to generated sources: /Users/<user>/Desktop/vmbuilding/platforms/unix/src
Don’t forget to accept the changed field by cmd-s. Leave the other settings unchanged.

Make all plugins internal (right-click in list of plugins). Then select as not built (you can drag and drop between lists): Aio, FFT, FT2, InternetConfig, MacMenubar, Mpeg3, QuickTime, Security, Sound, TestOSA

Select as external:  B3DAccelerator, FFI, FileCopy, HostWindow, MIDI, UUID, XDisplayControl.

Note, that I haven't cared to make Freetype work and have just excluded it above. In a related blog post, Laurent describes how to patch Freetype to compile on Linux.

Evaluate the following expression to force the use of LF for line breaks (else the awk script triggered by cmake does not properly work).

MultiByteFileStream defaultToLF.

Click the button Entire to generate the VM sources in the directory ./unix/src/. After the VMMaker has completed, save and quit the image.

4. Compile the new sources

We create a new build directory and compile the source we created in step 3.

mkdir ../build_new; cd ../build_new
../unix/cmake/configure --without-vm-display-Quartz
make
./squeak ~/Desktop/vmbuilding/Pharo1.0-10508-rc2dev10.01.2/Pharo1.0-10508-rc2dev10.01.2.image &

If you get here without an error, you made it! Congrats! ;)

5. Benchmarks

A good way to test the healthiness of a new VM is to run the tiny benchmarks.

1 tinyBenchmarks

The result should be similar to the ones you obtained in step 1.

On my MacBook Pro 10.6.2 with 2.4GHz Intel Core 2 Duo, gcc 4.2.1 I got
479400749 bytecodes/sec; 12303286 sends/sec.

Posted by Adrian Lienhard on March 11, 2010


Archive

Filter (Chronologic, Yearly view)
2010>>
January(1)
February
March(1)
April(1)
May(1)
June
July
August
September
October
November
December

CMSBOX Powered by cmsbox