Frank Ch. Eigler
fche at redhat.com
Sun Jun 29 18:04:23 PDT 2008
Please forgive me for "crashing" the discussion party here. I would
like to clarify some systemtap-related issues that people have raised.
(I'm one of its developers.) I'll just list individual points,
roughly in order they were raised. For a fuller treatment of any of
the topics, please involve our public <systemtap at sources.redhat.com>
* postgres, other dtrace-probe-instrumented userspace programs
We aim to piggyback on these efforts by reusing the dtrace
instrumentation calls embedded into postgres etc., if at all
* "klunky and prone to break in unexpected ways"
There's a germ of truth there, but OTOH the case James ran into
involved complications beyond normal symbolic debugging too
(possibly having to search separately compiled modules for
definitions of opaque struct-pointer types). We're working on it;
our bug/feature list is in public bugzilla.
* "unhappy week with dwarf"
Guilty as charged. :-)
* kprobes, markers
Performance of kprobes-based probes is about 1 us per hit overhead.
Markers are on the order of tens of nanoseconds, which makes a huge
difference for frequently-hit probes. We'd be happy to interface to
other event sources like ftrace or whatever, as long as they provide
suitable kernel-module-accessible APIs.
* user-space probing
We're finally getting very close in this. Yes, it'd use the IBM
uprobes prototype above the Red Hat utrace work as a lower layer,
which we hope get upstream as soon as possible. It will behave
analogously to dtrace: executing probes in kernel space. If it can
be made safe (and we think it can), it's a huge performance win over
trying to do it in userspace (with some gang of debugging processes
It's a fine special-purpose tool. We hope to hook into the same
sorts of underlying hardware performance counters to enable the same
profiling capability in systemtap - except well integrated with the
rest of the probing events / scripts. perfmon2 upstream would be
* dtrace "just works"
Yeah, so I hear, but think about how different their target
environment is. Their kernel hardly changes (several fixed APIs,
ABIs): this has huge implications. Their kernel was willing to
insert probes (~ markers), a bunch of build system changes (debug
info subset transcribing). Here in linux land, we suffer
multifaceted tensions and it is hard to go toward a goal without
obstructions (well-meaning as they may be).
A bunch of third-party scripts are often conflated with "dtrace",
which is just a matter of growing the user community enough, and
giving them a good tool to build on top of. A growing set of
runnable end-user scripts is already packaged with systemtap,
intended for use by nonexperts, more help (e.g. concise problem
statements about what you'd like to measure/see) would be welcome.
* integrating systemtap runtime into kernel
We did some analysis about how much of the runtime code contains
novel & relevant code to the kernel. We came up with a fraction
like 20% (IIRC; still searching for a link to the thread). Some of
the code is indeed in need of some cleanup love.
Some of it has been necessary to work around kernel disruptions
(e.g., unexporting stuff like kallsyms_lookup). The parts that are
deeply kernel-version-sensitive (and would thus benefit from your
maintenance) are quite small. We're still open to trying to pursue
copying/upstreaming some of this code into the kernel.
Theodore is mistaken that we are deflecting the job of tapset (probe
macro; abstracting architecture and kernel version-change -
$foo->bar->baz, function names) authorship. We have asked for help,
and have received a little, but the group has in fact authored a
growing collection of this stuff.
We would welcome having tapsets be included with the kernel and
cared for by you guys.
Yes, it's very helpful & necessary if one wants to place probes at
just about any statement and extract just about any data value.
It's the same prerequisite that crash or kgdb would have, since we
operate at a similar level of object/source code visibility. Other
distros are learning to package this admittedly bulky data up, so
it'll be a matter of a largish download for distro users. Kernel
developers will of course have the data generated locally already.
We've recently gained the ability to work on symbol table level data
only. It's a compromise technology: it shrinks the installation
footprint but we get only function-entry probes; we lose data
typing; can only get at ABI-dictated positional integral arguments.
* systemtap building
The only thing unusual with building the thing is the use of the
elfutils library to parse elf/dwarf data; links to that are provided
and one can link to a private copy if the system lacks it.
* systemtap releases
True, we've been spotty with formal releases, though they are
archived and available, and we're moving to a more regular release
schedule very shortly. The weekly snapshots have been good (except
a recent unfortunate regression that hits 2.6.25 kernels
particularly badly - that's holding up the new release plans).
Thanks for reading; sorry about the length.
More information about the Ksummit-2008-discuss