[Ksummit-2008-discuss] DTrace

Masami Hiramatsu mhiramat at redhat.com
Mon Jun 30 14:10:23 PDT 2008


James Bottomley wrote:
> On Sun, 2008-06-29 at 21:04 -0400, Frank Ch. Eigler wrote:
>> * kprobes, markers
>>   Performance of kprobes-based probes is about 1 us per hit overhead.
>>   Markers are on the order of tens of nanoseconds, which makes a huge
>>   difference for frequently-hit probes.  We'd be happy to interface to
>>   other event sources like ftrace or whatever, as long as they provide
>>   suitable kernel-module-accessible APIs.
> There were two specific latencies of concern to the financial trading
> house type end user: One was the latency from execution to run.  This is
> caused mostly by the module build and insertion.  I really can't see
> this getting better except by divorcing systemtap from having to use the
> whole of the kernel build infrastructure.  To do that, we need to begin
> putting a lot of the C fragments that make up that infrastructure into
> the kernel to lessen the load.  It would actually be nice finally to get
> to the point where you simply link the probe routines with a special
> module stub (built as part of the kernel) and insert it.

I agree, compiling systemtap runtime code to an independent module(or
object file) could reduce building time.
(However, I think it depends on what script you write. if you probe all
of sys_* functions, function searching time becomes long)

> The other is the probe execution latency.  Since the institutions are
> tracing transactions on the order of milliseconds, microsecond latencies
> in the probes do give them cause for concern (it only takes a few probe
> points to add up to a significant perturbation).

Marker has another benefit, it enables you to probe irq handler.
Since Kprobe uses exceptions and isn't recursive, it can't probe
irq related functions. Marker can probe it, because it doesn't use
any exceptions.

>> * integrating systemtap runtime into kernel
>>   We did some analysis about how much of the runtime code contains
>>   novel & relevant code to the kernel.  We came up with a fraction
>>   like 20% (IIRC; still searching for a link to the thread).  Some of
>>   the code is indeed in need of some cleanup love.  
>>   Some of it has been necessary to work around kernel disruptions
>>   (e.g., unexporting stuff like kallsyms_lookup).  The parts that are
>>   deeply kernel-version-sensitive (and would thus benefit from your
>>   maintenance) are quite small.  We're still open to trying to pursue
>>   copying/upstreaming some of this code into the kernel.
> Actually, this one is an example of a wrong approach.  What you're
> effectively doing is trying to implement an ABI for staprun in these
> files (as well as various helpers for the modules).  The work around for
> kallsyms_lookup is pretty horrible as well ... expecially as the kernel
> has its own address to symbol string converter.
> This is a lot of what needs to be cleaned up and simplified.  The
> interface between systemtap and the kernel is essentially a private ABI
> and we should treat it as such, so all the helpers for the modules and
> the necessary implementers of the ABI should be in kernel ... there
> shouldn't be any (if done right) carried around as C fragments with
> kernel version ifdefs ...

And also, some of them should be isolated from the kernel itself.
For example, systemtap can not call do_gettimeofday() because it
is not recursive. So, now, systemtap has its own time.c.

Thank you,

Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat at redhat.com

More information about the Ksummit-2008-discuss mailing list