[Ksummit-2008-discuss] DTrace

Frank Ch. Eigler fche at redhat.com
Sun Jun 29 18:04:23 PDT 2008

Hi -

Please forgive me for "crashing" the discussion party here.  I would
like to clarify some systemtap-related issues that people have raised.
(I'm one of its developers.)  I'll just list individual points,
roughly in order they were raised.  For a fuller treatment of any of
the topics, please involve our public <systemtap at sources.redhat.com>
mailing list.

* postgres, other dtrace-probe-instrumented userspace programs

  We aim to piggyback on these efforts by reusing the dtrace
  instrumentation calls embedded into postgres etc., if at all

* "klunky and prone to break in unexpected ways"

  There's a germ of truth there, but OTOH the case James ran into
  involved complications beyond normal symbolic debugging too
  (possibly having to search separately compiled modules for
  definitions of opaque struct-pointer types).  We're working on it;
  our bug/feature list is in public bugzilla.

* "unhappy week with dwarf"

  Guilty as charged. :-)

* kprobes, markers

  Performance of kprobes-based probes is about 1 us per hit overhead.
  Markers are on the order of tens of nanoseconds, which makes a huge
  difference for frequently-hit probes.  We'd be happy to interface to
  other event sources like ftrace or whatever, as long as they provide
  suitable kernel-module-accessible APIs.

* user-space probing

  We're finally getting very close in this.  Yes, it'd use the IBM
  uprobes prototype above the Red Hat utrace work as a lower layer,
  which we hope get upstream as soon as possible.  It will behave
  analogously to dtrace: executing probes in kernel space.  If it can
  be made safe (and we think it can), it's a huge performance win over
  trying to do it in userspace (with some gang of debugging processes
  or whatever).

* oprofile

  It's a fine special-purpose tool.  We hope to hook into the same
  sorts of underlying hardware performance counters to enable the same
  profiling capability in systemtap - except well integrated with the
  rest of the probing events / scripts.  perfmon2 upstream would be
  very helpful.

* dtrace "just works"

  Yeah, so I hear, but think about how different their target
  environment is.  Their kernel hardly changes (several fixed APIs,
  ABIs): this has huge implications.  Their kernel was willing to
  insert probes (~ markers), a bunch of build system changes (debug
  info subset transcribing).  Here in linux land, we suffer
  multifaceted tensions and it is hard to go toward a goal without
  obstructions (well-meaning as they may be).

  A bunch of third-party scripts are often conflated with "dtrace",
  which is just a matter of growing the user community enough, and
  giving them a good tool to build on top of.  A growing set of
  runnable end-user scripts is already packaged with systemtap,
  intended for use by nonexperts, more help (e.g. concise problem
  statements about what you'd like to measure/see) would be welcome.

* integrating systemtap runtime into kernel

  We did some analysis about how much of the runtime code contains
  novel & relevant code to the kernel.  We came up with a fraction
  like 20% (IIRC; still searching for a link to the thread).  Some of
  the code is indeed in need of some cleanup love.  

  Some of it has been necessary to work around kernel disruptions
  (e.g., unexporting stuff like kallsyms_lookup).  The parts that are
  deeply kernel-version-sensitive (and would thus benefit from your
  maintenance) are quite small.  We're still open to trying to pursue
  copying/upstreaming some of this code into the kernel.

* tapsets

  Theodore is mistaken that we are deflecting the job of tapset (probe
  macro; abstracting architecture and kernel version-change -
  $foo->bar->baz, function names) authorship.  We have asked for help,
  and have received a little, but the group has in fact authored a
  growing collection of this stuff.

  We would welcome having tapsets be included with the kernel and
  cared for by you guys.

* debuginfo

  Yes, it's very helpful & necessary if one wants to place probes at
  just about any statement and extract just about any data value.
  It's the same prerequisite that crash or kgdb would have, since we
  operate at a similar level of object/source code visibility.  Other
  distros are learning to package this admittedly bulky data up, so
  it'll be a matter of a largish download for distro users. Kernel
  developers will of course have the data generated locally already.

  We've recently gained the ability to work on symbol table level data
  only.  It's a compromise technology: it shrinks the installation
  footprint but we get only function-entry probes; we lose data
  typing; can only get at ABI-dictated positional integral arguments.

* systemtap building

  The only thing unusual with building the thing is the use of the
  elfutils library to parse elf/dwarf data; links to that are provided
  and one can link to a private copy if the system lacks it.

* systemtap releases

  True, we've been spotty with formal releases, though they are
  archived and available, and we're moving to a more regular release
  schedule very shortly.  The weekly snapshots have been good (except
  a recent unfortunate regression that hits 2.6.25 kernels
  particularly badly - that's holding up the new release plans).

Thanks for reading; sorry about the length.

- FChE

More information about the Ksummit-2008-discuss mailing list