[Ksummit-2012-discuss] [ATTEND] maintaining, SCSI/block, bugs

Roland Dreier roland at kernel.org
Sun Jun 17 00:32:43 UTC 2012


I'd like to attend the 2012 Kernel Summit.  There are a number of
areas where I think I could contribute to discussions:

 - I'm the maintainer of the InfiniBand/RDMA subsystem, which is
   moderately active (ca. 50-100 patches merged per kernel release,
   and if you listen to Christoph Lameter, RDMA, kernel bypass etc are
   becoming more and more important ;) and presents the typical
   challenge that many areas of the kernel face: hardware vendors add
   new features and want to get support out to users quickly, so there
   is lots of new code to merge, while the fraction of the community
   that reviews patches or cleans up technical debt remains low.

   This ties into the topic that tglx raised and I certainly feel the
   tension in how best to spend my own limited time.  Vendors
   certainly wish I would merge more stuff so that features can reach
   users who definitely want them -- so most of the time, most of the
   small community who actually care about InfiniBand/RDMA are mad at
   me for holding back one patchset or another.  But across the longer
   term, I know that I need to take the time to review things or else
   we'll be killed by a thousand maintainability nits or even big
   architectural goofups.

   And I have to feel that the traditional answers of just saying no to
   patches and trying to get more reviewers involved are not
   sufficient.  The InfiniBand/RDMA community seems to be very slowly
   maturing but history shows that vendors will create a whole
   OpenFabrics Alliance and maintain OFED for years just to have a way
   of shipping code instead of just getting their code upstream, and
   it would be good to talk about how we can avoid this in the future.
   (Perhaps holding up the history of OFED as an example to other
   outsiders might be sufficient...)

 - I've also recently become involved in the relatively new SCSI
   target subsystem (drivers/target).  Having a SCSI target in the
   kernel provides all sorts of interesting possibilities for testing
   the SCSI (initiator) stack via error injection -- one can imagine a
   quite useful testing setup within a single laptop using iSCSI
   between VMs, and this also a hackable target makes coding support
   for new exotic SCSI commands much more accessible.

   The SCSI target stack is also a good example of merging a codebase
   with a long out-of-tree heritage and all the issues that entails --
   and it is very gratifying how fast progress in cleaning the code up
   has come.  So perhaps a bit of counterpoint to all the gloom and
   doom about adding bugs and bitrot everywhere.

 - As part of my $DAYJOB I work on a high-performance storage box that
   uses SSDs on a SAS fabric.  I'm very interested in discussing
   Jens's plans for a "third way" for block devices that lets us get
   the best of request queue and make_request drivers.

 - Finally, $DAYJOB more generally involves shipping a very high-end
   embedded system (24 threads, 96 GB of RAM, but still theoretically
   an appliance).  My session at last year's Kernel Summit I think
   unfortunately got sidetracked on specifics about the block/SCSI
   stack (and we do seem to be close to getting SCSI device unplug
   working now), but I'm more convinced than ever that we need to
   think hard about how we "harden" error paths.  It really sucks to
   try and debug crashes that happen once a month when some rare event
   hits, and the only debugging info you have is whatever crash trace
   you get back from the customer site.

   I know that the traditional answer to this is that these are just
   bugs to fix, and we'll fix them.  But I still think we need to
   think about creative ways to make things more reliable or at least
   more debuggable.  I'd be curious to know how this ties into Dave
   Jones's testing / Fedora debugging work.

 - [Also we'd love to work on mmap_sem hold times, which is another
   thing we run into trying to run a multithreaded app on a reasonably
   large memory system.]


More information about the Ksummit-2012-discuss mailing list