[Ksummit-2012-discuss] [ATTEND] kernel core dump and "dying breath"

Thu Jun 21 11:14:49 UTC 2012

On Thu, 2012-06-21 at 05:57 -0500, Jason Wessel wrote:
> On 06/21/2012 05:37 AM, Konrad Rzeszutek Wilk wrote:
> > On Thu, Jun 21, 2012 at 4:05 AM, Cong Wang <xiyou.wangcong at gmail.com> wrote:
> >> Hi, all,
> >>
> >> I would like to bring up the kernel "dying breath" topic with
> >> to the Kernel Summit this year. It will contain some recent
> >> technologies like pstore, ramconsole etc., and of course
> >> should also cover kdump and netconsole too.
> 
> If we are going to make netconsole actually work reliably in all
> contexts, kgdboe can be revived.  Currently there are places
> netconsole simply doesn't work and that printk you were looking
> for... It is never going to be delivered.
> 
> > 
> > Are there any future projects in the pipeline? Most of these deal with
> > depositing somewhere "why it crashed" information, but are there any
> > that try to omit the cause on the next boot?
> > 
> 
> Self healing eh?  Short of using a different kernel to run some
> scripts to look at the crash itself and take action such as booting a
> new kernel, OR booting the existing kernel and looking at the previous
> crash information to attempt to take corrective action I am not aware
> of anything.  Things like fsck take this sort of action today, I
> believe you are asking about something of an entirely different level.
> 
> The one thing that does come to mind is that if you did save
> information about the prior crash that you could probably get more
> information on the next next crash by automatically inserting a kprobe
> at the crash address that could collect more information automatically
> into the ftrace buffer or "something" depending on the original crash.

So I think we're advocating something much more simplistic.  The basic
problem is that a lot of kernel crashes are very hard for the average
user to report:  for a lot the best we get is a photo of the crash
screen with most of the relevant information scrolled off the top.
Serial or net consoles may be standard dev tools, but they're also too
complex for most average users (although, to their credit, there's a lot
who try).

So the question becomes could we make the gathering of this data easier,
so it just becomes a cut and paste from a log file (or even an automatic
submission depending on what permissions you gave your distro).  The
thought is that a lot of systems are coming with areas of NVRAM today,
so we could write the dying gasp of a crashing kernel to this NVRAM area
(avoids most of the problems that trying to write this to disk had and
NVRAM isn't cleared on bios init) and then spit it out on next boot in a
form that can be easily consumed by automated bug reporting tools or
easily cut and pasted into an email.

James