[Ksummit-2012-discuss] [ATTEND] kernel core dump and "dying breath"

Srivatsa S. Bhat srivatsa.bhat at linux.vnet.ibm.com
Thu Jun 21 11:33:46 UTC 2012


On 06/21/2012 04:44 PM, James Bottomley wrote:

> On Thu, 2012-06-21 at 05:57 -0500, Jason Wessel wrote:
>> On 06/21/2012 05:37 AM, Konrad Rzeszutek Wilk wrote:
>>> On Thu, Jun 21, 2012 at 4:05 AM, Cong Wang <xiyou.wangcong at gmail.com> wrote:
>>>> Hi, all,
>>>>
>>>> I would like to bring up the kernel "dying breath" topic with
>>>> to the Kernel Summit this year. It will contain some recent
>>>> technologies like pstore, ramconsole etc., and of course
>>>> should also cover kdump and netconsole too.
>>
>> If we are going to make netconsole actually work reliably in all
>> contexts, kgdboe can be revived.  Currently there are places
>> netconsole simply doesn't work and that printk you were looking
>> for... It is never going to be delivered.
>>
>>>
>>> Are there any future projects in the pipeline? Most of these deal with
>>> depositing somewhere "why it crashed" information, but are there any
>>> that try to omit the cause on the next boot?
>>>
>>
>> Self healing eh?  Short of using a different kernel to run some
>> scripts to look at the crash itself and take action such as booting a
>> new kernel, OR booting the existing kernel and looking at the previous
>> crash information to attempt to take corrective action I am not aware
>> of anything.  Things like fsck take this sort of action today, I
>> believe you are asking about something of an entirely different level.
>>
>> The one thing that does come to mind is that if you did save
>> information about the prior crash that you could probably get more
>> information on the next next crash by automatically inserting a kprobe
>> at the crash address that could collect more information automatically
>> into the ftrace buffer or "something" depending on the original crash.
> 
> So I think we're advocating something much more simplistic.  The basic
> problem is that a lot of kernel crashes are very hard for the average
> user to report:  for a lot the best we get is a photo of the crash
> screen with most of the relevant information scrolled off the top.
> Serial or net consoles may be standard dev tools, but they're also too
> complex for most average users (although, to their credit, there's a lot
> who try).
> 
> So the question becomes could we make the gathering of this data easier,
> so it just becomes a cut and paste from a log file (or even an automatic
> submission depending on what permissions you gave your distro).  The
> thought is that a lot of systems are coming with areas of NVRAM today,
> so we could write the dying gasp of a crashing kernel to this NVRAM area
> (avoids most of the problems that trying to write this to disk had and
> NVRAM isn't cleared on bios init) and then spit it out on next boot in a
> form that can be easily consumed by automated bug reporting tools or
> easily cut and pasted into an email.
> 


Fedora has a bug reporting tool called ABRT, which automatically files
bug reports with bugzilla.redhat.com (after getting the required permissions
from the user). It monitors the kernel ring buffer and such, and whenever
it finds a warning or stacktrace, it notifies the user and automatically
files bug reports after gathering all the relevant data. ( Of course, the
machine should still be up and running for that to work :-)). Usually it
works pretty well for warnings/lockdep splats and such which don't entirely
bring the machine down. It would be interesting to see how this can be
taken forward to file bug reports on the subsequent boot (when the kernel
crashed entirely on the previous boot), by saving the crash data using
techniques that you mentioned above.

Regards,
Srivatsa S. Bhat



More information about the Ksummit-2012-discuss mailing list