[Ksummit-2012-discuss] [ATTEND] kernel core dump and "dying breath"

Konrad Rzeszutek Wilk konrad.wilk at oracle.com
Thu Jun 21 15:23:37 UTC 2012


On Thu, Jun 21, 2012 at 05:57:29AM -0500, Jason Wessel wrote:
> On 06/21/2012 05:37 AM, Konrad Rzeszutek Wilk wrote:
> > On Thu, Jun 21, 2012 at 4:05 AM, Cong Wang <xiyou.wangcong at gmail.com> wrote:
> >> Hi, all,
> >>
> >> I would like to bring up the kernel "dying breath" topic with
> >> to the Kernel Summit this year. It will contain some recent
> >> technologies like pstore, ramconsole etc., and of course
> >> should also cover kdump and netconsole too.
> 
> If we are going to make netconsole actually work reliably in all
> contexts, kgdboe can be revived.  Currently there are places
> netconsole simply doesn't work and that printk you were looking
> for... It is never going to be delivered.
> 
> > 
> > Are there any future projects in the pipeline? Most of these deal with
> > depositing somewhere "why it crashed" information, but are there any
> > that try to omit the cause on the next boot?
> > 
> 
> Self healing eh?  Short of using a different kernel to run some

<nods>
> scripts to look at the crash itself and take action such as booting a
> new kernel, OR booting the existing kernel and looking at the previous
> crash information to attempt to take corrective action I am not aware
> of anything.  Things like fsck take this sort of action today, I
> believe you are asking about something of an entirely different level.

Some drivers have these knobs to set more conservative options. For example
network drivers can skip using MSI and instead use legacy interrupts.

Was wondering if it would make sense to introduce a '.fallback' function
(to be implemented by the drivers) so that if we boot on the newly
kernel it would try to use a conservative approach to hardware.

> 
> The one thing that does come to mind is that if you did save
> information about the prior crash that you could probably get more
> information on the next next crash by automatically inserting a kprobe
> at the crash address that could collect more information automatically
> into the ftrace buffer or "something" depending on the original crash.

That is pretty neat..
> 
> For the really tricky sorts of problems like memory corruption
> however, the addresses tend to move around so this is not likely to
> help much.  I tend to fall back to kdb, and the "kdb death script" (a

They move? Is it that the whole memory bank is busted? Could the
second reboot mark the DIMM as unsavory and not use it?
> toy of mine that is not in the mainline), where you can assign an
> action to output all the commands you would have other wise typed, and
> then reboot.
> Jason.


More information about the Ksummit-2012-discuss mailing list