[Ksummit-2008-discuss] Tracking of regressions and suspend/resume of devices

Rafael J. Wysocki rjw at sisk.pl
Wed Jun 4 12:12:53 PDT 2008


On Monday, 2 of June 2008, Grant Grundler wrote:
> On Sat, May 31, 2008 at 7:25 PM, Arjan van de Ven <arjan at linux.intel.com> wrote:
> > Benjamin Herrenschmidt wrote:
> >> On Fri, 2008-05-30 at 13:21 +0200, Rafael J. Wysocki wrote:
> >>> Second, since patches that introduce a new framework for suspend/resume of
> >>> devices are targeted at 2.6.27, I'd like to discuss that too, because the next
> >>> step will be to adapt drivers to the new framework which is an enormous task
> >>> and many people are likely to be involved.  For this purpose we'll need some
> >>> official guidance in writing device drivers' suspend and resume callbacks,
> >>> among other things, and IMO it's important to reach an agreement on how to do
> >>> these things.
> 
> Suspend/resume would be an excellent BOF/hack session.
> 
> >> After last KS, I start writing some kind of device driver writer guide
> >> to power management, though unfortunately didn't finish it. Now with the
> >> APIs changing in significant way, I doubt I can re-use much of that but
> >> I believe it's still a good idea and we should maybe have a small group
> >> BOF'ing to try to put together something here.
> 
> Even if the APIs are changing, *anything* would be a good starting point
> since nothing exists right now.
> 
> >> Writing a good documentation isn't trivial, and I discovered by
> >> experience that explaining how PM works and what drivers have to do is
> >> even less. So it makes sense to work in a small group to put together
> >> the structure of the guide before we start filling it. Unless we do that
> >> before hand on IRC :-)
> >>
> >
> > documentation only goes so far...
> 
> But without documentation,  I have no clue what sort of things
> tulip_suspend/resume
> should be doing when it's broken.
> See: http://bugzilla.kernel.org/show_bug.cgi?id=8952
> 
> This bug would have been resolved alot faster if I had something like
> Documentation/PCI/pci.txt but for suspend/resume...would it make more
> sense to add suspend resume documentation to each subsystem?
> e.g. add something specific to NIC, SCSI, USB, block, graphics, etc.
> 
> 
> > We can do other things to make suspend/resume more robust in drivers.
> 
> Creating a test harness is an excellent idea. More of the kernel could do
> with a test harness and/or error injection. Sounds like another
> BOF/Hacking topic.
> 
> But I expect many NIC drivers to fail because the relationship between
> init_one/remove_one and suspend/resume isn't clear from staring at code.
> And the NIC drivers are pretty inconsistent on how they implement
> suspend/resume.
> 
> > Example: For NIC drivers, we could have a library helper function or something
> > that calls the drivers suspend method on "ifconfig down", and resume on going back up.
> > The need for a library function comes from the unfortunate corner case that you
> > can only do this after, say, 10 seconds, to avoid stupid dhcp clients from bouncing
> > the physical link all the time.
> > This would make sure the suspend/resume methods get tested a lot, and in addition,
> > there is a ton of overlap right now between "down" and "suspend", since both do
> > pretty much equivalent power management steps.
> 
> Exactly. This is why I am asking whoever is "engineering" the change
> for 2.6.27 to
> provide some basic documentation. I'm happy to be a reviewer/editor
> for that document
> if given something to start with.

Well, I think that the documentation should be provided along with some
implementation examples, but I'd rather like the people who actually write
drivers to participate in creating them.  So, the idea is to choose one or a
couple of drivers from each distinct category, implement the new callbacks for
them and provide that code as examples, but I'll need some help.

Thanks,
Rafael


More information about the Ksummit-2008-discuss mailing list