[Ksummit-2008-discuss] Tracking of regressions and suspend/resume of devices

Grant Grundler grundler at google.com
Mon Jun 2 09:57:46 PDT 2008


On Sat, May 31, 2008 at 7:25 PM, Arjan van de Ven <arjan at linux.intel.com> wrote:
> Benjamin Herrenschmidt wrote:
>> On Fri, 2008-05-30 at 13:21 +0200, Rafael J. Wysocki wrote:
>>> Second, since patches that introduce a new framework for suspend/resume of
>>> devices are targeted at 2.6.27, I'd like to discuss that too, because the next
>>> step will be to adapt drivers to the new framework which is an enormous task
>>> and many people are likely to be involved.  For this purpose we'll need some
>>> official guidance in writing device drivers' suspend and resume callbacks,
>>> among other things, and IMO it's important to reach an agreement on how to do
>>> these things.

Suspend/resume would be an excellent BOF/hack session.

>> After last KS, I start writing some kind of device driver writer guide
>> to power management, though unfortunately didn't finish it. Now with the
>> APIs changing in significant way, I doubt I can re-use much of that but
>> I believe it's still a good idea and we should maybe have a small group
>> BOF'ing to try to put together something here.

Even if the APIs are changing, *anything* would be a good starting point
since nothing exists right now.

>> Writing a good documentation isn't trivial, and I discovered by
>> experience that explaining how PM works and what drivers have to do is
>> even less. So it makes sense to work in a small group to put together
>> the structure of the guide before we start filling it. Unless we do that
>> before hand on IRC :-)
>>
>
> documentation only goes so far...

But without documentation,  I have no clue what sort of things
tulip_suspend/resume
should be doing when it's broken.
See: http://bugzilla.kernel.org/show_bug.cgi?id=8952

This bug would have been resolved alot faster if I had something like
Documentation/PCI/pci.txt but for suspend/resume...would it make more
sense to add suspend resume documentation to each subsystem?
e.g. add something specific to NIC, SCSI, USB, block, graphics, etc.


> We can do other things to make suspend/resume more robust in drivers.

Creating a test harness is an excellent idea. More of the kernel could do
with a test harness and/or error injection. Sounds like another
BOF/Hacking topic.

But I expect many NIC drivers to fail because the relationship between
init_one/remove_one and suspend/resume isn't clear from staring at code.
And the NIC drivers are pretty inconsistent on how they implement
suspend/resume.

> Example: For NIC drivers, we could have a library helper function or something
> that calls the drivers suspend method on "ifconfig down", and resume on going back up.
> The need for a library function comes from the unfortunate corner case that you
> can only do this after, say, 10 seconds, to avoid stupid dhcp clients from bouncing
> the physical link all the time.
> This would make sure the suspend/resume methods get tested a lot, and in addition,
> there is a ton of overlap right now between "down" and "suspend", since both do
> pretty much equivalent power management steps.

Exactly. This is why I am asking whoever is "engineering" the change
for 2.6.27 to
provide some basic documentation. I'm happy to be a reviewer/editor
for that document
if given something to start with.

thanks,
grant


More information about the Ksummit-2008-discuss mailing list