[linux-pm] calling runtime PM from system PM methods

Fri Jun 10 16:14:36 PDT 2011

"Rafael J. Wysocki" <rjw at sisk.pl> writes:

[...]

> Whether or not user space has disabled runtime PM _doesn't_ _matter_ for
> system suspend, because _you_ _can't_ call pm_runtime_suspend(), or
> pm_runtime_put_sunc(), from a driver's .suspend() callback _anyway_.
> The reason is that doing that would cause the subsystem's (or power
> domain's in this case) .runtime_suspend() callback to be invoked and
> that's incorrect.  Namely, it would require the subsystem (power domain)
> to expect that its .runtime_suspend() would always be executed indirectly
> as a result of calling its .suspend() (through the driver's callback)
> and that expectation may or may not be met (depending on the driver's
> design).

So here's an interesting scenario which I think it triggers the same
problem as you highlight above.

Assume you have a driver that's using runtime PM on a per-xfer basis.
Before each xfer, it does a pm_runtime_get_sync(), after each xfer it
does a pm_runtime_put_sync() (for this example, it's important that it's
a _put_sync()).  The _put_sync() might happen in an ISR, or possibly in
a thread waiting on a completion which is awoken by the ISR, etc. etc.
(the runtime PM callbacks are IRQ safe, and device is marked as such.)

The driver is in the middle of an xfer and a system suspend request
happens.

The driver's ->suspend() callback happens, and the driver

- enables/disables wakeups based on device_may_wakeup()
- prevents future xfers
- waits for current xfer to finish

As soon as the xfer finishes, the driver gets notified (completion,
callback, IRQ, whatever) and calls pm_runtime_put_sync(), which triggers
subsys->runtime_suspend --> driver->runtime_suspend.

While the driver's ->suspend() callback doesn't directly call
pm_runtime_put_sync(), the act of waiting for the xfer to finish
causes the subsystem/driver->runtime_suspend callbacks to be called
during the subsytem/driver->suspend callback, which is the same problem
as you highlight above.  

Based on your commit that removed incrementing the usage count across
suspend[1], you mentioned "we can rely on subsystems and device drivers
to avoid doing that unnecessarily."  The above example shows that this
type of thing might not be that obvious to detect and thus avoid.

I suspect the solution to the above will be to add back the usage count
increment across system suspend, but I'm hoping not.  IMO, it would be
more flexible to allow the subsystems to decide.  The subsystems could
provide locking (or manage dev->power.usage_count) themselves if
necessary.  For example, leave it to the subsystem->prepare() to
pm_runtime_get_noresume() if it wants to avoid the "nesting" of
callbacks.

A related question: does the pm_wq need to be freezable?  From
Documentation/power/runtime_pm.txt:

* The power management workqueue pm_wq in which bus types and device drivers can
  put their PM-related work items.  It is strongly recommended that pm_wq be
  used for queuing all work items related to run-time PM, because this allows
  them to be synchronized with system-wide power transitions (suspend to RAM,
  hibernation and resume from system sleep states).  pm_wq is declared in
  include/linux/pm_runtime.h and defined in kernel/power/main.c.

Is "synchronized with system-wide power transistions" correct here?
Rather than synchronize, using a freezable workqueue actually _prevents_
runtime PM events (at least async ones.)

Again, proper locking (or management of dev->power.usage_count) at the
subsystem level would get you the same effect, but still leave
flexibility to the subsystem/pwr_domain layer.

Kevin

P.S. the commit below[1] removed the usage count increment/decrement
     across system suspend/resume, but Documentation/power/runtime_pm.txt 
     still refers to it.   Patch below[2] removes it, ssuming you're
     not planning on adding it back.  ;)

[1]
commit e8665002477f0278f84f898145b1f141ba26ee26
Author: Rafael J. Wysocki <rjw at sisk.pl>
Date:   Sat Feb 12 01:42:41 2011 +0100

    PM: Allow pm_runtime_suspend() to succeed during system suspend

    The dpm_prepare() function increments the runtime PM reference
    counters of all devices to prevent pm_runtime_suspend() from
    executing subsystem-level callbacks.  However, this was supposed to
    guard against a specific race condition that cannot happen, because
    the power management workqueue is freezable, so pm_runtime_suspend()
    can only be called synchronously during system suspend and we can
    rely on subsystems and device drivers to avoid doing that
    unnecessarily.

    Make dpm_prepare() drop the runtime PM reference to each device
    after making sure that runtime resume is not pending for it.

    Signed-off-by: Rafael J. Wysocki <rjw at sisk.pl>
    Acked-by: Kevin Hilman <khilman at ti.com>

[2]