[linux-pm] [RFC PATCH v2 1/3] PM / Core: suspend_again callback for device PM.

MyungJoo Ham myungjoo.ham at samsung.com
Mon Apr 25 18:31:09 PDT 2011

A system or a device may need to control suspend/wakeup events. It may
want to wakeup the system after a predefined amount of time or at a
predefined event decided while entering suspend for polling or delayed
work. Then, it may want to enter suspend again if its predefined wakeup
condition is the only wakeup reason and there is no outstanding events;
thus, it does not wakeup the userspace unnecessary and keeps suspended
as long as possible (saving the power).

Enabling a system to wakeup after a specified time can be easily
achieved by using RTC. However, to enter suspend again immediately
without invoking userland, we need additional features in the
suspend framework.

Such need comes from:

1. Monitoring a critical device status without interrupts that can
wakeup the system. (in-suspend polling)
 An example is ambient temperature monitoring that needs to shut down
the system or a specific device function if it is too hot or cold. The
temperature of a specific device may be needed to be monitored as well;
e.g., a charger monitors battery temperature in order to stop charging
if overheated.

2. Execute critical "delayed work" at suspend.
 A driver or a system/board may have a delayed work (or any similar
things) that it wants to execute at the requested time.
 For example, some chargers want to check the battery voltage some
time (e.g., 30 seconds) after the battery is fully charged and the
charger has stopped. Then, the charger restarts charging if the voltage has
dropped more than a threshold, which is smaller than "restart-charger"
voltage, which is a threshold to restart charging regardless of the
time passed.

This patch allows a device to provide "suspend_again" callback with
struct dev_pm_ops in driver. With suspend_again callbacks registered
along with supporting subsystems, the suspend framework
(kernel/power/suspend.c) tries to enter suspend again if conditions are met.
In general, in order to support suspend-again, subsystems simply need to
return the value returned by the suspend_again of device as the
pm_generic_suspend_again does.

The system enters the suspend again if and only if all of the following
conditions are met:
1. None of suspend_again ops returned "I want to stop suspend" by
returning a negative number.
2. At least one of suspend_again ops returned "I want to suspend again"
by returning a positive number.

suspend_again ops may return "I do not care. This wakeup is not related
with me." by providing zero.

suspend_again ops may return a negative number in order to override
other devices' "suspend again" request and to wakeup fully. Devices
that poll sensors during suspend may need this if any outstanding status
that requires to notify the userland is found. Conventional suspend
wakeup sources may also need this to override SUSPEND_AGAIN devices.
(we may need to lookup every IRQ registered with set_irq_wake and
register IRQS that are related with suspend_ops separatedly. this is
not implemented yet anyway and discusses below with item 3.)

Anyway, the following features may need to be added later:

1. An API to allow devices to express next desired wakeup-time. Then,
the framework will combine them and setup RTC alarm accordingly and
save/restore previously registered RTC alarms.
=> For this, rtc_timer_init/start/cancel might work for any rtc that
has wakeup enabled. We need to express whether the rtc_timer is going to
be used for suspend_again so that it would not fully wakeup the system.
=> Then, we need a bogus platform-device of rtc to see if the
current wakeup event is whether a) rtc-induced and b) suspend-again
related rtc event. This may be implemented at drivers/rtc/interface.c?

2. Create a method to declare a specific instance of delayed-work is to
be executed in suspend by waking up the system in the middle of
suspend. Then, let the framework handle those "critical" delayed-work
in suspend.
=> If the workitem 1 with identification method is done, this will
probably be an easy feature to add. (virtually already implemented with 1.)

3. If a device says "suspend again, >0" and there is another wakeup
source pending (e.g., power button) without suspend_again ops, the
system will enter suspend again. In such a case, the system should not
suspend again. We may need to see if irqs that are enabled by
set_irq_wake() (and not related to suspend_ops devices)
are pending at syscore_suspend_again(). Maybe we need to add
something like "set_irq_wake_with_suspend_again" so that IRQs with
suspend_again ops implemented are ignored for the
"override-suspend-again-continue" checking.
=> For wakeup-enabled IRQs, such behavior of fully-waking up the system
(by returning a negative number at suspend_again) should be default and
should not require any modification; the change should be transparent to
the conventional wakeup sources. Thus, we should modify IRQ core to
do this by default and add an option not to return < 0 optionally
(selected by suspend-again users). The implementation of 1 and 2 should
include this in their code as well.

In this patch, the suspend-again loops at "suspend_devices_and_enter"
because at that point,
1. every device is resume and are ready to be used.
2. userspace is still not invoked (transparent to it)
3. console is accessible as it is outside of suspend/resume_console.
4. it is outside of trace_machine_suspend(); thus, we can have seperated
traces for each instance of suspend_again.

For the RFC release, I have set the point of "suspend-again" after
suspend_ops->end(). However, I'm not so sure about where to set the
suspend-again point. Because in-suspend polling, which may require
I/O with other devices, is supposed to be executed at suspend-again ops,
the suspend-again point is configured to be as late as possible in
suspend_devices_and_enter(). In order to reduce the number of devices
waked up, we may need to set the suspend-again point ealier.

Signed-off-by: MyungJoo Ham <myungjoo.ham at samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park at samsung.com>

Changes from v1:
	- moved from syscore to dev_pm_ops
	- added generic ops for subsystems.
	- cleaned up suspend_again code at kernel/power/suspend.
 drivers/base/power/generic_ops.c |   17 ++++++++++
 drivers/base/power/main.c        |   61 ++++++++++++++++++++++++++++++++++++++
 include/linux/pm.h               |   17 ++++++++++
 kernel/power/suspend.c           |    4 ++-
 4 files changed, 98 insertions(+), 1 deletions(-)

diff --git a/drivers/base/power/generic_ops.c b/drivers/base/power/generic_ops.c
index 42f97f9..0976452 100644
--- a/drivers/base/power/generic_ops.c
+++ b/drivers/base/power/generic_ops.c
@@ -213,6 +213,22 @@ int pm_generic_restore(struct device *dev)
 	return __pm_generic_resume(dev, PM_EVENT_RESTORE);
+ * pm_generic_suspend_again - Generic suspend-again callback for subsystems.
+ * @dev: Device to check for suspend-again request.
+ */
+int pm_generic_suspend_again(struct device *dev)
+	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
+	if (!pm || !pm->suspend_again)
+		return 0;
+	return pm->suspend_again(dev);
 #endif /* CONFIG_PM_SLEEP */
 struct dev_pm_ops generic_subsys_pm_ops = {
@@ -223,6 +239,7 @@ struct dev_pm_ops generic_subsys_pm_ops = {
 	.thaw = pm_generic_thaw,
 	.poweroff = pm_generic_poweroff,
 	.restore = pm_generic_restore,
+	.suspend_again = pm_generic_suspend_again,
 	.runtime_suspend = pm_generic_runtime_suspend,
diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 052dc53..a5b659f 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1082,3 +1082,64 @@ int device_pm_wait_for_dev(struct device *subordinate, struct device *dev)
 	return async_error;
+ * dpm_suspend_again - Let devices poll or run code while the system is kept
+ *	suspended in the view of userland.
+ *
+ * Execute all "suspend_again" callbacks of devices and return non-zero value
+ * if all of the following conditions are met:
+ * 1) There is a device that wants to suspend again (returned > 0)
+ * 2) There is no device that wants to wake up (returned < 0)
+ */
+int dpm_suspend_again(void)
+	int suspend_again = 0;
+	int wakeup = 0;
+	struct device *dev;
+	mutex_lock(&dpm_list_mtx);
+	list_for_each_entry(dev, &dpm_list, power.entry) {
+		const struct dev_pm_ops *ops = NULL;
+		int result = 0;
+		get_device(dev);
+		mutex_unlock(&dpm_list_mtx);
+		device_lock(dev);
+		if (dev->type && dev->type->pm) {
+			ops = dev->type->pm;
+			dev_dbg(dev, "type suspend_again ");
+		} else if (dev->class && dev->class->pm) {
+			ops = dev->class->pm;
+			dev_dbg(dev, "class suspend_again ");
+		} else if (dev->bus && dev->bus->pm) {
+			ops = dev->bus->pm;
+			dev_dbg(dev, "suspend_again ");
+		}
+		if (ops && ops->suspend_again)
+			result = ops->suspend_again(dev);
+		device_unlock(dev);
+		mutex_lock(&dpm_list_mtx);
+		put_device(dev);
+		if (result > 0)
+			suspend_again++;
+		if (result < 0)
+			wakeup++;
+	}
+	mutex_unlock(&dpm_list_mtx);
+	pr_debug("%d devices wants to suspend again. "
+		 "%d devices wants to wakeup.\n", suspend_again, wakeup);
+	if (suspend_again && !wakeup) {
+		pr_info("Suspending Again.\n");
+		return 1;
+	}
+	pr_info("No devices wants to suspend again or a device wants to wakeup.\n");
+	return 0;
diff --git a/include/linux/pm.h b/include/linux/pm.h
index 512e091..d3a9c67 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -145,6 +145,15 @@ typedef struct pm_message {
  *	actions required for resuming the device that need interrupts to be
  *	disabled
+ * @suspend_again: Tell the system whether the device wants to either
+ *	suspend again (return > 0), wake up (return < 0), or not-care
+ *	(return = 0). If a device wants to poll sensors or execute some code
+ *	during suspended, suspend_again callback is the place assuming that
+ *	periodic-wakeup or alarm-wakeup is already setup. Note that if a
+ *	device returns "wakeup" with an under-zero value, it overrides
+ *	other devices' "suspend again" return values. This allows to
+ *	execute some codes while being kept suspended in the view of userland.
+ *
  * @freeze_noirq: Complete the operations of ->freeze() by carrying out any
  *	actions required for freezing the device that need interrupts to be
  *	disabled
@@ -212,6 +221,7 @@ struct dev_pm_ops {
 	int (*restore)(struct device *dev);
 	int (*suspend_noirq)(struct device *dev);
 	int (*resume_noirq)(struct device *dev);
+	int (*suspend_again)(struct device *dev);
 	int (*freeze_noirq)(struct device *dev);
 	int (*thaw_noirq)(struct device *dev);
 	int (*poweroff_noirq)(struct device *dev);
@@ -544,6 +554,7 @@ extern void dpm_resume_end(pm_message_t state);
 extern void device_pm_unlock(void);
 extern int dpm_suspend_noirq(pm_message_t state);
 extern int dpm_suspend_start(pm_message_t state);
+extern int dpm_suspend_again(void);
 extern void __suspend_report_result(const char *function, void *fn, int ret);
@@ -563,6 +574,11 @@ static inline int dpm_suspend_start(pm_message_t state)
 	return 0;
+static int dpm_suspend_again(void)
+	return 0;
 #define suspend_report_result(fn, ret)		do {} while (0)
 static inline int device_pm_wait_for_dev(struct device *a, struct device *b)
@@ -585,5 +601,6 @@ extern int pm_generic_freeze(struct device *dev);
 extern int pm_generic_thaw(struct device *dev);
 extern int pm_generic_restore(struct device *dev);
 extern int pm_generic_poweroff(struct device *dev);
+extern int pm_generic_suspend_again(struct device *dev);
 #endif /* _LINUX_PM_H */
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index 08515b4..b5a84a4 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -291,7 +291,9 @@ int enter_state(suspend_state_t state)
 		goto Finish;
 	pr_debug("PM: Entering %s sleep\n", pm_states[state]);
-	error = suspend_devices_and_enter(state);
+	do {
+		error = suspend_devices_and_enter(state);
+	} while (!error && dpm_suspend_again());
 	pr_debug("PM: Finishing wakeup.\n");

More information about the linux-pm mailing list