[linux-pm] [RFC][PATCH] PM: Separate suspend and hibernation callbacks (highest level) - updated

Sun Mar 9 12:56:53 PDT 2008

On Sun, 9 Mar 2008, Rafael J. Wysocki wrote:

> > Didn't we decide it would be better to call @prepare in a separate
> > pass?
> 
> Afterwards I realized that it would be racy.  Namely, once a new device is
> registered, it may require ->prepare() to be called before ->suspend(), so if
> that happens after we've called ->prepare() for all devices, we'll be in
> trouble.

There's nothing wrong with doing it this way.  However the race you're 
worried about will never happen if drivers are written correctly.  Once 
prepare has been called for all devices, no new devices should be 
registered.

> > In any case, drivers aren't in a position to detect concurrent 
> > registrations.  The PM core shouldn't require them to return -EAGAIN; 
> > it can detect these registrations by itself.
> 
> In the appended new version of the patch, the core does it.  However, in theory
> there may be situations in which the core can't detect a race condition and
> recover from it, so I think it's reasonable to leave the possibility to signal
> such a condition by the driver.

I don't mind taking special action when drivers return -EAGAIN.  The 
point is that drivers shouldn't be _required_ to do it.

> > Finally, there needs to be a new "prepared" field added to dev_pm_info,
> > right next to the "sleeping" field.  Mention that it will be set prior
> > to calling @prepare and cleared prior to calling @complete.  Even
> > though it is read-only to drivers, they will find it useful.
> 
> Yes, I noticed that after sending the patch.  I handled it in a slightly
> different way, by converting 'sleeping' into 'status' that may assum one of
> the three values: DPM_ON, DPM_SUSPENDING, DPM_OFF.

Okay, that should work just as well.

> + * @prepare: Prepare the device for the upcoming transition, but do NOT change
> + *	its hardware state.  Prevent new children of the device from being
> + *	registered and prevent new calls to the probe method from being made
> + *	after @prepare() returns.  Also, if a concurrent child registration
> + *	or a call to the probe method already in progress is detected by
> + *	@prepare(), it may return -EAGAIN, so that the PM core can execute it
> + *	once again (e.g. after suspending the newly registered child) to recover
> + *	from the race condition. This method is executed for all kinds of
> + *	suspend transitions and is immediately followed by one of the suspend
> + *	callbacks: @suspend(), @freeze(), @poweroff(), or @quiesce().
> + *	[NOTE: The PM core is able to detect a registration of a new child of a
> + *	device concurrent with this device's @prepare() callback, in which case
> + *	@complete() will be called for the device and the core will try again to
> + *	suspend it later.  However, the core is NOT able to handle a call to
> + *	the probe method concurrent with @prepare().]

I think you can leave out this NOTE.  Direct probe method calls by 
drivers are pretty rare; almost all probing is done by the driver core.

> + * @complete: Undo the changes made by @prepare().  This method is executed for
> + *	all kinds of resume transitions, immediately following one of the resume
> + *	callbacks: @resume(), @thaw(), @restore(), or @recover().  Also executed
> + *	if a suspend callback (@suspend(), @freeze(), @poweroff(), @quiesce())
> + *	immediately following a successful @prepare() fails OR if @prepare()
> + *	returns -EAGAIN.

The last part is wrong.  If @prepare returns an error (such as -EAGAIN) 
then @complete shouldn't be called.  What you mean is that @complete 
will be invoked immediately following a successful @prepare if the PM 
core detects that a new child was registered while @prepare was 
running.

> +enum dpm_state {
> +	DPM_ON,
> +	DPM_SUSPENDING,
> +	DPM_OFF,
> +};

There should be kerneldoc for dpm_state.  Drivers may very well want to 
check dev->power.status to see whether a new child registration should 
be blocked.  The documentation should be very clear that 
dev->power.status doesn't describe the device's actual power state; 
rather it describes which method calls the PM core has made or is 
about to make.

> @@ -71,22 +71,29 @@ int device_pm_add(struct device *dev)
>  		 dev->bus ? dev->bus->name : "No Bus",
>  		 kobject_name(&dev->kobj));
>  	mutex_lock(&dpm_list_mtx);
> -	if ((dev->parent && dev->parent->power.sleeping) || all_sleeping) {
> -		if (dev->parent->power.sleeping)
> -			dev_warn(dev,
> -				"parent %s is sleeping, will not add\n",
> +	if (all_sleeping) {
> +		dev_warn(dev, "devices are sleeping, will not add\n");
> +		goto Failure;
> +	}
> +	if (dev->parent) {
> +		if (dev->parent->power.status == DPM_OFF) {
> +			dev_warn(dev, "parent %s is sleeping, will not add\n",
>  				dev->parent->bus_id);
> -		else
> -			dev_warn(dev, "devices are sleeping, will not add\n");
> -		WARN_ON(true);
> -		error = -EBUSY;
> -	} else {
> -		error = dpm_sysfs_add(dev);
> -		if (!error)
> -			list_add_tail(&dev->power.entry, &dpm_active);
> +			goto Failure;
> +		} else if (dev->parent->power.status == DPM_SUSPENDING) {
> +			dev->parent->power.status == DPM_ON;
> +		}

This part puzzled me until I read the rest of the patch.  It's a good
way to look specifically for new _child_ registrations as opposed to
any new registrations.

> @@ -418,22 +629,37 @@ static int dpm_suspend(pm_message_t stat
>  		struct list_head *entry = dpm_active.prev;
>  		struct device *dev = to_device(entry);
>  
> -		WARN_ON(dev->parent && dev->parent->power.sleeping);
> +		WARN_ON(dev->parent && dev->parent->power.status == DPM_OFF);
> +		dev->power.status = DPM_SUSPENDING;
> +		mutex_unlock(&dpm_list_mtx);
> +
> +		error = prepare_device(dev, state);
> +
> +		mutex_lock(&dpm_list_mtx);
> +		if (error == -EAGAIN)
> +			dev->power.status = DPM_ON;
> +		else if(error)
> +			break;

The last four lines should be:

		if (error) {
			dev->power.status = DPM_ON;
			if (error == -EAGAIN)
				continue;
			break;
		}

Insert:		/* Was a child added during prepare_device()? */
> +		if (dev->power.status == DPM_ON) {
> +			mutex_unlock(&dpm_list_mtx);
> +
> +			complete_device(dev, resume_event(state));
>  
> -		dev->power.sleeping = true;
> +			mutex_lock(&dpm_list_mtx);
> +			continue;
> +		}
> +		dev->power.status = DPM_OFF;
>  		mutex_unlock(&dpm_list_mtx);
> +
>  		error = suspend_device(dev, state);
> +
>  		mutex_lock(&dpm_list_mtx);
>  		if (error) {
> -			printk(KERN_ERR "Could not suspend device %s: "
> -					"error %d%s\n",
> -					kobject_name(&dev->kobj),
> -					error,
> -					(error == -EAGAIN ?
> -					" (please convert to suspend_late)" :
> -					""));
> -			dev->power.sleeping = false;
> -			break;
> +			dev->power.status = DPM_ON;
> +			mutex_unlock(&dpm_list_mtx);
> +
> +			complete_device(dev, resume_event(state));
> +			return error;

Early return?  I guess it's okay...

>  		}
>  		if (!list_empty(&dev->power.entry))
>  			list_move(&dev->power.entry, &dpm_off);

Looks good.

Alan Stern