[PATCH 5/6] c/r: correctly restore pgid

Oren Laadan orenl at librato.com
Tue Sep 8 09:40:46 PDT 2009



Serge E. Hallyn wrote:
> Quoting Oren Laadan (orenl at librato.com):
>>
>> Serge E. Hallyn wrote:
>>> Quoting Oren Laadan (orenl at librato.com):
>>>> The main challenge with restoring the pgid of tasks is that the
>>>> original "owner" (the process with that pid) might have exited
>>>> already. I call these "ghost" pgids. 'mktree' does create these
>>>> processes, but they then exit without participating in the restart.
>>>>
>>>> To solve this, this patch introduces a RESTART_GHOST flag, used for
>>>> "ghost" owners that are created only to pass their pgid to other
>>>> tasks. ('mktree' now makes them call restart(2) instead of exiting).
>>>>
>>>> When a "ghost" task calls restart(2), it will be placed on a wait
>>>> queue until the restart completes and then exit. This guarantees that
>>>> the pgid that it owns remains available for all (regular) restarting
>>>> tasks for when they need it.
>>>>
>>>> Regular tasks perform the restart as before, except that they also
>>>> now restore their old pgrp, which is guaranteed to exist.
>>>>
>>>> Changelog [v1]:
>>>>   - Verify that pgid owner is a thread-group-leader.
>>>>   - Handle the case of pgid/sid == 0 using root's parent pid-ns
>>>>
>>>> Signed-off-by: Oren Laadan <orenl at cs.colubmia.edu>
>>>> ---
>>>>  checkpoint/process.c             |  106 ++++++++++++++++++++++++-
>>>>  checkpoint/restart.c             |  158 ++++++++++++++++++++++++++------------
>>>>  checkpoint/sys.c                 |    3 +-
>>>>  include/linux/checkpoint.h       |   11 ++-
>>>>  include/linux/checkpoint_hdr.h   |    3 +
>>>>  include/linux/checkpoint_types.h |    6 +-
>>>>  6 files changed, 230 insertions(+), 57 deletions(-)
>>>>
>>>> diff --git a/checkpoint/process.c b/checkpoint/process.c
>>>> index 40b2580..5d6bdb9 100644
>>>> --- a/checkpoint/process.c
>>>> +++ b/checkpoint/process.c
>>>> @@ -23,6 +23,57 @@
>>>>  #include <linux/syscalls.h>
>>>>
>>>>
>>>> +pid_t ckpt_pid_nr(struct ckpt_ctx *ctx, struct pid *pid)
>>>> +{
>>>> +	return pid ? pid_nr_ns(pid, ctx->root_nsproxy->pid_ns) : CKPT_PID_NULL;
>>>> +}
>>>> +
>>>> +/* must be called with tasklist_lock or rcu_read_lock() held */
>>>> +struct pid *_ckpt_find_pgrp(struct ckpt_ctx *ctx, pid_t pgid)
>>>> +{
>>>> +	struct task_struct *p;
>>>> +	struct pid *pgrp;
>>>> +
>>>> +	if (pgid == 0) {
>>>> +		/*
>>>> +		 * At checkpoint the pgid owner lived in an ancestor
>>>> +		 * pid-ns. The best we can do (sanely and safely) is
>>>> +		 * to examine the parent of this restart's root: if in
>>>> +		 * a distinct pid-ns, use its pgrp; otherwise fail.
>>>> +		 */
>>>> +		p = ctx->root_task->real_parent;
>>>> +		if (p->nsproxy->pid_ns == current->nsproxy->pid_ns)
>>>> +			return NULL;
>>>> +		pgrp = task_pgrp(p);
>>>> +	} else {
>>>> +		/*
>>>> +		 * Find the owner process of this pgid (it must exist
>>>> +		 * if pgrp exists). It must be a thread group leader.
>>>> +		 */
>>>> +		pgrp = find_vpid(pgid);
>>>> +		p = pid_task(pgrp, PIDTYPE_PID);
>>>> +		if (!p || !thread_group_leader(p))
>>>> +			return NULL;
>>>> +		/*
>>>> +		 * The pgrp must "belong" to our restart tree (compare
>>>> +		 * p->checkpoint_ctx to ours). This prevents malicious
>>>> +		 * input from (guessing and) using unrelated pgrps. If
>>>> +		 * the owner is dead, then it doesn't have a context,
>>>> +		 * so instead compare against its (real) parent's.
>>>> +		 */
>>>> +		if (p->exit_state == EXIT_ZOMBIE)
>>>> +			p = p->real_parent;
>>>> +		if (p->checkpoint_ctx != ctx)
>>>> +			return NULL;
>>>> +	}
>>>> +
>>>> +	if (task_session(current) != task_session(p))
>>>> +		return NULL;
>>>> +
>>>> +	return pgrp;
>>>> +}
>>>> +
>>>> +
>>>>  #ifdef CONFIG_FUTEX
>>>>  static void save_task_robust_futex_list(struct ckpt_hdr_task *h,
>>>>  					struct task_struct *t)
>>>> @@ -94,8 +145,8 @@ static int checkpoint_task_struct(struct ckpt_ctx *ctx, struct task_struct *t)
>>>>  		h->exit_signal = t->exit_signal;
>>>>  		h->pdeath_signal = t->pdeath_signal;
>>>>
>>>> -		h->set_child_tid = t->set_child_tid;
>>>> -		h->clear_child_tid = t->clear_child_tid;
>>>> +		h->set_child_tid = (unsigned long) t->set_child_tid;
>>> note that set_child_tid is an int (signed), not a long.  Same on
>>> x86, but not on other arches.  Shouldn't lose info so could be worse.
>> {set,clear}_child_tid are both pointers to user space: it's an address
>> in userspace, so we save it as 'unsigned long'.
>>
>> {clear,set}_child_tid is defined in include/linux/sched.h ... how can
>> it differ for different archs ?
> 
> sizeof long differs for different archs.  Not the type of x_child_tid.

Sure. In all ckpt headers, all pointers allway get a __u64 regardless
of arch, to cover both 32- and 64-bit.

Oren.

> 
>>> On the whole,
>>>
>>> Acked-by: Serge Hallyn <serue at us.ibm.com>
>> Thanks. I got a few fixes for the code piles up and now c/r of 'screen'
>> with a couple of shells is working :)
> 
> Cool!


More information about the Containers mailing list