[PATCH 1/4] proc.5: Document /proc/[pid]/uid_map and /proc/[pid]/gid_map

Tue Jan 1 10:12:10 UTC 2013

"Michael Kerrisk (man-pages)" <mtk.manpages at gmail.com> writes:

> Hi Eric,
>
> On Fri, Dec 28, 2012 at 10:20 PM, Eric W. Biederman
> <ebiederm at xmission.com> wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages at gmail.com> writes:
>
> [...]
>
>>>> For writing you are correct about the mapping to the parent (but that is
>>>> not an exception that is a restriction on who can write to the file).
>>>
>>> So, by the way, I added this sentence to the page:
>>>
>>>               In   order   to   write   to   the   /proc/[pid]/uid_map
>>>               (/proc/[pid]/gid_map) file,  a  process  must  have  the
>>>               CAP_SETUID (CAP_SETGID) capability in the user namespace
>>>               of the process pid.
>>>
>>> Is that correct?
>>
>> Yes.
>>
>>> But, there appear to be more rules than this governing whether a
>>> process can write to the file (i.e., various other -EPERM cases). What
>>> are the rules?
>>
>> In general you must also have CAP_SETUID (CAP_SETGID) in the parent user
>> namespace as well.  The one exception to that is if you are mapping
>> your current uid and gid.
>
> Can you clarify what you mean by "mapping your own UID and GID" please
> (i.e., who is "you" in that sentence).

At the time of clone() or unshare() that creates a new user namespace,
the kuid and the kgid of the process does not change.

setuid and setgid fail before any mappings are set up.

Therefore the caller is allowed to map any single uid to the uid of the
caller in the parent user namespace.  Likewise the caller is allowed to
map any single gid to the gid of the caller in the parent user
namespace.

>> A rose by any other name will smell as
>> sweet.  In practice this means you must be root to map to uid or gids
>> other than your own, which preserves the current limits on setuid and
>> setgid.
>>
>> Additionally the writer must see the map file with the lower user
>> namespace being the parent user namespace.  Which means you must be
>> inside the user namespace itself or in the parent user namespace to
>> write to the user namespaces mapping file.
>
> Okay -- I added some words on this point.
>
>> For /proc/[pid]/projid_map which will be interesting once xfs
>> has kuid/kgid support there are no capability checks because xfs let's
>> anyone have any projid.
>>
>> This is one of the few cases where it almost matters to understand
>> how ns_capable works when you are not in the user namespace in question,
>> and that goes to what is a parent user namespace.  If you would like
>> some more detail on that please ask.
>>
>>>> The complete rule is for the user namespace of the second value is:
>>>>
>>>> - If the user namespace of the opener of the file and the user namespace
>>>>   of the process do not match.  The user namespace of the opener of the
>>>>   file is used.
>>>>
>>>> - If the user namespace of the opener of the file and the user namespace
>>>>   of the process are the same.  The parent user namespace of the process
>>>>   is used for the second value.
>>>
>>> Could you give an example of the last case? (What I'm really seeking,
>>> I think, is clarification of "parent user namespace". Does that mean
>>> "user namespace of the process that created the user namespace of this
>>> process"?)
>>
>> User namespaces form a tree.  What you can do in one user namespace is a
>> subset of what you can do in the parent user namespace.
>>
>> The parent user namespace is the user namespace of the process that
>> calls unshare or clone with CLONE_NEWUSER.
>
> Thanks.
>
>> The last case is the common case of /proc/self/uid_map.  And you see how
>> your uids map into the user namespace of the creator of your user
>> namespace.
>
> Okay -- got it now.
>
>> With the default being just:         0          0 4294967295
>
> Right.
>
>>>> While very wordy I think the rule makes a lot of intuitive and practical
>>>> sense.  Especially since it is non-trivial to come up with the chain of
>>>> user namespaces a process is in.
>
> Yes, I see what you mean.
>
> [...]
>
>> Thank you very much for your time and patience in getting a good
>> description of the user namespace.
>
> Well, we're not done yet, but we're getting there. Below, I've pasted
> the current text from proc(5). Could you please take a look, and let
> me know of any errors or improvements.
>
> Cheers,
>
> Michael
>
>        /proc/[pid]/uid_map, /proc/[pid]/gid_map (since Linux 3.5)
>               These  files  expose the mappings for user and group IDs
>               inside the user namespace  for  the  process  pid.   The
>               description  here  explains  the  details  for  uid_map;
>               gid_map is exactly the same, but each instance of  "user
>               ID" is replaced by "group ID".
>
>               The  uid_map  file  exposes the mapping of user IDs from
>               the user namespace of the process pid to the user names‐
>               pace of the process that opened uid_map (but see a qual‐
>               ification to this point below).  In  other  words,  pro‐
>               cesses that are in different user namespaces will poten‐
>               tially see different values when reading from a particu‐
>               lar  uid_map file, depending on the user ID mappings for
>               the user namespaces of the reading processes.
>
>               Each line in the file specifies a 1-to-1  mapping  of  a
>               range  of  contiguous  between two user namespaces.  The
>               specification in each line takes the form of three  num‐
>               bers  delimited  by  white space.  The first two numbers
>               specify the starting user ID  in  each  user  namespace.
>               The  third  number  specifies  the  length of the mapped
>               range.  In detail, the fields are  interpreted  as  fol‐
>               lows:
>
>               (1) The  start  of  the  range  of  user IDs in the user
>                   namespace of the process pid.
>
>               (2) The start of the range of user IDs to which the user
>                   IDs  specified  by  field one map.  How field two is
>                   interpreted depends  on  whether  the  process  that
>                   opened  uid_map  and the process pid are in the same
>                   user namespace, as follows:
>
>                   a) If the two processes are in different user names‐
>                      paces:  field two is the start of a range of user
>                      IDs in the user namespace  of  the  process  that
>                      opened uid_map.
>
>                   b) If  the two processes are in the same user names‐
>                      pace: field two is the start of the range of user
>                      IDs  in  the parent user namespace of the process
>                      pid.  (The "parent user namespace"  is  the  user
>                      namespace  of  the  process  that  created a user
>                      namespace via a call to  unshare(2)  or  clone(2)
>                      with  the CLONE_NEWUSER flag.)  This case enables
>                      the opener of uid_map (the common  case  here  is
>                      opening /proc/self/uid_map) to see the mapping of
>                      user IDs into the user namespace of  the  process
>                      that created this user namespace.
>
>               (3) The  length  of the range of user IDs that is mapped
>                   between the two user namespaces.
>
>               After the creation of a new user namespace, the  uid_map
>               file  may be written to exactly once to specify the map‐
>               ping of user IDs in the new user namespace.  (An attempt
>               to write more than once to the file fails with the error
>               EPERM.)
>
>               The lines written to uid_map must conform to the follow‐
>               ing rules:
>
>               *  The  three fields must be valid numbers, and the last
>                  field must be greater than 0.
>
>               *  Lines are terminated by newline characters.
>
>               *  There is an (arbitrary) limit on the number of  lines
>                  in  the  file.   As  at  Linux 3.8, the limit is five
>                  lines.
>
>               *  The range of user IDs specified in each  line  cannot
>                  overlap  with  the ranges in any other lines.  In the
>                  current implementation (Linux 3.8), this  requirement
>                  is  satisified  by  a  simplistic implementation that
>                  imposes the further requirement that  the  values  in
>                  both  field 1 and field 2 of successive lines must be
>                  in ascending numerical order.
>
>               Writes that violate the above rules fail with the  error
>               EINVAL.
>
>               In    order    for   a   process   to   write   to   the
>               /proc/[pid]/uid_map (/proc/[pid]/gid_map) file, the fol‐
>               lowing requirements must be met:
>
>               *  The  process  must  have  the CAP_SETUID (CAP_SETGID)
>                  capability in the user namespace of the process pid.
>
>               *  The process must  have  the  CAP_SETUID  (CAP_SETGID)
>                  capability in the parent user namespace.
>
>               *  The  process  must be in either the user namespace of
>                  the process pid or inside the parent  user  namespace
>                  of the process pid.

That sounds right.

In addition /proc/[pid]/projid_map was added in 3.7, and obeys the same
rules except that there are no capabilities required to set the mapping.

I suspect it is probably easier to add a quick mention of projid_map
instead of repeating all of the text bug I could be wrong.  In any event
I will leave off with projid_map until we get the uid_map and gid_map
ext solid.

Eric