[PATCH 11/11][v3]: Enable multiple instances of devpts

sukadev at us.ibm.com sukadev at us.ibm.com
Wed Sep 3 22:35:51 PDT 2008


From: Sukadev Bhattiprolu <sukadev at us.ibm.com>
Subject: [PATCH 11/11]: Enable multiple instances of devpts

To support containers, allow multiple instances of devpts filesystem.
such that indices of ptys allocated in one instance are independent
of ptys allocated in other instances of devpts.

But to preserve backward compatibility, enable this support for multiple
instances under the new mount option, '-o newinstance'.

IOW, devpts must support both single-mount and multiple-mount semantics.
If the filesystem is mounted without the 'newinstance' option (as in current
start-up scripts) the new mount simply binds to the initial kernel mount
of devpts and thus current behavior is preserved.

If the 'newinstance' option is specified (by new startup scripts) a new
instance of the devpts fs is created and any ptys created in this instance
are independent of the ptys in other mounts of devpts.

Eg: A container startup script could do the following:

	$ ns_exec -cm /bin/bash
	$ umount /dev/pts
	$ mount -t devpts -o newinstance lxcpts /dev/pts
	$ mount -o bind /dev/pts/ptmx /dev/ptmx
	$ sshd -p 6710

where 'ns_exec -cm /bin/bash' is calls clone() with CLONE_NEWNS flag
and execs /bin/bash in the child process.  A pty created by the sshd
is not visible in the original mount of /dev/pts.

USER-SPACE-IMPACT:

	In the 'legacy mode' (i.e '-o newinstance' option is never specified),
	there should be no change in behavior.
	
	In multi-instance mode (i.e '-o newinstance mount option is specified
	at least once) following user-space issues should be noted.
	
	1. The multi-instance mounts have a 'ptmx' node created/destroyed
	   automatically when devpts is mounted/unmounted. The legacy-mode
	   mounts do not have this node.

	2. To effectively use the multi-instance mode, applications/libraries
	    should, open "/dev/pts/ptmx" instead of "/dev/ptmx" but obviously
	    this would fail in the legacy mode.
	    
	    To work in either legacy or multi-instance mode, applications
	    could replace:

		master_fd = open("/dev/ptmx", flags);

	    with

		if (access("/dev/pts/ptmx", A_OK))
			master_fd = open("/dev/pts/ptmx", flags);
		else
			master_fd = open("/dev/ptmx", flags);

	   To maintain backward compatibility, administrators or startup
	   scripts can "redirect" open of /dev/ptmx to /dev/pts/ptmx in
	   multi-instance mode using a bind mount.

		mount -t devpts -o newinstance devpts /dev/pts
		mount -o bind /dev/pts/ptmx /dev/ptmx

	3. A multi-instance mount that is not accompanied by above bind mount
	   would result in an unusable/unreachable tty to applications that
	   open "/dev/ptmx". i.e

	   	mount -t devpts -o newinstance lxcpts /dev/pts

	   followed by:

		open("/dev/ptmx")

	    would create a pty, say /dev/pts/7, in the initial kernel mount.
	    But /dev/pts/7 would be invisible in the new mount.

	    TODO:

		- We need to document this clearly somewhere (or can the kernel
		  automatically establish the bind mount).
	   
	4. The permissions for "/dev/pts/ptmx" node should be specified when
	   mounting /dev/pts, using the '-o ptmxmode=%o' mount option (default
	   is 0666). 
	   
	   	mount -t devpts -o newinstance -o ptmxmode=0644 devpts /dev/pts

	   The permissions can be later be changed as usual with 'chmod'.

	   	chmod 666 /dev/pts/ptmx

TODO:

	- Document impact of not bind mounting /dev/pts/ptmx after a 
	  multi-instance mount

	- Can we print some friendly message either from kernel or in common
	  user-space commands when this disconnect happens ?

	  
Implementation note:

	See comments in new get_sb_ref() function in fs/super.c on why
	get_sb_single() cannot be directly used.

Changelog[v3]:
	- Rename new mount option to 'newinstance'

	- Create ptmx nodes only in 'newinstance' mounts

	- Bugfix: parse_mount_options() modifies @data but since we need to
	  parse the @data twice (once in devpts_get_sb() and once during
	  do_remount_sb()), parse a local copy of @data in devpts_get_sb().
	  (restructured code in devpts_get_sb() to fix this)

Changelog[v2]:
	- Support both single-mount and multiple-mount semantics and
	  provide '-onewmnt' option to select the semantics.

Signed-off-by: Sukadev Bhattiprolu <sukadev at us.ibm.com>

---
 fs/devpts/inode.c  |  162 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 fs/super.c         |   43 ++++++++++++++
 include/linux/fs.h |    2 
 3 files changed, 203 insertions(+), 4 deletions(-)

Index: linux-2.6.27-rc3-tty/fs/devpts/inode.c
===================================================================
--- linux-2.6.27-rc3-tty.orig/fs/devpts/inode.c	2008-09-03 21:34:36.000000000 -0700
+++ linux-2.6.27-rc3-tty/fs/devpts/inode.c	2008-09-03 21:53:59.000000000 -0700
@@ -42,10 +42,11 @@ struct pts_mount_opts {
 	gid_t   gid;
 	umode_t mode;
 	umode_t ptmxmode;
+	int newinstance;
 };
 
 enum {
-	Opt_uid, Opt_gid, Opt_mode, Opt_ptmxmode,
+	Opt_uid, Opt_gid, Opt_mode, Opt_ptmxmode, Opt_newinstance,
 	Opt_err
 };
 
@@ -54,6 +55,7 @@ static match_table_t tokens = {
 	{Opt_gid, "gid=%u"},
 	{Opt_mode, "mode=%o"},
 	{Opt_ptmxmode, "ptmxmode=%o"},
+	{Opt_newinstance, "newinstance"},
 	{Opt_err, NULL}
 };
 
@@ -85,6 +87,7 @@ static int parse_mount_options(char *dat
 	opts->gid     = 0;
 	opts->mode    = DEVPTS_DEFAULT_MODE;
 	opts->ptmxmode = DEVPTS_DEFAULT_PTMX_MODE;
+	opts->newinstance = 0;
 
 	while ((p = strsep(&data, ",")) != NULL) {
 		substring_t args[MAX_OPT_ARGS];
@@ -118,6 +121,9 @@ static int parse_mount_options(char *dat
 				return -EINVAL;
 			opts->ptmxmode = option & S_IALLUGO;
 			break;
+		case Opt_newinstance:
+			opts->newinstance = 1;
+			break;
 		default:
 			printk(KERN_ERR "devpts: called with bogus options\n");
 			return -EINVAL;
@@ -127,6 +133,53 @@ static int parse_mount_options(char *dat
 	return 0;
 }
 
+/*
+ * Safely parse the mount options in @data and update @opts.
+ *
+ * devpts ends up parsing options several times during mount, due to the
+ * two modes of operation it supports.
+ *
+ * The initial mount of single-instance mode parses options twwo times:
+ * 	- in devpts_get_sb() to determine the type of mount
+ * 	- in devpts_remount (when get_sb_single() calls do_remount_sb())
+ *
+ * Subsequent mounts in single-instance mode parses options two times:
+ * 	- in devpts_get_sb() to determine type of mount
+ * 	- in devpts_remount (when get_sb_single() calls do_remount_sb())
+ *
+ * Multi-instance mount parses options two times:
+ * 	- in devpts_get_sb() to determine type of mount
+ * 	- in new_pts_mount() to record options
+ *
+ * Since the locations that we parse the options can occur from more than
+ * one place, there does not seem to be a way to parse once and save/use
+ * the results.
+ *
+ * As if this was not messy enough, parsing of options modifies the @data
+ * making subsequent parsing incorrect. Hence the safe_parse_mount_options().
+ *
+ * Return: 0 On success, -errno on error
+ */
+static int safe_parse_mount_options(void *data, struct pts_mount_opts *opts)
+{
+	int rc;
+	void *datacp;
+
+	if (!data)
+		return 0;
+
+	/* Use kstrdup() ?  */
+	datacp = kmalloc(PAGE_SIZE, GFP_KERNEL);
+	if (!datacp)
+		return -ENOMEM;
+
+	memcpy(datacp, data, PAGE_SIZE);
+	rc = parse_mount_options((char *)datacp, opts);
+	kfree(datacp);
+
+	return rc;
+}
+
 static int devpts_remount(struct super_block *sb, int *flags, char *data)
 {
 	struct pts_fs_info *fsi = DEVPTS_SB(sb);
@@ -145,7 +198,10 @@ static int devpts_show_options(struct se
 	if (opts->setgid)
 		seq_printf(seq, ",gid=%u", opts->gid);
 	seq_printf(seq, ",mode=%03o", opts->mode);
-	seq_printf(seq, ",ptmxmode=%03o", opts->ptmxmode);
+	if (opts->newinstance) {
+		seq_printf(seq, ",ptmxmode=%03o", opts->ptmxmode);
+		seq_printf(seq, ",newinstance");
+	}
 
 	return 0;
 }
@@ -259,10 +315,107 @@ int mknod_ptmx(struct super_block *sb)
 	return 0;
 }
 
+/*
+ * Mount or remount the initial kernel mount of devpts. This type of
+ * mount maintains the legacy, single-instance semantics.
+ */
+static int init_pts_mount(struct file_system_type *fs_type, int flags,
+		void *data, struct vfsmount *mnt)
+{
+	int err;
+
+	if (!devpts_mnt) {
+		err = get_sb_single(fs_type, flags, data, devpts_fill_super,
+				mnt);
+		if (!err)
+			devpts_mnt = mnt;
+
+		return err;
+	}
+
+	return get_sb_ref(devpts_mnt->mnt_sb, flags, data, mnt);
+}
+
+/*
+ * Mount a new (private) instance of devpts. This is selected via
+ * the '-o newinstance' mount option and the PTYs created in this
+ * instance are independent of the PTYs in other devpts instances.
+ *
+ * This type of mount is used in containers to provide isolated PTYs.
+ */
+static int new_pts_mount(struct file_system_type *fs_type, int flags,
+		void *data, struct vfsmount *mnt)
+{
+	int err;
+	struct pts_fs_info *fsi;
+	struct pts_mount_opts *opts;
+
+	printk(KERN_NOTICE "devpts: newinstance mount\n");
+
+	err = get_sb_nodev(fs_type, flags, data, devpts_fill_super, mnt);
+	if (err)
+		return err;
+
+	/*
+	 * Parse mount options here rather than in devpts_fill_super()
+	 * to avoid unnecessary repetition of the parsing in single-
+	 * instance mode.
+	 */
+	fsi = DEVPTS_SB(mnt->mnt_sb);
+	opts = &fsi->mount_opts;
+
+	err = parse_mount_options(data, opts);
+	if (err)
+		goto fail;
+
+	err = mknod_ptmx(mnt->mnt_sb);
+	if (err)
+		goto fail;
+
+	return 0;
+
+fail:
+	dput(mnt->mnt_sb->s_root);
+	deactivate_super(mnt->mnt_sb);
+	return err;
+}
+
+/*
+ * Check if 'newinstance' mount option was specified in @data.
+ *
+ * Return: -errno  	on error (eg: invalid mount options specified)
+ * 	 : 1 		if 'newinstance' mount option was specified
+ * 	 : 0 		if 'newinstance' mount option was NOT specified
+ */
+static int is_new_instance_mount(void *data)
+{
+	int rc;
+	struct pts_mount_opts opts;
+
+	if (!data)
+		return 0;
+
+	rc = safe_parse_mount_options(data, &opts);
+	if (!rc)
+		rc = opts.newinstance;
+
+	return rc;
+}
+
+
 static int devpts_get_sb(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data, struct vfsmount *mnt)
 {
-	return get_sb_single(fs_type, flags, data, devpts_fill_super, mnt);
+	int new;
+
+	new = is_new_instance_mount(data);
+	if (new < 0)
+		return new;
+
+	if (new)
+		return new_pts_mount(fs_type, flags, data, mnt);
+
+	return init_pts_mount(fs_type, flags, data, mnt);
 }
 
 
@@ -393,8 +546,9 @@ void devpts_pty_kill(struct tty_struct *
 	if (dentry && !IS_ERR(dentry)) {
 		inode->i_nlink--;
 		d_delete(dentry);
-		dput(dentry);
+		dput(dentry);		// d_lookup in devpts_pty_new
 	}
+	dput(dentry);			// d_find_alias above
 
 	mutex_unlock(&root->d_inode->i_mutex);
 }
Index: linux-2.6.27-rc3-tty/fs/super.c
===================================================================
--- linux-2.6.27-rc3-tty.orig/fs/super.c	2008-09-03 21:28:11.000000000 -0700
+++ linux-2.6.27-rc3-tty/fs/super.c	2008-09-03 21:59:42.000000000 -0700
@@ -883,6 +883,49 @@ int get_sb_single(struct file_system_typ
 
 EXPORT_SYMBOL(get_sb_single);
 
+int get_sb_ref(struct super_block *sb, int flags, void *data,
+		struct vfsmount *mnt)
+{
+	int err;
+
+	/*
+	 * UGLY:
+	 *
+	 * This is needed to support multiple mounts in devpts while
+	 * preserving backward compatibility of the current 'single-mount'
+	 * semantics.
+	 *
+	 * devpts cannot simply use get_sb_single(), bc get_sb_single() or
+	 * more specifically, sget() finds the most recent mount of devpts.
+	 * But that recent mount may not the be initial kernel mount (user
+	 * may have mounted with the '-onewinstance' option since the initial
+	 * mount and get_sb_single() would pick that super-block).
+	 *
+	 * Assuming caller has a valid/initialized sb, unroll essentials of
+	 * get_sb_single() here.
+	 */
+	spin_lock(&sb_lock);
+
+	if (!grab_super(sb)) {
+		/*
+		 * TODO: anymore cleanup ?
+		 */
+		return -EAGAIN;
+	}
+
+	err = do_remount_sb(sb, flags, data, 0);
+	if (err) {
+		/*
+		 * (don't deactivate_super() here - its from initial pts mount)
+		 *
+		 * TODO: anymore cleanup ?
+		 */
+		up_write(&sb->s_umount);
+		return err;
+	}
+	return simple_set_mnt(mnt, sb);
+}
+
 struct vfsmount *
 vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void *data)
 {
Index: linux-2.6.27-rc3-tty/include/linux/fs.h
===================================================================
--- linux-2.6.27-rc3-tty.orig/include/linux/fs.h	2008-09-03 21:28:11.000000000 -0700
+++ linux-2.6.27-rc3-tty/include/linux/fs.h	2008-09-03 21:34:47.000000000 -0700
@@ -1516,6 +1516,8 @@ extern int get_sb_nodev(struct file_syst
 	int flags, void *data,
 	int (*fill_super)(struct super_block *, void *, int),
 	struct vfsmount *mnt);
+extern int get_sb_ref(struct super_block *sb, int flags, void *data,
+	struct vfsmount *mnt);
 void generic_shutdown_super(struct super_block *sb);
 void kill_block_super(struct super_block *sb);
 void kill_anon_super(struct super_block *sb);


More information about the Containers mailing list