1. 23 4月, 2008 3 次提交
    • M
      [patch 5/7] vfs: mountinfo: allow using process root · a1a2c409
      Miklos Szeredi 提交于
      Allow /proc/<pid>/mountinfo to use the root of <pid> to calculate
      mountpoints.
      
       - move definition of 'struct proc_mounts' to <linux/mnt_namespace.h>
       - add the process's namespace and root to this structure
       - pass a pointer to 'struct proc_mounts' into seq_operations
      
      In addition the following cleanups are made:
      
       - use a common open function for /proc/<pid>/{mounts,mountstat}
       - surround namespace.c part of these proc files with #ifdef CONFIG_PROC_FS
       - make the seq_operations structures const
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a1a2c409
    • M
      [patch 4/7] vfs: mountinfo: add mount peer group ID · 719f5d7f
      Miklos Szeredi 提交于
      Add a unique ID to each peer group using the IDR infrastructure.  The
      identifiers are reused after the peer group dissolves.
      
      The IDR structures are protected by holding namepspace_sem for write
      while allocating or deallocating IDs.
      
      IDs are allocated when a previously unshared vfsmount becomes the
      first member of a peer group.  When a new member is added to an
      existing group, the ID is copied from one of the old members.
      
      IDs are freed when the last member of a peer group is unshared.
      
      Setting the MNT_SHARED flag on members of a subtree is done as a
      separate step, after all the IDs have been allocated.  This way an
      allocation failure can be cleaned up easilty, without affecting the
      propagation state.
      
      Based on design sketch by Al Viro.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      719f5d7f
    • M
      [patch 3/7] vfs: mountinfo: add mount ID · 73cd49ec
      Miklos Szeredi 提交于
      Add a unique ID to each vfsmount using the IDR infrastructure.  The
      identifiers are reused after the vfsmount is freed.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      73cd49ec
  2. 22 4月, 2008 3 次提交
  3. 19 4月, 2008 3 次提交
    • D
      [PATCH] r/o bind mounts: honor mount writer counts at remount · 2e4b7fcd
      Dave Hansen 提交于
      Originally from: Herbert Poetzl <herbert@13thfloor.at>
      
      This is the core of the read-only bind mount patch set.
      
      Note that this does _not_ add a "ro" option directly to the bind mount
      operation.  If you require such a mount, you must first do the bind, then
      follow it up with a 'mount -o remount,ro' operation:
      
      If you wish to have a r/o bind mount of /foo on bar:
      
      	mount --bind /foo /bar
      	mount -o remount,ro /bar
      Acked-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2e4b7fcd
    • D
      [PATCH] r/o bind mounts: track numbers of writers to mounts · 3d733633
      Dave Hansen 提交于
      This is the real meat of the entire series.  It actually
      implements the tracking of the number of writers to a mount.
      However, it causes scalability problems because there can be
      hundreds of cpus doing open()/close() on files on the same mnt at
      the same time.  Even an atomic_t in the mnt has massive scalaing
      problems because the cacheline gets so terribly contended.
      
      This uses a statically-allocated percpu variable.  All want/drop
      operations are local to a cpu as long that cpu operates on the same
      mount, and there are no writer count imbalances.  Writer count
      imbalances happen when a write is taken on one cpu, and released
      on another, like when an open/close pair is performed on two
      
      Upon a remount,ro request, all of the data from the percpu
      variables is collected (expensive, but very rare) and we determine
      if there are any outstanding writers to the mount.
      
      I've written a little benchmark to sit in a loop for a couple of
      seconds in several cpus in parallel doing open/write/close loops.
      
      http://sr71.net/~dave/linux/openbench.c
      
      The code in here is a a worst-possible case for this patch.  It
      does opens on a _pair_ of files in two different mounts in parallel.
      This should cause my code to lose its "operate on the same mount"
      optimization completely.  This worst-case scenario causes a 3%
      degredation in the benchmark.
      
      I could probably get rid of even this 3%, but it would be more
      complex than what I have here, and I think this is getting into
      acceptable territory.  In practice, I expect writing more than 3
      bytes to a file, as well as disk I/O to mask any effects that this
      has.
      
      (To get rid of that 3%, we could have an #defined number of mounts
      in the percpu variable.  So, instead of a CPU getting operate only
      on percpu data when it accesses only one mount, it could stay on
      percpu data when it only accesses N or fewer mounts.)
      
      [AV] merged fix for __clear_mnt_mount() stepping on freed vfsmount
      Acked-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3d733633
    • D
      [PATCH] r/o bind mounts: stub functions · 8366025e
      Dave Hansen 提交于
      This patch adds two function mnt_want_write() and mnt_drop_write().  These are
      used like a lock pair around and fs operations that might cause a write to the
      filesystem.
      
      Before these can become useful, we must first cover each place in the VFS
      where writes are performed with a want/drop pair.  When that is complete, we
      can actually introduce code that will safely check the counts before allowing
      r/w<->r/o transitions to occur.
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Acked-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8366025e
  4. 28 3月, 2008 5 次提交
  5. 15 2月, 2008 6 次提交
  6. 09 2月, 2008 2 次提交
  7. 07 2月, 2008 1 次提交
  8. 25 1月, 2008 2 次提交
  9. 21 10月, 2007 1 次提交
  10. 20 10月, 2007 1 次提交
    • P
      pid namespaces: introduce MS_KERNMOUNT flag · 8bf9725c
      Pavel Emelyanov 提交于
      This flag tells the .get_sb callback that this is a kern_mount() call so that
      it can trust *data pointer to be valid in-kernel one.  If this flag is passed
      from the user process, it is cleared since the *data pointer is not a valid
      kernel object.
      
      Running a few steps forward - this will be needed for proc to create the
      superblock and store a valid pid namespace on it during the namespace
      creation.  The reason, why the namespace cannot live without proc mount is
      described in the appropriate patch.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8bf9725c
  11. 17 10月, 2007 1 次提交
  12. 20 7月, 2007 1 次提交
    • P
      mm: Remove slab destructors from kmem_cache_create(). · 20c2df83
      Paul Mundt 提交于
      Slab destructors were no longer supported after Christoph's
      c59def9f change. They've been
      BUGs for both slab and slub, and slob never supported them
      either.
      
      This rips out support for the dtor pointer from kmem_cache_create()
      completely and fixes up every single callsite in the kernel (there were
      about 224, not including the slab allocator definitions themselves,
      or the documentation references).
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      20c2df83
  13. 17 7月, 2007 4 次提交
  14. 09 5月, 2007 4 次提交
    • M
      check privileges before setting mount propagation · ee6f9582
      Miklos Szeredi 提交于
      There's a missing check for CAP_SYS_ADMIN in do_change_type().
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ee6f9582
    • P
      Introduce a handy list_first_entry macro · b5e61818
      Pavel Emelianov 提交于
      There are many places in the kernel where the construction like
      
         foo = list_entry(head->next, struct foo_struct, list);
      
      are used.
      The code might look more descriptive and neat if using the macro
      
         list_first_entry(head, type, member) \
                   list_entry((head)->next, type, member)
      
      Here is the macro itself and the examples of its usage in the generic code.
       If it will turn out to be useful, I can prepare the set of patches to
      inject in into arch-specific code, drivers, networking, etc.
      Signed-off-by: NPavel Emelianov <xemul@openvz.org>
      Signed-off-by: NKirill Korotaev <dev@openvz.org>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: John McCutchan <ttb@tentacle.dhs.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: john stultz <johnstul@us.ibm.com>
      Cc: Ram Pai <linuxram@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5e61818
    • M
      add filesystem subtype support · 79c0b2df
      Miklos Szeredi 提交于
      There's a slight problem with filesystem type representation in fuse
      based filesystems.
      
      From the kernel's view, there are just two filesystem types: fuse and
      fuseblk.  From the user's view there are lots of different filesystem
      types.  The user is not even much concerned if the filesystem is fuse based
      or not.  So there's a conflict of interest in how this should be
      represented in fstab, mtab and /proc/mounts.
      
      The current scheme is to encode the real filesystem type in the mount
      source.  So an sshfs mount looks like this:
      
        sshfs#user@server:/   /mnt/server    fuse   rw,nosuid,nodev,...
      
      This url-ish syntax works OK for sshfs and similar filesystems.  However
      for block device based filesystems (ntfs-3g, zfs) it doesn't work, since
      the kernel expects the mount source to be a real device name.
      
      A possibly better scheme would be to encode the real type in the type
      field as "type.subtype".  So fuse mounts would look like this:
      
        /dev/hda1       /mnt/windows   fuseblk.ntfs-3g   rw,...
        user@server:/   /mnt/server    fuse.sshfs        rw,nosuid,nodev,...
      
      This patch adds the necessary code to the kernel so that this can be
      correctly displayed in /proc/mounts.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79c0b2df
    • B
      Merge sys_clone()/sys_unshare() nsproxy and namespace handling · e3222c4e
      Badari Pulavarty 提交于
      sys_clone() and sys_unshare() both makes copies of nsproxy and its associated
      namespaces.  But they have different code paths.
      
      This patch merges all the nsproxy and its associated namespace copy/clone
      handling (as much as possible).  Posted on container list earlier for
      feedback.
      
      - Create a new nsproxy and its associated namespaces and pass it back to
        caller to attach it to right process.
      
      - Changed all copy_*_ns() routines to return a new copy of namespace
        instead of attaching it to task->nsproxy.
      
      - Moved the CAP_SYS_ADMIN checks out of copy_*_ns() routines.
      
      - Removed unnessary !ns checks from copy_*_ns() and added BUG_ON()
        just incase.
      
      - Get rid of all individual unshare_*_ns() routines and make use of
        copy_*_ns() instead.
      
      [akpm@osdl.org: cleanups, warning fix]
      [clg@fr.ibm.com: remove dup_namespaces() declaration]
      [serue@us.ibm.com: fix CONFIG_IPC_NS=n, clone(CLONE_NEWIPC) retval]
      [akpm@linux-foundation.org: fix build with CONFIG_SYSVIPC=n]
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Signed-off-by: NSerge Hallyn <serue@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: <containers@lists.osdl.org>
      Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e3222c4e
  15. 12 2月, 2007 1 次提交
  16. 14 12月, 2006 1 次提交
  17. 09 12月, 2006 1 次提交