1. 28 4月, 2008 1 次提交
  2. 25 4月, 2008 1 次提交
  3. 23 4月, 2008 5 次提交
    • M
      [patch 7/7] vfs: mountinfo: show dominating group id · 97e7e0f7
      Miklos Szeredi 提交于
      Show peer group ID of nearest dominating group that has intersection
      with the mount's namespace.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      97e7e0f7
    • R
      [patch 6/7] vfs: mountinfo: add /proc/<pid>/mountinfo · 2d4d4864
      Ram Pai 提交于
      [mszeredi@suse.cz] rewrite and split big patch into managable chunks
      
      /proc/mounts in its current form lacks important information:
      
       - propagation state
       - root of mount for bind mounts
       - the st_dev value used within the filesystem
       - identifier for each mount and it's parent
      
      It also suffers from the following problems:
      
       - not easily extendable
       - ambiguity of mountpoints within a chrooted environment
       - doesn't distinguish between filesystem dependent and independent options
       - doesn't distinguish between per mount and per super block options
      
      This patch introduces /proc/<pid>/mountinfo which attempts to address
      all these deficiencies.
      
      Code shared between /proc/<pid>/mounts and /proc/<pid>/mountinfo is
      extracted into separate functions.
      
      Thanks to Al Viro for the help in getting the design right.
      Signed-off-by: NRam Pai <linuxram@us.ibm.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2d4d4864
    • M
      [patch 5/7] vfs: mountinfo: allow using process root · a1a2c409
      Miklos Szeredi 提交于
      Allow /proc/<pid>/mountinfo to use the root of <pid> to calculate
      mountpoints.
      
       - move definition of 'struct proc_mounts' to <linux/mnt_namespace.h>
       - add the process's namespace and root to this structure
       - pass a pointer to 'struct proc_mounts' into seq_operations
      
      In addition the following cleanups are made:
      
       - use a common open function for /proc/<pid>/{mounts,mountstat}
       - surround namespace.c part of these proc files with #ifdef CONFIG_PROC_FS
       - make the seq_operations structures const
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a1a2c409
    • M
      [patch 4/7] vfs: mountinfo: add mount peer group ID · 719f5d7f
      Miklos Szeredi 提交于
      Add a unique ID to each peer group using the IDR infrastructure.  The
      identifiers are reused after the peer group dissolves.
      
      The IDR structures are protected by holding namepspace_sem for write
      while allocating or deallocating IDs.
      
      IDs are allocated when a previously unshared vfsmount becomes the
      first member of a peer group.  When a new member is added to an
      existing group, the ID is copied from one of the old members.
      
      IDs are freed when the last member of a peer group is unshared.
      
      Setting the MNT_SHARED flag on members of a subtree is done as a
      separate step, after all the IDs have been allocated.  This way an
      allocation failure can be cleaned up easilty, without affecting the
      propagation state.
      
      Based on design sketch by Al Viro.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      719f5d7f
    • M
      [patch 3/7] vfs: mountinfo: add mount ID · 73cd49ec
      Miklos Szeredi 提交于
      Add a unique ID to each vfsmount using the IDR infrastructure.  The
      identifiers are reused after the vfsmount is freed.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      73cd49ec
  4. 22 4月, 2008 3 次提交
  5. 19 4月, 2008 3 次提交
    • D
      [PATCH] r/o bind mounts: honor mount writer counts at remount · 2e4b7fcd
      Dave Hansen 提交于
      Originally from: Herbert Poetzl <herbert@13thfloor.at>
      
      This is the core of the read-only bind mount patch set.
      
      Note that this does _not_ add a "ro" option directly to the bind mount
      operation.  If you require such a mount, you must first do the bind, then
      follow it up with a 'mount -o remount,ro' operation:
      
      If you wish to have a r/o bind mount of /foo on bar:
      
      	mount --bind /foo /bar
      	mount -o remount,ro /bar
      Acked-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2e4b7fcd
    • D
      [PATCH] r/o bind mounts: track numbers of writers to mounts · 3d733633
      Dave Hansen 提交于
      This is the real meat of the entire series.  It actually
      implements the tracking of the number of writers to a mount.
      However, it causes scalability problems because there can be
      hundreds of cpus doing open()/close() on files on the same mnt at
      the same time.  Even an atomic_t in the mnt has massive scalaing
      problems because the cacheline gets so terribly contended.
      
      This uses a statically-allocated percpu variable.  All want/drop
      operations are local to a cpu as long that cpu operates on the same
      mount, and there are no writer count imbalances.  Writer count
      imbalances happen when a write is taken on one cpu, and released
      on another, like when an open/close pair is performed on two
      
      Upon a remount,ro request, all of the data from the percpu
      variables is collected (expensive, but very rare) and we determine
      if there are any outstanding writers to the mount.
      
      I've written a little benchmark to sit in a loop for a couple of
      seconds in several cpus in parallel doing open/write/close loops.
      
      http://sr71.net/~dave/linux/openbench.c
      
      The code in here is a a worst-possible case for this patch.  It
      does opens on a _pair_ of files in two different mounts in parallel.
      This should cause my code to lose its "operate on the same mount"
      optimization completely.  This worst-case scenario causes a 3%
      degredation in the benchmark.
      
      I could probably get rid of even this 3%, but it would be more
      complex than what I have here, and I think this is getting into
      acceptable territory.  In practice, I expect writing more than 3
      bytes to a file, as well as disk I/O to mask any effects that this
      has.
      
      (To get rid of that 3%, we could have an #defined number of mounts
      in the percpu variable.  So, instead of a CPU getting operate only
      on percpu data when it accesses only one mount, it could stay on
      percpu data when it only accesses N or fewer mounts.)
      
      [AV] merged fix for __clear_mnt_mount() stepping on freed vfsmount
      Acked-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3d733633
    • D
      [PATCH] r/o bind mounts: stub functions · 8366025e
      Dave Hansen 提交于
      This patch adds two function mnt_want_write() and mnt_drop_write().  These are
      used like a lock pair around and fs operations that might cause a write to the
      filesystem.
      
      Before these can become useful, we must first cover each place in the VFS
      where writes are performed with a want/drop pair.  When that is complete, we
      can actually introduce code that will safely check the counts before allowing
      r/w<->r/o transitions to occur.
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Acked-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8366025e
  6. 28 3月, 2008 5 次提交
  7. 15 2月, 2008 6 次提交
  8. 09 2月, 2008 2 次提交
  9. 07 2月, 2008 1 次提交
  10. 25 1月, 2008 2 次提交
  11. 21 10月, 2007 1 次提交
  12. 20 10月, 2007 1 次提交
    • P
      pid namespaces: introduce MS_KERNMOUNT flag · 8bf9725c
      Pavel Emelyanov 提交于
      This flag tells the .get_sb callback that this is a kern_mount() call so that
      it can trust *data pointer to be valid in-kernel one.  If this flag is passed
      from the user process, it is cleared since the *data pointer is not a valid
      kernel object.
      
      Running a few steps forward - this will be needed for proc to create the
      superblock and store a valid pid namespace on it during the namespace
      creation.  The reason, why the namespace cannot live without proc mount is
      described in the appropriate patch.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8bf9725c
  13. 17 10月, 2007 1 次提交
  14. 20 7月, 2007 1 次提交
    • P
      mm: Remove slab destructors from kmem_cache_create(). · 20c2df83
      Paul Mundt 提交于
      Slab destructors were no longer supported after Christoph's
      c59def9f change. They've been
      BUGs for both slab and slub, and slob never supported them
      either.
      
      This rips out support for the dtor pointer from kmem_cache_create()
      completely and fixes up every single callsite in the kernel (there were
      about 224, not including the slab allocator definitions themselves,
      or the documentation references).
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      20c2df83
  15. 17 7月, 2007 4 次提交
  16. 09 5月, 2007 3 次提交
    • M
      check privileges before setting mount propagation · ee6f9582
      Miklos Szeredi 提交于
      There's a missing check for CAP_SYS_ADMIN in do_change_type().
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ee6f9582
    • P
      Introduce a handy list_first_entry macro · b5e61818
      Pavel Emelianov 提交于
      There are many places in the kernel where the construction like
      
         foo = list_entry(head->next, struct foo_struct, list);
      
      are used.
      The code might look more descriptive and neat if using the macro
      
         list_first_entry(head, type, member) \
                   list_entry((head)->next, type, member)
      
      Here is the macro itself and the examples of its usage in the generic code.
       If it will turn out to be useful, I can prepare the set of patches to
      inject in into arch-specific code, drivers, networking, etc.
      Signed-off-by: NPavel Emelianov <xemul@openvz.org>
      Signed-off-by: NKirill Korotaev <dev@openvz.org>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: John McCutchan <ttb@tentacle.dhs.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: john stultz <johnstul@us.ibm.com>
      Cc: Ram Pai <linuxram@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5e61818
    • M
      add filesystem subtype support · 79c0b2df
      Miklos Szeredi 提交于
      There's a slight problem with filesystem type representation in fuse
      based filesystems.
      
      From the kernel's view, there are just two filesystem types: fuse and
      fuseblk.  From the user's view there are lots of different filesystem
      types.  The user is not even much concerned if the filesystem is fuse based
      or not.  So there's a conflict of interest in how this should be
      represented in fstab, mtab and /proc/mounts.
      
      The current scheme is to encode the real filesystem type in the mount
      source.  So an sshfs mount looks like this:
      
        sshfs#user@server:/   /mnt/server    fuse   rw,nosuid,nodev,...
      
      This url-ish syntax works OK for sshfs and similar filesystems.  However
      for block device based filesystems (ntfs-3g, zfs) it doesn't work, since
      the kernel expects the mount source to be a real device name.
      
      A possibly better scheme would be to encode the real type in the type
      field as "type.subtype".  So fuse mounts would look like this:
      
        /dev/hda1       /mnt/windows   fuseblk.ntfs-3g   rw,...
        user@server:/   /mnt/server    fuse.sshfs        rw,nosuid,nodev,...
      
      This patch adds the necessary code to the kernel so that this can be
      correctly displayed in /proc/mounts.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79c0b2df