1. 15 3月, 2011 1 次提交
    • A
      New AT_... flag: AT_EMPTY_PATH · f52e0c11
      Al Viro 提交于
      For name_to_handle_at(2) we'll want both ...at()-style syscall that
      would be usable for non-directory descriptors (with empty relative
      pathname).  Introduce new flag (AT_EMPTY_PATH) to deal with that and
      corresponding LOOKUP_EMPTY; teach user_path_at() and path_init() to
      deal with the latter.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      f52e0c11
  2. 14 3月, 2011 4 次提交
    • A
      reduce vfs_path_lookup() to do_path_lookup() · 5b6ca027
      Al Viro 提交于
      New lookup flag: LOOKUP_ROOT.  nd->root is set (and held) by caller,
      path_init() starts walking from that place and all pathname resolution
      machinery never drops nd->root if that flag is set.  That turns
      vfs_path_lookup() into a special case of do_path_lookup() *and*
      gets us down to 3 callers of link_path_walk(), making it finally
      feasible to rip the handling of trailing symlink out of link_path_walk().
      That will not only simply the living hell out of it, but make life
      much simpler for unionfs merge.  Trailing symlink handling will
      become iterative, which is a good thing for stack footprint in
      a lot of situations as well.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5b6ca027
    • A
      get rid of nd->file · 70e9b357
      Al Viro 提交于
      Don't stash the struct file * used as starting point of walk in nameidata;
      pass file ** to path_init() instead.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      70e9b357
    • A
      untangle the "need_reval_dot" mess · 16c2cd71
      Al Viro 提交于
      instead of ad-hackery around need_reval_dot(), do the following:
      set a flag (LOOKUP_JUMPED) in the beginning of path, on absolute
      symlink traversal, on ".." and on procfs-style symlinks.  Clear on
      normal components, leave unchanged on ".".  Non-nested callers of
      link_path_walk() call handle_reval_path(), which checks that flag
      is set and that fs does want the final revalidate thing, then does
      ->d_revalidate().  In link_path_walk() all the return_reval stuff
      is gone.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      16c2cd71
    • A
      kill path_lookup() · c9c6cac0
      Al Viro 提交于
      all remaining callers pass LOOKUP_PARENT to it, so
      flags argument can die; renamed to kern_path_parent()
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      c9c6cac0
  3. 16 1月, 2011 2 次提交
    • D
      Add an AT_NO_AUTOMOUNT flag to suppress terminal automount · 6f45b656
      David Howells 提交于
      Add an AT_NO_AUTOMOUNT flag to suppress terminal automounting of automount
      point directories.  This can be used by fstatat() users to permit the
      gathering of attributes on an automount point and also prevent
      mass-automounting of a directory of automount points by ls.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6f45b656
    • D
      Add a dentry op to allow processes to be held during pathwalk transit · cc53ce53
      David Howells 提交于
      Add a dentry op (d_manage) to permit a filesystem to hold a process and make it
      sleep when it tries to transit away from one of that filesystem's directories
      during a pathwalk.  The operation is keyed off a new dentry flag
      (DCACHE_MANAGE_TRANSIT).
      
      The filesystem is allowed to be selective about which processes it holds and
      which it permits to continue on or prohibits from transiting from each flagged
      directory.  This will allow autofs to hold up client processes whilst letting
      its userspace daemon through to maintain the directory or the stuff behind it
      or mounted upon it.
      
      The ->d_manage() dentry operation:
      
      	int (*d_manage)(struct path *path, bool mounting_here);
      
      takes a pointer to the directory about to be transited away from and a flag
      indicating whether the transit is undertaken by do_add_mount() or
      do_move_mount() skipping through a pile of filesystems mounted on a mountpoint.
      
      It should return 0 if successful and to let the process continue on its way;
      -EISDIR to prohibit the caller from skipping to overmounted filesystems or
      automounting, and to use this directory; or some other error code to return to
      the user.
      
      ->d_manage() is called with namespace_sem writelocked if mounting_here is true
      and no other locks held, so it may sleep.  However, if mounting_here is true,
      it may not initiate or wait for a mount or unmount upon the parameter
      directory, even if the act is actually performed by userspace.
      
      Within fs/namei.c, follow_managed() is extended to check with d_manage() first
      on each managed directory, before transiting away from it or attempting to
      automount upon it.
      
      follow_down() is renamed follow_down_one() and should only be used where the
      filesystem deliberately intends to avoid management steps (e.g. autofs).
      
      A new follow_down() is added that incorporates the loop done by all other
      callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS
      and CIFS do use it, their use is removed by converting them to use
      d_automount()).  The new follow_down() calls d_manage() as appropriate.  It
      also takes an extra parameter to indicate if it is being called from mount code
      (with namespace_sem writelocked) which it passes to d_manage().  follow_down()
      ignores automount points so that it can be used to mount on them.
      
      __follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with
      DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to
      sleep.  It would be possible to enter d_manage() in rcu-walk mode too, and have
      that determine whether to abort or not itself.  That would allow the autofs
      daemon to continue on in rcu-walk mode.
      
      Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't
      required as every tranist from that directory will cause d_manage() to be
      invoked.  It can always be set again when necessary.
      
      ==========================
      WHAT THIS MEANS FOR AUTOFS
      ==========================
      
      Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to
      trigger the automounting of indirect mounts, and both of these can be called
      with i_mutex held.
      
      autofs knows that the i_mutex will be held by the caller in lookup(), and so
      can drop it before invoking the daemon - but this isn't so for d_revalidate(),
      since the lock is only held on _some_ of the code paths that call it.  This
      means that autofs can't risk dropping i_mutex from its d_revalidate() function
      before it calls the daemon.
      
      The bug could manifest itself as, for example, a process that's trying to
      validate an automount dentry that gets made to wait because that dentry is
      expired and needs cleaning up:
      
      	mkdir         S ffffffff8014e05a     0 32580  24956
      	Call Trace:
      	 [<ffffffff885371fd>] :autofs4:autofs4_wait+0x674/0x897
      	 [<ffffffff80127f7d>] avc_has_perm+0x46/0x58
      	 [<ffffffff8009fdcf>] autoremove_wake_function+0x0/0x2e
      	 [<ffffffff88537be6>] :autofs4:autofs4_expire_wait+0x41/0x6b
      	 [<ffffffff88535cfc>] :autofs4:autofs4_revalidate+0x91/0x149
      	 [<ffffffff80036d96>] __lookup_hash+0xa0/0x12f
      	 [<ffffffff80057a2f>] lookup_create+0x46/0x80
      	 [<ffffffff800e6e31>] sys_mkdirat+0x56/0xe4
      
      versus the automount daemon which wants to remove that dentry, but can't
      because the normal process is holding the i_mutex lock:
      
      	automount     D ffffffff8014e05a     0 32581      1              32561
      	Call Trace:
      	 [<ffffffff80063c3f>] __mutex_lock_slowpath+0x60/0x9b
      	 [<ffffffff8000ccf1>] do_path_lookup+0x2ca/0x2f1
      	 [<ffffffff80063c89>] .text.lock.mutex+0xf/0x14
      	 [<ffffffff800e6d55>] do_rmdir+0x77/0xde
      	 [<ffffffff8005d229>] tracesys+0x71/0xe0
      	 [<ffffffff8005d28d>] tracesys+0xd5/0xe0
      
      which means that the system is deadlocked.
      
      This patch allows autofs to hold up normal processes whilst the daemon goes
      ahead and does things to the dentry tree behind the automouter point without
      risking a deadlock as almost no locks are held in d_manage() and none in
      d_automount().
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Was-Acked-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      cc53ce53
  4. 07 1月, 2011 2 次提交
    • N
      fs: rcu-walk for path lookup · 31e6b01f
      Nick Piggin 提交于
      Perform common cases of path lookups without any stores or locking in the
      ancestor dentry elements. This is called rcu-walk, as opposed to the current
      algorithm which is a refcount based walk, or ref-walk.
      
      This results in far fewer atomic operations on every path element,
      significantly improving path lookup performance. It also avoids cacheline
      bouncing on common dentries, significantly improving scalability.
      
      The overall design is like this:
      * LOOKUP_RCU is set in nd->flags, which distinguishes rcu-walk from ref-walk.
      * Take the RCU lock for the entire path walk, starting with the acquiring
        of the starting path (eg. root/cwd/fd-path). So now dentry refcounts are
        not required for dentry persistence.
      * synchronize_rcu is called when unregistering a filesystem, so we can
        access d_ops and i_ops during rcu-walk.
      * Similarly take the vfsmount lock for the entire path walk. So now mnt
        refcounts are not required for persistence. Also we are free to perform mount
        lookups, and to assume dentry mount points and mount roots are stable up and
        down the path.
      * Have a per-dentry seqlock to protect the dentry name, parent, and inode,
        so we can load this tuple atomically, and also check whether any of its
        members have changed.
      * Dentry lookups (based on parent, candidate string tuple) recheck the parent
        sequence after the child is found in case anything changed in the parent
        during the path walk.
      * inode is also RCU protected so we can load d_inode and use the inode for
        limited things.
      * i_mode, i_uid, i_gid can be tested for exec permissions during path walk.
      * i_op can be loaded.
      
      When we reach the destination dentry, we lock it, recheck lookup sequence,
      and increment its refcount and mountpoint refcount. RCU and vfsmount locks
      are dropped. This is termed "dropping rcu-walk". If the dentry refcount does
      not match, we can not drop rcu-walk gracefully at the current point in the
      lokup, so instead return -ECHILD (for want of a better errno). This signals the
      path walking code to re-do the entire lookup with a ref-walk.
      
      Aside from the final dentry, there are other situations that may be encounted
      where we cannot continue rcu-walk. In that case, we drop rcu-walk (ie. take
      a reference on the last good dentry) and continue with a ref-walk. Again, if
      we can drop rcu-walk gracefully, we return -ECHILD and do the whole lookup
      using ref-walk. But it is very important that we can continue with ref-walk
      for most cases, particularly to avoid the overhead of double lookups, and to
      gain the scalability advantages on common path elements (like cwd and root).
      
      The cases where rcu-walk cannot continue are:
      * NULL dentry (ie. any uncached path element)
      * parent with d_inode->i_op->permission or ACLs
      * dentries with d_revalidate
      * Following links
      
      In future patches, permission checks and d_revalidate become rcu-walk aware. It
      may be possible eventually to make following links rcu-walk aware.
      
      Uncached path elements will always require dropping to ref-walk mode, at the
      very least because i_mutex needs to be grabbed, and objects allocated.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      31e6b01f
    • N
      fs: dcache remove dcache_lock · b5c84bf6
      Nick Piggin 提交于
      dcache_lock no longer protects anything. remove it.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      b5c84bf6
  5. 23 12月, 2009 1 次提交
  6. 12 12月, 2009 1 次提交
  7. 21 9月, 2009 1 次提交
  8. 12 6月, 2009 3 次提交
  9. 09 5月, 2009 1 次提交
  10. 01 1月, 2009 1 次提交
  11. 23 10月, 2008 3 次提交
  12. 27 7月, 2008 5 次提交
  13. 15 2月, 2008 4 次提交
  14. 14 2月, 2008 1 次提交
  15. 17 10月, 2007 1 次提交
    • C
      partially fix up the lookup_one_noperm mess · eead1911
      Christoph Hellwig 提交于
      Try to fix the mess created by sysfs braindamage.
      
       - refactor code internal to fs/namei.c a little to avoid too much
         duplication:
      	o __lookup_hash_kern is renamed back to __lookup_hash
      	o the old __lookup_hash goes away, permission checks moves to
      	  the two callers
      	o useless inline qualifiers on above functions go away
       - lookup_one_len_kern loses it's last argument and is renamed to
         lookup_one_noperm to make it's useage a little more clear
       - added kerneldoc comments to describe lookup_one_len aswell as
         lookup_one_noperm and make it very clear that no one should use
         the latter ever.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eead1911
  16. 20 7月, 2007 3 次提交
  17. 28 4月, 2007 1 次提交
  18. 09 12月, 2006 1 次提交
  19. 01 10月, 2006 1 次提交
  20. 30 9月, 2006 1 次提交
  21. 15 7月, 2006 1 次提交
  22. 01 4月, 2006 1 次提交