1. 04 5月, 2014 2 次提交
    • A
      more graceful recovery in umount_collect() · 9c8c10e2
      Al Viro 提交于
      Start with shrink_dcache_parent(), then scan what remains.
      
      First of all, BUG() is very much an overkill here; we are holding
      ->s_umount, and hitting BUG() means that a lot of interesting stuff
      will be hanging after that point (sync(2), for example).  Moreover,
      in cases when there had been more than one leak, we'll be better
      off reporting all of them.  And more than just the last component
      of pathname - %pd is there for just such uses...
      
      That was the last user of dentry_lru_del(), so kill it off...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9c8c10e2
    • A
      don't remove from shrink list in select_collect() · fe91522a
      Al Viro 提交于
      	If we find something already on a shrink list, just increment
      data->found and do nothing else.  Loops in shrink_dcache_parent() and
      check_submounts_and_drop() will do the right thing - everything we
      did put into our list will be evicted and if there had been nothing,
      but data->found got non-zero, well, we have somebody else shrinking
      those guys; just try again.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      fe91522a
  2. 01 5月, 2014 5 次提交
  3. 20 4月, 2014 1 次提交
    • A
      fix races between __d_instantiate() and checks of dentry flags · 22213318
      Al Viro 提交于
      in non-lazy walk we need to be careful about dentry switching from
      negative to positive - both ->d_flags and ->d_inode are updated,
      and in some places we might see only one store.  The cases where
      dentry has been obtained by dcache lookup with ->i_mutex held on
      parent are safe - ->d_lock and ->i_mutex provide all the barriers
      we need.  However, there are several places where we run into
      trouble:
      	* do_last() fetches ->d_inode, then checks ->d_flags and
      assumes that inode won't be NULL unless d_is_negative() is true.
      Race with e.g. creat() - we might have fetched the old value of
      ->d_inode (still NULL) and new value of ->d_flags (already not
      DCACHE_MISS_TYPE).  Lin Ming has observed and reported the resulting
      oops.
      	* a bunch of places checks ->d_inode for being non-NULL,
      then checks ->d_flags for "is it a symlink".  Race with symlink(2)
      in case if our CPU sees ->d_inode update first - we see non-NULL
      there, but ->d_flags still contains DCACHE_MISS_TYPE instead of
      DCACHE_SYMLINK_TYPE.  Result: false negative on "should we follow
      link here?", with subsequent unpleasantness.
      
      Cc: stable@vger.kernel.org # 3.13 and 3.14 need that one
      Reported-and-tested-by: NLin Ming <minggr@gmail.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      22213318
  4. 01 4月, 2014 1 次提交
  5. 23 3月, 2014 1 次提交
  6. 16 3月, 2014 1 次提交
    • D
      drm: add pseudo filesystem for shared inodes · 31bbe16f
      David Herrmann 提交于
      Our current DRM design uses a single address_space for all users of the
      same DRM device. However, there is no way to create an anonymous
      address_space without an underlying inode. Therefore, we wait for the
      first ->open() callback on a registered char-dev and take-over the inode
      of the char-dev. This worked well so far, but has several drawbacks:
       - We screw with FS internals and rely on some non-obvious invariants like
         inode->i_mapping being the same as inode->i_data for char-devs.
       - We don't have any address_space prior to the first ->open() from
         user-space. This leads to ugly fallback code and we cannot allocate
         global objects early.
      
      As pointed out by Al-Viro, fs/anon_inode.c is *not* supposed to be used by
      drivers for anonymous inode-allocation. Therefore, this patch follows the
      proposed alternative solution and adds a pseudo filesystem mount-point to
      DRM. We can then allocate private inodes including a private address_space
      for each DRM device at initialization time.
      
      Note that we could use:
        sysfs_get_inode(sysfs_mnt->mnt_sb, drm_device->dev->kobj.sd);
      to get access to the underlying sysfs-inode of a "struct device" object.
      However, most of this information is currently hidden and it's not clear
      whether this address_space is suitable for driver access. Thus, unless
      linux allows anonymous address_space objects or driver-core provides a
      public inode per device, we're left with our own private internal mount
      point.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
      31bbe16f
  7. 27 1月, 2014 1 次提交
  8. 26 1月, 2014 1 次提交
    • E
      vfs: Remove second variable named error in __dentry_path · a8323da0
      Eric W. Biederman 提交于
      In commit  232d2d60
      Author: Waiman Long <Waiman.Long@hp.com>
      Date:   Mon Sep 9 12:18:13 2013 -0400
      
          dcache: Translating dentry into pathname without taking rename_lock
      
      The __dentry_path locking was changed and the variable error was
      intended to be moved outside of the loop.  Unfortunately the inner
      declaration of error was not removed. Resulting in a version of
      __dentry_path that will never return an error.
      
      Remove the problematic inner declaration of error and allow
      __dentry_path to return errors once again.
      
      Cc: stable@vger.kernel.org
      Cc: Waiman Long <Waiman.Long@hp.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a8323da0
  9. 13 12月, 2013 1 次提交
  10. 27 11月, 2013 1 次提交
    • E
      vfs: In d_path don't call d_dname on a mount point · f48cfddc
      Eric W. Biederman 提交于
      Aditya Kali (adityakali@google.com) wrote:
      > Commit bf056bfa:
      > "proc: Fix the namespace inode permission checks." converted
      > the namespace files into symlinks. The same commit changed
      > the way namespace bind mounts appear in /proc/mounts:
      >   $ mount --bind /proc/self/ns/ipc /mnt/ipc
      > Originally:
      >   $ cat /proc/mounts | grep ipc
      >   proc /mnt/ipc proc rw,nosuid,nodev,noexec 0 0
      >
      > After commit bf056bfa:
      >   $ cat /proc/mounts | grep ipc
      >   proc ipc:[4026531839] proc rw,nosuid,nodev,noexec 0 0
      >
      > This breaks userspace which expects the 2nd field in
      > /proc/mounts to be a valid path.
      
      The symlink /proc/<pid>/ns/{ipc,mnt,net,pid,user,uts} point to
      dentries allocated with d_alloc_pseudo that we can mount, and
      that have interesting names printed out with d_dname.
      
      When these files are bind mounted /proc/mounts is not currently
      displaying the mount point correctly because d_dname is called instead
      of just displaying the path where the file is mounted.
      
      Solve this by adding an explicit check to distinguish mounted pseudo
      inodes and unmounted pseudo inodes.  Unmounted pseudo inodes always
      use mount of their filesstem as the mnt_root  in their path making
      these two cases easy to distinguish.
      
      CC: stable@vger.kernel.org
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Reported-by: NAditya Kali <adityakali@google.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      f48cfddc
  11. 16 11月, 2013 3 次提交
  12. 13 11月, 2013 2 次提交
  13. 09 11月, 2013 7 次提交
    • J
      dcache: don't clear DCACHE_DISCONNECTED too early · f80de2cd
      J. Bruce Fields 提交于
      DCACHE_DISCONNECTED should not be cleared until we're sure the dentry is
      connected all the way up to the root of the filesystem.  It *shouldn't*
      be cleared as soon as the dentry is connected to a parent.  That will
      cause bugs at least on exportable filesystems.
      Acked-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      f80de2cd
    • J
      dcache: Don't set DISCONNECTED on "pseudo filesystem" dentries · e1a24bb0
      J. Bruce Fields 提交于
      I can't for the life of me see any reason why anyone should care whether
      a dentry that is never hooked into the dentry cache would need
      DCACHE_DISCONNECTED set.
      
      This originates from 4b936885 "fs:
      improve scalability of pseudo filesystems", which probably just made the
      false assumption the DCACHE_DISCONNECTED was meant to be set on anything
      not connected to a parent somehow.
      
      So this is just confusing.  Ideally the only uses of DCACHE_DISCONNECTED
      would be in the filehandle-lookup code, which needs it to ensure
      dentries are connected into the dentry tree before use.
      
      I left d_alloc_pseudo there even though it's now equivalent to
      __d_alloc(), just on the theory the name is better documentation of its
      intended use outside dcache.c.
      
      Cc: Nick Piggin <npiggin@kernel.dk>
      Acked-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e1a24bb0
    • J
      dcache: use IS_ROOT to decide where dentry is hashed · 7632e465
      J. Bruce Fields 提交于
      Every hashed dentry is either hashed in the dentry_hashtable, or a
      superblock's s_anon list.
      
      __d_drop() assumes it can determine which is the case by checking
      DCACHE_DISCONNECTED; this is not true.
      
      It is true that when DCACHE_DISCONNECTED is cleared, the dentry is not
      only hashed on dentry_hashtable, but is fully connected to its parents
      back to the root.
      
      But the converse is *not* true: fs/exportfs/expfs.c:reconnect_path()
      attempts to connect a directory (found by filehandle lookup) back to
      root by ascending to parents and performing lookups one at a time.  It
      does not clear DCACHE_DISCONNECTED until it's done, and that is not at
      all an atomic process.
      
      In particular, it is possible for DCACHE_DISCONNECTED to be set on a
      dentry which is hashed on the dentry_hashtable.
      
      Instead, use IS_ROOT() to check which hash chain a dentry is on.  This
      *does* work:
      
      Dentries are hashed only by:
      
      	- d_obtain_alias, which adds an IS_ROOT() dentry to sb_anon.
      
      	- __d_rehash, called by _d_rehash: hashes to the dentry's
      	  parent, and all callers of _d_rehash appear to have d_parent
      	  set to a "real" parent.
      	- __d_rehash, called by __d_move: rehashes the moved dentry to
      	  hash chain determined by target, and assigns target's d_parent
      	  to its d_parent, before dropping the dentry's d_lock.
      
      Therefore I believe it's safe for a holder of a dentry's d_lock to
      assume that it is hashed on sb_anon if and only if IS_ROOT(dentry) is
      true.
      
      I believe the incorrect assumption about DCACHE_DISCONNECTED was
      originally introduced by ceb5bdc2 "fs: dcache per-bucket dcache hash
      locking".
      
      Also add a comment while we're here.
      
      Cc: Nick Piggin <npiggin@kernel.dk>
      Acked-by: NChristoph Hellwig <hch@infradead.org>
      Reviewed-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7632e465
    • D
      VFS: Put a small type field into struct dentry::d_flags · b18825a7
      David Howells 提交于
      Put a type field into struct dentry::d_flags to indicate if the dentry is one
      of the following types that relate particularly to pathwalk:
      
      	Miss (negative dentry)
      	Directory
      	"Automount" directory (defective - no i_op->lookup())
      	Symlink
      	Other (regular, socket, fifo, device)
      
      The type field is set to one of the first five types on a dentry by calls to
      __d_instantiate() and d_obtain_alias() from information in the inode (if one is
      given).
      
      The type is cleared by dentry_unlink_inode() when it reconstitutes an existing
      dentry as a negative dentry.
      
      Accessors provided are:
      
      	d_set_type(dentry, type)
      	d_is_directory(dentry)
      	d_is_autodir(dentry)
      	d_is_symlink(dentry)
      	d_is_file(dentry)
      	d_is_negative(dentry)
      	d_is_positive(dentry)
      
      A bunch of checks in pathname resolution switched to those.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b18825a7
    • A
      fold __d_shrink() into its only remaining caller · b61625d2
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b61625d2
    • A
      RCU'd vfsmounts · 48a066e7
      Al Viro 提交于
      * RCU-delayed freeing of vfsmounts
      * vfsmount_lock replaced with a seqlock (mount_lock)
      * sequence number from mount_lock is stored in nameidata->m_seq and
      used when we exit RCU mode
      * new vfsmount flag - MNT_SYNC_UMOUNT.  Set by umount_tree() when its
      caller knows that vfsmount will have no surviving references.
      * synchronize_rcu() done between unlocking namespace_sem in namespace_unlock()
      and doing pending mntput().
      * new helper: legitimize_mnt(mnt, seq).  Checks the mount_lock sequence
      number against seq, then grabs reference to mnt.  Then it rechecks mount_lock
      again to close the race and either returns success or drops the reference it
      has acquired.  The subtle point is that in case of MNT_SYNC_UMOUNT we can
      simply decrement the refcount and sod off - aforementioned synchronize_rcu()
      makes sure that final mntput() won't come until we leave RCU mode.  We need
      that, since we don't want to end up with some lazy pathwalk racing with
      umount() and stealing the final mntput() from it - caller of umount() may
      expect it to return only once the fs is shut down and we don't want to break
      that.  In other cases (i.e. with MNT_SYNC_UMOUNT absent) we have to do
      full-blown mntput() in case of mount_lock sequence number mismatch happening
      just as we'd grabbed the reference, but in those cases we won't be stealing
      the final mntput() from anything that would care.
      * mntput_no_expire() doesn't lock anything on the fast path now.  Incidentally,
      SMP and UP cases are handled the same way - no ifdefs there.
      * normal pathname resolution does *not* do any writes to mount_lock.  It does,
      of course, bump the refcounts of vfsmount and dentry in the very end, but that's
      it.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      48a066e7
    • A
      switch shrink_dcache_for_umount() to use of d_walk() · 42c32608
      Al Viro 提交于
      we have too many iterators in fs/dcache.c...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      42c32608
  14. 06 11月, 2013 1 次提交
  15. 01 11月, 2013 1 次提交
    • L
      vfs: decrapify dput(), fix cache behavior under normal load · 358eec18
      Linus Torvalds 提交于
      We do not want to dirty the dentry->d_flags cacheline in dput() just to
      set the DCACHE_REFERENCED flag when it is already set in the common case
      anyway.  This way the first cacheline of the dentry (which contains the
      RCU lookup information etc) can stay shared among multiple CPU's.
      
      This finishes off some of the details of all the scalability patches
      merged during the merge window.
      
      Also don't mark dentry_kill() for inlining, since it's the uncommon path
      and inlining it just makes the common path slower due to extra function
      entry/exit overhead.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      358eec18
  16. 25 10月, 2013 2 次提交
  17. 22 10月, 2013 1 次提交
    • R
      vfs: fix new kernel-doc warnings · 69c88dc7
      Randy Dunlap 提交于
      Move kernel-doc notation to immediately before its function to eliminate
      kernel-doc warnings introduced by commit db14fc3a ("vfs: add
      d_walk()")
      
        Warning(fs/dcache.c:1343): No description found for parameter 'data'
        Warning(fs/dcache.c:1343): No description found for parameter 'dentry'
        Warning(fs/dcache.c:1343): Excess function parameter 'parent' description in 'check_mount'
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: Miklos Szeredi <mszeredi@suse.cz>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69c88dc7
  18. 15 9月, 2013 1 次提交
  19. 14 9月, 2013 1 次提交
    • L
      vfs: fix dentry LRU list handling and nr_dentry_unused accounting · 89dc77bc
      Linus Torvalds 提交于
      The LRU list changes interacted badly with our nr_dentry_unused
      accounting, and even worse with the new DCACHE_LRU_LIST bit logic.
      
      This introduces helper functions to make sure everything follows the
      proper dcache d_lru list rules: the dentry cache is complicated by the
      fact that some of the hotpaths don't even want to look at the LRU list
      at all, and the fact that we use the same list entry in the dentry for
      both the LRU list and for our temporary shrinking lists when removing
      things from the LRU.
      
      The helper functions temporarily have some extra sanity checking for the
      flag bits that have to match the current LRU state of the dentry.  We'll
      remove that before the final 3.12 release, but considering how easy it
      is to get wrong, this first cleanup version has some very particular
      sanity checking.
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      89dc77bc
  20. 13 9月, 2013 6 次提交
    • L
      vfs: make d_path() get the root path under RCU · 68f0d9d9
      Linus Torvalds 提交于
      This avoids the spinlocks and refcounts in the d_path() sequence too
      (used by /proc and various other entities).  See commit 8b19e341 for
      the equivalent getcwd() system call path.
      
      And unlike getcwd(), d_path() doesn't copy the result to user space, so
      I don't need to fear _that_ particular bug happening again.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      68f0d9d9
    • L
      vfs: use __getname/__putname for getcwd() system call · 3272c544
      Linus Torvalds 提交于
      It's a pathname.  It should use the pathname allocators and
      deallocators, and PATH_MAX instead of PAGE_SIZE.  Never mind that the
      two are commonly the same.
      
      With this, the allocations scale up nicely too, and I can do getcwd()
      system calls at a rate of about 300M/s, with no lock contention
      anywhere.
      
      Of course, nobody sane does that, especially since getcwd() is
      traditionally a very slow operation in Unix.  But this was also the
      simplest way to benchmark the prepend_path() improvements by Waiman, and
      once I saw the profiles I couldn't leave it well enough alone.
      
      But apart from being an performance improvement (from using per-cpu slab
      allocators instead of the raw page allocator), it's actually a valid and
      real cleanup.
      Signed-off-by: NLinus "OCD" Torvalds <torvalds@linux-foundation.org>
      3272c544
    • L
      vfs: don't copy things to user space holding the rcu readlock · ff812d72
      Linus Torvalds 提交于
      Oops.  That wasn't very smart.  We don't actually need the RCU lock any
      more by the time we copy the cwd string to user space, but I had
      stupidly surrounded the whole thing with it.
      
      Introduced by commit 8b19e341 ("vfs: make getcwd() get the root and
      pwd path under rcu")
      
      Is-a-big-hairy-idiot: Linus Torvalds <torvalds@linux-foundation.org>
      ff812d72
    • L
      vfs: make getcwd() get the root and pwd path under rcu · 8b19e341
      Linus Torvalds 提交于
      This allows us to skip all the crazy spinlocks and reference count
      updates, and instead use the fs sequence read-lock to get an atomic
      snapshot of the root and cwd information.
      
      We might want to make the rule that "prepend_path()" is always called
      with the RCU lock held, but the RCU lock nests fine and this is the
      minimal fix.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8b19e341
    • L
      vfs: move get_fs_root_and_pwd() to single caller · 5762482f
      Linus Torvalds 提交于
      Let's not pollute the include files with inline functions that are only
      used in a single place.  Especially not if we decide we might want to
      change the semantics of said function to make it more efficient..
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5762482f
    • W
      dcache: get/release read lock in read_seqbegin_or_lock() & friend · 18129977
      Waiman Long 提交于
      This patch modifies read_seqbegin_or_lock() and need_seqretry() to use
      newly introduced read_seqlock_excl() and read_sequnlock_excl()
      primitives so that they won't change the sequence number even if they
      fall back to take the lock.  This is OK as no change to the protected
      data structure is being made.
      
      It will prevent one fallback to lock taking from cascading into a series
      of lock taking reducing performance because of the sequence number
      change.  It will also allow other sequence readers to go forward while
      an exclusive reader lock is taken.
      
      This patch also updates some of the inaccurate comments in the code.
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      To: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      18129977