1. 02 12月, 2016 1 次提交
    • N
      NFSv4: add flock_owner to open context · 532d4def
      NeilBrown 提交于
      An open file description (struct file) in a given process can be
      associated with two different lock owners.
      
      It can have a Posix lock owner which will be different in each process
      that has a fd on the file.
      It can have a Flock owner which will be the same in all processes.
      
      When searching for a lock stateid to use, we need to consider both of these
      owners
      
      So add a new "flock_owner" to the "nfs_open_context" (of which there
      is one for each open file description).
      
      This flock_owner does not need to be reference-counted as there is a
      1-1 relation between 'struct file' and nfs open contexts,
      and it will never be part of a list of contexts.  So there is no need
      for a 'flock_context' - just the owner is enough.
      
      The io_count included in the (Posix) lock_context provides no
      guarantee that all read-aheads that could use the state have
      completed, so not supporting it for flock locks in not a serious
      problem.  Synchronization between flock and read-ahead can be added
      later if needed.
      
      When creating an open_context for a non-openning create call, we don't have
      a 'struct file' to pass in, so the lock context gets initialized with
      a NULL owner, but this will never be used.
      
      The flock_owner is not used at all in this patch, that will come later.
      Acked-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      532d4def
  2. 28 9月, 2016 1 次提交
  3. 27 9月, 2016 1 次提交
    • M
      fs: make remaining filesystems use .rename2 · 1cd66c93
      Miklos Szeredi 提交于
      This is trivial to do:
      
       - add flags argument to foo_rename()
       - check if flags is zero
       - assign foo_rename() to .rename2 instead of .rename
      
      This doesn't mean it's impossible to support RENAME_NOREPLACE for these
      filesystems, but it is not trivial, like for local filesystems.
      RENAME_NOREPLACE must guarantee atomicity (i.e. it shouldn't be possible
      for a file to be created on one host while it is overwritten by rename on
      another host).
      
      Filesystems converted:
      
      9p, afs, ceph, coda, ecryptfs, kernfs, lustre, ncpfs, nfs, ocfs2, orangefs.
      
      After this, we can get rid of the duplicate interfaces for rename.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: David Howells <dhowells@redhat.com> [AFS]
      Acked-by: NMike Marshall <hubcap@omnibond.com>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Ilya Dryomov <idryomov@gmail.com>
      Cc: Jan Harkes <jaharkes@cs.cmu.edu>
      Cc: Tyler Hicks <tyhicks@canonical.com>
      Cc: Oleg Drokin <oleg.drokin@intel.com>
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      1cd66c93
  4. 23 9月, 2016 1 次提交
  5. 06 7月, 2016 2 次提交
  6. 27 6月, 2016 1 次提交
    • A
      make nfs_atomic_open() call d_drop() on all ->open_context() errors. · d20cb71d
      Al Viro 提交于
      In "NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code"
      unconditional d_drop() after the ->open_context() had been removed.  It had
      been correct for success cases (there ->open_context() itself had been doing
      dcache manipulations), but not for error ones.  Only one of those (ENOENT)
      got a compensatory d_drop() added in that commit, but in fact it should've
      been done for all errors.  As it is, the case of O_CREAT non-exclusive open
      on a hashed negative dentry racing with e.g. symlink creation from another
      client ended up with ->open_context() getting an error and proceeding to
      call nfs_lookup().  On a hashed dentry, which would've instantly triggered
      BUG_ON() in d_materialise_unique() (or, these days, its equivalent in
      d_splice_alias()).
      
      Cc: stable@vger.kernel.org # v3.10+
      Tested-by: NOleg Drokin <green@linuxhacker.ru>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      d20cb71d
  7. 25 6月, 2016 2 次提交
  8. 16 6月, 2016 1 次提交
  9. 11 6月, 2016 1 次提交
    • L
      vfs: make the string hashes salt the hash · 8387ff25
      Linus Torvalds 提交于
      We always mixed in the parent pointer into the dentry name hash, but we
      did it late at lookup time.  It turns out that we can simplify that
      lookup-time action by salting the hash with the parent pointer early
      instead of late.
      
      A few other users of our string hashes also wanted to mix in their own
      pointers into the hash, and those are updated to use the same mechanism.
      
      Hash users that don't have any particular initial salt can just use the
      NULL pointer as a no-salt.
      
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8387ff25
  10. 30 5月, 2016 2 次提交
  11. 09 5月, 2016 1 次提交
    • A
      nfs: per-name sillyunlink exclusion · 884be175
      Al Viro 提交于
      use d_alloc_parallel() for sillyunlink/lookup exclusion and
      explicit rwsem (nfs_rmdir() being a writer and nfs_call_unlink() -
      a reader) for rmdir/sillyunlink one.
      
      That ought to make lookup/readdir/!O_CREAT atomic_open really
      parallel on NFS.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      884be175
  12. 03 5月, 2016 1 次提交
  13. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  14. 27 3月, 2016 1 次提交
  15. 14 3月, 2016 1 次提交
  16. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  17. 15 1月, 2016 1 次提交
    • A
      Make sure that highmem pages are not added to symlink page cache · e8ecde25
      Al Viro 提交于
      inode_nohighmem() is sufficient to make sure that page_get_link()
      won't try to allocate a highmem page.  Moreover, it is sufficient
      to make sure that page_symlink/__page_symlink won't do the same
      thing.  However, any filesystem that manually preseeds the symlink's
      page cache upon symlink(2) needs to make sure that the page it
      inserts there won't be a highmem one.
      
      Fortunately, only nfs and shmem have run afoul of that...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e8ecde25
  18. 29 12月, 2015 2 次提交
  19. 04 11月, 2015 1 次提交
  20. 18 8月, 2015 2 次提交
  21. 01 7月, 2015 1 次提交
  22. 24 6月, 2015 1 次提交
  23. 24 4月, 2015 1 次提交
    • B
      NFS: Don't attempt to decode missing directory entries · ce85cfbe
      Benjamin Coddington 提交于
      If a READDIR reply comes back without any page data, avoid a NULL pointer
      dereference in xdr_copy_to_scratch().
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
      IP: [<ffffffff813a378d>] memcpy+0xd/0x110
      ...
      Call Trace:
      	? xdr_inline_decode+0x7a/0xb0 [sunrpc]
      	nfs3_decode_dirent+0x73/0x320 [nfsv3]
      	nfs_readdir_page_filler+0xd5/0x4e0 [nfs]
      	? nfs3_rpc_wrapper.constprop.9+0x42/0xc0 [nfsv3]
      	nfs_readdir_xdr_to_array+0x1fa/0x330 [nfs]
      	? mem_cgroup_commit_charge+0xac/0x160
      	? nfs_readdir_xdr_to_array+0x330/0x330 [nfs]
      	nfs_readdir_filler+0x22/0x90 [nfs]
      	do_read_cache_page+0x7e/0x1a0
      	read_cache_page+0x1c/0x20
      	nfs_readdir+0x18e/0x660 [nfs]
      	? nfs3_xdr_dec_getattr3res+0x80/0x80 [nfsv3]
      	iterate_dir+0x97/0x130
      	SyS_getdents+0x94/0x120
      	? fillonedir+0xd0/0xd0
      	system_call_fastpath+0x12/0x17
      Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      ce85cfbe
  24. 16 4月, 2015 1 次提交
  25. 02 3月, 2015 2 次提交
  26. 20 11月, 2014 2 次提交
  27. 05 11月, 2014 1 次提交
  28. 09 10月, 2014 2 次提交
  29. 04 8月, 2014 4 次提交
    • N
      NFS: fix two problems in lookup_revalidate in RCU-walk · 50d77739
      NeilBrown 提交于
      1/ rcu_dereference isn't correct: that field isn't
         RCU protected.   It could potentially change at any time
         so ACCESS_ONCE might be justified.
      
         changes to ->d_parent are protected by ->d_seq.  However
         that isn't always checked after ->d_revalidate is called,
         so it is safest to keep the double-check that ->d_parent
         hasn't changed at the end of these functions.
      
      2/ in nfs4_lookup_revalidate, "->d_parent" was forgotten.
         So 'parent' was not the parent of 'dentry'.
         This fails safe is the context is that dentry->d_inode is
         NULL, and the result of parent->d_inode being NULL is
         that ECHILD is returned, which is always safe.
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      50d77739
    • N
      NFS: allow lockless access to access_cache · f682a398
      NeilBrown 提交于
      The access cache is used during RCU-walk path lookups, so it is best
      to avoid locking if possible as taking a lock kills concurrency.
      
      The rbtree is not rcu-safe and cannot easily be made so.
      Instead we simply check the last (i.e. most recent) entry on the LRU
      list.  If this doesn't match, then we return -ECHILD and retry in
      lock/refcount mode.
      
      This requires freeing the nfs_access_entry struct with rcu, and
      requires using rcu access primatives when adding entries to the lru, and
      when examining the last entry.
      
      Calling put_rpccred before kfree_rcu looks a bit odd, but as
      put_rpccred already provides rcu protection, we know that the cred will
      not actually be freed until the next grace period, so any concurrent
      access will be safe.
      
      This patch provides about 5% performance improvement on a stat-heavy
      synthetic work load with 4 threads on a 2-core CPU.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      f682a398
    • N
      NFS: teach nfs_lookup_verify_inode to handle LOOKUP_RCU · 1fa1e384
      NeilBrown 提交于
      It fails with -ECHILD rather than make an RPC call.
      
      This allows nfs_lookup_revalidate to call it in RCU-walk mode.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      1fa1e384
    • N
      NFS: teach nfs_neg_need_reval to understand LOOKUP_RCU · 912a108d
      NeilBrown 提交于
      This requires nfs_check_verifier to take an rcu_walk flag, and requires
      an rcu version of nfs_revalidate_inode which returns -ECHILD rather
      than making an RPC call.
      
      With this, nfs_lookup_revalidate can call nfs_neg_need_reval in
      RCU-walk mode.
      
      We can also move the LOOKUP_RCU check past the nfs_check_verifier()
      call in nfs_lookup_revalidate.
      
      If RCU_WALK prevents nfs_check_verifier or nfs_neg_need_reval from
      doing a full check, they return a status indicating that a revalidation
      is required.  As this revalidation will not be possible in RCU_WALK
      mode, -ECHILD will ultimately be returned, which is the desired result.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      912a108d