1. 28 6月, 2017 1 次提交
  2. 06 5月, 2017 1 次提交
  3. 21 4月, 2017 1 次提交
    • B
      NFS: switch back to to ->iterate() · b044f645
      Benjamin Coddington 提交于
      NFS has some optimizations for readdir to choose between using READDIR or
      READDIRPLUS based on workload, and which NFS operation to use is determined
      by subsequent interactions with lookup, d_revalidate, and getattr.
      
      Concurrent use of nfs_readdir() via ->iterate_shared() can cause those
      optimizations to repeatedly invalidate the pagecache used to store
      directory entries during readdir(), which causes some very bad performance
      for directories with many entries (more than about 10000).
      
      There's a couple ways to fix this in NFS, but no fix would be as simple as
      going back to ->iterate() to serialize nfs_readdir(), and neither fix I
      tested performed as well as going back to ->iterate().
      
      The first required taking the directory's i_lock for each entry, with the
      result of terrible contention.
      
      The second way adds another flag to the nfs_inode, and so keeps the
      optimizations working for large directories.  The difference from using
      ->iterate() here is that much more memory is consumed for a given workload
      without any performance gain.
      
      The workings of nfs_readdir() are such that concurrent users are serialized
      within read_cache_page() waiting to retrieve pages of entries from the
      server.  By serializing this work in iterate_dir() instead, contention for
      cache pages is reduced.  Waiting processes can have an uncontended pass at
      the entirety of the directory's pagecache once previous processes have
      completed filling it.
      
      v2 - Keep the bits needed for parallel lookup
      Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      b044f645
  4. 28 3月, 2017 1 次提交
  5. 09 2月, 2017 1 次提交
  6. 20 12月, 2016 2 次提交
  7. 10 12月, 2016 1 次提交
  8. 05 12月, 2016 1 次提交
  9. 03 12月, 2016 3 次提交
  10. 02 12月, 2016 1 次提交
    • N
      NFSv4: add flock_owner to open context · 532d4def
      NeilBrown 提交于
      An open file description (struct file) in a given process can be
      associated with two different lock owners.
      
      It can have a Posix lock owner which will be different in each process
      that has a fd on the file.
      It can have a Flock owner which will be the same in all processes.
      
      When searching for a lock stateid to use, we need to consider both of these
      owners
      
      So add a new "flock_owner" to the "nfs_open_context" (of which there
      is one for each open file description).
      
      This flock_owner does not need to be reference-counted as there is a
      1-1 relation between 'struct file' and nfs open contexts,
      and it will never be part of a list of contexts.  So there is no need
      for a 'flock_context' - just the owner is enough.
      
      The io_count included in the (Posix) lock_context provides no
      guarantee that all read-aheads that could use the state have
      completed, so not supporting it for flock locks in not a serious
      problem.  Synchronization between flock and read-ahead can be added
      later if needed.
      
      When creating an open_context for a non-openning create call, we don't have
      a 'struct file' to pass in, so the lock context gets initialized with
      a NULL owner, but this will never be used.
      
      The flock_owner is not used at all in this patch, that will come later.
      Acked-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      532d4def
  11. 28 9月, 2016 1 次提交
  12. 27 9月, 2016 1 次提交
    • M
      fs: make remaining filesystems use .rename2 · 1cd66c93
      Miklos Szeredi 提交于
      This is trivial to do:
      
       - add flags argument to foo_rename()
       - check if flags is zero
       - assign foo_rename() to .rename2 instead of .rename
      
      This doesn't mean it's impossible to support RENAME_NOREPLACE for these
      filesystems, but it is not trivial, like for local filesystems.
      RENAME_NOREPLACE must guarantee atomicity (i.e. it shouldn't be possible
      for a file to be created on one host while it is overwritten by rename on
      another host).
      
      Filesystems converted:
      
      9p, afs, ceph, coda, ecryptfs, kernfs, lustre, ncpfs, nfs, ocfs2, orangefs.
      
      After this, we can get rid of the duplicate interfaces for rename.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: David Howells <dhowells@redhat.com> [AFS]
      Acked-by: NMike Marshall <hubcap@omnibond.com>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Ilya Dryomov <idryomov@gmail.com>
      Cc: Jan Harkes <jaharkes@cs.cmu.edu>
      Cc: Tyler Hicks <tyhicks@canonical.com>
      Cc: Oleg Drokin <oleg.drokin@intel.com>
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      1cd66c93
  13. 23 9月, 2016 1 次提交
  14. 06 7月, 2016 2 次提交
  15. 27 6月, 2016 1 次提交
    • A
      make nfs_atomic_open() call d_drop() on all ->open_context() errors. · d20cb71d
      Al Viro 提交于
      In "NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code"
      unconditional d_drop() after the ->open_context() had been removed.  It had
      been correct for success cases (there ->open_context() itself had been doing
      dcache manipulations), but not for error ones.  Only one of those (ENOENT)
      got a compensatory d_drop() added in that commit, but in fact it should've
      been done for all errors.  As it is, the case of O_CREAT non-exclusive open
      on a hashed negative dentry racing with e.g. symlink creation from another
      client ended up with ->open_context() getting an error and proceeding to
      call nfs_lookup().  On a hashed dentry, which would've instantly triggered
      BUG_ON() in d_materialise_unique() (or, these days, its equivalent in
      d_splice_alias()).
      
      Cc: stable@vger.kernel.org # v3.10+
      Tested-by: NOleg Drokin <green@linuxhacker.ru>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      d20cb71d
  16. 25 6月, 2016 2 次提交
  17. 16 6月, 2016 1 次提交
  18. 11 6月, 2016 1 次提交
    • L
      vfs: make the string hashes salt the hash · 8387ff25
      Linus Torvalds 提交于
      We always mixed in the parent pointer into the dentry name hash, but we
      did it late at lookup time.  It turns out that we can simplify that
      lookup-time action by salting the hash with the parent pointer early
      instead of late.
      
      A few other users of our string hashes also wanted to mix in their own
      pointers into the hash, and those are updated to use the same mechanism.
      
      Hash users that don't have any particular initial salt can just use the
      NULL pointer as a no-salt.
      
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8387ff25
  19. 30 5月, 2016 2 次提交
  20. 09 5月, 2016 1 次提交
    • A
      nfs: per-name sillyunlink exclusion · 884be175
      Al Viro 提交于
      use d_alloc_parallel() for sillyunlink/lookup exclusion and
      explicit rwsem (nfs_rmdir() being a writer and nfs_call_unlink() -
      a reader) for rmdir/sillyunlink one.
      
      That ought to make lookup/readdir/!O_CREAT atomic_open really
      parallel on NFS.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      884be175
  21. 03 5月, 2016 1 次提交
  22. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  23. 27 3月, 2016 1 次提交
  24. 14 3月, 2016 1 次提交
  25. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  26. 15 1月, 2016 1 次提交
    • A
      Make sure that highmem pages are not added to symlink page cache · e8ecde25
      Al Viro 提交于
      inode_nohighmem() is sufficient to make sure that page_get_link()
      won't try to allocate a highmem page.  Moreover, it is sufficient
      to make sure that page_symlink/__page_symlink won't do the same
      thing.  However, any filesystem that manually preseeds the symlink's
      page cache upon symlink(2) needs to make sure that the page it
      inserts there won't be a highmem one.
      
      Fortunately, only nfs and shmem have run afoul of that...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e8ecde25
  27. 29 12月, 2015 2 次提交
  28. 04 11月, 2015 1 次提交
  29. 18 8月, 2015 2 次提交
  30. 01 7月, 2015 1 次提交
  31. 24 6月, 2015 1 次提交
  32. 24 4月, 2015 1 次提交
    • B
      NFS: Don't attempt to decode missing directory entries · ce85cfbe
      Benjamin Coddington 提交于
      If a READDIR reply comes back without any page data, avoid a NULL pointer
      dereference in xdr_copy_to_scratch().
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
      IP: [<ffffffff813a378d>] memcpy+0xd/0x110
      ...
      Call Trace:
      	? xdr_inline_decode+0x7a/0xb0 [sunrpc]
      	nfs3_decode_dirent+0x73/0x320 [nfsv3]
      	nfs_readdir_page_filler+0xd5/0x4e0 [nfs]
      	? nfs3_rpc_wrapper.constprop.9+0x42/0xc0 [nfsv3]
      	nfs_readdir_xdr_to_array+0x1fa/0x330 [nfs]
      	? mem_cgroup_commit_charge+0xac/0x160
      	? nfs_readdir_xdr_to_array+0x330/0x330 [nfs]
      	nfs_readdir_filler+0x22/0x90 [nfs]
      	do_read_cache_page+0x7e/0x1a0
      	read_cache_page+0x1c/0x20
      	nfs_readdir+0x18e/0x660 [nfs]
      	? nfs3_xdr_dec_getattr3res+0x80/0x80 [nfsv3]
      	iterate_dir+0x97/0x130
      	SyS_getdents+0x94/0x120
      	? fillonedir+0xd0/0xd0
      	system_call_fastpath+0x12/0x17
      Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      ce85cfbe