1. 18 8月, 2015 2 次提交
  2. 23 7月, 2015 3 次提交
  3. 01 7月, 2015 1 次提交
  4. 02 6月, 2015 1 次提交
    • N
      NFS: report more appropriate block size for directories. · 7ef5ca4f
      NeilBrown 提交于
      In glibc 2.21 (and several previous), a call to opendir() will
      result in a 32K (BUFSIZ*4) buffer being allocated and passed to
      getdents.
      
      However a call to fdopendir() results in an 'fstat' request to
      determine block size and a matching buffer allocated for subsequent
      use with getdents.  This will typically be 1M.
      
      The first getdents call on an NFS directory will always use
      READDIR_PLUS (or NFSv4 equivalent) if available.  Subsequent getdents
      calls only use this more expensive version if some 'stat' requests are
      made between the getdents calls.
      
      For this reason it is good to keep at least that first getdents call
      relatively short.  When fdopendir() and readdir() is used on a large
      directory, it takes approximately 32 times as long to complete as
      using "opendir".  Current versions of 'find' use fdopendir() and
      demonstrate this slowness.
      
      'stat' on a directory currently returns the 'wsize'.  This number has
      no meaning on directories.
      Actual READDIR requests are limited to ->dtsize, which itself is
      capped at 4 pages, coincidently the same as BUFSIZ*4.
      So this is a meaningful number to use as the blocksize on directories,
      and has the effect of making 'find' on large directories go a lot
      faster.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      7ef5ca4f
  5. 24 4月, 2015 3 次提交
    • F
      nfs: Remove unneeded casts in nfs · c456aacf
      Firo Yang 提交于
      Don't unnecessarily cast allocation return value in
      fs/nfs/inode.c::nfs_alloc_inode().
      Signed-off-by: NFiro Yang <firogm@gmail.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      c456aacf
    • A
      nfs: Fetch MOUNTED_ON_FILEID when updating an inode · ea96d1ec
      Anna Schumaker 提交于
      2ef47eb1 (NFS: Fix use of nfs_attr_use_mounted_on_fileid()) was a good
      start to fixing a circular directory structure warning for NFS v4
      "junctioned" mountpoints.  Unfortunately, further testing continued to
      generate this error.
      
      My server is configured like this:
      
      anna@nfsd ~ % df
      Filesystem      Size  Used Avail Use% Mounted on
      /dev/vda1       9.1G  2.0G  6.5G  24% /
      /dev/vdc1      1014M   33M  982M   4% /exports
      /dev/vdc2      1014M   33M  982M   4% /exports/vol1
      /dev/vdc3      1014M   33M  982M   4% /exports/vol1/vol2
      
      anna@nfsd ~ % cat /etc/exports
      /exports/          *(rw,async,no_subtree_check,no_root_squash)
      /exports/vol1/     *(rw,async,no_subtree_check,no_root_squash)
      /exports/vol1/vol2 *(rw,async,no_subtree_check,no_root_squash)
      
      I've been running chown across the entire mountpoint twice in a row to
      hit this problem.  The first run succeeds, but the second one fails with
      the circular directory warning along with:
      
      anna@client ~ % dmesg
      [Apr 3 14:28] NFS: server 192.168.100.204 error: fileid changed
                    fsid 0:39: expected fileid 0x100080, got 0x80
      
      WHere 0x80 is the mountpoint's fileid and 0x100080 is the mounted-on
      fileid.
      
      This patch fixes the issue by requesting an updated mounted-on fileid
      from the server during nfs_update_inode(), and then checking that the
      fileid stored in the nfs_inode matches either the fileid or mounted-on
      fileid returned by the server.
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      ea96d1ec
    • A
      NFS: Don't zap caches on fallocate() · 9a51940b
      Anna Schumaker 提交于
      This patch adds a GETATTR to the end of ALLOCATE and DEALLOCATE
      operations so we can set the updated inode size and change attribute
      directly.  DEALLOCATE will still need to release pagecache pages, so
      nfs42_proc_deallocate() now calls truncate_pagecache_range() before
      contacting the server.
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      9a51940b
  6. 16 4月, 2015 1 次提交
  7. 28 3月, 2015 3 次提交
  8. 04 3月, 2015 2 次提交
    • T
      NFS: Don't write enable new pages while an invalidation is proceeding · ef070dcb
      Trond Myklebust 提交于
      nfs_vm_page_mkwrite() should wait until the page cache invalidation
      is finished. This is the second patch in a 2 patch series to deprecate
      the NFS client's reliance on nfs_release_page() in the context of
      nfs_invalidate_mapping().
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      ef070dcb
    • T
      NFS: Fix a regression in the read() syscall · 874f9463
      Trond Myklebust 提交于
      When invalidating the page cache for a regular file, we want to first
      sync all dirty data to disk and then call invalidate_inode_pages2().
      The latter relies on nfs_launder_page() and nfs_release_page() to deal
      respectively with dirty pages, and unstable written pages.
      
      When commit 95905446 ("NFS: avoid deadlocks with loop-back mounted
      NFS filesystems.") changed the behaviour of nfs_release_page(), then it
      made it possible for invalidate_inode_pages2() to fail with an EBUSY.
      Unfortunately, that error is then propagated back to read().
      
      Let's therefore work around the problem for now by protecting the call
      to sync the data and invalidate_inode_pages2() so that they are atomic
      w.r.t. the addition of new writes.
      Later on, we can revisit whether or not we still need nfs_launder_page()
      and nfs_release_page().
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      874f9463
  9. 02 3月, 2015 8 次提交
  10. 14 2月, 2015 1 次提交
  11. 31 1月, 2015 1 次提交
  12. 22 1月, 2015 1 次提交
    • A
      NFS: Fix use of nfs_attr_use_mounted_on_fileid() · 2ef47eb1
      Anna Schumaker 提交于
      This function call was being optimized out during nfs_fhget(), leading
      to situations where we have a valid fileid but still want to use the
      mounted_on_fileid.  For example, imagine we have our server configured
      like this:
      
      server % df
      Filesystem      Size  Used Avail Use% Mounted on
      /dev/vda1       9.1G  6.5G  1.9G  78% /
      /dev/vdb1       487M  2.3M  456M   1% /exports
      /dev/vdc1       487M  2.3M  456M   1% /exports/vol1
      /dev/vdd1       487M  2.3M  456M   1% /exports/vol2
      
      If our client mounts /exports and tries to do a "chown -R" across the
      entire mountpoint, we will get a nasty message warning us about a circular
      directory structure.  Running chown with strace tells me that each directory
      has the same device and inode number:
      
      newfstatat(AT_FDCWD, "/nfs/", {st_dev=makedev(0, 38), st_ino=2, ...}) = 0
      newfstatat(4, "vol1", {st_dev=makedev(0, 38), st_ino=2, ...}) = 0
      newfstatat(4, "vol2", {st_dev=makedev(0, 38), st_ino=2, ...}) = 0
      
      With this patch the mounted_on_fileid values are used for st_ino, so the
      directory loop warning isn't reported.
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      2ef47eb1
  13. 21 1月, 2015 1 次提交
  14. 26 11月, 2014 1 次提交
  15. 25 11月, 2014 1 次提交
    • W
      NFS: fix subtle change in COMMIT behavior · cb1410c7
      Weston Andros Adamson 提交于
      Recent work in the pgio layer made it possible for there to be more than one
      request per page. This caused a subtle change in commit behavior, because
      write.c:nfs_commit_unstable_pages compares the number of *pages* waiting for
      writeback against the number of requests on a commit list to choose when to
      send a COMMIT in a non-blocking flush.
      
      This is probably hard to hit in normal operation - you have to be using
      rsize/wsize < PAGE_SIZE, or pnfs with lots of boundaries that are not page
      aligned to have a noticeable change in behavior.
      Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      cb1410c7
  16. 13 11月, 2014 1 次提交
  17. 01 10月, 2014 1 次提交
  18. 11 9月, 2014 1 次提交
  19. 05 8月, 2014 1 次提交
    • E
      NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes · 65b38851
      Eric W. Biederman 提交于
      The usage of pid_ns->child_reaper->nsproxy->net_ns in
      nfs_server_list_open and nfs_client_list_open is not safe.
      
      /proc for a pid namespace can remain mounted after the all of the
      process in that pid namespace have exited.  There are also times
      before the initial process in a pid namespace has started or after the
      initial process in a pid namespace has exited where
      pid_ns->child_reaper can be NULL or stale.  Making the idiom
      pid_ns->child_reaper->nsproxy a double whammy of problems.
      
      Luckily all that needs to happen is to move /proc/fs/nfsfs/servers and
      /proc/fs/nfsfs/volumes under /proc/net to /proc/net/nfsfs/servers and
      /proc/net/nfsfs/volumes and add a symlink from the original location,
      and to use seq_open_net as it has been designed.
      
      Cc: stable@vger.kernel.org
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      65b38851
  20. 04 8月, 2014 1 次提交
    • N
      NFS: teach nfs_neg_need_reval to understand LOOKUP_RCU · 912a108d
      NeilBrown 提交于
      This requires nfs_check_verifier to take an rcu_walk flag, and requires
      an rcu version of nfs_revalidate_inode which returns -ECHILD rather
      than making an RPC call.
      
      With this, nfs_lookup_revalidate can call nfs_neg_need_reval in
      RCU-walk mode.
      
      We can also move the LOOKUP_RCU check past the nfs_check_verifier()
      call in nfs_lookup_revalidate.
      
      If RCU_WALK prevents nfs_check_verifier or nfs_neg_need_reval from
      doing a full check, they return a status indicating that a revalidation
      is required.  As this revalidation will not be possible in RCU_WALK
      mode, -ECHILD will ultimately be returned, which is the desired result.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      912a108d
  21. 16 7月, 2014 2 次提交
    • N
      sched: Allow wait_on_bit_action() functions to support a timeout · c1221321
      NeilBrown 提交于
      It is currently not possible for various wait_on_bit functions
      to implement a timeout.
      
      While the "action" function that is called to do the waiting
      could certainly use schedule_timeout(), there is no way to carry
      forward the remaining timeout after a false wake-up.
      As false-wakeups a clearly possible at least due to possible
      hash collisions in bit_waitqueue(), this is a real problem.
      
      The 'action' function is currently passed a pointer to the word
      containing the bit being waited on.  No current action functions
      use this pointer.  So changing it to something else will be a
      little noisy but will have no immediate effect.
      
      This patch changes the 'action' function to take a pointer to
      the "struct wait_bit_key", which contains a pointer to the word
      containing the bit so nothing is really lost.
      
      It also adds a 'private' field to "struct wait_bit_key", which
      is initialized to zero.
      
      An action function can now implement a timeout with something
      like
      
      static int timed_out_waiter(struct wait_bit_key *key)
      {
      	unsigned long waited;
      	if (key->private == 0) {
      		key->private = jiffies;
      		if (key->private == 0)
      			key->private -= 1;
      	}
      	waited = jiffies - key->private;
      	if (waited > 10 * HZ)
      		return -EAGAIN;
      	schedule_timeout(waited - 10 * HZ);
      	return 0;
      }
      
      If any other need for context in a waiter were found it would be
      easy to use ->private for some other purpose, or even extend
      "struct wait_bit_key".
      
      My particular need is to support timeouts in nfs_release_page()
      to avoid deadlocks with loopback mounted NFS.
      
      While wait_on_bit_timeout() would be a cleaner interface, it
      will not meet my need.  I need the timeout to be sensitive to
      the state of the connection with the server, which could change.
       So I need to use an 'action' interface.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140707051604.28027.41257.stgit@notabene.brownSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c1221321
    • N
      sched: Remove proliferation of wait_on_bit() action functions · 74316201
      NeilBrown 提交于
      The current "wait_on_bit" interface requires an 'action'
      function to be provided which does the actual waiting.
      There are over 20 such functions, many of them identical.
      Most cases can be satisfied by one of just two functions, one
      which uses io_schedule() and one which just uses schedule().
      
      So:
       Rename wait_on_bit and        wait_on_bit_lock to
              wait_on_bit_action and wait_on_bit_lock_action
       to make it explicit that they need an action function.
      
       Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
       which are *not* given an action function but implicitly use
       a standard one.
       The decision to error-out if a signal is pending is now made
       based on the 'mode' argument rather than being encoded in the action
       function.
      
       All instances of the old wait_on_bit and wait_on_bit_lock which
       can use the new version have been changed accordingly and their
       action functions have been discarded.
       wait_on_bit{_lock} does not return any specific error code in the
       event of a signal so the caller must check for non-zero and
       interpolate their own error code as appropriate.
      
      The wait_on_bit() call in __fscache_wait_on_invalidate() was
      ambiguous as it specified TASK_UNINTERRUPTIBLE but used
      fscache_wait_bit_interruptible as an action function.
      David Howells confirms this should be uniformly
      "uninterruptible"
      
      The main remaining user of wait_on_bit{,_lock}_action is NFS
      which needs to use a freezer-aware schedule() call.
      
      A comment in fs/gfs2/glock.c notes that having multiple 'action'
      functions is useful as they display differently in the 'wchan'
      field of 'ps'. (and /proc/$PID/wchan).
      As the new bit_wait{,_io} functions are tagged "__sched", they
      will not show up at all, but something higher in the stack.  So
      the distinction will still be visible, only with different
      function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
      gfs2/glock.c case).
      
      Since first version of this patch (against 3.15) two new action
      functions appeared, on in NFS and one in CIFS.  CIFS also now
      uses an action function that makes the same freezer aware
      schedule call as NFS.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
      Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brownSigned-off-by: NIngo Molnar <mingo@kernel.org>
      74316201
  22. 25 6月, 2014 2 次提交
  23. 18 4月, 2014 1 次提交