1. 14 2月, 2015 1 次提交
  2. 04 2月, 2015 5 次提交
  3. 21 1月, 2015 1 次提交
  4. 17 1月, 2015 3 次提交
  5. 25 11月, 2014 2 次提交
  6. 13 11月, 2014 1 次提交
  7. 25 9月, 2014 3 次提交
    • N
      NFS: avoid waiting at all in nfs_release_page when congested. · 353db796
      NeilBrown 提交于
      If nfs_release_page() is called on a sequence of pages which are all
      in the same file which is blocked on COMMIT, each page could
      contribute a 1 second delay which could be come excessive.  I have
      seen delays of as much as 208 seconds.
      
      To keep the delay to one second, mark the bdi as write-congested
      if the commit didn't finished.  Once it does finish, the
      write-congested flag will be cleared by nfs_commit_release_pages().
      
      With this, the longest total delay in try_to_free_pages that I have
      seen is under 3 seconds.  With no waiting in nfs_release_page at all
      I have seen delays of nearly 1.5 seconds.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: NJeff Layton <jlayton@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      353db796
    • N
      NFS: avoid deadlocks with loop-back mounted NFS filesystems. · 95905446
      NeilBrown 提交于
      Support for loop-back mounted NFS filesystems is useful when NFS is
      used to access shared storage in a high-availability cluster.
      
      If the node running the NFS server fails, some other node can mount the
      filesystem and start providing NFS service.  If that node already had
      the filesystem NFS mounted, it will now have it loop-back mounted.
      
      nfsd can suffer a deadlock when allocating memory and entering direct
      reclaim.
      While direct reclaim does not write to the NFS filesystem it can send
      and wait for a COMMIT through nfs_release_page().
      
      This patch modifies nfs_release_page() to wait a limited time for the
      commit to complete - one second.  If the commit doesn't complete
      in this time, nfs_release_page() will fail.  This means it might now
      fail in some cases where it wouldn't before.  These cases are only
      when 'gfp' includes '__GFP_WAIT'.
      
      nfs_release_page() is only called by try_to_release_page(), and that
      can only be called on an NFS page with required 'gfp' flags from
       - page_cache_pipe_buf_steal() in splice.c
       - shrink_page_list() in vmscan.c
       - invalidate_inode_pages2_range() in truncate.c
      
      The first two handle failure quite safely.  The last is only called
      after ->launder_page() has been called, and that will have waited
      for the commit to finish already.
      
      So aborting if the commit takes longer than 1 second is perfectly safe.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: NJeff Layton <jlayton@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      95905446
    • N
      NFS: don't use STABLE writes during writeback. · e87b4c7a
      NeilBrown 提交于
      commit b31268ac
        FS: Use stable writes when not doing a bulk flush
      
      was a bit heavy handed.
      The particular problem that lead to this patch was that
      small writes to an O_SYNC file we being written as UNSTABLE writes
      followed by a commit.
      This is appropriate for large writes (which require multiple NFS
      requests) but for small writes (single NFS request), using
      NFS_FILE_SYNC is more efficient.
      
      So that patch causes the code to select between the two methods
      depending on how many nfs requests get generated.
      
      Unfortunately this ends up applying to non O_SYNC writes as well.
      In particular if you memory-map a file and update random pages, then
      when they are eventually written out by writeback they will go as
      NFS_FILE_SYNC.  This is inefficient and slows down the application.
      
      So: only set FLUSH_COND_STABLE when wbc->sync_mode is WB_SYNC_ALL.
      With this patch:
       O_SYNC writes are NFS_FILE_SYNC for single requests, and NFS_UNSTABLE
          followed by COMMIT for multiple requests
       Writing immediately before close of fsync follow the same pattern.
       Non-O_SYNC writes without an fsync of close eventually get flushed
       out as UNSTABLE and a commit follows eventually as appropriate.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      e87b4c7a
  8. 13 9月, 2014 1 次提交
  9. 11 9月, 2014 2 次提交
  10. 23 8月, 2014 3 次提交
  11. 04 8月, 2014 4 次提交
  12. 16 7月, 2014 1 次提交
    • N
      sched: Remove proliferation of wait_on_bit() action functions · 74316201
      NeilBrown 提交于
      The current "wait_on_bit" interface requires an 'action'
      function to be provided which does the actual waiting.
      There are over 20 such functions, many of them identical.
      Most cases can be satisfied by one of just two functions, one
      which uses io_schedule() and one which just uses schedule().
      
      So:
       Rename wait_on_bit and        wait_on_bit_lock to
              wait_on_bit_action and wait_on_bit_lock_action
       to make it explicit that they need an action function.
      
       Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
       which are *not* given an action function but implicitly use
       a standard one.
       The decision to error-out if a signal is pending is now made
       based on the 'mode' argument rather than being encoded in the action
       function.
      
       All instances of the old wait_on_bit and wait_on_bit_lock which
       can use the new version have been changed accordingly and their
       action functions have been discarded.
       wait_on_bit{_lock} does not return any specific error code in the
       event of a signal so the caller must check for non-zero and
       interpolate their own error code as appropriate.
      
      The wait_on_bit() call in __fscache_wait_on_invalidate() was
      ambiguous as it specified TASK_UNINTERRUPTIBLE but used
      fscache_wait_bit_interruptible as an action function.
      David Howells confirms this should be uniformly
      "uninterruptible"
      
      The main remaining user of wait_on_bit{,_lock}_action is NFS
      which needs to use a freezer-aware schedule() call.
      
      A comment in fs/gfs2/glock.c notes that having multiple 'action'
      functions is useful as they display differently in the 'wchan'
      field of 'ps'. (and /proc/$PID/wchan).
      As the new bit_wait{,_io} functions are tagged "__sched", they
      will not show up at all, but something higher in the stack.  So
      the distinction will still be visible, only with different
      function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
      gfs2/glock.c case).
      
      Since first version of this patch (against 3.15) two new action
      functions appeared, on in NFS and one in CIFS.  CIFS also now
      uses an action function that makes the same freezer aware
      schedule call as NFS.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
      Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brownSigned-off-by: NIngo Molnar <mingo@kernel.org>
      74316201
  13. 13 7月, 2014 5 次提交
  14. 25 6月, 2014 5 次提交
  15. 29 5月, 2014 3 次提交
    • W
      nfs: page group support in nfs_mark_uptodate · d72ddcba
      Weston Andros Adamson 提交于
      Change how nfs_mark_uptodate checks to see if writes cover a whole page.
      
      This patch should have no effect yet since all page groups currently
      have one request, but will come into play when pg_test functions are
      modified to split pages into sub-page regions.
      Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      d72ddcba
    • W
      nfs: page group syncing in write path · 20633f04
      Weston Andros Adamson 提交于
      Operations that modify state for a whole page must be syncronized across
      all requests within a page group. In the write path, this is calling
      end_page_writeback and removing the head request from an inode.
      Both of these operations should not be called until all requests
      in a page group have reached the point where they would call them.
      
      This patch should have no effect yet since all page groups currently
      have one request, but will come into play when pg_test functions are
      modified to split pages into sub-page regions.
      Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      20633f04
    • W
      nfs: add support for multiple nfs reqs per page · 2bfc6e56
      Weston Andros Adamson 提交于
      Add "page groups" - a circular list of nfs requests (struct nfs_page)
      that all reference the same page. This gives nfs read and write paths
      the ability to account for sub-page regions independently.  This
      somewhat follows the design of struct buffer_head's sub-page
      accounting.
      
      Only "head" requests are ever added/removed from the inode list in
      the buffered write path. "head" and "sub" requests are treated the
      same through the read path and the rest of the write/commit path.
      Requests are given an extra reference across the life of the list.
      
      Page groups are never rejoined after being split. If the read/write
      request fails and the client falls back to another path (ie revert
      to MDS in PNFS case), the already split requests are pushed through
      the recoalescing code again, which may split them further and then
      coalesce them into properly sized requests on the wire. Fragmentation
      shouldn't be a problem with the current design, because we flush all
      requests in page group when a non-contiguous request is added, so
      the only time resplitting should occur is on a resend of a read or
      write.
      
      This patch lays the groundwork for sub-page splitting, but does not
      actually do any splitting. For now all page groups have one request
      as pg_test functions don't yet split pages. There are several related
      patches that are needed support multiple requests per page group.
      Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      2bfc6e56