1. 15 4月, 2015 1 次提交
    • K
      page_writeback: clean up mess around cancel_dirty_page() · b9ea2515
      Konstantin Khlebnikov 提交于
      This patch replaces cancel_dirty_page() with a helper function
      account_page_cleaned() which only updates counters.  It's called from
      truncate_complete_page() and from try_to_free_buffers() (hack for ext3).
      Page is locked in both cases, page-lock protects against concurrent
      dirtiers: see commit 2d6d7f98 ("mm: protect set_page_dirty() from
      ongoing truncation").
      
      Delete_from_page_cache() shouldn't be called for dirty pages, they must
      be handled by caller (either written or truncated).  This patch treats
      final dirty accounting fixup at the end of __delete_from_page_cache() as
      a debug check and adds WARN_ON_ONCE() around it.  If something removes
      dirty pages without proper handling that might be a bug and unwritten
      data might be lost.
      
      Hugetlbfs has no dirty pages accounting, ClearPageDirty() is enough
      here.
      
      cancel_dirty_page() in nfs_wb_page_cancel() is redundant.  This is
      helper for nfs_invalidate_page() and it's called only in case complete
      invalidation.
      
      The mess was started in v2.6.20 after commits 46d2277c ("Clean up
      and make try_to_free_buffers() not race with dirty pages") and
      3e67c098 ("truncate: clear page dirtiness before running
      try_to_free_buffers()") first was reverted right in v2.6.20 in commit
      ecdfc978 ("Resurrect 'try_to_free_buffers()' VM hackery"), second in
      v2.6.25 commit a2b34564 ("Fix dirty page accounting leak with ext3
      data=journal").
      
      Custom fixes were introduced between these points.  NFS in v2.6.23, commit
      1b3b4a1a ("NFS: Fix a write request leak in nfs_invalidate_page()").
      Kludge in __delete_from_page_cache() in v2.6.24, commit 3a692790 ("Do
      dirty page accounting when removing a page from the page cache").  Since
      v2.6.25 all of them are redundant.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9ea2515
  2. 02 3月, 2015 1 次提交
  3. 14 2月, 2015 2 次提交
  4. 04 2月, 2015 5 次提交
  5. 21 1月, 2015 1 次提交
  6. 17 1月, 2015 3 次提交
  7. 25 11月, 2014 2 次提交
  8. 13 11月, 2014 1 次提交
  9. 25 9月, 2014 3 次提交
    • N
      NFS: avoid waiting at all in nfs_release_page when congested. · 353db796
      NeilBrown 提交于
      If nfs_release_page() is called on a sequence of pages which are all
      in the same file which is blocked on COMMIT, each page could
      contribute a 1 second delay which could be come excessive.  I have
      seen delays of as much as 208 seconds.
      
      To keep the delay to one second, mark the bdi as write-congested
      if the commit didn't finished.  Once it does finish, the
      write-congested flag will be cleared by nfs_commit_release_pages().
      
      With this, the longest total delay in try_to_free_pages that I have
      seen is under 3 seconds.  With no waiting in nfs_release_page at all
      I have seen delays of nearly 1.5 seconds.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: NJeff Layton <jlayton@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      353db796
    • N
      NFS: avoid deadlocks with loop-back mounted NFS filesystems. · 95905446
      NeilBrown 提交于
      Support for loop-back mounted NFS filesystems is useful when NFS is
      used to access shared storage in a high-availability cluster.
      
      If the node running the NFS server fails, some other node can mount the
      filesystem and start providing NFS service.  If that node already had
      the filesystem NFS mounted, it will now have it loop-back mounted.
      
      nfsd can suffer a deadlock when allocating memory and entering direct
      reclaim.
      While direct reclaim does not write to the NFS filesystem it can send
      and wait for a COMMIT through nfs_release_page().
      
      This patch modifies nfs_release_page() to wait a limited time for the
      commit to complete - one second.  If the commit doesn't complete
      in this time, nfs_release_page() will fail.  This means it might now
      fail in some cases where it wouldn't before.  These cases are only
      when 'gfp' includes '__GFP_WAIT'.
      
      nfs_release_page() is only called by try_to_release_page(), and that
      can only be called on an NFS page with required 'gfp' flags from
       - page_cache_pipe_buf_steal() in splice.c
       - shrink_page_list() in vmscan.c
       - invalidate_inode_pages2_range() in truncate.c
      
      The first two handle failure quite safely.  The last is only called
      after ->launder_page() has been called, and that will have waited
      for the commit to finish already.
      
      So aborting if the commit takes longer than 1 second is perfectly safe.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: NJeff Layton <jlayton@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      95905446
    • N
      NFS: don't use STABLE writes during writeback. · e87b4c7a
      NeilBrown 提交于
      commit b31268ac
        FS: Use stable writes when not doing a bulk flush
      
      was a bit heavy handed.
      The particular problem that lead to this patch was that
      small writes to an O_SYNC file we being written as UNSTABLE writes
      followed by a commit.
      This is appropriate for large writes (which require multiple NFS
      requests) but for small writes (single NFS request), using
      NFS_FILE_SYNC is more efficient.
      
      So that patch causes the code to select between the two methods
      depending on how many nfs requests get generated.
      
      Unfortunately this ends up applying to non O_SYNC writes as well.
      In particular if you memory-map a file and update random pages, then
      when they are eventually written out by writeback they will go as
      NFS_FILE_SYNC.  This is inefficient and slows down the application.
      
      So: only set FLUSH_COND_STABLE when wbc->sync_mode is WB_SYNC_ALL.
      With this patch:
       O_SYNC writes are NFS_FILE_SYNC for single requests, and NFS_UNSTABLE
          followed by COMMIT for multiple requests
       Writing immediately before close of fsync follow the same pattern.
       Non-O_SYNC writes without an fsync of close eventually get flushed
       out as UNSTABLE and a commit follows eventually as appropriate.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      e87b4c7a
  10. 13 9月, 2014 1 次提交
  11. 11 9月, 2014 2 次提交
  12. 23 8月, 2014 3 次提交
  13. 04 8月, 2014 4 次提交
  14. 16 7月, 2014 1 次提交
    • N
      sched: Remove proliferation of wait_on_bit() action functions · 74316201
      NeilBrown 提交于
      The current "wait_on_bit" interface requires an 'action'
      function to be provided which does the actual waiting.
      There are over 20 such functions, many of them identical.
      Most cases can be satisfied by one of just two functions, one
      which uses io_schedule() and one which just uses schedule().
      
      So:
       Rename wait_on_bit and        wait_on_bit_lock to
              wait_on_bit_action and wait_on_bit_lock_action
       to make it explicit that they need an action function.
      
       Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
       which are *not* given an action function but implicitly use
       a standard one.
       The decision to error-out if a signal is pending is now made
       based on the 'mode' argument rather than being encoded in the action
       function.
      
       All instances of the old wait_on_bit and wait_on_bit_lock which
       can use the new version have been changed accordingly and their
       action functions have been discarded.
       wait_on_bit{_lock} does not return any specific error code in the
       event of a signal so the caller must check for non-zero and
       interpolate their own error code as appropriate.
      
      The wait_on_bit() call in __fscache_wait_on_invalidate() was
      ambiguous as it specified TASK_UNINTERRUPTIBLE but used
      fscache_wait_bit_interruptible as an action function.
      David Howells confirms this should be uniformly
      "uninterruptible"
      
      The main remaining user of wait_on_bit{,_lock}_action is NFS
      which needs to use a freezer-aware schedule() call.
      
      A comment in fs/gfs2/glock.c notes that having multiple 'action'
      functions is useful as they display differently in the 'wchan'
      field of 'ps'. (and /proc/$PID/wchan).
      As the new bit_wait{,_io} functions are tagged "__sched", they
      will not show up at all, but something higher in the stack.  So
      the distinction will still be visible, only with different
      function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
      gfs2/glock.c case).
      
      Since first version of this patch (against 3.15) two new action
      functions appeared, on in NFS and one in CIFS.  CIFS also now
      uses an action function that makes the same freezer aware
      schedule call as NFS.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
      Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brownSigned-off-by: NIngo Molnar <mingo@kernel.org>
      74316201
  15. 13 7月, 2014 5 次提交
  16. 25 6月, 2014 5 次提交