1. 28 3月, 2015 2 次提交
  2. 04 3月, 2015 2 次提交
    • T
      NFS: Don't write enable new pages while an invalidation is proceeding · ef070dcb
      Trond Myklebust 提交于
      nfs_vm_page_mkwrite() should wait until the page cache invalidation
      is finished. This is the second patch in a 2 patch series to deprecate
      the NFS client's reliance on nfs_release_page() in the context of
      nfs_invalidate_mapping().
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      ef070dcb
    • T
      NFS: Fix a regression in the read() syscall · 874f9463
      Trond Myklebust 提交于
      When invalidating the page cache for a regular file, we want to first
      sync all dirty data to disk and then call invalidate_inode_pages2().
      The latter relies on nfs_launder_page() and nfs_release_page() to deal
      respectively with dirty pages, and unstable written pages.
      
      When commit 95905446 ("NFS: avoid deadlocks with loop-back mounted
      NFS filesystems.") changed the behaviour of nfs_release_page(), then it
      made it possible for invalidate_inode_pages2() to fail with an EBUSY.
      Unfortunately, that error is then propagated back to read().
      
      Let's therefore work around the problem for now by protecting the call
      to sync the data and invalidate_inode_pages2() so that they are atomic
      w.r.t. the addition of new writes.
      Later on, we can revisit whether or not we still need nfs_launder_page()
      and nfs_release_page().
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      874f9463
  3. 02 3月, 2015 1 次提交
  4. 11 2月, 2015 1 次提交
  5. 14 10月, 2014 1 次提交
  6. 25 9月, 2014 3 次提交
    • N
      NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page() · 1aff5256
      NeilBrown 提交于
      Now that nfs_release_page() doesn't block indefinitely, other deadlock
      avoidance mechanisms aren't needed.
       - it doesn't hurt for kswapd to block occasionally.  If it doesn't
         want to block it would clear __GFP_WAIT.  The current_is_kswapd()
         was only added to avoid deadlocks and we have a new approach for
         that.
       - memory allocation in the SUNRPC layer can very rarely try to
         ->releasepage() a page it is trying to handle.  The deadlock
         is removed as nfs_release_page() doesn't block indefinitely.
      
      So we don't need to set PF_FSTRANS for sunrpc network operations any
      more.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: NJeff Layton <jlayton@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      1aff5256
    • N
      NFS: avoid waiting at all in nfs_release_page when congested. · 353db796
      NeilBrown 提交于
      If nfs_release_page() is called on a sequence of pages which are all
      in the same file which is blocked on COMMIT, each page could
      contribute a 1 second delay which could be come excessive.  I have
      seen delays of as much as 208 seconds.
      
      To keep the delay to one second, mark the bdi as write-congested
      if the commit didn't finished.  Once it does finish, the
      write-congested flag will be cleared by nfs_commit_release_pages().
      
      With this, the longest total delay in try_to_free_pages that I have
      seen is under 3 seconds.  With no waiting in nfs_release_page at all
      I have seen delays of nearly 1.5 seconds.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: NJeff Layton <jlayton@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      353db796
    • N
      NFS: avoid deadlocks with loop-back mounted NFS filesystems. · 95905446
      NeilBrown 提交于
      Support for loop-back mounted NFS filesystems is useful when NFS is
      used to access shared storage in a high-availability cluster.
      
      If the node running the NFS server fails, some other node can mount the
      filesystem and start providing NFS service.  If that node already had
      the filesystem NFS mounted, it will now have it loop-back mounted.
      
      nfsd can suffer a deadlock when allocating memory and entering direct
      reclaim.
      While direct reclaim does not write to the NFS filesystem it can send
      and wait for a COMMIT through nfs_release_page().
      
      This patch modifies nfs_release_page() to wait a limited time for the
      commit to complete - one second.  If the commit doesn't complete
      in this time, nfs_release_page() will fail.  This means it might now
      fail in some cases where it wouldn't before.  These cases are only
      when 'gfp' includes '__GFP_WAIT'.
      
      nfs_release_page() is only called by try_to_release_page(), and that
      can only be called on an NFS page with required 'gfp' flags from
       - page_cache_pipe_buf_steal() in splice.c
       - shrink_page_list() in vmscan.c
       - invalidate_inode_pages2_range() in truncate.c
      
      The first two handle failure quite safely.  The last is only called
      after ->launder_page() has been called, and that will have waited
      for the commit to finish already.
      
      So aborting if the commit takes longer than 1 second is perfectly safe.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: NJeff Layton <jlayton@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      95905446
  7. 11 9月, 2014 2 次提交
    • J
      nfs: fix RCU cl_xprt handling in nfs_swap_activate/deactivate · dad2b015
      Jeff Layton 提交于
      sparse says:
      
      fs/nfs/file.c:543:60: warning: incorrect type in argument 1 (different address spaces)
      fs/nfs/file.c:543:60:    expected struct rpc_xprt *xprt
      fs/nfs/file.c:543:60:    got struct rpc_xprt [noderef] <asn:4>*cl_xprt
      fs/nfs/file.c:548:53: warning: incorrect type in argument 1 (different address spaces)
      fs/nfs/file.c:548:53:    expected struct rpc_xprt *xprt
      fs/nfs/file.c:548:53:    got struct rpc_xprt [noderef] <asn:4>*cl_xprt
      
      cl_xprt is RCU-managed, so we need to take care to dereference and use
      it while holding the RCU read lock.
      
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: NJeff Layton <jlayton@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      dad2b015
    • C
      pnfs: add flag to force read-modify-write in ->write_begin · 612aa983
      Christoph Hellwig 提交于
      Like all block based filesystems, the pNFS block layout driver can't read
      or write at a byte granularity and thus has to perform read-modify-write
      cycles on writes smaller than this granularity.
      
      Add a flag so that the core NFS code always reads a whole page when
      starting a smaller write, so that we can do it in the place where the VFS
      expects it instead of doing in very deadlock prone way in the writeback
      handler.
      
      Note that in theory we could do less than page size reads here for disks
      that have a smaller sector size which are served by a server with a smaller
      pnfs block size.  But so far that doesn't seem like a worthwhile
      optimization.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      612aa983
  8. 10 9月, 2014 1 次提交
  9. 16 7月, 2014 1 次提交
    • N
      sched: Remove proliferation of wait_on_bit() action functions · 74316201
      NeilBrown 提交于
      The current "wait_on_bit" interface requires an 'action'
      function to be provided which does the actual waiting.
      There are over 20 such functions, many of them identical.
      Most cases can be satisfied by one of just two functions, one
      which uses io_schedule() and one which just uses schedule().
      
      So:
       Rename wait_on_bit and        wait_on_bit_lock to
              wait_on_bit_action and wait_on_bit_lock_action
       to make it explicit that they need an action function.
      
       Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
       which are *not* given an action function but implicitly use
       a standard one.
       The decision to error-out if a signal is pending is now made
       based on the 'mode' argument rather than being encoded in the action
       function.
      
       All instances of the old wait_on_bit and wait_on_bit_lock which
       can use the new version have been changed accordingly and their
       action functions have been discarded.
       wait_on_bit{_lock} does not return any specific error code in the
       event of a signal so the caller must check for non-zero and
       interpolate their own error code as appropriate.
      
      The wait_on_bit() call in __fscache_wait_on_invalidate() was
      ambiguous as it specified TASK_UNINTERRUPTIBLE but used
      fscache_wait_bit_interruptible as an action function.
      David Howells confirms this should be uniformly
      "uninterruptible"
      
      The main remaining user of wait_on_bit{,_lock}_action is NFS
      which needs to use a freezer-aware schedule() call.
      
      A comment in fs/gfs2/glock.c notes that having multiple 'action'
      functions is useful as they display differently in the 'wchan'
      field of 'ps'. (and /proc/$PID/wchan).
      As the new bit_wait{,_io} functions are tagged "__sched", they
      will not show up at all, but something higher in the stack.  So
      the distinction will still be visible, only with different
      function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
      gfs2/glock.c case).
      
      Since first version of this patch (against 3.15) two new action
      functions appeared, on in NFS and one in CIFS.  CIFS also now
      uses an action function that makes the same freezer aware
      schedule call as NFS.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
      Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brownSigned-off-by: NIngo Molnar <mingo@kernel.org>
      74316201
  10. 12 6月, 2014 1 次提交
  11. 02 6月, 2014 1 次提交
    • J
      locks: ensure that fl_owner is always initialized properly in flock and lease codepaths · 130d1f95
      Jeff Layton 提交于
      Currently, the fl_owner isn't set for flock locks. Some filesystems use
      byte-range locks to simulate flock locks and there is a common idiom in
      those that does:
      
          fl->fl_owner = (fl_owner_t)filp;
          fl->fl_start = 0;
          fl->fl_end = OFFSET_MAX;
      
      Since flock locks are generally "owned" by the open file description,
      move this into the common flock lock setup code. The fl_start and fl_end
      fields are already set appropriately, so remove the unneeded setting of
      that in flock ops in those filesystems as well.
      
      Finally, the lease code also sets the fl_owner as if they were owned by
      the process and not the open file description. This is incorrect as
      leases have the same ownership semantics as flock locks. Set them the
      same way. The lease code doesn't actually use the fl_owner value for
      anything, so this is more for consistency's sake than a bugfix.
      Reported-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: NJeff Layton <jlayton@poochiereds.net>
      Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (Staging portion)
      Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
      130d1f95
  12. 07 5月, 2014 5 次提交
  13. 08 4月, 2014 1 次提交
  14. 06 1月, 2014 1 次提交
  15. 25 10月, 2013 1 次提交
  16. 04 9月, 2013 1 次提交
    • A
      NFS avoid expired credential keys for buffered writes · dc24826b
      Andy Adamson 提交于
      We must avoid buffering a WRITE that is using a credential key (e.g. a GSS
      context key) that is about to expire or has expired.  We currently will
      paint ourselves into a corner by returning success to the applciation
      for such a buffered WRITE, only to discover that we do not have permission when
      we attempt to flush the WRITE (and potentially associated COMMIT) to disk.
      
      Use the RPC layer credential key timeout and expire routines which use a
      a watermark, gss_key_expire_timeo. We test the key in nfs_file_write.
      
      If a WRITE is using a credential with a key that will expire within
      watermark seconds, flush the inode in nfs_write_end and send only
      NFS_FILE_SYNC WRITEs by adding nfs_ctx_key_to_expire to nfs_need_sync_write.
      Note that this results in single page NFS_FILE_SYNC WRITEs.
      Signed-off-by: NAndy Adamson <andros@netapp.com>
      [Trond: removed a pr_warn_ratelimited() for now]
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      dc24826b
  17. 22 8月, 2013 1 次提交
  18. 04 7月, 2013 1 次提交
    • M
      fs: nfs: inform the VM about pages being committed or unstable · f919b196
      Mel Gorman 提交于
      VM page reclaim uses dirty and writeback page states to determine if
      flushers are cleaning pages too slowly and that page reclaim should
      stall waiting on flushers to catch up.  Page state in NFS is a bit more
      complex and a clean page can be unreclaimable due to being unstable
      which is effectively "dirty" from the perspective of the VM from reclaim
      context.  Similarly, if the inode is currently being committed then it's
      similar to being under writeback.
      
      This patch adds a is_dirty_writeback() handled for NFS that checks if a
      pages backing inode is being committed and should be accounted as
      writeback and if a page has private state indicating that it is
      effectively dirty.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Cc: Zlatko Calusic <zcalusic@bitsync.net>
      Cc: dormando <dormando@rydia.net>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f919b196
  19. 22 5月, 2013 1 次提交
    • L
      mm: change invalidatepage prototype to accept length · d47992f8
      Lukas Czerner 提交于
      Currently there is no way to truncate partial page where the end
      truncate point is not at the end of the page. This is because it was not
      needed and the functionality was enough for file system truncate
      operation to work properly. However more file systems now support punch
      hole feature and it can benefit from mm supporting truncating page just
      up to the certain point.
      
      Specifically, with this functionality truncate_inode_pages_range() can
      be changed so it supports truncating partial page at the end of the
      range (currently it will BUG_ON() if 'end' is not at the end of the
      page).
      
      This commit changes the invalidatepage() address space operation
      prototype to accept range to be invalidated and update all the instances
      for it.
      
      We also change the block_invalidatepage() in the same way and actually
      make a use of the new length argument implementing range invalidation.
      
      Actual file system implementations will follow except the file systems
      where the changes are really simple and should not change the behaviour
      in any way .Implementation for truncate_page_range() which will be able
      to accept page unaligned ranges will follow as well.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      d47992f8
  20. 09 4月, 2013 1 次提交
  21. 23 2月, 2013 1 次提交
  22. 18 12月, 2012 1 次提交
  23. 09 10月, 2012 1 次提交
    • K
      mm: kill vma flag VM_CAN_NONLINEAR · 0b173bc4
      Konstantin Khlebnikov 提交于
      Move actual pte filling for non-linear file mappings into the new special
      vma operation: ->remap_pages().
      
      Filesystems must implement this method to get non-linear mapping support,
      if it uses filemap_fault() then generic_file_remap_pages() can be used.
      
      Now device drivers can implement this method and obtain nonlinear vma support.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>	#arch/tile
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0b173bc4
  24. 29 9月, 2012 2 次提交
  25. 12 9月, 2012 1 次提交
  26. 01 8月, 2012 3 次提交
  27. 31 7月, 2012 2 次提交