1. 20 9月, 2016 1 次提交
  2. 04 9月, 2016 1 次提交
  3. 20 7月, 2016 1 次提交
    • S
      sunrpc: move NO_CRKEY_TIMEOUT to the auth->au_flags · ce52914e
      Scott Mayhew 提交于
      A generic_cred can be used to look up a unx_cred or a gss_cred, so it's
      not really safe to use the the generic_cred->acred->ac_flags to store
      the NO_CRKEY_TIMEOUT flag.  A lookup for a unx_cred triggered while the
      KEY_EXPIRE_SOON flag is already set will cause both NO_CRKEY_TIMEOUT and
      KEY_EXPIRE_SOON to be set in the ac_flags, leaving the user associated
      with the auth_cred to be in a state where they're perpetually doing 4K
      NFS_FILE_SYNC writes.
      
      This can be reproduced as follows:
      
      1. Mount two NFS filesystems, one with sec=krb5 and one with sec=sys.
      They do not need to be the same export, nor do they even need to be from
      the same NFS server.  Also, v3 is fine.
      $ sudo mount -o v3,sec=krb5 server1:/export /mnt/krb5
      $ sudo mount -o v3,sec=sys server2:/export /mnt/sys
      
      2. As the normal user, before accessing the kerberized mount, kinit with
      a short lifetime (but not so short that renewing the ticket would leave
      you within the 4-minute window again by the time the original ticket
      expires), e.g.
      $ kinit -l 10m -r 60m
      
      3. Do some I/O to the kerberized mount and verify that the writes are
      wsize, UNSTABLE:
      $ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1
      
      4. Wait until you're within 4 minutes of key expiry, then do some more
      I/O to the kerberized mount to ensure that RPC_CRED_KEY_EXPIRE_SOON gets
      set.  Verify that the writes are 4K, FILE_SYNC:
      $ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1
      
      5. Now do some I/O to the sec=sys mount.  This will cause
      RPC_CRED_NO_CRKEY_TIMEOUT to be set:
      $ dd if=/dev/zero of=/mnt/sys/file bs=1M count=1
      
      6. Writes for that user will now be permanently 4K, FILE_SYNC for that
      user, regardless of which mount is being written to, until you reboot
      the client.  Renewing the kerberos ticket (assuming it hasn't already
      expired) will have no effect.  Grabbing a new kerberos ticket at this
      point will have no effect either.
      
      Move the flag to the auth->au_flags field (which is currently unused)
      and rename it slightly to reflect that it's no longer associated with
      the auth_cred->ac_flags.  Add the rpc_auth to the arg list of
      rpcauth_cred_key_to_expire and check the au_flags there too.  Finally,
      add the inode to the arg list of nfs_ctx_key_to_expire so we can
      determine the rpc_auth to pass to rpcauth_cred_key_to_expire.
      Signed-off-by: NScott Mayhew <smayhew@redhat.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      ce52914e
  4. 06 7月, 2016 5 次提交
  5. 22 6月, 2016 4 次提交
  6. 02 5月, 2016 1 次提交
  7. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  8. 17 3月, 2016 2 次提交
  9. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  10. 08 1月, 2016 1 次提交
  11. 01 1月, 2016 1 次提交
  12. 29 12月, 2015 1 次提交
  13. 07 11月, 2015 1 次提交
    • M
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep... · d0164adc
      Mel Gorman 提交于
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
      
      __GFP_WAIT has been used to identify atomic context in callers that hold
      spinlocks or are in interrupts.  They are expected to be high priority and
      have access one of two watermarks lower than "min" which can be referred
      to as the "atomic reserve".  __GFP_HIGH users get access to the first
      lower watermark and can be called the "high priority reserve".
      
      Over time, callers had a requirement to not block when fallback options
      were available.  Some have abused __GFP_WAIT leading to a situation where
      an optimisitic allocation with a fallback option can access atomic
      reserves.
      
      This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
      cannot sleep and have no alternative.  High priority users continue to use
      __GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
      are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
      callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
      redefined as a caller that is willing to enter direct reclaim and wake
      kswapd for background reclaim.
      
      This patch then converts a number of sites
      
      o __GFP_ATOMIC is used by callers that are high priority and have memory
        pools for those requests. GFP_ATOMIC uses this flag.
      
      o Callers that have a limited mempool to guarantee forward progress clear
        __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
        into this category where kswapd will still be woken but atomic reserves
        are not used as there is a one-entry mempool to guarantee progress.
      
      o Callers that are checking if they are non-blocking should use the
        helper gfpflags_allow_blocking() where possible. This is because
        checking for __GFP_WAIT as was done historically now can trigger false
        positives. Some exceptions like dm-crypt.c exist where the code intent
        is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
        flag manipulations.
      
      o Callers that built their own GFP flags instead of starting with GFP_KERNEL
        and friends now also need to specify __GFP_KSWAPD_RECLAIM.
      
      The first key hazard to watch out for is callers that removed __GFP_WAIT
      and was depending on access to atomic reserves for inconspicuous reasons.
      In some cases it may be appropriate for them to use __GFP_HIGH.
      
      The second key hazard is callers that assembled their own combination of
      GFP flags instead of starting with something like GFP_KERNEL.  They may
      now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
      if it's missed in most cases as other activity will wake kswapd.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vitaly Wool <vitalywool@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0164adc
  14. 23 10月, 2015 1 次提交
  15. 08 9月, 2015 1 次提交
  16. 18 8月, 2015 2 次提交
  17. 11 6月, 2015 1 次提交
    • J
      sunrpc: keep a count of swapfiles associated with the rpc_clnt · 3c87ef6e
      Jeff Layton 提交于
      Jerome reported seeing a warning pop when working with a swapfile on
      NFS. The nfs_swap_activate can end up calling sk_set_memalloc while
      holding the rcu_read_lock and that function can sleep.
      
      To fix that, we need to take a reference to the xprt while holding the
      rcu_read_lock, set the socket up for swapping and then drop that
      reference. But, xprt_put is not exported and having NFS deal with the
      underlying xprt is a bit of layering violation anyway.
      
      Fix this by adding a set of activate/deactivate functions that take a
      rpc_clnt pointer instead of an rpc_xprt, and have nfs_swap_activate and
      nfs_swap_deactivate call those.
      
      Also, add a per-rpc_clnt atomic counter to keep track of the number of
      active swapfiles associated with it. When the counter does a 0->1
      transition, we enable swapping on the xprt, when we do a 1->0 transition
      we disable swapping on it.
      
      This also allows us to be a bit more selective with the RPC_TASK_SWAPPER
      flag. If non-swapper and swapper clnts are sharing a xprt, then we only
      need to flag the tasks from the swapper clnt with that flag.
      Acked-by: NMel Gorman <mgorman@suse.de>
      Reported-by: NJerome Marchand <jmarchan@redhat.com>
      Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
      Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      3c87ef6e
  18. 16 4月, 2015 1 次提交
  19. 12 4月, 2015 2 次提交
  20. 28 3月, 2015 2 次提交
  21. 26 3月, 2015 1 次提交
  22. 04 3月, 2015 2 次提交
    • T
      NFS: Don't write enable new pages while an invalidation is proceeding · ef070dcb
      Trond Myklebust 提交于
      nfs_vm_page_mkwrite() should wait until the page cache invalidation
      is finished. This is the second patch in a 2 patch series to deprecate
      the NFS client's reliance on nfs_release_page() in the context of
      nfs_invalidate_mapping().
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      ef070dcb
    • T
      NFS: Fix a regression in the read() syscall · 874f9463
      Trond Myklebust 提交于
      When invalidating the page cache for a regular file, we want to first
      sync all dirty data to disk and then call invalidate_inode_pages2().
      The latter relies on nfs_launder_page() and nfs_release_page() to deal
      respectively with dirty pages, and unstable written pages.
      
      When commit 95905446 ("NFS: avoid deadlocks with loop-back mounted
      NFS filesystems.") changed the behaviour of nfs_release_page(), then it
      made it possible for invalidate_inode_pages2() to fail with an EBUSY.
      Unfortunately, that error is then propagated back to read().
      
      Let's therefore work around the problem for now by protecting the call
      to sync the data and invalidate_inode_pages2() so that they are atomic
      w.r.t. the addition of new writes.
      Later on, we can revisit whether or not we still need nfs_launder_page()
      and nfs_release_page().
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      874f9463
  23. 02 3月, 2015 1 次提交
  24. 11 2月, 2015 1 次提交
  25. 14 10月, 2014 1 次提交
  26. 25 9月, 2014 3 次提交
    • N
      NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page() · 1aff5256
      NeilBrown 提交于
      Now that nfs_release_page() doesn't block indefinitely, other deadlock
      avoidance mechanisms aren't needed.
       - it doesn't hurt for kswapd to block occasionally.  If it doesn't
         want to block it would clear __GFP_WAIT.  The current_is_kswapd()
         was only added to avoid deadlocks and we have a new approach for
         that.
       - memory allocation in the SUNRPC layer can very rarely try to
         ->releasepage() a page it is trying to handle.  The deadlock
         is removed as nfs_release_page() doesn't block indefinitely.
      
      So we don't need to set PF_FSTRANS for sunrpc network operations any
      more.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: NJeff Layton <jlayton@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      1aff5256
    • N
      NFS: avoid waiting at all in nfs_release_page when congested. · 353db796
      NeilBrown 提交于
      If nfs_release_page() is called on a sequence of pages which are all
      in the same file which is blocked on COMMIT, each page could
      contribute a 1 second delay which could be come excessive.  I have
      seen delays of as much as 208 seconds.
      
      To keep the delay to one second, mark the bdi as write-congested
      if the commit didn't finished.  Once it does finish, the
      write-congested flag will be cleared by nfs_commit_release_pages().
      
      With this, the longest total delay in try_to_free_pages that I have
      seen is under 3 seconds.  With no waiting in nfs_release_page at all
      I have seen delays of nearly 1.5 seconds.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: NJeff Layton <jlayton@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      353db796
    • N
      NFS: avoid deadlocks with loop-back mounted NFS filesystems. · 95905446
      NeilBrown 提交于
      Support for loop-back mounted NFS filesystems is useful when NFS is
      used to access shared storage in a high-availability cluster.
      
      If the node running the NFS server fails, some other node can mount the
      filesystem and start providing NFS service.  If that node already had
      the filesystem NFS mounted, it will now have it loop-back mounted.
      
      nfsd can suffer a deadlock when allocating memory and entering direct
      reclaim.
      While direct reclaim does not write to the NFS filesystem it can send
      and wait for a COMMIT through nfs_release_page().
      
      This patch modifies nfs_release_page() to wait a limited time for the
      commit to complete - one second.  If the commit doesn't complete
      in this time, nfs_release_page() will fail.  This means it might now
      fail in some cases where it wouldn't before.  These cases are only
      when 'gfp' includes '__GFP_WAIT'.
      
      nfs_release_page() is only called by try_to_release_page(), and that
      can only be called on an NFS page with required 'gfp' flags from
       - page_cache_pipe_buf_steal() in splice.c
       - shrink_page_list() in vmscan.c
       - invalidate_inode_pages2_range() in truncate.c
      
      The first two handle failure quite safely.  The last is only called
      after ->launder_page() has been called, and that will have waited
      for the commit to finish already.
      
      So aborting if the commit takes longer than 1 second is perfectly safe.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: NJeff Layton <jlayton@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      95905446