1. 21 4月, 2017 1 次提交
  2. 20 9月, 2016 3 次提交
    • C
      SUNRPC: Separate buffer pointers for RPC Call and Reply messages · 68778945
      Chuck Lever 提交于
      For xprtrdma, the RPC Call and Reply buffers are involved in real
      I/O operations.
      
      To start with, the DMA direction of the I/O for a Call is opposite
      that of a Reply.
      
      In the current arrangement, the Reply buffer address is on a
      four-byte alignment just past the call buffer. Would be friendlier
      on some platforms if that was at a DMA cache alignment instead.
      
      Because the current arrangement allocates a single memory region
      which contains both buffers, the RPC Reply buffer often contains a
      page boundary in it when the Call buffer is large enough (which is
      frequent).
      
      It would be a little nicer for setting up DMA operations (and
      possible registration of the Reply buffer) if the two buffers were
      separated, well-aligned, and contained as few page boundaries as
      possible.
      
      Now, I could just pad out the single memory region used for the pair
      of buffers. But frequently that would mean a lot of unused space to
      ensure the Reply buffer did not have a page boundary.
      
      Add a separate pointer to rpc_rqst that points right to the RPC
      Reply buffer. This makes no difference to xprtsock, but it will help
      xprtrdma in subsequent patches.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      68778945
    • C
      SUNRPC: Generalize the RPC buffer release API · 3435c74a
      Chuck Lever 提交于
      xprtrdma needs to allocate the Call and Reply buffers separately.
      TBH, the reliance on using a single buffer for the pair of XDR
      buffers is transport implementation-specific.
      
      Instead of passing just the rq_buffer into the buf_free method, pass
      the task structure and let buf_free take care of freeing both
      XDR buffers at once.
      
      There's a micro-optimization here. In the common case, both
      xprt_release and the transport's buf_free method were checking if
      rq_buffer was NULL. Now the check is done only once per RPC.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      3435c74a
    • C
      SUNRPC: Generalize the RPC buffer allocation API · 5fe6eaa1
      Chuck Lever 提交于
      xprtrdma needs to allocate the Call and Reply buffers separately.
      TBH, the reliance on using a single buffer for the pair of XDR
      buffers is transport implementation-specific.
      
      Transports that want to allocate separate Call and Reply buffers
      will ignore the "size" argument anyway.  Don't bother passing it.
      
      The buf_alloc method can't return two pointers. Instead, make the
      method's return value an error code, and set the rq_buffer pointer
      in the method itself.
      
      This gives call_allocate an opportunity to terminate an RPC instead
      of looping forever when a permanent problem occurs. If a request is
      just bogus, or the transport is in a state where it can't allocate
      resources for any request, there needs to be a way to kill the RPC
      right there and not loop.
      
      This immediately fixes a rare problem in the backchannel send path,
      which loops if the server happens to send a CB request whose
      call+reply size is larger than a page (which it shouldn't do yet).
      
      One more issue: looks like xprt_inject_disconnect was incorrectly
      placed in the failure path in call_allocate. It needs to be in the
      success path, as it is for other call-sites.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      5fe6eaa1
  3. 14 6月, 2016 2 次提交
  4. 06 2月, 2016 1 次提交
  5. 14 12月, 2015 1 次提交
  6. 23 9月, 2015 1 次提交
  7. 18 9月, 2015 1 次提交
  8. 05 9月, 2015 1 次提交
  9. 13 3月, 2015 1 次提交
  10. 12 3月, 2015 1 次提交
  11. 25 1月, 2015 2 次提交
  12. 25 11月, 2014 2 次提交
  13. 25 9月, 2014 1 次提交
  14. 16 7月, 2014 1 次提交
    • N
      sched: Allow wait_on_bit_action() functions to support a timeout · c1221321
      NeilBrown 提交于
      It is currently not possible for various wait_on_bit functions
      to implement a timeout.
      
      While the "action" function that is called to do the waiting
      could certainly use schedule_timeout(), there is no way to carry
      forward the remaining timeout after a false wake-up.
      As false-wakeups a clearly possible at least due to possible
      hash collisions in bit_waitqueue(), this is a real problem.
      
      The 'action' function is currently passed a pointer to the word
      containing the bit being waited on.  No current action functions
      use this pointer.  So changing it to something else will be a
      little noisy but will have no immediate effect.
      
      This patch changes the 'action' function to take a pointer to
      the "struct wait_bit_key", which contains a pointer to the word
      containing the bit so nothing is really lost.
      
      It also adds a 'private' field to "struct wait_bit_key", which
      is initialized to zero.
      
      An action function can now implement a timeout with something
      like
      
      static int timed_out_waiter(struct wait_bit_key *key)
      {
      	unsigned long waited;
      	if (key->private == 0) {
      		key->private = jiffies;
      		if (key->private == 0)
      			key->private -= 1;
      	}
      	waited = jiffies - key->private;
      	if (waited > 10 * HZ)
      		return -EAGAIN;
      	schedule_timeout(waited - 10 * HZ);
      	return 0;
      }
      
      If any other need for context in a waiter were found it would be
      easy to use ->private for some other purpose, or even extend
      "struct wait_bit_key".
      
      My particular need is to support timeouts in nfs_release_page()
      to avoid deadlocks with loopback mounted NFS.
      
      While wait_on_bit_timeout() would be a cleaner interface, it
      will not meet my need.  I need the timeout to be sensitive to
      the state of the connection with the server, which could change.
       So I need to use an 'action' interface.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140707051604.28027.41257.stgit@notabene.brownSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c1221321
  15. 29 5月, 2014 1 次提交
    • D
      net, sunrpc: suppress allocation warning in rpc_malloc() · c6c8fe79
      David Rientjes 提交于
      rpc_malloc() allocates with GFP_NOWAIT without making any attempt at
      reclaim so it easily fails when low on memory.  This ends up spamming the
      kernel log:
      
      SLAB: Unable to allocate memory on node 0 (gfp=0x4000)
        cache: kmalloc-8192, object size: 8192, order: 1
        node 0: slabs: 207/207, objs: 207/207, free: 0
      rekonq: page allocation failure: order:1, mode:0x204000
      CPU: 2 PID: 14321 Comm: rekonq Tainted: G           O  3.15.0-rc3-12.gfc9498b-desktop+ #6
      Hardware name: System manufacturer System Product Name/M4A785TD-V EVO, BIOS 2105    07/23/2010
       0000000000000000 ffff880010ff17d0 ffffffff815e693c 0000000000204000
       ffff880010ff1858 ffffffff81137bd2 0000000000000000 0000001000000000
       ffff88011ffebc38 0000000000000001 0000000000204000 ffff88011ffea000
      Call Trace:
       [<ffffffff815e693c>] dump_stack+0x4d/0x6f
       [<ffffffff81137bd2>] warn_alloc_failed+0xd2/0x140
       [<ffffffff8113be19>] __alloc_pages_nodemask+0x7e9/0xa30
       [<ffffffff811824a8>] kmem_getpages+0x58/0x140
       [<ffffffff81183de6>] fallback_alloc+0x1d6/0x210
       [<ffffffff81183be3>] ____cache_alloc_node+0x123/0x150
       [<ffffffff81185953>] __kmalloc+0x203/0x490
       [<ffffffffa06b0ee2>] rpc_malloc+0x32/0xa0 [sunrpc]
       [<ffffffffa06a6999>] call_allocate+0xb9/0x170 [sunrpc]
       [<ffffffffa06b19d8>] __rpc_execute+0x88/0x460 [sunrpc]
       [<ffffffffa06b2da9>] rpc_execute+0x59/0xc0 [sunrpc]
       [<ffffffffa06a932b>] rpc_run_task+0x6b/0x90 [sunrpc]
       [<ffffffffa077b5c1>] nfs4_call_sync_sequence+0x51/0x80 [nfsv4]
       [<ffffffffa077d45d>] _nfs4_do_setattr+0x1ed/0x280 [nfsv4]
       [<ffffffffa0782a72>] nfs4_do_setattr+0x72/0x180 [nfsv4]
       [<ffffffffa078334c>] nfs4_proc_setattr+0xbc/0x140 [nfsv4]
       [<ffffffffa074a7e8>] nfs_setattr+0xd8/0x240 [nfs]
       [<ffffffff811baa71>] notify_change+0x231/0x380
       [<ffffffff8119cf5c>] chmod_common+0xfc/0x120
       [<ffffffff8119df80>] SyS_chmod+0x40/0x90
       [<ffffffff815f4cfd>] system_call_fastpath+0x1a/0x1f
      ...
      
      If the allocation fails, simply return NULL and avoid spamming the kernel
      log.
      Reported-by: NMarc Dietrich <marvin24@gmx.de>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      c6c8fe79
  16. 21 3月, 2014 1 次提交
  17. 05 9月, 2013 1 次提交
  18. 07 6月, 2013 3 次提交
  19. 23 5月, 2013 1 次提交
    • T
      SUNRPC: Prevent an rpc_task wakeup race · a3c3cac5
      Trond Myklebust 提交于
      The lockless RPC_IS_QUEUED() test in __rpc_execute means that we need to
      be careful about ordering the calls to rpc_test_and_set_running(task) and
      rpc_clear_queued(task). If we get the order wrong, then we may end up
      testing the RPC_TASK_RUNNING flag after __rpc_execute() has looped
      and changed the state of the rpc_task.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: stable@vger.kernel.org
      a3c3cac5
  20. 12 5月, 2013 1 次提交
    • C
      freezer: add unsafe versions of freezable helpers for NFS · 416ad3c9
      Colin Cross 提交于
      NFS calls the freezable helpers with locks held, which is unsafe
      and will cause lockdep warnings when 6aa97070 "lockdep: check
      that no locks held at freeze time" is reapplied (it was reverted
      in dbf520a9).  NFS shouldn't be doing this, but it has
      long-running syscalls that must hold a lock but also shouldn't
      block suspend.  Until NFS freeze handling is rewritten to use a
      signal to exit out of the critical section, add new *_unsafe
      versions of the helpers that will not run the lockdep test when
      6aa97070 is reapplied, and call them from NFS.
      
      In practice the likley result of holding the lock while freezing
      is that a second task blocked on the lock will never freeze,
      aborting suspend, but it is possible to manufacture a case using
      the cgroup freezer, the lock, and the suspend freezer to create
      a deadlock.  Silencing the lockdep warning here will allow
      problems to be found in other drivers that may have a more
      serious deadlock risk, and prevent new problems from being added.
      Signed-off-by: NColin Cross <ccross@android.com>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      416ad3c9
  21. 25 3月, 2013 1 次提交
  22. 31 1月, 2013 1 次提交
  23. 09 1月, 2013 1 次提交
    • T
      SUNRPC: Ensure we release the socket write lock if the rpc_task exits early · 87ed5003
      Trond Myklebust 提交于
      If the rpc_task exits while holding the socket write lock before it has
      allocated an rpc slot, then the usual mechanism for releasing the write
      lock in xprt_release() is defeated.
      
      The problem occurs if the call to xprt_lock_write() initially fails, so
      that the rpc_task is put on the xprt->sending wait queue. If the task
      exits after being assigned the lock by __xprt_lock_write_func, but
      before it has retried the call to xprt_lock_and_alloc_slot(), then
      it calls xprt_release() while holding the write lock, but will
      immediately exit due to the test for task->tk_rqstp != NULL.
      Reported-by: NChris Perl <chris.perl@gmail.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: stable@vger.kernel.org [>= 3.1]
      87ed5003
  24. 05 1月, 2013 1 次提交
  25. 06 12月, 2012 2 次提交
  26. 05 11月, 2012 4 次提交
  27. 29 9月, 2012 1 次提交
  28. 01 8月, 2012 1 次提交
    • M
      nfs: enable swap on NFS · a564b8f0
      Mel Gorman 提交于
      Implement the new swapfile a_ops for NFS and hook up ->direct_IO.  This
      will set the NFS socket to SOCK_MEMALLOC and run socket reconnect under
      PF_MEMALLOC as well as reset SOCK_MEMALLOC before engaging the protocol
      ->connect() method.
      
      PF_MEMALLOC should allow the allocation of struct socket and related
      objects and the early (re)setting of SOCK_MEMALLOC should allow us to
      receive the packets required for the TCP connection buildup.
      
      [jlayton@redhat.com: Restore PF_MEMALLOC task flags in all cases]
      [dfeng@redhat.com: Fix handling of multiple swap files]
      [a.p.zijlstra@chello.nl: Original patch]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: Xiaotian Feng <dfeng@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a564b8f0
  29. 31 7月, 2012 1 次提交
    • J
      nfs: skip commit in releasepage if we're freeing memory for fs-related reasons · 5cf02d09
      Jeff Layton 提交于
      We've had some reports of a deadlock where rpciod ends up with a stack
      trace like this:
      
          PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
           #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
           #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
           #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
           #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
           #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
           #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
           #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
           #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
           #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
           #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
          #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
          #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
          #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
          #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
          #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
          #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
          #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
          #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
          #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
          #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
          #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
          #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
          #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
          #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
          #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
          #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca
      
      rpciod is trying to allocate memory for a new socket to talk to the
      server. The VM ends up calling ->releasepage to get more memory, and it
      tries to do a blocking commit. That commit can't succeed however without
      a connected socket, so we deadlock.
      
      Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
      socket allocation, and having nfs_release_page check for that flag when
      deciding whether to do a commit call. Also, set PF_FSTRANS
      unconditionally in rpc_async_schedule since that function can also do
      allocations sometimes.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: stable@vger.kernel.org
      5cf02d09