1. 28 9月, 2016 14 次提交
  2. 23 9月, 2016 15 次提交
  3. 20 9月, 2016 11 次提交
    • C
      nfs: cover ->migratepage with CONFIG_MIGRATION · f844cd0d
      Chao Yu 提交于
      It will be more clean to use CONFIG_MIGRATION to cover nfs' private
      .migratepage in nfs_file_aops like we do in other part of nfs
      operations.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      f844cd0d
    • D
      sunrpc: fix write space race causing stalls · d48f9ce7
      David Vrabel 提交于
      Write space becoming available may race with putting the task to sleep
      in xprt_wait_for_buffer_space().  The existing mechanism to avoid the
      race does not work.
      
      This (edited) partial trace illustrates the problem:
      
         [1] rpc_task_run_action: task:43546@5 ... action=call_transmit
         [2] xs_write_space <-xs_tcp_write_space
         [3] xprt_write_space <-xs_write_space
         [4] rpc_task_sleep: task:43546@5 ...
         [5] xs_write_space <-xs_tcp_write_space
      
      [1] Task 43546 runs but is out of write space.
      
      [2] Space becomes available, xs_write_space() clears the
          SOCKWQ_ASYNC_NOSPACE bit.
      
      [3] xprt_write_space() attemts to wake xprt->snd_task (== 43546), but
          this has not yet been queued and the wake up is lost.
      
      [4] xs_nospace() is called which calls xprt_wait_for_buffer_space()
          which queues task 43546.
      
      [5] The call to sk->sk_write_space() at the end of xs_nospace() (which
          is supposed to handle the above race) does not call
          xprt_write_space() as the SOCKWQ_ASYNC_NOSPACE bit is clear and
          thus the task is not woken.
      
      Fix the race by resetting the SOCKWQ_ASYNC_NOSPACE bit in xs_nospace()
      so the second call to sk->sk_write_space() calls xprt_write_space().
      Suggested-by: NTrond Myklebust <trondmy@primarydata.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      cc: stable@vger.kernel.org # 4.4
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      d48f9ce7
    • J
      pnfs: add a new mechanism to select a layout driver according to an ordered list · ca440c38
      Jeff Layton 提交于
      Currently, the layout driver selection code always chooses the first one
      from the list. That's not really ideal however, as the server can send
      the list of layout types in any order that it likes. It's up to the
      client to select the best one for its needs.
      
      This patch adds an ordered list of preferred driver types and has the
      selection code sort the list of available layout drivers according to it.
      Any unrecognized layout type is sorted to the end of the list.
      
      For now, the order of preference is hardcoded, but it should be possible
      to make this configurable in the future.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NJ. Bruce Fields <bfields@fieldses.org>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      ca440c38
    • C
      xprtrdma: Eliminate rpcrdma_receive_worker() · 496b77a5
      Chuck Lever 提交于
      Clean up: the extra layer of indirection doesn't add value.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      496b77a5
    • C
      xprtrdma: Rename rpcrdma_receive_wc() · 1519e969
      Chuck Lever 提交于
      Clean up: When converting xprtrdma to use the new CQ API, I missed a
      spot. The naming convention elsewhere is:
      
        {svc_rdma,rpcrdma}_wc_{operation}
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      1519e969
    • C
      xprtrmda: Report address of frmr, not mw · eeb30613
      Chuck Lever 提交于
      Tie frwr debugging messages together by always reporting the address
      of the frwr.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      eeb30613
    • C
      xprtrdma: Support larger inline thresholds · 44829d02
      Chuck Lever 提交于
      The Version One default inline threshold is still 1KB. But allow
      testing with thresholds up to 64KB.
      
      This maximum is somewhat arbitrary. There's no fundamental
      architectural limit I'm aware of, but it's good to keep the size of
      Receive buffers reasonable. Now that Send can use a s/g list, a
      Send buffer is only as large as each RPC requires. Receive buffers
      are always the size of the inline threshold, however.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      44829d02
    • C
      xprtrdma: Use gathered Send for large inline messages · 655fec69
      Chuck Lever 提交于
      An RPC Call message that is sent inline but that has a data payload
      (ie, one or more items in rq_snd_buf's page list) must be "pulled
      up:"
      
      - call_allocate has to reserve enough RPC Call buffer space to
      accommodate the data payload
      
      - call_transmit has to memcopy the rq_snd_buf's page list and tail
      into its head iovec before it is sent
      
      As the inline threshold is increased beyond its current 1KB default,
      however, this means data payloads of more than a few KB are copied
      by the host CPU. For example, if the inline threshold is increased
      just to 4KB, then NFS WRITE requests up to 4KB would involve a
      memcpy of the NFS WRITE's payload data into the RPC Call buffer.
      This is an undesirable amount of participation by the host CPU.
      
      The inline threshold may be much larger than 4KB in the future,
      after negotiation with a peer server.
      
      Instead of copying the components of rq_snd_buf into its head iovec,
      construct a gather list of these components, and send them all in
      place. The same approach is already used in the Linux server's
      RPC-over-RDMA reply path.
      
      This mechanism also eliminates the need for rpcrdma_tail_pullup,
      which is used to manage the XDR pad and trailing inline content when
      a Read list is present.
      
      This requires that the pages in rq_snd_buf's page list be DMA-mapped
      during marshaling, and unmapped when a data-bearing RPC is
      completed. This is slightly less efficient for very small I/O
      payloads, but significantly more efficient as data payload size and
      inline threshold increase past a kilobyte.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      655fec69
    • C
      xprtrdma: Basic support for Remote Invalidation · c8b920bb
      Chuck Lever 提交于
      Have frwr's ro_unmap_sync recognize an invalidated rkey that appears
      as part of a Receive completion. Local invalidation can be skipped
      for that rkey.
      
      Use an out-of-band signaling mechanism to indicate to the server
      that the client is prepared to receive RDMA Send With Invalidate.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      c8b920bb
    • C
      xprtrdma: Client-side support for rpcrdma_connect_private · 87cfb9a0
      Chuck Lever 提交于
      Send an RDMA-CM private message on connect, and look for one during
      a connection-established event.
      
      Both sides can communicate their various implementation limits.
      Implementations that don't support this sideband protocol ignore it.
      
      Once the client knows the server's inline threshold maxima, it can
      adjust the use of Reply chunks, and eliminate most use of Position
      Zero Read chunks. Moderately-sized I/O can be done using a pure
      inline RDMA Send instead of RDMA operations that require memory
      registration.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      87cfb9a0
    • C
      rpcrdma: RDMA/CM private message data structure · ff06bd19
      Chuck Lever 提交于
      Introduce data structure used by both client and server to exchange
      implementation details during RDMA/CM connection establishment.
      
      This is an experimental out-of-band exchange between Linux
      RPC-over-RDMA Version One implementations, replacing the deprecated
      CCP (see RFC 5666bis). The purpose of this extension is to enable
      prototyping of features that might be introduced in a subsequent
      version of RPC-over-RDMA.
      
      Suggested by Christoph Hellwig and Devesh Sharma.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      ff06bd19