1. 30 11月, 2016 2 次提交
  2. 20 9月, 2016 12 次提交
    • C
      xprtrdma: Support larger inline thresholds · 44829d02
      Chuck Lever 提交于
      The Version One default inline threshold is still 1KB. But allow
      testing with thresholds up to 64KB.
      
      This maximum is somewhat arbitrary. There's no fundamental
      architectural limit I'm aware of, but it's good to keep the size of
      Receive buffers reasonable. Now that Send can use a s/g list, a
      Send buffer is only as large as each RPC requires. Receive buffers
      are always the size of the inline threshold, however.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      44829d02
    • C
      xprtrdma: Use gathered Send for large inline messages · 655fec69
      Chuck Lever 提交于
      An RPC Call message that is sent inline but that has a data payload
      (ie, one or more items in rq_snd_buf's page list) must be "pulled
      up:"
      
      - call_allocate has to reserve enough RPC Call buffer space to
      accommodate the data payload
      
      - call_transmit has to memcopy the rq_snd_buf's page list and tail
      into its head iovec before it is sent
      
      As the inline threshold is increased beyond its current 1KB default,
      however, this means data payloads of more than a few KB are copied
      by the host CPU. For example, if the inline threshold is increased
      just to 4KB, then NFS WRITE requests up to 4KB would involve a
      memcpy of the NFS WRITE's payload data into the RPC Call buffer.
      This is an undesirable amount of participation by the host CPU.
      
      The inline threshold may be much larger than 4KB in the future,
      after negotiation with a peer server.
      
      Instead of copying the components of rq_snd_buf into its head iovec,
      construct a gather list of these components, and send them all in
      place. The same approach is already used in the Linux server's
      RPC-over-RDMA reply path.
      
      This mechanism also eliminates the need for rpcrdma_tail_pullup,
      which is used to manage the XDR pad and trailing inline content when
      a Read list is present.
      
      This requires that the pages in rq_snd_buf's page list be DMA-mapped
      during marshaling, and unmapped when a data-bearing RPC is
      completed. This is slightly less efficient for very small I/O
      payloads, but significantly more efficient as data payload size and
      inline threshold increase past a kilobyte.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      655fec69
    • C
      xprtrdma: Basic support for Remote Invalidation · c8b920bb
      Chuck Lever 提交于
      Have frwr's ro_unmap_sync recognize an invalidated rkey that appears
      as part of a Receive completion. Local invalidation can be skipped
      for that rkey.
      
      Use an out-of-band signaling mechanism to indicate to the server
      that the client is prepared to receive RDMA Send With Invalidate.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      c8b920bb
    • C
      xprtrdma: Eliminate "ia" argument in rpcrdma_{alloc, free}_regbuf · 13650c23
      Chuck Lever 提交于
      Clean up. The "ia" argument is no longer used.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      13650c23
    • C
      xprtrdma: Replace DMA_BIDIRECTIONAL · 99ef4db3
      Chuck Lever 提交于
      The use of DMA_BIDIRECTIONAL is discouraged by DMA-API.txt.
      Fortunately, xprtrdma now knows which direction I/O is going as
      soon as it allocates each regbuf.
      
      The RPC Call and Reply buffers are no longer the same regbuf. They
      can each be labeled correctly now. The RPC Reply buffer is never
      part of either a Send or Receive WR, but it can be part of Reply
      chunk, which is mapped and registered via ->ro_map . So it is not
      DMA mapped when it is allocated (DMA_NONE), to avoid a double-
      mapping.
      
      Since Receive buffers are no longer DMA_BIDIRECTIONAL and their
      contents are never modified by the host CPU, DMA-API-HOWTO.txt
      suggests that a DMA sync before posting each buffer should be
      unnecessary. (See my_card_interrupt_handler).
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      99ef4db3
    • C
      xprtrdma: Use smaller buffers for RPC-over-RDMA headers · 08cf2efd
      Chuck Lever 提交于
      Commit 94931746 ("xprtrdma: Limit number of RDMA segments in
      RPC-over-RDMA headers") capped the number of chunks that may appear
      in RPC-over-RDMA headers. The maximum header size can be estimated
      and fixed to avoid allocating buffer space that is never used.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      08cf2efd
    • C
      xprtrdma: Initialize separate RPC call and reply buffers · 9c40c49f
      Chuck Lever 提交于
      RPC-over-RDMA needs to separate its RPC call and reply buffers.
      
       o When an RPC Call is sent, rq_snd_buf is DMA mapped for an RDMA
         Send operation using DMA_TO_DEVICE
      
       o If the client expects a large RPC reply, it DMA maps rq_rcv_buf
         as part of a Reply chunk using DMA_FROM_DEVICE
      
      The two mappings are for data movement in opposite directions.
      
      DMA-API.txt suggests that if these mappings share a DMA cacheline,
      bad things can happen. This could occur in the final bytes of
      rq_snd_buf and the first bytes of rq_rcv_buf if the two buffers
      happen to share a DMA cacheline.
      
      On x86_64 the cacheline size is typically 8 bytes, and RPC call
      messages are usually much smaller than the send buffer, so this
      hasn't been a noticeable problem. But the DMA cacheline size can be
      larger on other platforms.
      
      Also, often rq_rcv_buf starts most of the way into a page, thus
      an additional RDMA segment is needed to map and register the end of
      that buffer. Try to avoid that scenario to reduce the cost of
      registering and invalidating Reply chunks.
      
      Instead of carrying a single regbuf that covers both rq_snd_buf and
      rq_rcv_buf, each struct rpcrdma_req now carries one regbuf for
      rq_snd_buf and one regbuf for rq_rcv_buf.
      
      Some incidental changes worth noting:
      
      - To clear out some spaghetti, refactor xprt_rdma_allocate.
      - The value stored in rg_size is the same as the value stored in
        the iov.length field, so eliminate rg_size
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      9c40c49f
    • C
      SUNRPC: Add a transport-specific private field in rpc_rqst · 5a6d1db4
      Chuck Lever 提交于
      Currently there's a hidden and indirect mechanism for finding the
      rpcrdma_req that goes with an rpc_rqst. It depends on getting from
      the rq_buffer pointer in struct rpc_rqst to the struct
      rpcrdma_regbuf that controls that buffer, and then to the struct
      rpcrdma_req it goes with.
      
      This was done back in the day to avoid the need to add a per-rqst
      pointer or to alter the buf_free API when support for RPC-over-RDMA
      was introduced.
      
      I'm about to change the way regbuf's work to support larger inline
      thresholds. Now is a good time to replace this indirect mechanism
      with something that is more straightforward. I guess this should be
      considered a clean up.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      5a6d1db4
    • C
      SUNRPC: Separate buffer pointers for RPC Call and Reply messages · 68778945
      Chuck Lever 提交于
      For xprtrdma, the RPC Call and Reply buffers are involved in real
      I/O operations.
      
      To start with, the DMA direction of the I/O for a Call is opposite
      that of a Reply.
      
      In the current arrangement, the Reply buffer address is on a
      four-byte alignment just past the call buffer. Would be friendlier
      on some platforms if that was at a DMA cache alignment instead.
      
      Because the current arrangement allocates a single memory region
      which contains both buffers, the RPC Reply buffer often contains a
      page boundary in it when the Call buffer is large enough (which is
      frequent).
      
      It would be a little nicer for setting up DMA operations (and
      possible registration of the Reply buffer) if the two buffers were
      separated, well-aligned, and contained as few page boundaries as
      possible.
      
      Now, I could just pad out the single memory region used for the pair
      of buffers. But frequently that would mean a lot of unused space to
      ensure the Reply buffer did not have a page boundary.
      
      Add a separate pointer to rpc_rqst that points right to the RPC
      Reply buffer. This makes no difference to xprtsock, but it will help
      xprtrdma in subsequent patches.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      68778945
    • C
      SUNRPC: Generalize the RPC buffer release API · 3435c74a
      Chuck Lever 提交于
      xprtrdma needs to allocate the Call and Reply buffers separately.
      TBH, the reliance on using a single buffer for the pair of XDR
      buffers is transport implementation-specific.
      
      Instead of passing just the rq_buffer into the buf_free method, pass
      the task structure and let buf_free take care of freeing both
      XDR buffers at once.
      
      There's a micro-optimization here. In the common case, both
      xprt_release and the transport's buf_free method were checking if
      rq_buffer was NULL. Now the check is done only once per RPC.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      3435c74a
    • C
      SUNRPC: Generalize the RPC buffer allocation API · 5fe6eaa1
      Chuck Lever 提交于
      xprtrdma needs to allocate the Call and Reply buffers separately.
      TBH, the reliance on using a single buffer for the pair of XDR
      buffers is transport implementation-specific.
      
      Transports that want to allocate separate Call and Reply buffers
      will ignore the "size" argument anyway.  Don't bother passing it.
      
      The buf_alloc method can't return two pointers. Instead, make the
      method's return value an error code, and set the rq_buffer pointer
      in the method itself.
      
      This gives call_allocate an opportunity to terminate an RPC instead
      of looping forever when a permanent problem occurs. If a request is
      just bogus, or the transport is in a state where it can't allocate
      resources for any request, there needs to be a way to kill the RPC
      right there and not loop.
      
      This immediately fixes a rare problem in the backchannel send path,
      which loops if the server happens to send a CB request whose
      call+reply size is larger than a page (which it shouldn't do yet).
      
      One more issue: looks like xprt_inject_disconnect was incorrectly
      placed in the failure path in call_allocate. It needs to be in the
      success path, as it is for other call-sites.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      5fe6eaa1
    • C
      xprtrdma: Eliminate INLINE_THRESHOLD macros · eb342e9a
      Chuck Lever 提交于
      Clean up: r_xprt is already available everywhere these macros are
      invoked, so just dereference that directly.
      
      RPCRDMA_INLINE_PAD_VALUE is no longer used, so it can simply be
      removed.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      eb342e9a
  3. 12 7月, 2016 4 次提交
    • C
      xprtrdma: Place registered MWs on a per-req list · 9d6b0409
      Chuck Lever 提交于
      Instead of placing registered MWs sparsely into the rl_segments
      array, place these MWs on a per-req list.
      
      ro_unmap_{sync,safe} can then simply pull those MWs off the list
      instead of walking through the array.
      
      This change significantly reduces the size of struct rpcrdma_req
      by removing nsegs and rl_mw from every array element.
      
      As an additional clean-up, chunk co-ordinates are returned in the
      "*mw" output argument so they are no longer needed in every
      array element.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      9d6b0409
    • C
      xprtrdma: Allocate MRs on demand · e2ac236c
      Chuck Lever 提交于
      Frequent MR list exhaustion can impact I/O throughput, so enough MRs
      are always created during transport set-up to prevent running out.
      This means more MRs are created than most workloads need.
      
      Commit 94f58c58 ("xprtrdma: Allow Read list and Reply chunk
      simultaneously") introduced support for sending two chunk lists per
      RPC, which consumes more MRs per RPC.
      
      Instead of trying to provision more MRs, introduce a mechanism for
      allocating MRs on demand. A few MRs are allocated during transport
      set-up to kick things off.
      
      This significantly reduces the average number of MRs per transport
      while allowing the MR count to grow for workloads or devices that
      need more MRs.
      
      FRWR with mlx4 allocated almost 400 MRs per transport before this
      patch. Now it starts with 32.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      e2ac236c
    • C
      xprtrdma: Honor ->send_request API contract · 7a89f9c6
      Chuck Lever 提交于
      Commit c93c6223 ("xprtrdma: Disconnect on registration failure")
      added a disconnect for some RPC marshaling failures. This is needed
      only in a handful of cases, but it was triggering for simple stuff
      like temporary resource shortages. Try to straighten this out.
      
      Fix up the lower layers so they don't return -ENOMEM or other error
      codes that the RPC client's FSM doesn't explicitly recognize.
      
      Also fix up the places in the send_request path that do want a
      disconnect. For example, when ib_post_send or ib_post_recv fail,
      this is a sign that there is a send or receive queue resource
      miscalculation. That should be rare, and is a sign of a software
      bug. But xprtrdma can recover: disconnect to reset the transport and
      start over.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      7a89f9c6
    • C
      xprtrdma: Refactor MR recovery work queues · 505bbe64
      Chuck Lever 提交于
      I found that commit ead3f26e ("xprtrdma: Add ro_unmap_safe
      memreg method"), which introduces ro_unmap_safe, never wired up the
      FMR recovery worker.
      
      The FMR and FRWR recovery work queues both do the same thing.
      Instead of setting up separate individual work queues for this,
      schedule a delayed worker to deal with them, since recovering MRs is
      not performance-critical.
      
      Fixes: ead3f26e ("xprtrdma: Add ro_unmap_safe memreg method")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      505bbe64
  4. 18 5月, 2016 3 次提交
  5. 20 1月, 2016 1 次提交
  6. 19 12月, 2015 1 次提交
  7. 03 11月, 2015 4 次提交
  8. 28 9月, 2015 1 次提交
  9. 06 8月, 2015 3 次提交
  10. 13 6月, 2015 3 次提交
  11. 11 6月, 2015 2 次提交
  12. 05 6月, 2015 1 次提交
    • C
      rpcrdma: Merge svcrdma and xprtrdma modules into one · ffe1f0df
      Chuck Lever 提交于
      Bi-directional RPC support means code in svcrdma.ko invokes a bit of
      code in xprtrdma.ko, and vice versa. To avoid loader/linker loops,
      merge the server and client side modules together into a single
      module.
      
      When backchannel capabilities are added, the combined module will
      register all needed transport capabilities so that Upper Layer
      consumers automatically have everything needed to create a
      bi-directional transport connection.
      
      Module aliases are added for backwards compatibility with user
      space, which still may expect svcrdma.ko or xprtrdma.ko to be
      present.
      
      This commit reverts commit 2e8c12e1 ("xprtrdma: add separate
      Kconfig options for NFSoRDMA client and server support") and
      provides a single CONFIG option for enabling the new module.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      ffe1f0df
  13. 31 3月, 2015 3 次提交