1. 18 5月, 2016 4 次提交
    • C
      xprtrdma: Prevent inline overflow · 302d3deb
      Chuck Lever 提交于
      When deciding whether to send a Call inline, rpcrdma_marshal_req
      doesn't take into account header bytes consumed by chunk lists.
      This results in Call messages on the wire that are sometimes larger
      than the inline threshold.
      
      Likewise, when a Write list or Reply chunk is in play, the server's
      reply has to emit an RDMA Send that includes a larger-than-minimal
      RPC-over-RDMA header.
      
      The actual size of a Call message cannot be estimated until after
      the chunk lists have been registered. Thus the size of each
      RPC-over-RDMA header can be estimated only after chunks are
      registered; but the decision to register chunks is based on the size
      of that header. Chicken, meet egg.
      
      The best a client can do is estimate header size based on the
      largest header that might occur, and then ensure that inline content
      is always smaller than that.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      302d3deb
    • C
      xprtrdma: Limit number of RDMA segments in RPC-over-RDMA headers · 94931746
      Chuck Lever 提交于
      Send buffer space is shared between the RPC-over-RDMA header and
      an RPC message. A large RPC-over-RDMA header means less space is
      available for the associated RPC message, which then has to be
      moved via an RDMA Read or Write.
      
      As more segments are added to the chunk lists, the header increases
      in size.  Typical modern hardware needs only a few segments to
      convey the maximum payload size, but some devices and registration
      modes may need a lot of segments to convey data payload. Sometimes
      so many are needed that the remaining space in the Send buffer is
      not enough for the RPC message. Sending such a message usually
      fails.
      
      To ensure a transport can always make forward progress, cap the
      number of RDMA segments that are allowed in chunk lists. This
      prevents less-capable devices and memory registrations from
      consuming a large portion of the Send buffer by reducing the
      maximum data payload that can be conveyed with such devices.
      
      For now I choose an arbitrary maximum of 8 RDMA segments. This
      allows a maximum size RPC-over-RDMA header to fit nicely in the
      current 1024 byte inline threshold with over 700 bytes remaining
      for an inline RPC message.
      
      The current maximum data payload of NFS READ or WRITE requests is
      one megabyte. To convey that payload on a client with 4KB pages,
      each chunk segment would need to handle 32 or more data pages. This
      is well within the capabilities of FMR. For physical registration,
      the maximum payload size on platforms with 4KB pages is reduced to
      32KB.
      
      For FRWR, a device's maximum page list depth would need to be at
      least 34 to support the maximum 1MB payload. A device with a smaller
      maximum page list depth means the maximum data payload is reduced
      when using that device.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      94931746
    • C
      xprtrdma: Bound the inline threshold values · 29c55422
      Chuck Lever 提交于
      Currently the sysctls that allow setting the inline threshold allow
      any value to be set.
      
      Small values only make the transport run slower. The default 1KB
      setting is as low as is reasonable. And the logic that decides how
      to divide a Send buffer between RPC-over-RDMA header and RPC message
      assumes (but does not check) that the lower bound is not crazy (say,
      57 bytes).
      
      Send and receive buffers share a page with some control information.
      Values larger than about 3KB can't be supported, currently.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      29c55422
    • C
      sunrpc: Advertise maximum backchannel payload size · 6b26cc8c
      Chuck Lever 提交于
      RPC-over-RDMA transports have a limit on how large a backward
      direction (backchannel) RPC message can be. Ensure that the NFSv4.x
      CREATE_SESSION operation advertises this limit to servers.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      6b26cc8c
  2. 15 3月, 2016 9 次提交
  3. 02 3月, 2016 11 次提交
  4. 17 2月, 2016 1 次提交
  5. 20 1月, 2016 11 次提交
  6. 07 1月, 2016 1 次提交
    • J
      Revert "svcrdma: Do not send XDR roundup bytes for a write chunk" · 3daa020f
      J. Bruce Fields 提交于
      This reverts commit 6f18dc89.
      
      Just as one example, it appears this code could do the wrong thing in
      the case of a two-byte NFS READ that crosses a page boundary.
      
      Chuck says: "In that case, nfsd would pass down an xdr_buf that has one
      byte in a page, one byte in another page, and a two-byte XDR pad. The
      logic introduced by this optimization would be fooled, and neither the
      second byte nor the XDR pad would be written to the client."
      
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      3daa020f
  7. 23 12月, 2015 1 次提交
  8. 19 12月, 2015 2 次提交
    • C
      xprtrdma: Revert commit e7104a2a ('xprtrdma: Cap req_cqinit'). · 26ae9d1c
      Chuck Lever 提交于
      The root of the problem was that sends (especially unsignalled
      FASTREG and LOCAL_INV Work Requests) were not properly flow-
      controlled, which allowed a send queue overrun.
      
      Now that the RPC/RDMA reply handler waits for invalidation to
      complete, the send queue is properly flow-controlled. Thus this
      limit is no longer necessary.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      26ae9d1c
    • C
      xprtrdma: Invalidate in the RPC reply handler · 68791649
      Chuck Lever 提交于
      There is a window between the time the RPC reply handler wakes the
      waiting RPC task and when xprt_release() invokes ops->buf_free.
      During this time, memory regions containing the data payload may
      still be accessed by a broken or malicious server, but the RPC
      application has already been allowed access to the memory containing
      the RPC request's data payloads.
      
      The server should be fenced from client memory containing RPC data
      payloads _before_ the RPC application is allowed to continue.
      
      This change also more strongly enforces send queue accounting. There
      is a maximum number of RPC calls allowed to be outstanding. When an
      RPC/RDMA transport is set up, just enough send queue resources are
      allocated to handle registration, Send, and invalidation WRs for
      each those RPCs at the same time.
      
      Before, additional RPC calls could be dispatched while invalidation
      WRs were still consuming send WQEs. When invalidation WRs backed
      up, dispatching additional RPCs resulted in a send queue overrun.
      
      Now, the reply handler prevents RPC dispatch until invalidation is
      complete. This prevents RPC call dispatch until there are enough
      send queue resources to proceed.
      
      Still to do: If an RPC exits early (say, ^C), the reply handler has
      no opportunity to perform invalidation. Currently, xprt_rdma_free()
      still frees remaining RDMA resources, which could deadlock.
      Additional changes are needed to handle invalidation properly in this
      case.
      Reported-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      68791649