1. 18 5月, 2016 7 次提交
    • C
      xprtrdma: Allow Read list and Reply chunk simultaneously · 94f58c58
      Chuck Lever 提交于
      rpcrdma_marshal_req() makes a simplifying assumption: that NFS
      operations with large Call messages have small Reply messages, and
      vice versa. Therefore with RPC-over-RDMA, only one chunk type is
      ever needed for each Call/Reply pair, because one direction needs
      chunks, the other direction will always fit inline.
      
      In fact, this assumption is asserted in the code:
      
        if (rtype != rpcrdma_noch && wtype != rpcrdma_noch) {
        	dprintk("RPC:       %s: cannot marshal multiple chunk lists\n",
      		__func__);
      	return -EIO;
        }
      
      But RPCGSS_SEC breaks this assumption. Because krb5i and krb5p
      perform data transformation on RPC messages before they are
      transmitted, direct data placement techniques cannot be used, thus
      RPC messages must be sent via a Long call in both directions.
      All such calls are sent with a Position Zero Read chunk, and all
      such replies are handled with a Reply chunk. Thus the client must
      provide every Call/Reply pair with both a Read list and a Reply
      chunk.
      
      Without any special security in effect, NFSv4 WRITEs may now also
      use the Read list and provide a Reply chunk. The marshal_req
      logic was preventing that, meaning an NFSv4 WRITE with a large
      payload that included a GETATTR result larger than the inline
      threshold would fail.
      
      The code that encodes each chunk list is now completely contained in
      its own function. There is some code duplication, but the trade-off
      is that the overall logic should be more clear.
      
      Note that all three chunk lists now share the rl_segments array.
      Some additional per-req accounting is necessary to track this
      usage. For the same reasons that the above simplifying assumption
      has held true for so long, I don't expect more array elements are
      needed at this time.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      94f58c58
    • C
      xprtrdma: Update comments in rpcrdma_marshal_req() · 88b18a12
      Chuck Lever 提交于
      Update documenting comments to reflect code changes over the past
      year.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      88b18a12
    • C
      xprtrdma: Avoid using Write list for small NFS READ requests · cce6deeb
      Chuck Lever 提交于
      Avoid the latency and interrupt overhead of registering a Write
      chunk when handling NFS READ requests of a few hundred bytes or
      less.
      
      This change does not interoperate with Linux NFS/RDMA servers
      that do not have commit 9d11b51c ('svcrdma: Fix send_reply()
      scatter/gather set-up'). Commit 9d11b51c was introduced in v4.3,
      and is included in 4.2.y, 4.1.y, and 3.18.y.
      
      Oracle bug 22925946 has been filed to request that the above fix
      be included in the Oracle Linux UEK4 NFS/RDMA server.
      
      Red Hat bugzillas 1327280 and 1327554 have been filed to request
      that RHEL NFS/RDMA server backports include the above fix.
      
      Workaround: Replace the "proto=rdma,port=20049" mount options
      with "proto=tcp" until commit 9d11b51c is applied to your
      NFS server.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      cce6deeb
    • C
      xprtrdma: Prevent inline overflow · 302d3deb
      Chuck Lever 提交于
      When deciding whether to send a Call inline, rpcrdma_marshal_req
      doesn't take into account header bytes consumed by chunk lists.
      This results in Call messages on the wire that are sometimes larger
      than the inline threshold.
      
      Likewise, when a Write list or Reply chunk is in play, the server's
      reply has to emit an RDMA Send that includes a larger-than-minimal
      RPC-over-RDMA header.
      
      The actual size of a Call message cannot be estimated until after
      the chunk lists have been registered. Thus the size of each
      RPC-over-RDMA header can be estimated only after chunks are
      registered; but the decision to register chunks is based on the size
      of that header. Chicken, meet egg.
      
      The best a client can do is estimate header size based on the
      largest header that might occur, and then ensure that inline content
      is always smaller than that.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      302d3deb
    • C
      xprtrdma: Limit number of RDMA segments in RPC-over-RDMA headers · 94931746
      Chuck Lever 提交于
      Send buffer space is shared between the RPC-over-RDMA header and
      an RPC message. A large RPC-over-RDMA header means less space is
      available for the associated RPC message, which then has to be
      moved via an RDMA Read or Write.
      
      As more segments are added to the chunk lists, the header increases
      in size.  Typical modern hardware needs only a few segments to
      convey the maximum payload size, but some devices and registration
      modes may need a lot of segments to convey data payload. Sometimes
      so many are needed that the remaining space in the Send buffer is
      not enough for the RPC message. Sending such a message usually
      fails.
      
      To ensure a transport can always make forward progress, cap the
      number of RDMA segments that are allowed in chunk lists. This
      prevents less-capable devices and memory registrations from
      consuming a large portion of the Send buffer by reducing the
      maximum data payload that can be conveyed with such devices.
      
      For now I choose an arbitrary maximum of 8 RDMA segments. This
      allows a maximum size RPC-over-RDMA header to fit nicely in the
      current 1024 byte inline threshold with over 700 bytes remaining
      for an inline RPC message.
      
      The current maximum data payload of NFS READ or WRITE requests is
      one megabyte. To convey that payload on a client with 4KB pages,
      each chunk segment would need to handle 32 or more data pages. This
      is well within the capabilities of FMR. For physical registration,
      the maximum payload size on platforms with 4KB pages is reduced to
      32KB.
      
      For FRWR, a device's maximum page list depth would need to be at
      least 34 to support the maximum 1MB payload. A device with a smaller
      maximum page list depth means the maximum data payload is reduced
      when using that device.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      94931746
    • C
      xprtrdma: Bound the inline threshold values · 29c55422
      Chuck Lever 提交于
      Currently the sysctls that allow setting the inline threshold allow
      any value to be set.
      
      Small values only make the transport run slower. The default 1KB
      setting is as low as is reasonable. And the logic that decides how
      to divide a Send buffer between RPC-over-RDMA header and RPC message
      assumes (but does not check) that the lower bound is not crazy (say,
      57 bytes).
      
      Send and receive buffers share a page with some control information.
      Values larger than about 3KB can't be supported, currently.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      29c55422
    • C
      sunrpc: Advertise maximum backchannel payload size · 6b26cc8c
      Chuck Lever 提交于
      RPC-over-RDMA transports have a limit on how large a backward
      direction (backchannel) RPC message can be. Ensure that the NFSv4.x
      CREATE_SESSION operation advertises this limit to servers.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      6b26cc8c
  2. 15 3月, 2016 9 次提交
  3. 02 3月, 2016 11 次提交
  4. 17 2月, 2016 1 次提交
  5. 20 1月, 2016 11 次提交
  6. 07 1月, 2016 1 次提交
    • J
      Revert "svcrdma: Do not send XDR roundup bytes for a write chunk" · 3daa020f
      J. Bruce Fields 提交于
      This reverts commit 6f18dc89.
      
      Just as one example, it appears this code could do the wrong thing in
      the case of a two-byte NFS READ that crosses a page boundary.
      
      Chuck says: "In that case, nfsd would pass down an xdr_buf that has one
      byte in a page, one byte in another page, and a two-byte XDR pad. The
      logic introduced by this optimization would be fooled, and neither the
      second byte nor the XDR pad would be written to the client."
      
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      3daa020f