1. 09 4月, 2021 1 次提交
  2. 28 7月, 2020 2 次提交
    • C
      svcrdma: CM event handler clean up · b297fed6
      Chuck Lever 提交于
      Now that there's a core tracepoint that reports these events, there's
      no need to maintain dprintk() call sites in each arm of the switch
      statements.
      
      We also refresh the documenting comments.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      b297fed6
    • C
      svcrdma: Remove transport reference counting · 365e9992
      Chuck Lever 提交于
      Jason tells me that a ULP cannot rely on getting an ESTABLISHED
      and DISCONNECTED event pair for each connection, so transport
      reference counting in the CM event handler will never be reliable.
      
      Now that we have ib_drain_qp(), svcrdma should no longer need to
      hold transport references while Sends and Receives are posted. So
      remove the get/put call sites in the CM event handlers.
      
      This eliminates a significant source of locked memory bus traffic.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      365e9992
  3. 14 7月, 2020 1 次提交
  4. 18 5月, 2020 3 次提交
  5. 18 4月, 2020 1 次提交
    • C
      svcrdma: Fix leak of svc_rdma_recv_ctxt objects · 23cf1ee1
      Chuck Lever 提交于
      Utilize the xpo_release_rqst transport method to ensure that each
      rqstp's svc_rdma_recv_ctxt object is released even when the server
      cannot return a Reply for that rqstp.
      
      Without this fix, each RPC whose Reply cannot be sent leaks one
      svc_rdma_recv_ctxt. This is a 2.5KB structure, a 4KB DMA-mapped
      Receive buffer, and any pages that might be part of the Reply
      message.
      
      The leak is infrequent unless the network fabric is unreliable or
      Kerberos is in use, as GSS sequence window overruns, which result
      in connection loss, are more common on fast transports.
      
      Fixes: 3a88092e ("svcrdma: Preserve Receive buffer until svc_rdma_sendto")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      23cf1ee1
  6. 17 3月, 2020 2 次提交
    • C
      svcrdma: Remove svcrdma_cm_event() trace point · 2426ddfd
      Chuck Lever 提交于
      Clean up. This trace point is no longer needed because the RDMA/core
      CMA code has an equivalent trace point that was added by commit
      ed999f82 ("RDMA/cma: Add trace points in RDMA Connection
      Manager").
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      2426ddfd
    • C
      nfsd: Fix NFSv4 READ on RDMA when using readv · 41205539
      Chuck Lever 提交于
      svcrdma expects that the payload falls precisely into the xdr_buf
      page vector. This does not seem to be the case for
      nfsd4_encode_readv().
      
      This code is called only when fops->splice_read is missing or when
      RQ_SPLICE_OK is clear, so it's not a noticeable problem in many
      common cases.
      
      Add new transport method: ->xpo_read_payload so that when a READ
      payload does not fit exactly in rq_res's page vector, the XDR
      encoder can inform the RPC transport exactly where that payload is,
      without the payload's XDR pad.
      
      That way, when a Write chunk is present, the transport knows what
      byte range in the Reply message is supposed to be matched with the
      chunk.
      
      Note that the Linux NFS server implementation of NFS/RDMA can
      currently handle only one Write chunk per RPC-over-RDMA message.
      This simplifies the implementation of this fix.
      
      Fixes: b0420980 ("nfsd4: allow exotic read compounds")
      Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=198053Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      41205539
  7. 19 8月, 2019 2 次提交
  8. 05 8月, 2019 1 次提交
  9. 07 7月, 2019 1 次提交
  10. 20 6月, 2019 1 次提交
    • C
      svcrdma: Ignore source port when computing DRC hash · 1e091c3b
      Chuck Lever 提交于
      The DRC appears to be effectively empty after an RPC/RDMA transport
      reconnect. The problem is that each connection uses a different
      source port, which defeats the DRC hash.
      
      Clients always have to disconnect before they send retransmissions
      to reset the connection's credit accounting, thus every retransmit
      on NFS/RDMA will miss the DRC.
      
      An NFS/RDMA client's IP source port is meaningless for RDMA
      transports. The transport layer typically sets the source port value
      on the connection to a random ephemeral port. The server already
      ignores it for the "secure port" check. See commit 16e4d93f
      ("NFSD: Ignore client's source port on RDMA transports").
      
      The Linux NFS server's DRC resolves XID collisions from the same
      source IP address by using the checksum of the first 200 bytes of
      the RPC call header.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Cc: stable@vger.kernel.org # v4.14+
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      1e091c3b
  11. 07 2月, 2019 3 次提交
    • C
      svcrdma: Remove syslog warnings in work completion handlers · 8820bcaa
      Chuck Lever 提交于
      These can result in a lot of log noise, and are able to be triggered
      by client misbehavior. Since there are trace points in these
      handlers now, there's no need to spam the log.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      8820bcaa
    • C
      svcrdma: Squelch compiler warning when SUNRPC_DEBUG is disabled · c7920f06
      Chuck Lever 提交于
        CC [M]  net/sunrpc/xprtrdma/svc_rdma_transport.o
      linux/net/sunrpc/xprtrdma/svc_rdma_transport.c: In function ‘svc_rdma_accept’:
      linux/net/sunrpc/xprtrdma/svc_rdma_transport.c:452:19: warning: variable ‘sap’ set but not used [-Wunused-but-set-variable]
        struct sockaddr *sap;
                         ^
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      c7920f06
    • C
      svcrdma: Remove max_sge check at connect time · e248aa7b
      Chuck Lever 提交于
      Two and a half years ago, the client was changed to use gathered
      Send for larger inline messages, in commit 655fec69 ("xprtrdma:
      Use gathered Send for large inline messages"). Several fixes were
      required because there are a few in-kernel device drivers whose
      max_sge is 3, and these were broken by the change.
      
      Apparently my memory is going, because some time later, I submitted
      commit 25fd86ec ("svcrdma: Don't overrun the SGE array in
      svc_rdma_send_ctxt"), and after that, commit f3c1fd0e ("svcrdma:
      Reduce max_send_sges"). These too incorrectly assumed in-kernel
      device drivers would have more than a few Send SGEs available.
      
      The fix for the server side is not the same. This is because the
      fundamental problem on the server is that, whether or not the client
      has provisioned a chunk for the RPC reply, the server must squeeze
      even the most complex RPC replies into a single RDMA Send. Failing
      in the send path because of Send SGE exhaustion should never be an
      option.
      
      Therefore, instead of failing when the send path runs out of SGEs,
      switch to using a bounce buffer mechanism to handle RPC replies that
      are too complex for the device to send directly. That allows us to
      remove the max_sge check to enable drivers with small max_sge to
      work again.
      Reported-by: NDon Dutile <ddutile@redhat.com>
      Fixes: 25fd86ec ("svcrdma: Don't overrun the SGE array in ...")
      Cc: stable@vger.kernel.org
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      e248aa7b
  12. 28 12月, 2018 3 次提交
  13. 30 10月, 2018 1 次提交
  14. 10 8月, 2018 1 次提交
  15. 19 6月, 2018 1 次提交
  16. 12 5月, 2018 9 次提交
    • C
      svcrdma: Don't overrun the SGE array in svc_rdma_send_ctxt · 25fd86ec
      Chuck Lever 提交于
      Receive buffers are always the same size, but each Send WR has a
      variable number of SGEs, based on the contents of the xdr_buf being
      sent.
      
      While assembling a Send WR, keep track of the number of SGEs so that
      we don't exceed the device's maximum, or walk off the end of the
      Send SGE array.
      
      For now the Send path just fails if it exceeds the maximum.
      
      The current logic in svc_rdma_accept bases the maximum number of
      Send SGEs on the largest NFS request that can be sent or received.
      In the transport layer, the limit is actually based on the
      capabilities of the underlying device, not on properties of the
      Upper Layer Protocol.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      25fd86ec
    • C
      svcrdma: Introduce svc_rdma_send_ctxt · 4201c746
      Chuck Lever 提交于
      svc_rdma_op_ctxt's are pre-allocated and maintained on a per-xprt
      free list. This eliminates the overhead of calling kmalloc / kfree,
      both of which grab a globally shared lock that disables interrupts.
      Introduce a replacement to svc_rdma_op_ctxt's that is built
      especially for the svcrdma Send path.
      
      Subsequent patches will take advantage of this new structure by
      allocating real resources which are then cached in these objects.
      The allocations are freed when the transport is torn down.
      
      I've renamed the structure so that static type checking can be used
      to ensure that uses of op_ctxt and send_ctxt are not confused. As an
      additional clean up, structure fields are renamed to conform with
      kernel coding conventions.
      
      Additional clean ups:
      - Handle svc_rdma_send_ctxt_get allocation failure at each call
        site, rather than pre-allocating and hoping we guessed correctly
      - All send_ctxt_put call-sites request page freeing, so remove
        the @free_pages argument
      - All send_ctxt_put call-sites unmap SGEs, so fold that into
        svc_rdma_send_ctxt_put
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      4201c746
    • C
      svcrdma: Persistently allocate and DMA-map Receive buffers · 3316f063
      Chuck Lever 提交于
      The current Receive path uses an array of pages which are allocated
      and DMA mapped when each Receive WR is posted, and then handed off
      to the upper layer in rqstp::rq_arg. The page flip releases unused
      pages in the rq_pages pagelist. This mechanism introduces a
      significant amount of overhead.
      
      So instead, kmalloc the Receive buffer, and leave it DMA-mapped
      while the transport remains connected. This confers a number of
      benefits:
      
      * Each Receive WR requires only one receive SGE, no matter how large
        the inline threshold is. This helps the server-side NFS/RDMA
        transport operate on less capable RDMA devices.
      
      * The Receive buffer is left allocated and mapped all the time. This
        relieves svc_rdma_post_recv from the overhead of allocating and
        DMA-mapping a fresh buffer.
      
      * svc_rdma_wc_receive no longer has to DMA unmap the Receive buffer.
        It has to DMA sync only the number of bytes that were received.
      
      * svc_rdma_build_arg_xdr no longer has to free a page in rq_pages
        for each page in the Receive buffer, making it a constant-time
        function.
      
      * The Receive buffer is now plugged directly into the rq_arg's
        head[0].iov_vec, and can be larger than a page without spilling
        over into rq_arg's page list. This enables simplification of
        the RDMA Read path in subsequent patches.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      3316f063
    • C
      svcrdma: Remove sc_rq_depth · 2c577bfe
      Chuck Lever 提交于
      Clean up: No need to retain rq_depth in struct svcrdma_xprt, it is
      used only in svc_rdma_accept().
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      2c577bfe
    • C
      svcrdma: Introduce svc_rdma_recv_ctxt · ecf85b23
      Chuck Lever 提交于
      svc_rdma_op_ctxt's are pre-allocated and maintained on a per-xprt
      free list. This eliminates the overhead of calling kmalloc / kfree,
      both of which grab a globally shared lock that disables interrupts.
      To reduce contention further, separate the use of these objects in
      the Receive and Send paths in svcrdma.
      
      Subsequent patches will take advantage of this separation by
      allocating real resources which are then cached in these objects.
      The allocations are freed when the transport is torn down.
      
      I've renamed the structure so that static type checking can be used
      to ensure that uses of op_ctxt and recv_ctxt are not confused. As an
      additional clean up, structure fields are renamed to conform with
      kernel coding conventions.
      
      As a final clean up, helpers related to recv_ctxt are moved closer
      to the functions that use them.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      ecf85b23
    • C
      svcrdma: Trace key RDMA API events · bd2abef3
      Chuck Lever 提交于
      This includes:
        * Posting on the Send and Receive queues
        * Send, Receive, Read, and Write completion
        * Connect upcalls
        * QP errors
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      bd2abef3
    • C
      svcrdma: Trace key RPC/RDMA protocol events · 98895edb
      Chuck Lever 提交于
      This includes:
        * Transport accept and tear-down
        * Decisions about using Write and Reply chunks
        * Each RDMA segment that is handled
        * Whenever an RDMA_ERR is sent
      
      As a clean-up, I've standardized the order of the includes, and
      removed some now redundant dprintk call sites.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      98895edb
    • C
      svcrdma: Use passed-in net namespace when creating RDMA listener · 8dafcbee
      Chuck Lever 提交于
      Ensure each RDMA listener and its children transports are created in
      the same net namespace as the user that started the NFS service.
      This is similar to how listener sockets are created in
      svc_create_socket, required for enabling support for containers.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      8dafcbee
    • C
  17. 04 4月, 2018 2 次提交
  18. 21 3月, 2018 2 次提交
  19. 19 1月, 2018 1 次提交
  20. 08 11月, 2017 1 次提交
  21. 06 9月, 2017 1 次提交
    • C
      svcrdma: Estimate Send Queue depth properly · 26fb2254
      Chuck Lever 提交于
      The rdma_rw API adjusts max_send_wr upwards during the
      rdma_create_qp() call. If the ULP actually wants to take advantage
      of these extra resources, it must increase the size of its send
      completion queue (created before rdma_create_qp is called) and
      increase its send queue accounting limit.
      
      Use the new rdma_rw_mr_factor API to figure out the correct value
      to use for the Send Queue and Send Completion Queue depths.
      
      And, ensure that the chosen Send Queue depth for a newly created
      transport does not overrun the QP WR limit of the underlying device.
      
      Lastly, there's no longer a need to carry the Send Queue depth in
      struct svcxprt_rdma, since the value is used only in the
      svc_rdma_accept() path.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      26fb2254