1. 12 5月, 2018 5 次提交
    • C
      svcrdma: Simplify svc_rdma_recv_ctxt_put · 1e5f4160
      Chuck Lever 提交于
      Currently svc_rdma_recv_ctxt_put's callers have to know whether they
      want to free the ctxt's pages or not. This means the human
      developers have to know when and why to set that free_pages
      argument.
      
      Instead, the ctxt should carry that information with it so that
      svc_rdma_recv_ctxt_put does the right thing no matter who is
      calling.
      
      We want to keep track of the number of pages in the Receive buffer
      separately from the number of pages pulled over by RDMA Read. This
      is so that the correct number of pages can be freed properly and
      that number is well-documented.
      
      So now, rc_hdr_count is the number of pages consumed by head[0]
      (ie., the page index where the Read chunk should start); and
      rc_page_count is always the number of pages that need to be released
      when the ctxt is put.
      
      The @free_pages argument is no longer needed.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      1e5f4160
    • C
      svcrdma: Introduce svc_rdma_recv_ctxt · ecf85b23
      Chuck Lever 提交于
      svc_rdma_op_ctxt's are pre-allocated and maintained on a per-xprt
      free list. This eliminates the overhead of calling kmalloc / kfree,
      both of which grab a globally shared lock that disables interrupts.
      To reduce contention further, separate the use of these objects in
      the Receive and Send paths in svcrdma.
      
      Subsequent patches will take advantage of this separation by
      allocating real resources which are then cached in these objects.
      The allocations are freed when the transport is torn down.
      
      I've renamed the structure so that static type checking can be used
      to ensure that uses of op_ctxt and recv_ctxt are not confused. As an
      additional clean up, structure fields are renamed to conform with
      kernel coding conventions.
      
      As a final clean up, helpers related to recv_ctxt are moved closer
      to the functions that use them.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      ecf85b23
    • C
      svcrdma: Trace key RDMA API events · bd2abef3
      Chuck Lever 提交于
      This includes:
        * Posting on the Send and Receive queues
        * Send, Receive, Read, and Write completion
        * Connect upcalls
        * QP errors
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      bd2abef3
    • C
      svcrdma: Trace key RPC/RDMA protocol events · 98895edb
      Chuck Lever 提交于
      This includes:
        * Transport accept and tear-down
        * Decisions about using Write and Reply chunks
        * Each RDMA segment that is handled
        * Whenever an RDMA_ERR is sent
      
      As a clean-up, I've standardized the order of the includes, and
      removed some now redundant dprintk call sites.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      98895edb
    • C
  2. 21 3月, 2018 1 次提交
  3. 19 1月, 2018 1 次提交
  4. 13 7月, 2017 2 次提交
    • C
      svcrdma: Properly compute .len and .buflen for received RPC Calls · 71641d99
      Chuck Lever 提交于
      When an RPC-over-RDMA request is received, the Receive buffer
      contains a Transport Header possibly followed by an RPC message.
      
      Even though rq_arg.head[0] (as passed to NFSD) does not contain the
      Transport Header header, currently rq_arg.len includes the size of
      the Transport Header.
      
      That violates the intent of the xdr_buf API contract. .buflen should
      include everything, but .len should be exactly the length of the RPC
      message in the buffer.
      
      The rq_arg fields are summed together at the end of
      svc_rdma_recvfrom to obtain the correct return value. rq_arg.len
      really ought to contain the correct number of bytes already, but it
      currently doesn't due to the above misbehavior.
      
      Let's instead ensure that .buflen includes the length of the
      transport header, and that .len is always equal to head.iov_len +
      .page_len + tail.iov_len .
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      71641d99
    • C
      svcrdma: Use generic RDMA R/W API in RPC Call path · cafc7398
      Chuck Lever 提交于
      The current svcrdma recvfrom code path has a lot of detail about
      registration mode and the type of port (iWARP, IB, etc).
      
      Instead, use the RDMA core's generic R/W API. This shares code with
      other RDMA-enabled ULPs that manages the gory details of buffer
      registration and the posting of RDMA Read Work Requests.
      
      Since the Read list marshaling code is being replaced, I took the
      opportunity to replace C structure-based XDR encoding code with more
      portable code that uses pointer arithmetic.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      cafc7398
  5. 29 6月, 2017 5 次提交
    • C
      svcrdma: Don't account for Receive queue "starvation" · 2d6491a5
      Chuck Lever 提交于
      >From what I can tell, calling ->recvfrom when there is no work to do
      is a normal part of operation. This is the only way svc_recv can
      tell when there is no more data ready to receive on the transport.
      
      Neither the TCP nor the UDP transport implementations have a
      "starve" metric.
      
      The cost of receive starvation accounting is bumping an atomic, which
      results in extra (IMO unnecessary) bus traffic between CPU sockets,
      while holding a spin lock.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      2d6491a5
    • C
      svcrdma: Improve Reply chunk sanity checking · ca5c76ab
      Chuck Lever 提交于
      Identify malformed transport headers and unsupported chunk
      combinations as early as possible.
      
      - Ensure that segment lengths are not crazy.
      
      - Ensure that the Reply chunk's segment count is not crazy.
      
      With a 1KB inline threshold, the largest number of Write segments
      that can be conveyed is about 60 (for a RDMA_NOMSG Reply message).
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      ca5c76ab
    • C
      svcrdma: Improve Write chunk sanity checking · 3c22f326
      Chuck Lever 提交于
      Identify malformed transport headers and unsupported chunk
      combinations as early as possible.
      
      - Reject RPC-over-RDMA messages that contain more than one Write
      chunk, since this implementation does not support more than one per
      message.
      
      - Ensure that segment lengths are not crazy.
      
      - Ensure that the chunk's segment count is not crazy.
      
      With a 1KB inline threshold, the largest number of Write segments
      that can be conveyed is about 60 (for a RDMA_NOMSG Reply message).
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      3c22f326
    • C
      svcrdma: Improve Read chunk sanity checking · e77340e0
      Chuck Lever 提交于
      Identify malformed transport headers and unsupported chunk
      combinations as early as possible.
      
      - Reject RPC-over-RDMA messages that contain more than one Read chunk,
        since this implementation currently does not support more than one
        per RPC transaction.
      
      - Ensure that segment lengths are not crazy.
      
      - Remove the segment count check. With a 1KB inline threshold, the
        largest number of Read segments that can be conveyed is about 40
        (for a RDMA_NOMSG Call message). This is nowhere near
        RPCSVC_MAXPAGES. As far as I can tell, that was just a sanity
        check and does not enforce an implementation limit.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      e77340e0
    • C
      svcrdma: Remove svc_rdma_marshal.c · a80a3234
      Chuck Lever 提交于
      svc_rdma_marshal.c has one remaining exported function --
      svc_rdma_xdr_decode_req -- and it has a single call site. Take
      the same approach as the sendto path, and move this function
      into the source file where it is called.
      
      This is a refactoring change only.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      a80a3234
  6. 26 4月, 2017 2 次提交
  7. 09 2月, 2017 2 次提交
  8. 13 1月, 2017 1 次提交
  9. 01 12月, 2016 4 次提交
    • C
      svcrdma: Remove unused variable in rdma_copy_tail() · f5426d37
      Chuck Lever 提交于
      Clean up.
      
      linux-2.6/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c: In function
       ‘rdma_copy_tail’:
      linux-2.6/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c:376:6: warning:
       variable ‘ret’ set but not used [-Wunused-but-set-variable]
        int ret;
            ^
      
      Fixes: a97c331f ("svcrdma: Handle additional inline content")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      f5426d37
    • C
      svcrdma: Remove svc_rdma_op_ctxt::wc_status · 96a58f9c
      Chuck Lever 提交于
      Clean up: Completion status is already reported in the individual
      completion handlers. Save a few bytes in struct svc_rdma_op_ctxt.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      96a58f9c
    • C
      svcrdma: Remove DMA map accounting · dd6fd213
      Chuck Lever 提交于
      Clean up: sc_dma_used is not required for correct operation. It is
      simply a debugging tool to report when svcrdma has leaked DMA maps.
      
      However, manipulating an atomic has a measurable CPU cost, and DMA
      map accounting specific to svcrdma will be meaningless once svcrdma
      is converted to use the new generic r/w API.
      
      A similar kind of debug accounting can be done simply by enabling
      the IOMMU or by using CONFIG_DMA_API_DEBUG, CONFIG_IOMMU_DEBUG, and
      CONFIG_IOMMU_LEAK.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      dd6fd213
    • C
      svcrdma: Renovate sendto chunk list parsing · 5fdca653
      Chuck Lever 提交于
      The current sendto code appears to support clients that provide only
      one of a Read list, a Write list, or a Reply chunk. My reading of
      that code is that it doesn't support the following cases:
      
       - Read list + Write list
       - Read list + Reply chunk
       - Write list + Reply chunk
       - Read list + Write list + Reply chunk
      
      The protocol allows more than one Read or Write chunk in those
      lists. Some clients do send a Read list and Reply chunk
      simultaneously. NFSv4 WRITE uses a Read list for the data payload,
      and a Reply chunk because the GETATTR result in the reply can
      contain a large object like an ACL.
      
      Generalize one of the sendto code paths needed to support all of
      the above cases, and attempt to ensure that only one pass is done
      through the RPC Call's transport header to gather chunk list
      information for building the reply.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      5fdca653
  10. 23 9月, 2016 1 次提交
    • C
      svcrdma: Tail iovec leaves an orphaned DMA mapping · cace564f
      Chuck Lever 提交于
      The ctxt's count field is overloaded to mean the number of pages in
      the ctxt->page array and the number of SGEs in the ctxt->sge array.
      Typically these two numbers are the same.
      
      However, when an inline RPC reply is constructed from an xdr_buf
      with a tail iovec, the head and tail often occupy the same page,
      but each are DMA mapped independently. In that case, ->count equals
      the number of pages, but it does not equal the number of SGEs.
      There's one more SGE, for the tail iovec. Hence there is one more
      DMA mapping than there are pages in the ctxt->page array.
      
      This isn't a real problem until the server's iommu is enabled. Then
      each RPC reply that has content in that iovec orphans a DMA mapping
      that consists of real resources.
      
      krb5i and krb5p always populate that tail iovec. After a couple
      million sent krb5i/p RPC replies, the NFS server starts behaving
      erratically. Reboot is needed to clear the problem.
      
      Fixes: 9d11b51c ("svcrdma: Fix send_reply() scatter/gather set-up")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      cace564f
  11. 14 5月, 2016 6 次提交
  12. 02 3月, 2016 6 次提交
  13. 20 1月, 2016 2 次提交
  14. 29 10月, 2015 1 次提交
  15. 12 10月, 2015 1 次提交