1. 09 2月, 2017 1 次提交
    • C
      svcrdma: Clean up RPC-over-RDMA Reply header encoder · 98fc21d3
      Chuck Lever 提交于
      Replace C structure-based XDR decoding with pointer arithmetic.
      Pointer arithmetic is considered more portable, and is used
      throughout the kernel's existing XDR encoders. The gcc optimizer
      generates similar assembler code either way.
      
      Byte-swapping before a memory store on x86 typically results in an
      instruction pipeline stall. Avoid byte-swapping when encoding a new
      header.
      
      svcrdma currently doesn't alter a connection's credit grant value
      after the connection has been accepted, so it is effectively a
      constant. Cache the byte-swapped value in a separate field.
      
      Christoph suggested pulling the header encoding logic into the only
      function that uses it.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      98fc21d3
  2. 01 12月, 2016 4 次提交
    • C
      svcrdma: Break up dprintk format in svc_rdma_accept() · 07257450
      Chuck Lever 提交于
      The current code results in:
      
      Nov  7 14:50:19 klimt kernel: svcrdma: newxprt->sc_cm_id=ffff88085590c800,
       newxprt->sc_pd=ffff880852a7ce00#012    cm_id->device=ffff88084dd20000,
       sc_pd->device=ffff88084dd20000#012    cap.max_send_wr = 272#012
       cap.max_recv_wr = 34#012    cap.max_send_sge = 32#012
       cap.max_recv_sge = 32
      Nov  7 14:50:19 klimt kernel: svcrdma: new connection ffff880855908000
       accepted with the following attributes:#012    local_ip        :
       10.0.0.5#012    local_port#011     : 20049#012    remote_ip       :
       10.0.0.2#012    remote_port     : 59909#012    max_sge         : 32#012
       max_sge_rd      : 30#012    sq_depth        : 272#012    max_requests    :
       32#012    ord             : 16
      
      Split up the output over multiple dprintks and take the opportunity
      to fix the display of IPv6 addresses.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      07257450
    • C
      svcrdma: Remove svc_rdma_op_ctxt::wc_status · 96a58f9c
      Chuck Lever 提交于
      Clean up: Completion status is already reported in the individual
      completion handlers. Save a few bytes in struct svc_rdma_op_ctxt.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      96a58f9c
    • C
      svcrdma: Remove DMA map accounting · dd6fd213
      Chuck Lever 提交于
      Clean up: sc_dma_used is not required for correct operation. It is
      simply a debugging tool to report when svcrdma has leaked DMA maps.
      
      However, manipulating an atomic has a measurable CPU cost, and DMA
      map accounting specific to svcrdma will be meaningless once svcrdma
      is converted to use the new generic r/w API.
      
      A similar kind of debug accounting can be done simply by enabling
      the IOMMU or by using CONFIG_DMA_API_DEBUG, CONFIG_IOMMU_DEBUG, and
      CONFIG_IOMMU_LEAK.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      dd6fd213
    • C
      svcrdma: Remove BH-disabled spin locking in svc_rdma_send() · e4eb42ce
      Chuck Lever 提交于
      svcrdma's current SQ accounting algorithm takes sc_lock and disables
      bottom-halves while posting all RDMA Read, Write, and Send WRs.
      
      This is relatively heavyweight serialization. And note that Write and
      Send are already fully serialized by the xpt_mutex.
      
      Using a single atomic_t should be all that is necessary to guarantee
      that ib_post_send() is called only when there is enough space on the
      send queue. This is what the other RDMA-enabled storage targets do.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      e4eb42ce
  3. 14 11月, 2016 1 次提交
    • S
      sunrpc: svc_age_temp_xprts_now should not call setsockopt non-tcp transports · ea08e392
      Scott Mayhew 提交于
      This fixes the following panic that can occur with NFSoRDMA.
      
      general protection fault: 0000 [#1] SMP
      Modules linked in: rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi
      scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp
      scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm
      mlx5_ib ib_core intel_powerclamp coretemp kvm_intel kvm sg ioatdma
      ipmi_devintf ipmi_ssif dcdbas iTCO_wdt iTCO_vendor_support pcspkr
      irqbypass sb_edac shpchp dca crc32_pclmul ghash_clmulni_intel edac_core
      lpc_ich aesni_intel lrw gf128mul glue_helper ablk_helper mei_me mei
      ipmi_si cryptd wmi ipmi_msghandler acpi_pad acpi_power_meter nfsd
      auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod
      crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper
      syscopyarea sysfillrect sysimgblt ahci fb_sys_fops ttm libahci mlx5_core
      tg3 crct10dif_pclmul drm crct10dif_common
      ptp i2c_core libata crc32c_intel pps_core fjes dm_mirror dm_region_hash
      dm_log dm_mod
      CPU: 1 PID: 120 Comm: kworker/1:1 Not tainted 3.10.0-514.el7.x86_64 #1
      Hardware name: Dell Inc. PowerEdge R320/0KM5PX, BIOS 2.4.2 01/29/2015
      Workqueue: events check_lifetime
      task: ffff88031f506dd0 ti: ffff88031f584000 task.ti: ffff88031f584000
      RIP: 0010:[<ffffffff8168d847>]  [<ffffffff8168d847>]
      _raw_spin_lock_bh+0x17/0x50
      RSP: 0018:ffff88031f587ba8  EFLAGS: 00010206
      RAX: 0000000000020000 RBX: 20041fac02080072 RCX: ffff88031f587fd8
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 20041fac02080072
      RBP: ffff88031f587bb0 R08: 0000000000000008 R09: ffffffff8155be77
      R10: ffff880322a59b00 R11: ffffea000bf39f00 R12: 20041fac02080072
      R13: 000000000000000d R14: ffff8800c4fbd800 R15: 0000000000000001
      FS:  0000000000000000(0000) GS:ffff880322a40000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f3c52d4547e CR3: 00000000019ba000 CR4: 00000000001407e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Stack:
      20041fac02080002 ffff88031f587bd0 ffffffff81557830 20041fac02080002
      ffff88031f587c78 ffff88031f587c40 ffffffff8155ae08 000000010157df32
      0000000800000001 ffff88031f587c20 ffffffff81096acb ffffffff81aa37d0
      Call Trace:
      [<ffffffff81557830>] lock_sock_nested+0x20/0x50
      [<ffffffff8155ae08>] sock_setsockopt+0x78/0x940
      [<ffffffff81096acb>] ? lock_timer_base.isra.33+0x2b/0x50
      [<ffffffff8155397d>] kernel_setsockopt+0x4d/0x50
      [<ffffffffa0386284>] svc_age_temp_xprts_now+0x174/0x1e0 [sunrpc]
      [<ffffffffa03b681d>] nfsd_inetaddr_event+0x9d/0xd0 [nfsd]
      [<ffffffff81691ebc>] notifier_call_chain+0x4c/0x70
      [<ffffffff810b687d>] __blocking_notifier_call_chain+0x4d/0x70
      [<ffffffff810b68b6>] blocking_notifier_call_chain+0x16/0x20
      [<ffffffff815e8538>] __inet_del_ifa+0x168/0x2d0
      [<ffffffff815e8cef>] check_lifetime+0x25f/0x270
      [<ffffffff810a7f3b>] process_one_work+0x17b/0x470
      [<ffffffff810a8d76>] worker_thread+0x126/0x410
      [<ffffffff810a8c50>] ? rescuer_thread+0x460/0x460
      [<ffffffff810b052f>] kthread+0xcf/0xe0
      [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140
      [<ffffffff81696418>] ret_from_fork+0x58/0x90
      [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140
      Code: ca 75 f1 5d c3 0f 1f 80 00 00 00 00 eb d9 66 0f 1f 44 00 00 0f 1f
      44 00 00 55 48 89 e5 53 48 89 fb e8 7e 04 a0 ff b8 00 00 02 00 <f0> 0f
      c1 03 89 c2 c1 ea 10 66 39 c2 75 03 5b 5d c3 83 e2 fe 0f
      RIP  [<ffffffff8168d847>] _raw_spin_lock_bh+0x17/0x50
      RSP <ffff88031f587ba8>
      Signed-off-by: NScott Mayhew <smayhew@redhat.com>
      Fixes: c3d4879e ("sunrpc: Add a function to close temporary transports immediately")
      Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      ea08e392
  4. 24 9月, 2016 1 次提交
  5. 23 9月, 2016 3 次提交
    • C
      svcrdma: support Remote Invalidation · 25d55296
      Chuck Lever 提交于
      Support Remote Invalidation. A private message is exchanged with
      the client upon RDMA transport connect that indicates whether
      Send With Invalidation may be used by the server to send RPC
      replies. The invalidate_rkey is arbitrarily chosen from among
      rkeys present in the RPC-over-RDMA header's chunk lists.
      
      Send With Invalidate improves performance only when clients can
      recognize, while processing an RPC reply, that an rkey has already
      been invalidated. That has been submitted as a separate change.
      
      In the future, the RPC-over-RDMA protocol might support Remote
      Invalidation properly. The protocol needs to enable signaling
      between peers to indicate when Remote Invalidation can be used
      for each individual RPC.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      25d55296
    • C
      svcrdma: Server-side support for rpcrdma_connect_private · cc9d8340
      Chuck Lever 提交于
      Prepare to receive an RDMA-CM private message when handling a new
      connection attempt, and send a similar message as part of connection
      acceptance.
      
      Both sides can communicate their various implementation limits.
      Implementations that don't support this sideband protocol ignore it.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      cc9d8340
    • C
      svcrdma: Tail iovec leaves an orphaned DMA mapping · cace564f
      Chuck Lever 提交于
      The ctxt's count field is overloaded to mean the number of pages in
      the ctxt->page array and the number of SGEs in the ctxt->sge array.
      Typically these two numbers are the same.
      
      However, when an inline RPC reply is constructed from an xdr_buf
      with a tail iovec, the head and tail often occupy the same page,
      but each are DMA mapped independently. In that case, ->count equals
      the number of pages, but it does not equal the number of SGEs.
      There's one more SGE, for the tail iovec. Hence there is one more
      DMA mapping than there are pages in the ctxt->page array.
      
      This isn't a real problem until the server's iommu is enabled. Then
      each RPC reply that has content in that iovec orphans a DMA mapping
      that consists of real resources.
      
      krb5i and krb5p always populate that tail iovec. After a couple
      million sent krb5i/p RPC replies, the NFS server starts behaving
      erratically. Reboot is needed to clear the problem.
      
      Fixes: 9d11b51c ("svcrdma: Fix send_reply() scatter/gather set-up")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      cace564f
  6. 14 5月, 2016 3 次提交
  7. 02 3月, 2016 4 次提交
    • C
      svcrdma: Use new CQ API for RPC-over-RDMA server send CQs · be99bb11
      Chuck Lever 提交于
      Calling ib_poll_cq() to sort through WCs during a completion is a
      common pattern amongst RDMA consumers. Since commit 14d3a3b2
      ("IB: add a proper completion queue abstraction"), WC sorting can
      be handled by the IB core.
      
      By converting to this new API, svcrdma is made a better neighbor to
      other RDMA consumers, as it allows the core to schedule the delivery
      of completions more fairly amongst all active consumers.
      
      This new API also aims each completion at a function that is
      specific to the WR's opcode. Thus the ctxt->wr_op field and the
      switch in process_context is replaced by a set of methods that
      handle each completion type.
      
      Because each ib_cqe carries a pointer to a completion method, the
      core can now post operations on a consumer's QP, and handle the
      completions itself.
      
      The server's rdma_stat_sq_poll and rdma_stat_sq_prod metrics are no
      longer updated.
      
      As a clean up, the cq_event_handler, the dto_tasklet, and all
      associated locking is removed, as they are no longer referenced or
      used.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      be99bb11
    • C
      svcrdma: Use new CQ API for RPC-over-RDMA server receive CQs · 8bd5ba86
      Chuck Lever 提交于
      Calling ib_poll_cq() to sort through WCs during a completion is a
      common pattern amongst RDMA consumers. Since commit 14d3a3b2
      ("IB: add a proper completion queue abstraction"), WC sorting can
      be handled by the IB core.
      
      By converting to this new API, svcrdma is made a better neighbor to
      other RDMA consumers, as it allows the core to schedule the delivery
      of completions more fairly amongst all active consumers.
      
      Because each ib_cqe carries a pointer to a completion method, the
      core can now post operations on a consumer's QP, and handle the
      completions itself.
      
      svcrdma receive completions no longer use the dto_tasklet. Each
      polled Receive WC is now handled individually in soft IRQ context.
      
      The server transport's rdma_stat_rq_poll and rdma_stat_rq_prod
      metrics are no longer updated.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      8bd5ba86
    • C
      svcrdma: Make RDMA_ERROR messages work · a6081b82
      Chuck Lever 提交于
      Fix several issues with svc_rdma_send_error():
      
       - Post a receive buffer to replace the one that was consumed by
         the incoming request
       - Posting a send should use DMA_TO_DEVICE, not DMA_FROM_DEVICE
       - No need to put_page _and_ free pages in svc_rdma_put_context
       - Make sure the sge is set up completely in case the error
         path goes through svc_rdma_unmap_dma()
       - Replace the use of ENOSYS, which has a reserved meaning
      
      Related fixes in svc_rdma_recvfrom():
      
       - Don't leak the ctxt associated with the incoming request
       - Don't close the connection after sending an error reply
       - Let svc_rdma_send_error() figure out the right header error code
      
      As a last clean up, move svc_rdma_send_error() to svc_rdma_sendto.c
      with other similar functions. There is some common logic in these
      functions that could someday be combined to reduce code duplication.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NDevesh Sharma <devesh.sharma@broadcom.com>
      Tested-by: NDevesh Sharma <devesh.sharma@broadcom.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      a6081b82
    • C
      svcrdma: svc_rdma_post_recv() should close connection on error · bf36387a
      Chuck Lever 提交于
      Clean up: Most svc_rdma_post_recv() call sites close the transport
      connection when a receive cannot be posted. Wrap that in a common
      helper.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NDevesh Sharma <devesh.sharma@broadcom.com>
      Tested-by: NDevesh Sharma <devesh.sharma@broadcom.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      bf36387a
  8. 20 1月, 2016 9 次提交
  9. 23 12月, 2015 1 次提交
  10. 03 11月, 2015 1 次提交
  11. 29 10月, 2015 2 次提交
  12. 31 8月, 2015 2 次提交
  13. 29 8月, 2015 1 次提交
  14. 11 8月, 2015 1 次提交
  15. 21 7月, 2015 2 次提交
  16. 13 6月, 2015 1 次提交
  17. 05 6月, 2015 3 次提交