1. 15 1月, 2020 10 次提交
    • C
      xprtrdma: Destroy rpcrdma_rep when Receive is flushed · 85810388
      Chuck Lever 提交于
      This reduces the hardware and memory footprint of an unconnected
      transport.
      
      At some point in the future, transport reconnect will allow
      resolving the destination IP address through a different device. The
      current change enables reps for the new connection to be allocated
      on whichever NUMA node the new device affines to after a reconnect.
      
      Note that this does not destroy _all_ the transport's reps... there
      will be a few that are still part of a running RPC completion.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      85810388
    • C
      xprtrdma: Allocate and map transport header buffers at connect time · b78de1dc
      Chuck Lever 提交于
      Currently the underlying RDMA device is chosen at transport set-up
      time. But it will soon be at connect time instead.
      
      The maximum size of a transport header is based on device
      capabilities. Thus transport header buffers have to be allocated
      _after_ the underlying device has been chosen (via address and route
      resolution); ie, in the connect worker.
      
      Thus, move the allocation of transport header buffers to the connect
      worker, after the point at which the underlying RDMA device has been
      chosen.
      
      This also means the RDMA device is available to do a DMA mapping of
      these buffers at connect time, instead of in the hot I/O path. Make
      that optimization as well.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      b78de1dc
    • C
      xprtrdma: Refactor frwr_is_supported · 25868e61
      Chuck Lever 提交于
      Refactor: Perform the "is supported" check in rpcrdma_ep_create()
      instead of in rpcrdma_ia_open(). frwr_open() is where most of the
      logic to query device attributes is already located.
      
      The current code displays a redundant error message when the device
      does not support FRWR. As an additional clean-up, this patch removes
      the extra message.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      25868e61
    • C
      xprtrdma: Eliminate per-transport "max pages" · 18d065a5
      Chuck Lever 提交于
      To support device hotplug and migrating a connection between devices
      of different capabilities, we have to guarantee that all in-kernel
      devices can support the same max NFS payload size (1 megabyte).
      
      This means that possibly one or two in-tree devices are no longer
      supported for NFS/RDMA because they cannot support 1MB rsize/wsize.
      The only one I confirmed was cxgb3, but it has already been removed
      from the kernel.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      18d065a5
    • C
      xprtrdma: Refactor initialization of ep->rep_max_requests · 7581d901
      Chuck Lever 提交于
      Clean up: there is no need to keep two copies of the same value.
      Also, in subsequent patches, rpcrdma_ep_create() will be called in
      the connect worker rather than at set-up time.
      
      Minor fix: Initialize the transport's sendctx to the value based on
      the capabilities of the underlying device, not the maximum setting.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      7581d901
    • C
      xprtrdma: Make sendctx queue lifetime the same as connection lifetime · cb586dec
      Chuck Lever 提交于
      The size of the sendctx queue depends on the value stored in
      ia->ri_max_send_sges. This value is determined by querying the
      underlying device.
      
      Eventually, rpcrdma_ia_open() and rpcrdma_ep_create() will be called
      in the connect worker rather than at transport set-up time. The
      underlying device will not have been chosen device set-up time.
      
      The sendctx queue will thus have to be created after the underlying
      device has been chosen via address and route resolution; in other
      words, in the connect worker.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      cb586dec
    • C
      xprtrdma: Eliminate ri_max_send_sges · 2e870368
      Chuck Lever 提交于
      Clean-up. The max_send_sge value also happens to be stored in
      ep->rep_attr. Let's keep just a single copy.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      2e870368
    • C
      xprtrdma: Fix oops in Receive handler after device removal · 671c450b
      Chuck Lever 提交于
      Since v5.4, a device removal occasionally triggered this oops:
      
      Dec  2 17:13:53 manet kernel: BUG: unable to handle page fault for address: 0000000c00000219
      Dec  2 17:13:53 manet kernel: #PF: supervisor read access in kernel mode
      Dec  2 17:13:53 manet kernel: #PF: error_code(0x0000) - not-present page
      Dec  2 17:13:53 manet kernel: PGD 0 P4D 0
      Dec  2 17:13:53 manet kernel: Oops: 0000 [#1] SMP
      Dec  2 17:13:53 manet kernel: CPU: 2 PID: 468 Comm: kworker/2:1H Tainted: G        W         5.4.0-00050-g53717e43af61 #883
      Dec  2 17:13:53 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
      Dec  2 17:13:53 manet kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
      Dec  2 17:13:53 manet kernel: RIP: 0010:rpcrdma_wc_receive+0x7c/0xf6 [rpcrdma]
      Dec  2 17:13:53 manet kernel: Code: 6d 8b 43 14 89 c1 89 45 78 48 89 4d 40 8b 43 2c 89 45 14 8b 43 20 89 45 18 48 8b 45 20 8b 53 14 48 8b 30 48 8b 40 10 48 8b 38 <48> 8b 87 18 02 00 00 48 85 c0 75 18 48 8b 05 1e 24 c4 e1 48 85 c0
      Dec  2 17:13:53 manet kernel: RSP: 0018:ffffc900035dfe00 EFLAGS: 00010246
      Dec  2 17:13:53 manet kernel: RAX: ffff888467290000 RBX: ffff88846c638400 RCX: 0000000000000048
      Dec  2 17:13:53 manet kernel: RDX: 0000000000000048 RSI: 00000000f942e000 RDI: 0000000c00000001
      Dec  2 17:13:53 manet kernel: RBP: ffff888467611b00 R08: ffff888464e4a3c4 R09: 0000000000000000
      Dec  2 17:13:53 manet kernel: R10: ffffc900035dfc88 R11: fefefefefefefeff R12: ffff888865af4428
      Dec  2 17:13:53 manet kernel: R13: ffff888466023000 R14: ffff88846c63f000 R15: 0000000000000010
      Dec  2 17:13:53 manet kernel: FS:  0000000000000000(0000) GS:ffff88846fa80000(0000) knlGS:0000000000000000
      Dec  2 17:13:53 manet kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      Dec  2 17:13:53 manet kernel: CR2: 0000000c00000219 CR3: 0000000002009002 CR4: 00000000001606e0
      Dec  2 17:13:53 manet kernel: Call Trace:
      Dec  2 17:13:53 manet kernel: __ib_process_cq+0x5c/0x14e [ib_core]
      Dec  2 17:13:53 manet kernel: ib_cq_poll_work+0x26/0x70 [ib_core]
      Dec  2 17:13:53 manet kernel: process_one_work+0x19d/0x2cd
      Dec  2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf
      Dec  2 17:13:53 manet kernel: worker_thread+0x1a6/0x25a
      Dec  2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf
      Dec  2 17:13:53 manet kernel: kthread+0xf4/0xf9
      Dec  2 17:13:53 manet kernel: ? kthread_queue_delayed_work+0x74/0x74
      Dec  2 17:13:53 manet kernel: ret_from_fork+0x24/0x30
      
      The proximal cause is that this rpcrdma_rep has a rr_rdmabuf that
      is still pointing to the old ib_device, which has been freed. The
      only way that is possible is if this rpcrdma_rep was not destroyed
      by rpcrdma_ia_remove.
      
      Debugging showed that was indeed the case: this rpcrdma_rep was
      still in use by a completing RPC at the time of the device removal,
      and thus wasn't on the rep free list. So, it was not found by
      rpcrdma_reps_destroy().
      
      The fix is to introduce a list of all rpcrdma_reps so that they all
      can be found when a device is removed. That list is used to perform
      only regbuf DMA unmapping, replacing that call to
      rpcrdma_reps_destroy().
      
      Meanwhile, to prevent corruption of this list, I've moved the
      destruction of temp rpcrdma_rep objects to rpcrdma_post_recvs().
      rpcrdma_xprt_drain() ensures that post_recvs (and thus rep_destroy) is
      not invoked while rpcrdma_reps_unmap is walking rb_all_reps, thus
      protecting the rb_all_reps list.
      
      Fixes: b0b227f0 ("xprtrdma: Use an llist to manage free rpcrdma_reps")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      671c450b
    • C
      xprtrdma: Fix completion wait during device removal · 13cb886c
      Chuck Lever 提交于
      I've found that on occasion, "rmmod <dev>" will hang while if an NFS
      is under load.
      
      Ensure that ri_remove_done is initialized only just before the
      transport is woken up to force a close. This avoids the completion
      possibly getting initialized again while the CM event handler is
      waiting for a wake-up.
      
      Fixes: bebd0318 ("xprtrdma: Support unplugging an HCA from under an NFS mount")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      13cb886c
    • C
      xprtrdma: Fix create_qp crash on device unload · b32b9ed4
      Chuck Lever 提交于
      On device re-insertion, the RDMA device driver crashes trying to set
      up a new QP:
      
      Nov 27 16:32:06 manet kernel: BUG: kernel NULL pointer dereference, address: 00000000000001c0
      Nov 27 16:32:06 manet kernel: #PF: supervisor write access in kernel mode
      Nov 27 16:32:06 manet kernel: #PF: error_code(0x0002) - not-present page
      Nov 27 16:32:06 manet kernel: PGD 0 P4D 0
      Nov 27 16:32:06 manet kernel: Oops: 0002 [#1] SMP
      Nov 27 16:32:06 manet kernel: CPU: 1 PID: 345 Comm: kworker/u28:0 Tainted: G        W         5.4.0 #852
      Nov 27 16:32:06 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
      Nov 27 16:32:06 manet kernel: Workqueue: xprtiod xprt_rdma_connect_worker [rpcrdma]
      Nov 27 16:32:06 manet kernel: RIP: 0010:atomic_try_cmpxchg+0x2/0x12
      Nov 27 16:32:06 manet kernel: Code: ff ff 48 8b 04 24 5a c3 c6 07 00 0f 1f 40 00 c3 31 c0 48 81 ff 08 09 68 81 72 0c 31 c0 48 81 ff 83 0c 68 81 0f 92 c0 c3 8b 06 <f0> 0f b1 17 0f 94 c2 84 d2 75 02 89 06 88 d0 c3 53 ba 01 00 00 00
      Nov 27 16:32:06 manet kernel: RSP: 0018:ffffc900035abbf0 EFLAGS: 00010046
      Nov 27 16:32:06 manet kernel: RAX: 0000000000000000 RBX: 00000000000001c0 RCX: 0000000000000000
      Nov 27 16:32:06 manet kernel: RDX: 0000000000000001 RSI: ffffc900035abbfc RDI: 00000000000001c0
      Nov 27 16:32:06 manet kernel: RBP: ffffc900035abde0 R08: 000000000000000e R09: ffffffffffffc000
      Nov 27 16:32:06 manet kernel: R10: 0000000000000000 R11: 000000000002e800 R12: ffff88886169d9f8
      Nov 27 16:32:06 manet kernel: R13: ffff88886169d9f4 R14: 0000000000000246 R15: 0000000000000000
      Nov 27 16:32:06 manet kernel: FS:  0000000000000000(0000) GS:ffff88846fa40000(0000) knlGS:0000000000000000
      Nov 27 16:32:06 manet kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      Nov 27 16:32:06 manet kernel: CR2: 00000000000001c0 CR3: 0000000002009006 CR4: 00000000001606e0
      Nov 27 16:32:06 manet kernel: Call Trace:
      Nov 27 16:32:06 manet kernel: do_raw_spin_lock+0x2f/0x5a
      Nov 27 16:32:06 manet kernel: create_qp_common.isra.47+0x856/0xadf [mlx4_ib]
      Nov 27 16:32:06 manet kernel: ? slab_post_alloc_hook.isra.60+0xa/0x1a
      Nov 27 16:32:06 manet kernel: ? __kmalloc+0x125/0x139
      Nov 27 16:32:06 manet kernel: mlx4_ib_create_qp+0x57f/0x972 [mlx4_ib]
      
      The fix is to copy the qp_init_attr struct that was just created by
      rpcrdma_ep_create() instead of using the one from the previous
      connection instance.
      
      Fixes: 98ef77d1 ("xprtrdma: Send Queue size grows after a reconnect")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      b32b9ed4
  2. 24 10月, 2019 10 次提交
  3. 27 8月, 2019 2 次提交
  4. 22 8月, 2019 1 次提交
  5. 21 8月, 2019 9 次提交
  6. 20 8月, 2019 1 次提交
    • C
      xprtrdma: Boost maximum transport header size · f3c66a2f
      Chuck Lever 提交于
      Although I haven't seen any performance results that justify it,
      I've received several complaints that NFS/RDMA no longer supports
      a maximum rsize and wsize of 1MB. These days it is somewhat smaller.
      
      To simplify the logic that determines whether a chunk list is
      necessary, the implementation uses a fixed maximum size of the
      transport header. Currently that maximum size is 256 bytes, one
      quarter of the default inline threshold size for RPC/RDMA v1.
      
      Since commit a7886849 ("xprtrdma: Reduce max_frwr_depth"), the
      size of chunks is also smaller to take advantage of inline page
      lists in device internal MR data structures.
      
      The combination of these two design choices has reduced the maximum
      NFS rsize and wsize that can be used for most RNIC/HCAs. Increasing
      the maximum transport header size and the maximum number of RDMA
      segments it can contain increases the negotiated maximum rsize/wsize
      on common RNIC/HCAs.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      f3c66a2f
  7. 05 8月, 2019 1 次提交
  8. 09 7月, 2019 6 次提交