1. 26 5月, 2021 1 次提交
  2. 26 4月, 2021 10 次提交
  3. 06 2月, 2021 3 次提交
  4. 11 11月, 2020 5 次提交
  5. 22 6月, 2020 2 次提交
  6. 27 3月, 2020 9 次提交
  7. 15 1月, 2020 6 次提交
    • C
      xprtrdma: Allocate and map transport header buffers at connect time · b78de1dc
      Chuck Lever 提交于
      Currently the underlying RDMA device is chosen at transport set-up
      time. But it will soon be at connect time instead.
      
      The maximum size of a transport header is based on device
      capabilities. Thus transport header buffers have to be allocated
      _after_ the underlying device has been chosen (via address and route
      resolution); ie, in the connect worker.
      
      Thus, move the allocation of transport header buffers to the connect
      worker, after the point at which the underlying RDMA device has been
      chosen.
      
      This also means the RDMA device is available to do a DMA mapping of
      these buffers at connect time, instead of in the hot I/O path. Make
      that optimization as well.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      b78de1dc
    • C
      xprtrdma: Refactor frwr_is_supported · 25868e61
      Chuck Lever 提交于
      Refactor: Perform the "is supported" check in rpcrdma_ep_create()
      instead of in rpcrdma_ia_open(). frwr_open() is where most of the
      logic to query device attributes is already located.
      
      The current code displays a redundant error message when the device
      does not support FRWR. As an additional clean-up, this patch removes
      the extra message.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      25868e61
    • C
      xprtrdma: Eliminate per-transport "max pages" · 18d065a5
      Chuck Lever 提交于
      To support device hotplug and migrating a connection between devices
      of different capabilities, we have to guarantee that all in-kernel
      devices can support the same max NFS payload size (1 megabyte).
      
      This means that possibly one or two in-tree devices are no longer
      supported for NFS/RDMA because they cannot support 1MB rsize/wsize.
      The only one I confirmed was cxgb3, but it has already been removed
      from the kernel.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      18d065a5
    • C
      xprtrdma: Refactor initialization of ep->rep_max_requests · 7581d901
      Chuck Lever 提交于
      Clean up: there is no need to keep two copies of the same value.
      Also, in subsequent patches, rpcrdma_ep_create() will be called in
      the connect worker rather than at set-up time.
      
      Minor fix: Initialize the transport's sendctx to the value based on
      the capabilities of the underlying device, not the maximum setting.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      7581d901
    • C
      xprtrdma: Eliminate ri_max_send_sges · 2e870368
      Chuck Lever 提交于
      Clean-up. The max_send_sge value also happens to be stored in
      ep->rep_attr. Let's keep just a single copy.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      2e870368
    • C
      xprtrdma: Fix oops in Receive handler after device removal · 671c450b
      Chuck Lever 提交于
      Since v5.4, a device removal occasionally triggered this oops:
      
      Dec  2 17:13:53 manet kernel: BUG: unable to handle page fault for address: 0000000c00000219
      Dec  2 17:13:53 manet kernel: #PF: supervisor read access in kernel mode
      Dec  2 17:13:53 manet kernel: #PF: error_code(0x0000) - not-present page
      Dec  2 17:13:53 manet kernel: PGD 0 P4D 0
      Dec  2 17:13:53 manet kernel: Oops: 0000 [#1] SMP
      Dec  2 17:13:53 manet kernel: CPU: 2 PID: 468 Comm: kworker/2:1H Tainted: G        W         5.4.0-00050-g53717e43af61 #883
      Dec  2 17:13:53 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
      Dec  2 17:13:53 manet kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
      Dec  2 17:13:53 manet kernel: RIP: 0010:rpcrdma_wc_receive+0x7c/0xf6 [rpcrdma]
      Dec  2 17:13:53 manet kernel: Code: 6d 8b 43 14 89 c1 89 45 78 48 89 4d 40 8b 43 2c 89 45 14 8b 43 20 89 45 18 48 8b 45 20 8b 53 14 48 8b 30 48 8b 40 10 48 8b 38 <48> 8b 87 18 02 00 00 48 85 c0 75 18 48 8b 05 1e 24 c4 e1 48 85 c0
      Dec  2 17:13:53 manet kernel: RSP: 0018:ffffc900035dfe00 EFLAGS: 00010246
      Dec  2 17:13:53 manet kernel: RAX: ffff888467290000 RBX: ffff88846c638400 RCX: 0000000000000048
      Dec  2 17:13:53 manet kernel: RDX: 0000000000000048 RSI: 00000000f942e000 RDI: 0000000c00000001
      Dec  2 17:13:53 manet kernel: RBP: ffff888467611b00 R08: ffff888464e4a3c4 R09: 0000000000000000
      Dec  2 17:13:53 manet kernel: R10: ffffc900035dfc88 R11: fefefefefefefeff R12: ffff888865af4428
      Dec  2 17:13:53 manet kernel: R13: ffff888466023000 R14: ffff88846c63f000 R15: 0000000000000010
      Dec  2 17:13:53 manet kernel: FS:  0000000000000000(0000) GS:ffff88846fa80000(0000) knlGS:0000000000000000
      Dec  2 17:13:53 manet kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      Dec  2 17:13:53 manet kernel: CR2: 0000000c00000219 CR3: 0000000002009002 CR4: 00000000001606e0
      Dec  2 17:13:53 manet kernel: Call Trace:
      Dec  2 17:13:53 manet kernel: __ib_process_cq+0x5c/0x14e [ib_core]
      Dec  2 17:13:53 manet kernel: ib_cq_poll_work+0x26/0x70 [ib_core]
      Dec  2 17:13:53 manet kernel: process_one_work+0x19d/0x2cd
      Dec  2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf
      Dec  2 17:13:53 manet kernel: worker_thread+0x1a6/0x25a
      Dec  2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf
      Dec  2 17:13:53 manet kernel: kthread+0xf4/0xf9
      Dec  2 17:13:53 manet kernel: ? kthread_queue_delayed_work+0x74/0x74
      Dec  2 17:13:53 manet kernel: ret_from_fork+0x24/0x30
      
      The proximal cause is that this rpcrdma_rep has a rr_rdmabuf that
      is still pointing to the old ib_device, which has been freed. The
      only way that is possible is if this rpcrdma_rep was not destroyed
      by rpcrdma_ia_remove.
      
      Debugging showed that was indeed the case: this rpcrdma_rep was
      still in use by a completing RPC at the time of the device removal,
      and thus wasn't on the rep free list. So, it was not found by
      rpcrdma_reps_destroy().
      
      The fix is to introduce a list of all rpcrdma_reps so that they all
      can be found when a device is removed. That list is used to perform
      only regbuf DMA unmapping, replacing that call to
      rpcrdma_reps_destroy().
      
      Meanwhile, to prevent corruption of this list, I've moved the
      destruction of temp rpcrdma_rep objects to rpcrdma_post_recvs().
      rpcrdma_xprt_drain() ensures that post_recvs (and thus rep_destroy) is
      not invoked while rpcrdma_reps_unmap is walking rb_all_reps, thus
      protecting the rb_all_reps list.
      
      Fixes: b0b227f0 ("xprtrdma: Use an llist to manage free rpcrdma_reps")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      671c450b
  8. 24 10月, 2019 4 次提交