1. 17 1月, 2020 3 次提交
  2. 16 1月, 2020 2 次提交
  3. 14 1月, 2020 4 次提交
  4. 08 1月, 2020 4 次提交
  5. 13 12月, 2019 1 次提交
  6. 24 11月, 2019 1 次提交
    • J
      RDMA/odp: Use mmu_interval_notifier_insert() · f25a546e
      Jason Gunthorpe 提交于
      Replace the internal interval tree based mmu notifier with the new common
      mmu_interval_notifier_insert() API. This removes a lot of code and fixes a
      deadlock that can be triggered in ODP:
      
       zap_page_range()
        mmu_notifier_invalidate_range_start()
         [..]
          ib_umem_notifier_invalidate_range_start()
             down_read(&per_mm->umem_rwsem)
        unmap_single_vma()
          [..]
            __split_huge_page_pmd()
              mmu_notifier_invalidate_range_start()
              [..]
                 ib_umem_notifier_invalidate_range_start()
                    down_read(&per_mm->umem_rwsem)   // DEADLOCK
      
              mmu_notifier_invalidate_range_end()
                 up_read(&per_mm->umem_rwsem)
        mmu_notifier_invalidate_range_end()
           up_read(&per_mm->umem_rwsem)
      
      The umem_rwsem is held across the range_start/end as the ODP algorithm for
      invalidate_range_end cannot tolerate changes to the interval
      tree. However, due to the nested invalidation regions the second
      down_read() can deadlock if there are competing writers. The new core code
      provides an alternative scheme to solve this problem.
      
      Fixes: ca748c39 ("RDMA/umem: Get rid of per_mm->notifier_count")
      Link: https://lore.kernel.org/r/20191112202231.3856-6-jgg@ziepe.caTested-by: NArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      f25a546e
  7. 23 11月, 2019 1 次提交
  8. 13 11月, 2019 1 次提交
  9. 07 11月, 2019 2 次提交
  10. 29 10月, 2019 1 次提交
    • B
      RDMA/core: Fix ib_dma_max_seg_size() · ecdfdfdb
      Bart Van Assche 提交于
      If dev->dma_device->params == NULL then the maximum DMA segment size is 64
      KB. See also the dma_get_max_seg_size() implementation. This patch fixes
      the following kernel warning:
      
        DMA-API: infiniband rxe0: mapping sg segment longer than device claims to support [len=126976] [max=65536]
        WARNING: CPU: 4 PID: 4848 at kernel/dma/debug.c:1220 debug_dma_map_sg+0x3d9/0x450
        RIP: 0010:debug_dma_map_sg+0x3d9/0x450
        Call Trace:
         srp_queuecommand+0x626/0x18d0 [ib_srp]
         scsi_queue_rq+0xd02/0x13e0 [scsi_mod]
         __blk_mq_try_issue_directly+0x2b3/0x3f0
         blk_mq_request_issue_directly+0xac/0xf0
         blk_insert_cloned_request+0xdf/0x170
         dm_mq_queue_rq+0x43d/0x830 [dm_mod]
         __blk_mq_try_issue_directly+0x2b3/0x3f0
         blk_mq_request_issue_directly+0xac/0xf0
         blk_mq_try_issue_list_directly+0xb8/0x170
         blk_mq_sched_insert_requests+0x23c/0x3b0
         blk_mq_flush_plug_list+0x529/0x730
         blk_flush_plug_list+0x21f/0x260
         blk_mq_make_request+0x56b/0xf20
         generic_make_request+0x196/0x660
         submit_bio+0xae/0x290
         blkdev_direct_IO+0x822/0x900
         generic_file_direct_write+0x110/0x200
         __generic_file_write_iter+0x124/0x2a0
         blkdev_write_iter+0x168/0x270
         aio_write+0x1c4/0x310
         io_submit_one+0x971/0x1390
         __x64_sys_io_submit+0x12a/0x390
         do_syscall_64+0x6f/0x2e0
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Link: https://lore.kernel.org/r/20191025225830.257535-2-bvanassche@acm.org
      Cc: <stable@vger.kernel.org>
      Fixes: 0b5cb330 ("RDMA/srp: Increase max_segment_size")
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      ecdfdfdb
  11. 23 10月, 2019 4 次提交
  12. 22 8月, 2019 2 次提交
  13. 21 8月, 2019 1 次提交
  14. 12 8月, 2019 1 次提交
  15. 06 8月, 2019 1 次提交
  16. 05 8月, 2019 1 次提交
  17. 01 8月, 2019 2 次提交
    • J
      RDMA/devices: Remove the lock around remove_client_context · 9cd58817
      Jason Gunthorpe 提交于
      Due to the complexity of client->remove() callbacks it is desirable to not
      hold any locks while calling them. Remove the last one by tracking only
      the highest client ID and running backwards from there over the xarray.
      
      Since the only purpose of that lock was to protect the linked list, we can
      drop the lock.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Link: https://lore.kernel.org/r/20190731081841.32345-3-leon@kernel.orgSigned-off-by: NDoug Ledford <dledford@redhat.com>
      9cd58817
    • J
      RDMA/devices: Do not deadlock during client removal · 621e55ff
      Jason Gunthorpe 提交于
      lockdep reports:
      
         WARNING: possible circular locking dependency detected
      
         modprobe/302 is trying to acquire lock:
         0000000007c8919c ((wq_completion)ib_cm){+.+.}, at: flush_workqueue+0xdf/0x990
      
         but task is already holding lock:
         000000002d3d2ca9 (&device->client_data_rwsem){++++}, at: remove_client_context+0x79/0xd0 [ib_core]
      
         which lock already depends on the new lock.
      
         the existing dependency chain (in reverse order) is:
      
         -> #2 (&device->client_data_rwsem){++++}:
                down_read+0x3f/0x160
                ib_get_net_dev_by_params+0xd5/0x200 [ib_core]
                cma_ib_req_handler+0x5f6/0x2090 [rdma_cm]
                cm_process_work+0x29/0x110 [ib_cm]
                cm_req_handler+0x10f5/0x1c00 [ib_cm]
                cm_work_handler+0x54c/0x311d [ib_cm]
                process_one_work+0x4aa/0xa30
                worker_thread+0x62/0x5b0
                kthread+0x1ca/0x1f0
                ret_from_fork+0x24/0x30
      
         -> #1 ((work_completion)(&(&work->work)->work)){+.+.}:
                process_one_work+0x45f/0xa30
                worker_thread+0x62/0x5b0
                kthread+0x1ca/0x1f0
                ret_from_fork+0x24/0x30
      
         -> #0 ((wq_completion)ib_cm){+.+.}:
                lock_acquire+0xc8/0x1d0
                flush_workqueue+0x102/0x990
                cm_remove_one+0x30e/0x3c0 [ib_cm]
                remove_client_context+0x94/0xd0 [ib_core]
                disable_device+0x10a/0x1f0 [ib_core]
                __ib_unregister_device+0x5a/0xe0 [ib_core]
                ib_unregister_device+0x21/0x30 [ib_core]
                mlx5_ib_stage_ib_reg_cleanup+0x9/0x10 [mlx5_ib]
                __mlx5_ib_remove+0x3d/0x70 [mlx5_ib]
                mlx5_ib_remove+0x12e/0x140 [mlx5_ib]
                mlx5_remove_device+0x144/0x150 [mlx5_core]
                mlx5_unregister_interface+0x3f/0xf0 [mlx5_core]
                mlx5_ib_cleanup+0x10/0x3a [mlx5_ib]
                __x64_sys_delete_module+0x227/0x350
                do_syscall_64+0xc3/0x6a4
                entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Which is due to the read side of the client_data_rwsem being obtained
      recursively through a work queue flush during cm client removal.
      
      The lock is being held across the remove in remove_client_context() so
      that the function is a fence, once it returns the client is removed. This
      is required so that the two callers do not proceed with destruction until
      the client completes removal.
      
      Instead of using client_data_rwsem use the existing device unregistration
      refcount and add a similar client unregistration (client->uses) refcount.
      
      This will fence the two unregistration paths without holding any locks.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 921eab11 ("RDMA/devices: Re-organize device.c locking")
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Link: https://lore.kernel.org/r/20190731081841.32345-2-leon@kernel.orgSigned-off-by: NDoug Ledford <dledford@redhat.com>
      621e55ff
  18. 09 7月, 2019 2 次提交
    • Y
      RDMA/core: Provide RDMA DIM support for ULPs · da662979
      Yamin Friedman 提交于
      Added the interface in the infiniband driver that applies the rdma_dim
      adaptive moderation. There is now a special function for allocating an
      ib_cq that uses rdma_dim.
      
      Performance improvement (ConnectX-5 100GbE, x86) running FIO benchmark over
      NVMf between two equal end-hosts with 56 cores across a Mellanox switch
      using null_blk device:
      
      READS without DIM:
      blk size | BW       | IOPS | 99th percentile latency  | 99.99th latency
      512B     | 3.8GiB/s | 7.7M | 1401  usec               | 2442  usec
      4k       | 7.0GiB/s | 1.8M | 4817  usec               | 6587  usec
      64k      | 10.7GiB/s| 175k | 9896  usec               | 10028 usec
      
      IO WRITES without DIM:
      blk size | BW       | IOPS | 99th percentile latency  | 99.99th latency
      512B     | 3.6GiB/s | 7.5M | 1434  usec               | 2474  usec
      4k       | 6.3GiB/s | 1.6M | 938   usec               | 1221  usec
      64k      | 10.7GiB/s| 175k | 8979  usec               | 12780 usec
      
      IO READS with DIM:
      blk size | BW       | IOPS | 99th percentile latency  | 99.99th latency
      512B     | 4GiB/s   | 8.2M | 816    usec              | 889   usec
      4k       | 10.1GiB/s| 2.65M| 3359   usec              | 5080  usec
      64k      | 10.7GiB/s| 175k | 9896   usec              | 10028 usec
      
      IO WRITES with DIM:
      blk size | BW       | IOPS  | 99th percentile latency | 99.99th latency
      512B     | 3.9GiB/s | 8.1M  | 799   usec              | 922   usec
      4k       | 9.6GiB/s | 2.5M  | 717   usec              | 1004  usec
      64k      | 10.7GiB/s| 176k  | 8586  usec              | 12256 usec
      
      The rdma_dim algorithm was designed to measure the effectiveness of
      moderation on the flow in a general way and thus should be appropriate
      for all RDMA storage protocols.
      
      rdma_dim is configured to be the default option based on performance
      improvement seen after extensive tests.
      Signed-off-by: NYamin Friedman <yaminf@mellanox.com>
      Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      da662979
    • D
      IB/mlx5: Report correctly tag matching rendezvous capability · 89705e92
      Danit Goldberg 提交于
      Userspace expects the IB_TM_CAP_RC bit to indicate that the device
      supports RC transport tag matching with rendezvous offload. However the
      firmware splits this into two capabilities for eager and rendezvous tag
      matching.
      
      Only if the FW supports both modes should userspace be told the tag
      matching capability is available.
      
      Cc: <stable@vger.kernel.org> # 4.13
      Fixes: eb761894 ("IB/mlx5: Fill XRQ capabilities")
      Signed-off-by: NDanit Goldberg <danitg@mellanox.com>
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      89705e92
  19. 05 7月, 2019 4 次提交
  20. 24 6月, 2019 2 次提交