1. 01 8月, 2019 2 次提交
    • J
      RDMA/devices: Remove the lock around remove_client_context · 9cd58817
      Jason Gunthorpe 提交于
      Due to the complexity of client->remove() callbacks it is desirable to not
      hold any locks while calling them. Remove the last one by tracking only
      the highest client ID and running backwards from there over the xarray.
      
      Since the only purpose of that lock was to protect the linked list, we can
      drop the lock.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Link: https://lore.kernel.org/r/20190731081841.32345-3-leon@kernel.orgSigned-off-by: NDoug Ledford <dledford@redhat.com>
      9cd58817
    • J
      RDMA/devices: Do not deadlock during client removal · 621e55ff
      Jason Gunthorpe 提交于
      lockdep reports:
      
         WARNING: possible circular locking dependency detected
      
         modprobe/302 is trying to acquire lock:
         0000000007c8919c ((wq_completion)ib_cm){+.+.}, at: flush_workqueue+0xdf/0x990
      
         but task is already holding lock:
         000000002d3d2ca9 (&device->client_data_rwsem){++++}, at: remove_client_context+0x79/0xd0 [ib_core]
      
         which lock already depends on the new lock.
      
         the existing dependency chain (in reverse order) is:
      
         -> #2 (&device->client_data_rwsem){++++}:
                down_read+0x3f/0x160
                ib_get_net_dev_by_params+0xd5/0x200 [ib_core]
                cma_ib_req_handler+0x5f6/0x2090 [rdma_cm]
                cm_process_work+0x29/0x110 [ib_cm]
                cm_req_handler+0x10f5/0x1c00 [ib_cm]
                cm_work_handler+0x54c/0x311d [ib_cm]
                process_one_work+0x4aa/0xa30
                worker_thread+0x62/0x5b0
                kthread+0x1ca/0x1f0
                ret_from_fork+0x24/0x30
      
         -> #1 ((work_completion)(&(&work->work)->work)){+.+.}:
                process_one_work+0x45f/0xa30
                worker_thread+0x62/0x5b0
                kthread+0x1ca/0x1f0
                ret_from_fork+0x24/0x30
      
         -> #0 ((wq_completion)ib_cm){+.+.}:
                lock_acquire+0xc8/0x1d0
                flush_workqueue+0x102/0x990
                cm_remove_one+0x30e/0x3c0 [ib_cm]
                remove_client_context+0x94/0xd0 [ib_core]
                disable_device+0x10a/0x1f0 [ib_core]
                __ib_unregister_device+0x5a/0xe0 [ib_core]
                ib_unregister_device+0x21/0x30 [ib_core]
                mlx5_ib_stage_ib_reg_cleanup+0x9/0x10 [mlx5_ib]
                __mlx5_ib_remove+0x3d/0x70 [mlx5_ib]
                mlx5_ib_remove+0x12e/0x140 [mlx5_ib]
                mlx5_remove_device+0x144/0x150 [mlx5_core]
                mlx5_unregister_interface+0x3f/0xf0 [mlx5_core]
                mlx5_ib_cleanup+0x10/0x3a [mlx5_ib]
                __x64_sys_delete_module+0x227/0x350
                do_syscall_64+0xc3/0x6a4
                entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Which is due to the read side of the client_data_rwsem being obtained
      recursively through a work queue flush during cm client removal.
      
      The lock is being held across the remove in remove_client_context() so
      that the function is a fence, once it returns the client is removed. This
      is required so that the two callers do not proceed with destruction until
      the client completes removal.
      
      Instead of using client_data_rwsem use the existing device unregistration
      refcount and add a similar client unregistration (client->uses) refcount.
      
      This will fence the two unregistration paths without holding any locks.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 921eab11 ("RDMA/devices: Re-organize device.c locking")
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Link: https://lore.kernel.org/r/20190731081841.32345-2-leon@kernel.orgSigned-off-by: NDoug Ledford <dledford@redhat.com>
      621e55ff
  2. 09 7月, 2019 2 次提交
    • Y
      RDMA/core: Provide RDMA DIM support for ULPs · da662979
      Yamin Friedman 提交于
      Added the interface in the infiniband driver that applies the rdma_dim
      adaptive moderation. There is now a special function for allocating an
      ib_cq that uses rdma_dim.
      
      Performance improvement (ConnectX-5 100GbE, x86) running FIO benchmark over
      NVMf between two equal end-hosts with 56 cores across a Mellanox switch
      using null_blk device:
      
      READS without DIM:
      blk size | BW       | IOPS | 99th percentile latency  | 99.99th latency
      512B     | 3.8GiB/s | 7.7M | 1401  usec               | 2442  usec
      4k       | 7.0GiB/s | 1.8M | 4817  usec               | 6587  usec
      64k      | 10.7GiB/s| 175k | 9896  usec               | 10028 usec
      
      IO WRITES without DIM:
      blk size | BW       | IOPS | 99th percentile latency  | 99.99th latency
      512B     | 3.6GiB/s | 7.5M | 1434  usec               | 2474  usec
      4k       | 6.3GiB/s | 1.6M | 938   usec               | 1221  usec
      64k      | 10.7GiB/s| 175k | 8979  usec               | 12780 usec
      
      IO READS with DIM:
      blk size | BW       | IOPS | 99th percentile latency  | 99.99th latency
      512B     | 4GiB/s   | 8.2M | 816    usec              | 889   usec
      4k       | 10.1GiB/s| 2.65M| 3359   usec              | 5080  usec
      64k      | 10.7GiB/s| 175k | 9896   usec              | 10028 usec
      
      IO WRITES with DIM:
      blk size | BW       | IOPS  | 99th percentile latency | 99.99th latency
      512B     | 3.9GiB/s | 8.1M  | 799   usec              | 922   usec
      4k       | 9.6GiB/s | 2.5M  | 717   usec              | 1004  usec
      64k      | 10.7GiB/s| 176k  | 8586  usec              | 12256 usec
      
      The rdma_dim algorithm was designed to measure the effectiveness of
      moderation on the flow in a general way and thus should be appropriate
      for all RDMA storage protocols.
      
      rdma_dim is configured to be the default option based on performance
      improvement seen after extensive tests.
      Signed-off-by: NYamin Friedman <yaminf@mellanox.com>
      Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      da662979
    • D
      IB/mlx5: Report correctly tag matching rendezvous capability · 89705e92
      Danit Goldberg 提交于
      Userspace expects the IB_TM_CAP_RC bit to indicate that the device
      supports RC transport tag matching with rendezvous offload. However the
      firmware splits this into two capabilities for eager and rendezvous tag
      matching.
      
      Only if the FW supports both modes should userspace be told the tag
      matching capability is available.
      
      Cc: <stable@vger.kernel.org> # 4.13
      Fixes: eb761894 ("IB/mlx5: Fill XRQ capabilities")
      Signed-off-by: NDanit Goldberg <danitg@mellanox.com>
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      89705e92
  3. 05 7月, 2019 4 次提交
  4. 24 6月, 2019 10 次提交
  5. 21 6月, 2019 1 次提交
  6. 19 6月, 2019 2 次提交
  7. 18 6月, 2019 1 次提交
  8. 12 6月, 2019 2 次提交
  9. 11 6月, 2019 3 次提交
  10. 22 5月, 2019 2 次提交
    • L
      RDMA/core: Make ib_destroy_cq() void · 890ac8d9
      Leon Romanovsky 提交于
      Kernel destroy CQ flows can't fail and the returned value of
      ib_destroy_cq() is not interested in those flows.
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      890ac8d9
    • L
      RDMA/srp: Rename SRP sysfs name after IB device rename trigger · dc1435c0
      Leon Romanovsky 提交于
      SRP logic used device name and port index as symlink to relevant
      kobject. If the IB device is renamed then the prior name will be re-used
      by the next device plugged in and sysfs will panic as SRP will try to
      re-use the same name.
      
       mlx5_ib: Mellanox Connect-IB Infiniband driver v5.0-0
       sysfs: cannot create duplicate filename '/class/infiniband_srp/srp-mlx5_0-1'
       CPU: 3 PID: 1107 Comm: modprobe Not tainted 5.1.0-for-upstream-perf-2019-05-12_15-09-52-87 #1
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
       Call Trace:
        dump_stack+0x5a/0x73
        sysfs_warn_dup+0x58/0x70
        sysfs_do_create_link_sd.isra.2+0xa3/0xb0
        device_add+0x33f/0x660
        srp_add_one+0x301/0x4f0 [ib_srp]
        add_client_context+0x99/0xe0 [ib_core]
        enable_device_and_get+0xd1/0x1b0 [ib_core]
        ib_register_device+0x533/0x710 [ib_core]
        ? mutex_lock+0xe/0x30
        __mlx5_ib_add+0x23/0x70 [mlx5_ib]
        mlx5_add_device+0x4e/0xd0 [mlx5_core]
        mlx5_register_interface+0x85/0xc0 [mlx5_core]
        ? 0xffffffffa0791000
        do_one_initcall+0x4b/0x1cb
        ? kmem_cache_alloc_trace+0xc6/0x1d0
        ? do_init_module+0x22/0x21f
        do_init_module+0x5a/0x21f
        load_module+0x17f2/0x1ca0
        ? m_show+0x1c0/0x1c0
        __do_sys_finit_module+0x94/0xe0
        do_syscall_64+0x48/0x120
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7f157cce10d9
      
      The module load/unload sequence was used to trigger such kernel panic:
       sudo modprobe ib_srp
       sudo modprobe -r mlx5_ib
       sudo modprobe -r mlx5_core
       sudo modprobe mlx5_core
      
      Have SRP track the name of the core device so that it can't have a name
      collision.
      
      Fixes: d21943dd ("RDMA/core: Implement IB device rename function")
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Reviewed-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      dc1435c0
  11. 07 5月, 2019 3 次提交
  12. 03 5月, 2019 2 次提交
  13. 02 5月, 2019 1 次提交
  14. 25 4月, 2019 1 次提交
  15. 09 4月, 2019 3 次提交
  16. 02 4月, 2019 1 次提交