1. 01 8月, 2019 2 次提交
    • J
      RDMA/devices: Remove the lock around remove_client_context · 9cd58817
      Jason Gunthorpe 提交于
      Due to the complexity of client->remove() callbacks it is desirable to not
      hold any locks while calling them. Remove the last one by tracking only
      the highest client ID and running backwards from there over the xarray.
      
      Since the only purpose of that lock was to protect the linked list, we can
      drop the lock.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Link: https://lore.kernel.org/r/20190731081841.32345-3-leon@kernel.orgSigned-off-by: NDoug Ledford <dledford@redhat.com>
      9cd58817
    • J
      RDMA/devices: Do not deadlock during client removal · 621e55ff
      Jason Gunthorpe 提交于
      lockdep reports:
      
         WARNING: possible circular locking dependency detected
      
         modprobe/302 is trying to acquire lock:
         0000000007c8919c ((wq_completion)ib_cm){+.+.}, at: flush_workqueue+0xdf/0x990
      
         but task is already holding lock:
         000000002d3d2ca9 (&device->client_data_rwsem){++++}, at: remove_client_context+0x79/0xd0 [ib_core]
      
         which lock already depends on the new lock.
      
         the existing dependency chain (in reverse order) is:
      
         -> #2 (&device->client_data_rwsem){++++}:
                down_read+0x3f/0x160
                ib_get_net_dev_by_params+0xd5/0x200 [ib_core]
                cma_ib_req_handler+0x5f6/0x2090 [rdma_cm]
                cm_process_work+0x29/0x110 [ib_cm]
                cm_req_handler+0x10f5/0x1c00 [ib_cm]
                cm_work_handler+0x54c/0x311d [ib_cm]
                process_one_work+0x4aa/0xa30
                worker_thread+0x62/0x5b0
                kthread+0x1ca/0x1f0
                ret_from_fork+0x24/0x30
      
         -> #1 ((work_completion)(&(&work->work)->work)){+.+.}:
                process_one_work+0x45f/0xa30
                worker_thread+0x62/0x5b0
                kthread+0x1ca/0x1f0
                ret_from_fork+0x24/0x30
      
         -> #0 ((wq_completion)ib_cm){+.+.}:
                lock_acquire+0xc8/0x1d0
                flush_workqueue+0x102/0x990
                cm_remove_one+0x30e/0x3c0 [ib_cm]
                remove_client_context+0x94/0xd0 [ib_core]
                disable_device+0x10a/0x1f0 [ib_core]
                __ib_unregister_device+0x5a/0xe0 [ib_core]
                ib_unregister_device+0x21/0x30 [ib_core]
                mlx5_ib_stage_ib_reg_cleanup+0x9/0x10 [mlx5_ib]
                __mlx5_ib_remove+0x3d/0x70 [mlx5_ib]
                mlx5_ib_remove+0x12e/0x140 [mlx5_ib]
                mlx5_remove_device+0x144/0x150 [mlx5_core]
                mlx5_unregister_interface+0x3f/0xf0 [mlx5_core]
                mlx5_ib_cleanup+0x10/0x3a [mlx5_ib]
                __x64_sys_delete_module+0x227/0x350
                do_syscall_64+0xc3/0x6a4
                entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Which is due to the read side of the client_data_rwsem being obtained
      recursively through a work queue flush during cm client removal.
      
      The lock is being held across the remove in remove_client_context() so
      that the function is a fence, once it returns the client is removed. This
      is required so that the two callers do not proceed with destruction until
      the client completes removal.
      
      Instead of using client_data_rwsem use the existing device unregistration
      refcount and add a similar client unregistration (client->uses) refcount.
      
      This will fence the two unregistration paths without holding any locks.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 921eab11 ("RDMA/devices: Re-organize device.c locking")
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Link: https://lore.kernel.org/r/20190731081841.32345-2-leon@kernel.orgSigned-off-by: NDoug Ledford <dledford@redhat.com>
      621e55ff
  2. 23 7月, 2019 1 次提交
    • K
      IB/hfi1: Unreserve a flushed OPFN request · 2b74c878
      Kaike Wan 提交于
      When an OPFN request is flushed, the request is completed without
      unreserving itself from the send queue. Subsequently, when a new
      request is post sent, the following warning will be triggered:
      
      WARNING: CPU: 4 PID: 8130 at rdmavt/qp.c:1761 rvt_post_send+0x72a/0x880 [rdmavt]
      Call Trace:
      [<ffffffffbbb61e41>] dump_stack+0x19/0x1b
      [<ffffffffbb497688>] __warn+0xd8/0x100
      [<ffffffffbb4977cd>] warn_slowpath_null+0x1d/0x20
      [<ffffffffc01c941a>] rvt_post_send+0x72a/0x880 [rdmavt]
      [<ffffffffbb4dcabe>] ? account_entity_dequeue+0xae/0xd0
      [<ffffffffbb61d645>] ? __kmalloc+0x55/0x230
      [<ffffffffc04e1a4c>] ib_uverbs_post_send+0x37c/0x5d0 [ib_uverbs]
      [<ffffffffc04e5e36>] ? rdma_lookup_put_uobject+0x26/0x60 [ib_uverbs]
      [<ffffffffc04dbce6>] ib_uverbs_write+0x286/0x460 [ib_uverbs]
      [<ffffffffbb6f9457>] ? security_file_permission+0x27/0xa0
      [<ffffffffbb641650>] vfs_write+0xc0/0x1f0
      [<ffffffffbb64246f>] SyS_write+0x7f/0xf0
      [<ffffffffbbb74ddb>] system_call_fastpath+0x22/0x27
      
      This patch fixes the problem by moving rvt_qp_wqe_unreserve() into
      rvt_qp_complete_swqe() to simplify the code and make it less
      error-prone.
      
      Fixes: ca95f802 ("IB/hfi1: Unreserve a reserved request when it is completed")
      Link: https://lore.kernel.org/r/20190715164528.74174.31364.stgit@awfm-01.aw.intel.com
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NKaike Wan <kaike.wan@intel.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      2b74c878
  3. 09 7月, 2019 3 次提交
  4. 05 7月, 2019 8 次提交
  5. 29 6月, 2019 6 次提交
  6. 26 6月, 2019 1 次提交
    • D
      RDMA/netlink: Audit policy settings for netlink attributes · 34d65cd8
      Doug Ledford 提交于
      For all string attributes for which we don't currently accept the element
      as input, we only use it as output, set the string length to
      RDMA_NLDEV_ATTR_EMPTY_STRING which is defined as 1.  That way we will only
      accept a null string for that element.  This will prevent someone from
      writing a new input routine that uses the element without also updating
      the policy to have a valid value.
      
      Also while there, make sure the existing entries that are valid have the
      correct policy, if not, correct the policy.  Remove unnecessary checks
      for nla_strlcpy() overflow once the policy has been set correctly.
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      34d65cd8
  7. 24 6月, 2019 12 次提交
  8. 21 6月, 2019 3 次提交
  9. 19 6月, 2019 2 次提交
  10. 18 6月, 2019 1 次提交
  11. 12 6月, 2019 1 次提交