1. 01 8月, 2019 1 次提交
    • J
      RDMA/devices: Do not deadlock during client removal · 621e55ff
      Jason Gunthorpe 提交于
      lockdep reports:
      
         WARNING: possible circular locking dependency detected
      
         modprobe/302 is trying to acquire lock:
         0000000007c8919c ((wq_completion)ib_cm){+.+.}, at: flush_workqueue+0xdf/0x990
      
         but task is already holding lock:
         000000002d3d2ca9 (&device->client_data_rwsem){++++}, at: remove_client_context+0x79/0xd0 [ib_core]
      
         which lock already depends on the new lock.
      
         the existing dependency chain (in reverse order) is:
      
         -> #2 (&device->client_data_rwsem){++++}:
                down_read+0x3f/0x160
                ib_get_net_dev_by_params+0xd5/0x200 [ib_core]
                cma_ib_req_handler+0x5f6/0x2090 [rdma_cm]
                cm_process_work+0x29/0x110 [ib_cm]
                cm_req_handler+0x10f5/0x1c00 [ib_cm]
                cm_work_handler+0x54c/0x311d [ib_cm]
                process_one_work+0x4aa/0xa30
                worker_thread+0x62/0x5b0
                kthread+0x1ca/0x1f0
                ret_from_fork+0x24/0x30
      
         -> #1 ((work_completion)(&(&work->work)->work)){+.+.}:
                process_one_work+0x45f/0xa30
                worker_thread+0x62/0x5b0
                kthread+0x1ca/0x1f0
                ret_from_fork+0x24/0x30
      
         -> #0 ((wq_completion)ib_cm){+.+.}:
                lock_acquire+0xc8/0x1d0
                flush_workqueue+0x102/0x990
                cm_remove_one+0x30e/0x3c0 [ib_cm]
                remove_client_context+0x94/0xd0 [ib_core]
                disable_device+0x10a/0x1f0 [ib_core]
                __ib_unregister_device+0x5a/0xe0 [ib_core]
                ib_unregister_device+0x21/0x30 [ib_core]
                mlx5_ib_stage_ib_reg_cleanup+0x9/0x10 [mlx5_ib]
                __mlx5_ib_remove+0x3d/0x70 [mlx5_ib]
                mlx5_ib_remove+0x12e/0x140 [mlx5_ib]
                mlx5_remove_device+0x144/0x150 [mlx5_core]
                mlx5_unregister_interface+0x3f/0xf0 [mlx5_core]
                mlx5_ib_cleanup+0x10/0x3a [mlx5_ib]
                __x64_sys_delete_module+0x227/0x350
                do_syscall_64+0xc3/0x6a4
                entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Which is due to the read side of the client_data_rwsem being obtained
      recursively through a work queue flush during cm client removal.
      
      The lock is being held across the remove in remove_client_context() so
      that the function is a fence, once it returns the client is removed. This
      is required so that the two callers do not proceed with destruction until
      the client completes removal.
      
      Instead of using client_data_rwsem use the existing device unregistration
      refcount and add a similar client unregistration (client->uses) refcount.
      
      This will fence the two unregistration paths without holding any locks.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 921eab11 ("RDMA/devices: Re-organize device.c locking")
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Link: https://lore.kernel.org/r/20190731081841.32345-2-leon@kernel.orgSigned-off-by: NDoug Ledford <dledford@redhat.com>
      621e55ff
  2. 23 7月, 2019 1 次提交
    • K
      IB/hfi1: Unreserve a flushed OPFN request · 2b74c878
      Kaike Wan 提交于
      When an OPFN request is flushed, the request is completed without
      unreserving itself from the send queue. Subsequently, when a new
      request is post sent, the following warning will be triggered:
      
      WARNING: CPU: 4 PID: 8130 at rdmavt/qp.c:1761 rvt_post_send+0x72a/0x880 [rdmavt]
      Call Trace:
      [<ffffffffbbb61e41>] dump_stack+0x19/0x1b
      [<ffffffffbb497688>] __warn+0xd8/0x100
      [<ffffffffbb4977cd>] warn_slowpath_null+0x1d/0x20
      [<ffffffffc01c941a>] rvt_post_send+0x72a/0x880 [rdmavt]
      [<ffffffffbb4dcabe>] ? account_entity_dequeue+0xae/0xd0
      [<ffffffffbb61d645>] ? __kmalloc+0x55/0x230
      [<ffffffffc04e1a4c>] ib_uverbs_post_send+0x37c/0x5d0 [ib_uverbs]
      [<ffffffffc04e5e36>] ? rdma_lookup_put_uobject+0x26/0x60 [ib_uverbs]
      [<ffffffffc04dbce6>] ib_uverbs_write+0x286/0x460 [ib_uverbs]
      [<ffffffffbb6f9457>] ? security_file_permission+0x27/0xa0
      [<ffffffffbb641650>] vfs_write+0xc0/0x1f0
      [<ffffffffbb64246f>] SyS_write+0x7f/0xf0
      [<ffffffffbbb74ddb>] system_call_fastpath+0x22/0x27
      
      This patch fixes the problem by moving rvt_qp_wqe_unreserve() into
      rvt_qp_complete_swqe() to simplify the code and make it less
      error-prone.
      
      Fixes: ca95f802 ("IB/hfi1: Unreserve a reserved request when it is completed")
      Link: https://lore.kernel.org/r/20190715164528.74174.31364.stgit@awfm-01.aw.intel.com
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NKaike Wan <kaike.wan@intel.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      2b74c878
  3. 09 7月, 2019 3 次提交
  4. 05 7月, 2019 8 次提交
  5. 29 6月, 2019 6 次提交
  6. 26 6月, 2019 1 次提交
    • D
      RDMA/netlink: Audit policy settings for netlink attributes · 34d65cd8
      Doug Ledford 提交于
      For all string attributes for which we don't currently accept the element
      as input, we only use it as output, set the string length to
      RDMA_NLDEV_ATTR_EMPTY_STRING which is defined as 1.  That way we will only
      accept a null string for that element.  This will prevent someone from
      writing a new input routine that uses the element without also updating
      the policy to have a valid value.
      
      Also while there, make sure the existing entries that are valid have the
      correct policy, if not, correct the policy.  Remove unnecessary checks
      for nla_strlcpy() overflow once the policy has been set correctly.
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      34d65cd8
  7. 24 6月, 2019 12 次提交
  8. 21 6月, 2019 3 次提交
  9. 19 6月, 2019 2 次提交
  10. 18 6月, 2019 1 次提交
  11. 12 6月, 2019 2 次提交