1. 13 3月, 2020 12 次提交
  2. 17 1月, 2020 1 次提交
  3. 16 1月, 2020 2 次提交
    • J
      RDMA/mlx5: Fix handling of IOVA != user_va in ODP paths · 8ffc3248
      Jason Gunthorpe 提交于
      Till recently it was not possible for userspace to specify a different
      IOVA, but with the new ibv_reg_mr_iova() library call this can be done.
      
      To compute the user_va we must compute:
        user_va = (iova - iova_start) + user_va_start
      
      while being cautious of overflow and other math problems.
      
      The iova is not reliably stored in the mmkey when the MR is created. Only
      the cached creation path (the common one) set it, so it must also be set
      when creating uncached MRs.
      
      Fix the weird use of iova when computing the starting page index in the
      MR. In the normal case, when iova == umem.address:
        iova & (~(BIT(page_shift) - 1)) ==
        ALIGN_DOWN(umem.address, odp->page_size) ==
        ib_umem_start(odp)
      
      And when iova is different using it in math with a user_va is wrong.
      
      Finally, do not allow an implicit ODP to be created with a non-zero IOVA
      as we have no support for that.
      
      Fixes: 7bdf65d4 ("IB/mlx5: Handle page faults")
      Signed-off-by: NMoni Shoua <monis@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      8ffc3248
    • M
      IB: Allow calls to ib_umem_get from kernel ULPs · c320e527
      Moni Shoua 提交于
      So far the assumption was that ib_umem_get() and ib_umem_odp_get()
      are called from flows that start in UVERBS and therefore has a user
      context. This assumption restricts flows that are initiated by ULPs
      and need the service that ib_umem_get() provides.
      
      This patch changes ib_umem_get() and ib_umem_odp_get() to get IB device
      directly by relying on the fact that both UVERBS and ULPs sets that
      field correctly.
      Reviewed-by: NGuy Levi <guyle@mellanox.com>
      Signed-off-by: NMoni Shoua <monis@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      c320e527
  4. 04 1月, 2020 2 次提交
  5. 24 11月, 2019 1 次提交
    • J
      RDMA/odp: Use mmu_interval_notifier_insert() · f25a546e
      Jason Gunthorpe 提交于
      Replace the internal interval tree based mmu notifier with the new common
      mmu_interval_notifier_insert() API. This removes a lot of code and fixes a
      deadlock that can be triggered in ODP:
      
       zap_page_range()
        mmu_notifier_invalidate_range_start()
         [..]
          ib_umem_notifier_invalidate_range_start()
             down_read(&per_mm->umem_rwsem)
        unmap_single_vma()
          [..]
            __split_huge_page_pmd()
              mmu_notifier_invalidate_range_start()
              [..]
                 ib_umem_notifier_invalidate_range_start()
                    down_read(&per_mm->umem_rwsem)   // DEADLOCK
      
              mmu_notifier_invalidate_range_end()
                 up_read(&per_mm->umem_rwsem)
        mmu_notifier_invalidate_range_end()
           up_read(&per_mm->umem_rwsem)
      
      The umem_rwsem is held across the range_start/end as the ODP algorithm for
      invalidate_range_end cannot tolerate changes to the interval
      tree. However, due to the nested invalidation regions the second
      down_read() can deadlock if there are competing writers. The new core code
      provides an alternative scheme to solve this problem.
      
      Fixes: ca748c39 ("RDMA/umem: Get rid of per_mm->notifier_count")
      Link: https://lore.kernel.org/r/20191112202231.3856-6-jgg@ziepe.caTested-by: NArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      f25a546e
  6. 17 11月, 2019 1 次提交
  7. 01 11月, 2019 1 次提交
  8. 29 10月, 2019 6 次提交
    • J
      RDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroy · 09689703
      Jason Gunthorpe 提交于
      For creation, as soon as the umem_odp is created the notifier can be
      called, however the underlying MR may not have been setup yet. This would
      cause problems if mlx5_ib_invalidate_range() runs. There is some
      confusing/ulocked/racy code that might by trying to solve this, but
      without locks it isn't going to work right.
      
      Instead trivially solve the problem by short-circuiting the invalidation
      if there are not yet any DMA mapped pages. By definition there is nothing
      to invalidate in this case.
      
      The create code will have the umem fully setup before anything is DMA
      mapped, and npages is fully locked by the umem_mutex.
      
      For destroy, invalidate the entire MR at the HW to stop DMA then DMA unmap
      the pages before destroying the MR. This drives npages to zero and
      prevents similar racing with invalidate while the MR is undergoing
      destruction.
      
      Arguably it would be better if the umem was created after the MR and
      destroyed before, but that would require a big rework of the MR code.
      
      Fixes: 6aec21f6 ("IB/mlx5: Page faults handling infrastructure")
      Link: https://lore.kernel.org/r/20191009160934.3143-15-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      09689703
    • J
      RDMA/mlx5: Rework implicit ODP destroy · 5256edcb
      Jason Gunthorpe 提交于
      Use SRCU in a sensible way by removing all MRs in the implicit tree from
      the two xarrays (the update operation), then a synchronize, followed by a
      normal single threaded teardown.
      
      This is only a little unusual from the normal pattern as there can still
      be some work pending in the unbound wq that may also require a workqueue
      flush. This is tracked with a single atomic, consolidating the redundant
      existing atomics and wait queue.
      
      For understand-ability the entire ODP implicit create/destroy flow now
      largely exists in a single pair of functions within odp.c, with a few
      support functions for tearing down an unused child.
      
      Link: https://lore.kernel.org/r/20191009160934.3143-13-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      5256edcb
    • J
    • J
      RDMA/mlx5: Use a dedicated mkey xarray for ODP · 806b101b
      Jason Gunthorpe 提交于
      There is a per device xarray storing mkeys that is used to store every
      mkey in the system. However, this xarray is now only read by ODP for
      certain ODP designated MRs (ODP, implicit ODP, MW, DEVX_INDIRECT).
      
      Create an xarray only for use by ODP, that only contains ODP related
      MKeys. This xarray is protected by SRCU and all erases are protected by a
      synchronize.
      
      This improves performance:
      
       - All MRs in the odp_mkeys xarray are ODP MRs, so some tests for is_odp()
         can be deleted. The xarray will also consume fewer nodes.
      
       - normal MR's are never mixed with ODP MRs in a SRCU data structure so
         performance sucking synchronize_srcu() on every MR destruction is not
         needed.
      
       - No smp_load_acquire(live) and xa_load() double barrier on read
      
      Due to the SRCU locking scheme care must be taken with the placement of
      the xa_store(). Once it completes the MR is immediately visible to other
      threads and only through a xa_erase() & synchronize_srcu() cycle could it
      be destroyed.
      
      Link: https://lore.kernel.org/r/20191009160934.3143-4-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      806b101b
    • J
      RDMA/mlx5: Split sig_err MR data into its own xarray · 50211ec9
      Jason Gunthorpe 提交于
      The locking model for signature is completely different than ODP, do not
      share the same xarray that relies on SRCU locking to support ODP.
      
      Simply store the active mlx5_core_sig_ctx's in an xarray when signature
      MRs are created and rely on trivial xarray locking to serialize
      everything.
      
      The overhead of storing only a handful of SIG related MRs is going to be
      much less than an xarray full of every mkey.
      
      Link: https://lore.kernel.org/r/20191009160934.3143-3-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
      Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      50211ec9
    • J
      RDMA/mlx5: Use irq xarray locking for mkey_table · 1524b12a
      Jason Gunthorpe 提交于
      The mkey_table xarray is touched by the reg_mr_callback() function which
      is called from a hard irq. Thus all other uses of xa_lock must use the
      _irq variants.
      
        WARNING: inconsistent lock state
        5.4.0-rc1 #12 Not tainted
        --------------------------------
        inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
        python3/343 [HC0[0]:SC0[0]:HE1:SE1] takes:
        ffff888182be1d40 (&(&xa->xa_lock)->rlock#3){?.-.}, at: xa_erase+0x12/0x30
        {IN-HARDIRQ-W} state was registered at:
          lock_acquire+0xe1/0x200
          _raw_spin_lock_irqsave+0x35/0x50
          reg_mr_callback+0x2dd/0x450 [mlx5_ib]
          mlx5_cmd_exec_cb_handler+0x2c/0x70 [mlx5_core]
          mlx5_cmd_comp_handler+0x355/0x840 [mlx5_core]
         [..]
      
         Possible unsafe locking scenario:
      
               CPU0
               ----
          lock(&(&xa->xa_lock)->rlock#3);
          <Interrupt>
            lock(&(&xa->xa_lock)->rlock#3);
      
         *** DEADLOCK ***
      
        2 locks held by python3/343:
         #0: ffff88818eb4bd38 (&uverbs_dev->disassociate_srcu){....}, at: ib_uverbs_ioctl+0xe5/0x1e0 [ib_uverbs]
         #1: ffff888176c76d38 (&file->hw_destroy_rwsem){++++}, at: uobj_destroy+0x2d/0x90 [ib_uverbs]
      
        stack backtrace:
        CPU: 3 PID: 343 Comm: python3 Not tainted 5.4.0-rc1 #12
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x86/0xca
         print_usage_bug.cold.50+0x2e5/0x355
         mark_lock+0x871/0xb50
         ? match_held_lock+0x20/0x250
         ? check_usage_forwards+0x240/0x240
         __lock_acquire+0x7de/0x23a0
         ? __kasan_check_read+0x11/0x20
         ? mark_lock+0xae/0xb50
         ? mark_held_locks+0xb0/0xb0
         ? find_held_lock+0xca/0xf0
         lock_acquire+0xe1/0x200
         ? xa_erase+0x12/0x30
         _raw_spin_lock+0x2a/0x40
         ? xa_erase+0x12/0x30
         xa_erase+0x12/0x30
         mlx5_ib_dealloc_mw+0x55/0xa0 [mlx5_ib]
         uverbs_dealloc_mw+0x3c/0x70 [ib_uverbs]
         uverbs_free_mw+0x1a/0x20 [ib_uverbs]
         destroy_hw_idr_uobject+0x49/0xa0 [ib_uverbs]
         [..]
      
      Fixes: 04177915 ("RDMA/mlx5: Add missing synchronize_srcu() for MW cases")
      Link: https://lore.kernel.org/r/20191024234910.GA9038@ziepe.caAcked-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      1524b12a
  9. 09 10月, 2019 1 次提交
  10. 05 10月, 2019 4 次提交
  11. 22 8月, 2019 3 次提交
  12. 21 8月, 2019 2 次提交
  13. 01 8月, 2019 1 次提交
  14. 25 7月, 2019 3 次提交