提交 · e0ebd8eb36ed850a22a9a0ca83edc4a40ad67c16 · openeuler / Kernel

17 1月, 2020 1 次提交

RDMA/mlx5: Set relaxed ordering when requested · d6de0bb1

由 Michael Guralnik 提交于 1月 08, 2020

Enable relaxed ordering in the mkey context when requested. As relaxed
ordering is not currently supported in UMR, disable UMR usage for relaxed
ordering MRs.

Link: https://lore.kernel.org/r/1578506740-22188-11-git-send-email-yishaih@mellanox.comSigned-off-by: NMichael Guralnik <michaelgur@mellanox.com>
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

d6de0bb1

16 1月, 2020 2 次提交

RDMA/mlx5: Fix handling of IOVA != user_va in ODP paths · 8ffc3248

由 Jason Gunthorpe 提交于 1月 15, 2020

Till recently it was not possible for userspace to specify a different
IOVA, but with the new ibv_reg_mr_iova() library call this can be done.

To compute the user_va we must compute:
  user_va = (iova - iova_start) + user_va_start

while being cautious of overflow and other math problems.

The iova is not reliably stored in the mmkey when the MR is created. Only
the cached creation path (the common one) set it, so it must also be set
when creating uncached MRs.

Fix the weird use of iova when computing the starting page index in the
MR. In the normal case, when iova == umem.address:
  iova & (~(BIT(page_shift) - 1)) ==
  ALIGN_DOWN(umem.address, odp->page_size) ==
  ib_umem_start(odp)

And when iova is different using it in math with a user_va is wrong.

Finally, do not allow an implicit ODP to be created with a non-zero IOVA
as we have no support for that.

Fixes: 7bdf65d4 ("IB/mlx5: Handle page faults")
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

8ffc3248

IB: Allow calls to ib_umem_get from kernel ULPs · c320e527

由 Moni Shoua 提交于 1月 15, 2020

So far the assumption was that ib_umem_get() and ib_umem_odp_get()
are called from flows that start in UVERBS and therefore has a user
context. This assumption restricts flows that are initiated by ULPs
and need the service that ib_umem_get() provides.

This patch changes ib_umem_get() and ib_umem_odp_get() to get IB device
directly by relying on the fact that both UVERBS and ULPs sets that
field correctly.
Reviewed-by: NGuy Levi <guyle@mellanox.com>
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

c320e527

04 1月, 2020 2 次提交

RDMA/mlx5: use true,false for bool variable · 2ab367a7

由 zhengbin 提交于 12月 24, 2019

Fixes coccicheck warning:

drivers/infiniband/hw/mlx5/mr.c:150:2-26: WARNING: Assignment of 0/1 to bool variable
drivers/infiniband/hw/mlx5/mr.c:1455:2-26: WARNING: Assignment of 0/1 to bool variable
drivers/infiniband/hw/mlx5/qp.c:1874:6-20: WARNING: Assignment of 0/1 to bool variable

Link: https://lore.kernel.org/r/1577176812-2238-6-git-send-email-zhengbin13@huawei.comReported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: Nzhengbin <zhengbin13@huawei.com>
Acked-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

2ab367a7

IB/mlx5: Unify ODP MR code paths to allow extra flexibility · cbe4b8f0

由 Artemy Kovalyov 提交于 12月 22, 2019

Building MR translation table in the ODP case requires additional
flexibility, namely random access to DMA addresses. Make both direct and
indirect ODP MR use same code path, separated from the non-ODP MR code
path.

With the restructuring the correct page_shift is now used around
__mlx5_ib_populate_pas().

Fixes: d2183c6f ("RDMA/umem: Move page_shift from ib_umem to ib_odp_umem")
Link: https://lore.kernel.org/r/20191222124649.52300-2-leon@kernel.orgSigned-off-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

cbe4b8f0

24 11月, 2019 1 次提交

RDMA/odp: Use mmu_interval_notifier_insert() · f25a546e

由 Jason Gunthorpe 提交于 11月 12, 2019

Replace the internal interval tree based mmu notifier with the new common
mmu_interval_notifier_insert() API. This removes a lot of code and fixes a
deadlock that can be triggered in ODP:

 zap_page_range()
  mmu_notifier_invalidate_range_start()
   [..]
    ib_umem_notifier_invalidate_range_start()
       down_read(&per_mm->umem_rwsem)
  unmap_single_vma()
    [..]
      __split_huge_page_pmd()
        mmu_notifier_invalidate_range_start()
        [..]
           ib_umem_notifier_invalidate_range_start()
              down_read(&per_mm->umem_rwsem)   // DEADLOCK

        mmu_notifier_invalidate_range_end()
           up_read(&per_mm->umem_rwsem)
  mmu_notifier_invalidate_range_end()
     up_read(&per_mm->umem_rwsem)

The umem_rwsem is held across the range_start/end as the ODP algorithm for
invalidate_range_end cannot tolerate changes to the interval
tree. However, due to the nested invalidation regions the second
down_read() can deadlock if there are competing writers. The new core code
provides an alternative scheme to solve this problem.

Fixes: ca748c39 ("RDMA/umem: Get rid of per_mm->notifier_count")
Link: https://lore.kernel.org/r/20191112202231.3856-6-jgg@ziepe.caTested-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

f25a546e

17 11月, 2019 1 次提交

IB/umem: remove the dmasync argument to ib_umem_get · 72b894b0

由 Christoph Hellwig 提交于 11月 13, 2019

The argument is always ignored, so remove it.

Link: https://lore.kernel.org/r/20191113073214.9514-3-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Acked-by: NMichal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

72b894b0

01 11月, 2019 1 次提交

RDMA/mlx5: Return proper error value · 546d3009

由 Leon Romanovsky 提交于 10月 29, 2019

Returned value from mlx5_mr_cache_alloc() is checked to be error or real
pointer. Return proper error code instead of NULL which is not checked
later.

Fixes: 81713d37 ("IB/mlx5: Add implicit MR support")
Link: https://lore.kernel.org/r/20191029055721.7192-1-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

546d3009

29 10月, 2019 6 次提交

RDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroy · 09689703

由 Jason Gunthorpe 提交于 10月 09, 2019

For creation, as soon as the umem_odp is created the notifier can be
called, however the underlying MR may not have been setup yet. This would
cause problems if mlx5_ib_invalidate_range() runs. There is some
confusing/ulocked/racy code that might by trying to solve this, but
without locks it isn't going to work right.

Instead trivially solve the problem by short-circuiting the invalidation
if there are not yet any DMA mapped pages. By definition there is nothing
to invalidate in this case.

The create code will have the umem fully setup before anything is DMA
mapped, and npages is fully locked by the umem_mutex.

For destroy, invalidate the entire MR at the HW to stop DMA then DMA unmap
the pages before destroying the MR. This drives npages to zero and
prevents similar racing with invalidate while the MR is undergoing
destruction.

Arguably it would be better if the umem was created after the MR and
destroyed before, but that would require a big rework of the MR code.

Fixes: 6aec21f6 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191009160934.3143-15-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

09689703

RDMA/mlx5: Rework implicit ODP destroy · 5256edcb

由 Jason Gunthorpe 提交于 10月 09, 2019

Use SRCU in a sensible way by removing all MRs in the implicit tree from
the two xarrays (the update operation), then a synchronize, followed by a
normal single threaded teardown.

This is only a little unusual from the normal pattern as there can still
be some work pending in the unbound wq that may also require a workqueue
flush. This is tracked with a single atomic, consolidating the redundant
existing atomics and wait queue.

For understand-ability the entire ODP implicit create/destroy flow now
largely exists in a single pair of functions within odp.c, with a few
support functions for tearing down an unused child.

Link: https://lore.kernel.org/r/20191009160934.3143-13-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

5256edcb

RDMA/mlx5: Delete struct mlx5_priv->mkey_table · 74bddb36

由 Jason Gunthorpe 提交于 10月 09, 2019

No users are left, delete it.

Link: https://lore.kernel.org/r/20191009160934.3143-5-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

74bddb36

RDMA/mlx5: Use a dedicated mkey xarray for ODP · 806b101b

由 Jason Gunthorpe 提交于 10月 09, 2019

There is a per device xarray storing mkeys that is used to store every
mkey in the system. However, this xarray is now only read by ODP for
certain ODP designated MRs (ODP, implicit ODP, MW, DEVX_INDIRECT).

Create an xarray only for use by ODP, that only contains ODP related
MKeys. This xarray is protected by SRCU and all erases are protected by a
synchronize.

This improves performance:

 - All MRs in the odp_mkeys xarray are ODP MRs, so some tests for is_odp()
   can be deleted. The xarray will also consume fewer nodes.

 - normal MR's are never mixed with ODP MRs in a SRCU data structure so
   performance sucking synchronize_srcu() on every MR destruction is not
   needed.

 - No smp_load_acquire(live) and xa_load() double barrier on read

Due to the SRCU locking scheme care must be taken with the placement of
the xa_store(). Once it completes the MR is immediately visible to other
threads and only through a xa_erase() & synchronize_srcu() cycle could it
be destroyed.

Link: https://lore.kernel.org/r/20191009160934.3143-4-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

806b101b

RDMA/mlx5: Split sig_err MR data into its own xarray · 50211ec9

由 Jason Gunthorpe 提交于 10月 09, 2019

The locking model for signature is completely different than ODP, do not
share the same xarray that relies on SRCU locking to support ODP.

Simply store the active mlx5_core_sig_ctx's in an xarray when signature
MRs are created and rely on trivial xarray locking to serialize
everything.

The overhead of storing only a handful of SIG related MRs is going to be
much less than an xarray full of every mkey.

Link: https://lore.kernel.org/r/20191009160934.3143-3-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

50211ec9

RDMA/mlx5: Use irq xarray locking for mkey_table · 1524b12a

由 Jason Gunthorpe 提交于 10月 24, 2019

The mkey_table xarray is touched by the reg_mr_callback() function which
is called from a hard irq. Thus all other uses of xa_lock must use the
_irq variants.

  WARNING: inconsistent lock state
  5.4.0-rc1 #12 Not tainted
  --------------------------------
  inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
  python3/343 [HC0[0]:SC0[0]:HE1:SE1] takes:
  ffff888182be1d40 (&(&xa->xa_lock)->rlock#3){?.-.}, at: xa_erase+0x12/0x30
  {IN-HARDIRQ-W} state was registered at:
    lock_acquire+0xe1/0x200
    _raw_spin_lock_irqsave+0x35/0x50
    reg_mr_callback+0x2dd/0x450 [mlx5_ib]
    mlx5_cmd_exec_cb_handler+0x2c/0x70 [mlx5_core]
    mlx5_cmd_comp_handler+0x355/0x840 [mlx5_core]
   [..]

   Possible unsafe locking scenario:

         CPU0
         ----
    lock(&(&xa->xa_lock)->rlock#3);
    <Interrupt>
      lock(&(&xa->xa_lock)->rlock#3);

   *** DEADLOCK ***

  2 locks held by python3/343:
   #0: ffff88818eb4bd38 (&uverbs_dev->disassociate_srcu){....}, at: ib_uverbs_ioctl+0xe5/0x1e0 [ib_uverbs]
   #1: ffff888176c76d38 (&file->hw_destroy_rwsem){++++}, at: uobj_destroy+0x2d/0x90 [ib_uverbs]

  stack backtrace:
  CPU: 3 PID: 343 Comm: python3 Not tainted 5.4.0-rc1 #12
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  Call Trace:
   dump_stack+0x86/0xca
   print_usage_bug.cold.50+0x2e5/0x355
   mark_lock+0x871/0xb50
   ? match_held_lock+0x20/0x250
   ? check_usage_forwards+0x240/0x240
   __lock_acquire+0x7de/0x23a0
   ? __kasan_check_read+0x11/0x20
   ? mark_lock+0xae/0xb50
   ? mark_held_locks+0xb0/0xb0
   ? find_held_lock+0xca/0xf0
   lock_acquire+0xe1/0x200
   ? xa_erase+0x12/0x30
   _raw_spin_lock+0x2a/0x40
   ? xa_erase+0x12/0x30
   xa_erase+0x12/0x30
   mlx5_ib_dealloc_mw+0x55/0xa0 [mlx5_ib]
   uverbs_dealloc_mw+0x3c/0x70 [ib_uverbs]
   uverbs_free_mw+0x1a/0x20 [ib_uverbs]
   destroy_hw_idr_uobject+0x49/0xa0 [ib_uverbs]
   [..]

Fixes: 04177915 ("RDMA/mlx5: Add missing synchronize_srcu() for MW cases")
Link: https://lore.kernel.org/r/20191024234910.GA9038@ziepe.caAcked-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

1524b12a

09 10月, 2019 1 次提交

IB/mlx5: Introduce and use mkey context setting helper routine · 03232cc4

由 Parav Pandit 提交于 10月 06, 2019

Introduce and use set_mkc_access_pd_addr_fields() which sets mkey
context's access rights, PD, address fields. Thereby avoid the code
duplication.

Link: https://lore.kernel.org/r/20191006155443.31068-1-leon@kernel.orgSigned-off-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

03232cc4

05 10月, 2019 4 次提交

RDMA/mlx5: Add missing synchronize_srcu() for MW cases · 04177915

由 Jason Gunthorpe 提交于 10月 01, 2019

While MR uses live as the SRCU 'update', the MW case uses the xarray
directly, xa_erase() causes the MW to become inaccessible to the pagefault
thread.

Thus whenever a MW is removed from the xarray we must synchronize_srcu()
before freeing it.

This must be done before freeing the mkey as re-use of the mkey while the
pagefault thread is using the stale mkey is undesirable.

Add the missing synchronizes to MW and DEVX indirect mkey and delete the
bogus protection against double destroy in mlx5_core_destroy_mkey()

Fixes: 534fd7aa ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP")
Fixes: 6aec21f6 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-7-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

04177915

RDMA/mlx5: Put live in the correct place for ODP MRs · aa603815

由 Jason Gunthorpe 提交于 10月 01, 2019

live is used to signal to the pagefault thread that the MR is initialized
and ready for use. It should be after the umem is assigned and all other
setup is completed. This prevents races (at least) of the form:

    CPU0                                     CPU1
mlx5_ib_alloc_implicit_mr()
 implicit_mr_alloc()
  live = 1
 imr->umem = umem
                                    num_pending_prefetch_inc()
                                      if (live)
				        atomic_inc(num_pending_prefetch)
 atomic_set(num_pending_prefetch,0) // Overwrites other thread's store

Further, live is being used with SRCU as the 'update' in an
acquire/release fashion, so it can not be read and written raw.

Move all live = 1's to after MR initialization is completed and use
smp_store_release/smp_load_acquire() for manipulating it.

Add a missing live = 0 when an implicit MR child is deleted, before
queuing work to do synchronize_srcu().

The barriers in update_odp_mr() were some broken attempt to create a
acquire/release, but were not even applied consistently and missed the
point, delete it as well.

Fixes: 6aec21f6 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-6-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

aa603815

RDMA/mlx5: Order num_pending_prefetch properly with synchronize_srcu · aa116b81

由 Jason Gunthorpe 提交于 10月 01, 2019

During destroy setting live = 0 and then synchronize_srcu() prevents
num_pending_prefetch from incrementing, and also, ensures that all work
holding that count is queued on the WQ. Testing before causes races of the
form:

    CPU0                                         CPU1
  dereg_mr()
                                          mlx5_ib_advise_mr_prefetch()
            				   srcu_read_lock()
                                            num_pending_prefetch_inc()
					      if (!live)
   live = 0
   atomic_read() == 0
     // skip flush_workqueue()
                                              atomic_inc()
 					      queue_work();
            				   srcu_read_unlock()
   WARN_ON(atomic_read())  // Fails

Swap the order so that the synchronize_srcu() prevents this.

Fixes: a6bc3875 ("IB/mlx5: Protect against prefetch of invalid MR")
Link: https://lore.kernel.org/r/20191001153821.23621-5-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

aa116b81

RDMA/mlx5: Do not allow rereg of a ODP MR · 880505cf

由 Jason Gunthorpe 提交于 10月 01, 2019

This code is completely broken, the umem of a ODP MR simply cannot be
discarded without a lot more locking, nor can an ODP mkey be blithely
destroyed via destroy_mkey().

Fixes: 6aec21f6 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-2-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

880505cf

22 8月, 2019 3 次提交

RDMA/odp: Provide ib_umem_odp_release() to undo the allocs · 0446cad9

由 Jason Gunthorpe 提交于 8月 19, 2019

Now that there are allocator APIs that return the ib_umem_odp directly
it should be freed through a umem_odp free'er as well.

Link: https://lore.kernel.org/r/20190819111710.18440-8-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

0446cad9

RDMA/odp: Split creating a umem_odp from ib_umem_get · 261dc53f

由 Jason Gunthorpe 提交于 8月 19, 2019

This is the last creation API that is overloaded for both, there is very
little code sharing and a driver has to be specifically ready for a
umem_odp to be created to use the odp version.

Link: https://lore.kernel.org/r/20190819111710.18440-7-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

261dc53f

RDMA/odp: Make it clearer when a umem is an implicit ODP umem · fd7dbf03

由 Jason Gunthorpe 提交于 8月 19, 2019

Implicit ODP umems are special, they don't have any page lists, they don't
exist in the interval tree and they are never DMA mapped.

Instead of trying to guess this based on a zero length use an explicit
flag.

Further, do not allow non-implicit umems to be 0 size.

Link: https://lore.kernel.org/r/20190819111710.18440-4-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

fd7dbf03

21 8月, 2019 2 次提交

IB/mlx5: Fix MR re-registration flow to use UMR properly · 25a45172

由 Moni Shoua 提交于 8月 15, 2019

The UMR WQE in the MR re-registration flow requires that
modify_atomic and modify_entity_size capabilities are enabled.
Therefore, check that the these capabilities are present before going to
umr flow and go through slow path if not.

Fixes: c8d75a98 ("IB/mlx5: Respect new UMR capabilities")
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Reviewed-by: NGuy Levi <guyle@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190815083834.9245-8-leon@kernel.orgSigned-off-by: NDoug Ledford <dledford@redhat.com>

25a45172

IB/mlx5: Consolidate use_umr checks into single function · 0e6613b4

由 Moni Shoua 提交于 8月 15, 2019

Introduce helper function to unify various use_umr checks.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190815083834.9245-6-leon@kernel.orgSigned-off-by: NDoug Ledford <dledford@redhat.com>

0e6613b4

01 8月, 2019 1 次提交

IB/mlx5: Fix MR registration flow to use UMR properly · e5366d30

由 Guy Levi 提交于 7月 31, 2019

Driver shouldn't allow to use UMR to register a MR when
umr_modify_atomic_disabled is set. Otherwise it will always end up with a
failure in the post send flow which sets the UMR WQE to modify atomic access
right.

Fixes: c8d75a98 ("IB/mlx5: Respect new UMR capabilities")
Signed-off-by: NGuy Levi <guyle@mellanox.com>
Reviewed-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190731081929.32559-1-leon@kernel.orgSigned-off-by: NDoug Ledford <dledford@redhat.com>

e5366d30

25 7月, 2019 4 次提交

IB/mlx5: Fix clean_mr() to work in the expected order · b9332dad

由 Yishai Hadas 提交于 7月 23, 2019

Any dma map underlying the MR should only be freed once the MR is fenced
at the hardware.

As of the above we first destroy the MKEY and just after that can safely
call to dma_unmap_single().

Link: https://lore.kernel.org/r/20190723065733.4899-6-leon@kernel.org
Cc: <stable@vger.kernel.org> # 4.3
Fixes: 8a187ee5 ("IB/mlx5: Support the new memory registration API")
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

b9332dad

IB/mlx5: Move MRs to a kernel PD when freeing them to the MR cache · 9ec4483a

由 Yishai Hadas 提交于 7月 23, 2019

Fix unreg_umr to move the MR to a kernel owned PD (i.e. the UMR PD) which
can't be accessed by userspace.

This ensures that nothing can continue to access the MR once it has been
placed in the kernels cache for reuse.

MRs in the cache continue to have their HW state, including DMA tables,
present. Even though the MR has been invalidated, changing the PD provides
an additional layer of protection against use of the MR.

Link: https://lore.kernel.org/r/20190723065733.4899-5-leon@kernel.org
Cc: <stable@vger.kernel.org> # 3.10
Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

9ec4483a

IB/mlx5: Use direct mkey destroy command upon UMR unreg failure · afd14174

由 Yishai Hadas 提交于 7月 23, 2019

Use a direct firmware command to destroy the mkey in case the unreg UMR
operation has failed.

This prevents a case that a mkey will leak out from the cache post a
failure to be destroyed by a UMR WR.

In case the MR cache limit didn't reach a call to add another entry to the
cache instead of the destroyed one is issued.

In addition, replaced a warn message to WARN_ON() as this flow is fatal
and can't happen unless some bug around.

Link: https://lore.kernel.org/r/20190723065733.4899-4-leon@kernel.org
Cc: <stable@vger.kernel.org> # 4.10
Fixes: 49780d42 ("IB/mlx5: Expose MR cache for mlx5_ib")
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

afd14174

IB/mlx5: Fix unreg_umr to ignore the mkey state · 6a053953

由 Yishai Hadas 提交于 7月 23, 2019

Fix unreg_umr to ignore the mkey state and do not fail if was freed.  This
prevents a case that a user space application already changed the mkey
state to free and then the UMR operation will fail leaving the mkey in an
inappropriate state.

Link: https://lore.kernel.org/r/20190723065733.4899-3-leon@kernel.org
Cc: <stable@vger.kernel.org> # 3.19
Fixes: 968e78dd ("IB/mlx5: Enhance UMR support to allow partial page table update")
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

6a053953

25 6月, 2019 1 次提交

net/mlx5: Convert mkey_table to XArray · 792c4e9d

由 Matthew Wilcox 提交于 6月 20, 2019

The lock protecting the data structure does not need to be an rwlock.  The
only read access to the lock is in an error path, and if that's limiting
your scalability, you have bigger performance problems.

Eliminate mlx5_mkey_table in favour of using the xarray directly.
reg_mr_callback must use GFP_ATOMIC for allocating XArray nodes as it may
be called in interrupt context.

This also fixes a minor bug where SRCU locking was being used on the radix
tree read side, when RCU was needed too.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

792c4e9d

24 6月, 2019 5 次提交

RDMA/mlx5: Refactor MR descriptors allocation · 7796d2a3

由 Max Gurtovoy 提交于 6月 11, 2019

Improve code readability using static helpers for each memory region
type. Re-use the common logic to get smaller functions that are easy
to maintain and reduce code duplication.
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

7796d2a3

RDMA/mlx5: Use PA mapping for PI handover · 2563e2f3

由 Max Gurtovoy 提交于 6月 11, 2019

If possibe, avoid doing a UMR operation to register data and protection
buffers (via MTT/KLM mkeys). Instead, use the local DMA key and map the
SG lists using PA access. This is safe, since the internal key for data
and protection never exposed to the remote server (only signature key
might be exposed). If PA mappings are not possible, perform mapping
using MTT/KLM descriptors.

The setup of the tested benchmark (using iSER ULP):
 - 2 servers with 24 cores (1 initiator and 1 target)
 - ConnectX-4/ConnectX-5 adapters
 - 24 target sessions with 1 LUN each
 - ramdisk backstore
 - PI active

Performance results running fio (24 jobs, 128 iodepth) using
write_generate=1 and read_verify=1 (w/w.o patch):

bs      IOPS(read)        IOPS(write)
----    ----------        ----------
512   1266.4K/1262.4K    1720.1K/1732.1K
4k    793139/570902      1129.6K/773982
32k   72660/72086        97229/96164

Using write_generate=0 and read_verify=0 (w/w.o patch):
bs      IOPS(read)        IOPS(write)
----    ----------        ----------
512   1590.2K/1600.1K    1828.2K/1830.3K
4k    1078.1K/937272     1142.1K/815304
32k   77012/77369        98125/97435
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Suggested-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

2563e2f3

RDMA/mlx5: Improve PI handover performance · de0ae958

由 Israel Rukshin 提交于 6月 11, 2019

In some loads, there is performance degradation when using KLM mkey
instead of MTT mkey. This is because KLM descriptor access is via
indirection that might require more HW resources and cycles.
Using KLM descriptor is not necessary when there are no gaps at the
data/metadata sg lists. As an optimization, use MTT mkey whenever it
is possible. For that matter, allocate internal MTT mkey and choose the
effective pi_mr for in transaction according to the required mapping
scheme.

The setup of the tested benchmark (using iSER ULP):
 - 2 servers with 24 cores (1 initiator and 1 target)
 - ConnectX-4/ConnectX-5 adapters
 - 24 target sessions with 1 LUN each
 - ramdisk backstore
 - PI active

Performance results running fio (24 jobs, 128 iodepth) using
write_generate=1 and read_verify=1 (w/w.o/baseline):

bs      IOPS(read)                IOPS(write)
----    ----------                ----------
512   1262.4K/1243.3K/1147.1K    1732.1K/1725.1K/1423.8K
4k    570902/571233/457874       773982/743293/642080
32k   72086/72388/71933          96164/71789/93249

Using write_generate=0 and read_verify=0 (w/w.o patch):
bs      IOPS(read)                IOPS(write)
----    ----------                ----------
512   1600.1K/1572.1K/1393.3K    1830.3K/1823.5K/1557.2K
4k    937272/921992/762934       815304/753772/646071
32k   77369/75052/72058          97435/73180/94612
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Suggested-by: NMax Gurtovoy <maxg@mellanox.com>
Suggested-by: NIdan Burstein <idanb@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

de0ae958

RDMA/mlx5: Remove unused IB_WR_REG_SIG_MR code · 5c171cbe

由 Israel Rukshin 提交于 6月 11, 2019

IB_WR_REG_SIG_MR is not needed after IB_WR_REG_MR_INTEGRITY
was used.
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

5c171cbe

RDMA/mlx5: Implement mlx5_ib_map_mr_sg_pi and mlx5_ib_alloc_mr_integrity · 6c984472

由 Max Gurtovoy 提交于 6月 11, 2019

mlx5_ib_map_mr_sg_pi() will map the PI and data dma mapped SG lists to the
mlx5 memory region prior to the registration operation. In the new
API, the mlx5 driver will allocate an internal memory region for the
UMR operation to register both PI and data SG lists. The internal MR
will use KLM mode in order to map 2 (possibly non-contiguous/non-align)
SG lists using 1 memory key. In the new API, each ULP will use 1 memory
region for the signature operation (instead of 3 in the old API). This
memory region will have a key that will be exposed to remote server to
perform RDMA operation. The internal memory key that will map the SG lists
will stay private.
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

6c984472

21 6月, 2019 1 次提交

RDMA: Check umem pointer validity prior to release · 836a0fbb

由 Leon Romanovsky 提交于 6月 16, 2019

Update ib_umem_release() to behave similarly to kfree() and allow
submitting NULL pointer as safe input to this function.

Fixes: a52c8e24 ("RDMA: Clean destroy CQ in drivers do not return errors")
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

836a0fbb

22 5月, 2019 1 次提交

RDMA/umem: Move page_shift from ib_umem to ib_odp_umem · d2183c6f

由 Jason Gunthorpe 提交于 5月 20, 2019

This value has always been set to PAGE_SHIFT in the core code, the only
thing that does differently was the ODP path. Move the value into the ODP
struct and still use it for ODP, but change all the non-ODP things to just
use PAGE_SHIFT/PAGE_SIZE/PAGE_MASK directly.
Reviewed-by: NShiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

d2183c6f

06 5月, 2019 2 次提交

IB/mlx5: Add steering SW ICM device memory type · 25c13324

由 Ariel Levkovich 提交于 5月 05, 2019

This patch adds support for allocating, deallocating and registering a new
device memory type, STEERING_SW_ICM.  This memory can be allocated and
used by a privileged user for direct rule insertion and management of the
device's steering tables.

The type is provided by the user via the dedicated attribute in the
alloc_dm ioctl command.
Signed-off-by: NAriel Levkovich <lariel@mellanox.com>
Reviewed-by: NEli Cohen <eli@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

25c13324

IB/mlx5: Support device memory type attribute · 3b113a1e

由 Ariel Levkovich 提交于 5月 05, 2019

This patch intoruduces a new mlx5_ib driver attribute to the DM allocation
method - the DM type.

In order to allow addition of new types in downstream patches this patch
also refactors the allocation, deallocation and registration handlers to
consider the requested type and perform the necessary actions according to
it.

Since not all future device memory types will be such that are mapped to
user memory, the mandatory page index output attribute is modified to be
optional.
Signed-off-by: NAriel Levkovich <lariel@mellanox.com>
Reviewed-by: NEli Cohen <eli@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

3b113a1e

11 4月, 2019 1 次提交

RDMA/mlx5: Move rep into port struct · 6a4d00be

由 Mark Bloch 提交于 3月 28, 2019

In preparation of moving into a model of single IB device multiple ports
move rep to be part of the port structure. We mark a representor device by
setting is_rep, no functional change with this patch.
Signed-off-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

6a4d00be

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功