提交 · d613bd64c68bab6712c472281e79559bdc984b62 · openeuler / Kernel

13 3月, 2020 12 次提交

RDMA/mlx5: Allow MRs to be created in the cache synchronously · aad719dc

由 Jason Gunthorpe 提交于 3月 10, 2020

If the cache is completely out of MRs, and we are running in cache mode,
then directly, and synchronously, create an MR that is compatible with the
cache bucket using a sleeping mailbox command. This ensures that the
thread that is waiting for the MR absolutely will get one.

When a MR allocated in this way becomes freed then it is compatible with
the cache bucket and will be recycled back into it.

Deletes the very buggy ent->compl scheme to create a synchronous MR
allocation.

Link: https://lore.kernel.org/r/20200310082238.239865-13-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

aad719dc

RDMA/mlx5: Revise how the hysteresis scheme works for cache filling · 1c78a21a

由 Jason Gunthorpe 提交于 3月 10, 2020

Currently if the work queue is running then it is in 'hysteresis' mode and
will fill until the cache reaches the high water mark. This implicit state
is very tricky and doesn't interact with pending very well.

Instead of self re-scheduling the work queue after the add_keys() has
started to create the new MR, have the queue scheduled from
reg_mr_callback() only after the requested MR has been added.

This avoids the bad design of an in-rush of queue'd work doing back to
back add_keys() until EAGAIN then sleeping. The add_keys() will be paced
one at a time as they complete, slowly filling up the cache.

Also, fix pending to be only manipulated under lock.

Link: https://lore.kernel.org/r/20200310082238.239865-12-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

1c78a21a

RDMA/mlx5: Fix locking in MR cache work queue · b9358bdb

由 Jason Gunthorpe 提交于 3月 10, 2020

All of the members of mlx5_cache_ent must be accessed while holding the
spinlock, add the missing spinlock in the __cache_work_func().

Using cache->stopped and flush_workqueue() is an inherently racy way to
shutdown self-scheduling work on a queue. Replace it with ent->disabled
under lock, and always check disabled before queuing any new work. Use
cancel_work_sync() to shutdown the queue.

Use READ_ONCE/WRITE_ONCE for dev->last_add to manage concurrency as
coherency is less important here.

Split fill_delay from the bitfield. C bitfield updates are not atomic and
this is just a mess. Use READ_ONCE/WRITE_ONCE, but this could also use
test_bit()/set_bit().

Link: https://lore.kernel.org/r/20200310082238.239865-11-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

b9358bdb

RDMA/mlx5: Lock access to ent->available_mrs/limit when doing queue_work · ad2d3ef4

由 Jason Gunthorpe 提交于 3月 10, 2020

Accesses to these members needs to be locked. There is no reason not to
hold a spinlock while calling queue_work(), so move the tests into a
helper and always call it under lock.

The helper should be called when available_mrs is adjusted.

Link: https://lore.kernel.org/r/20200310082238.239865-10-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

ad2d3ef4

RDMA/mlx5: Fix MR cache size and limit debugfs · a1d8854a

由 Jason Gunthorpe 提交于 3月 10, 2020

The size_write function is supposed to adjust the total_mr's to match the
user's request, but lacks locking and safety checking.

total_mrs can only be adjusted by at most available_mrs. mrs already
assigned to users cannot be revoked. Ensure that the user provides a
target value within the range of available_mrs and within the high/low
water mark.

limit_write has confusing and wrong sanity checking, and doesn't have the
ability to deallocate on limit reduction.

Since both functions use the same algorithm to adjust the available_mrs,
consolidate it into one function and write it correctly. Fix the locking
and by holding the spinlock for all accesses to ent->X.

Always fail if the user provides a malformed string.

Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/20200310082238.239865-9-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

a1d8854a

RDMA/mlx5: Always remove MRs from the cache before destroying them · 1769c4c5

由 Jason Gunthorpe 提交于 3月 10, 2020

The cache bucket tracks the total number of MRs that exists, both inside
and outside of the cache. Removing a MR from the cache (by setting
cache_ent to NULL) without updating total_mrs will cause the tracking to
leak and be inflated.

Further fix the rereg_mr path to always destroy the MR. reg_create will
always overwrite all the MR data in mlx5_ib_mr, so the MR must be
completely destroyed, in all cases, before this function can be
called. Detach the MR from the cache and unconditionally destroy it to
avoid leaking HW mkeys.

Fixes: afd14174 ("IB/mlx5: Use direct mkey destroy command upon UMR unreg failure")
Fixes: 56e11d62 ("IB/mlx5: Added support for re-registration of MRs")
Link: https://lore.kernel.org/r/20200310082238.239865-8-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

1769c4c5

RDMA/mlx5: Simplify how the MR cache bucket is located · b91e1751

由 Jason Gunthorpe 提交于 3月 10, 2020

There are many bad APIs here that are accepting a cache bucket index
instead of a bucket pointer. Many of the callers already have a bucket
pointer, so this results in a lot of confusing uses of order2idx().

Pass the struct mlx5_cache_ent into add_keys(), remove_keys(), and
alloc_cached_mr().

Once the MR is in the cache, store the cache bucket pointer directly in
the MR, replacing the 'bool allocated_from cache'.

In the end there is only one place that needs to form index from order,
alloc_mr_from_cache(). Increase the safety of this function by disallowing
it from accessing cache entries in the ODP special area.

Link: https://lore.kernel.org/r/20200310082238.239865-7-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

b91e1751

RDMA/mlx5: Rename the tracking variables for the MR cache · 7c8691a3

由 Jason Gunthorpe 提交于 3月 10, 2020

The old names do not clearly indicate the intent.

Link: https://lore.kernel.org/r/20200310082238.239865-6-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

7c8691a3

RDMA/mlx5: Replace spinlock protected write with atomic var · f743ff3b

由 Saeed Mahameed 提交于 3月 10, 2020

mkey variant calculation was spinlock protected to make it atomic, replace
that with one atomic variable.

Link: https://lore.kernel.org/r/20200310082238.239865-4-leon@kernel.orgSigned-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

f743ff3b

{IB,net}/mlx5: Move asynchronous mkey creation to mlx5_ib · a3cfdd39

由 Michael Guralnik 提交于 3月 10, 2020

As mlx5_ib is the only user of the mlx5_core_create_mkey_cb, move the
logic inside mlx5_ib and cleanup the code in mlx5_core.
Signed-off-by: NMichael Guralnik <michaelgur@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

a3cfdd39

{IB,net}/mlx5: Assign mkey variant in mlx5_ib only · fc6a9f86

由 Saeed Mahameed 提交于 3月 10, 2020

mkey variant is not required for mlx5_core use, move the mkey variant
counter to mlx5_ib.
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

fc6a9f86

{IB,net}/mlx5: Setup mkey variant before mr create command invocation · 54c62e13

由 Saeed Mahameed 提交于 3月 10, 2020

On reg_mr_callback() mlx5_ib is recalculating the mkey variant which is
wrong and will lead to using a different key variant than the one
submitted to firmware on create mkey command invocation.

To fix this, we store the mkey variant before invoking the firmware
command and use it later on completion (reg_mr_callback).
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Reviewed-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

54c62e13

17 1月, 2020 1 次提交

RDMA/mlx5: Set relaxed ordering when requested · d6de0bb1

由 Michael Guralnik 提交于 1月 08, 2020

Enable relaxed ordering in the mkey context when requested. As relaxed
ordering is not currently supported in UMR, disable UMR usage for relaxed
ordering MRs.

Link: https://lore.kernel.org/r/1578506740-22188-11-git-send-email-yishaih@mellanox.comSigned-off-by: NMichael Guralnik <michaelgur@mellanox.com>
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

d6de0bb1

16 1月, 2020 2 次提交

RDMA/mlx5: Fix handling of IOVA != user_va in ODP paths · 8ffc3248

由 Jason Gunthorpe 提交于 1月 15, 2020

Till recently it was not possible for userspace to specify a different
IOVA, but with the new ibv_reg_mr_iova() library call this can be done.

To compute the user_va we must compute:
  user_va = (iova - iova_start) + user_va_start

while being cautious of overflow and other math problems.

The iova is not reliably stored in the mmkey when the MR is created. Only
the cached creation path (the common one) set it, so it must also be set
when creating uncached MRs.

Fix the weird use of iova when computing the starting page index in the
MR. In the normal case, when iova == umem.address:
  iova & (~(BIT(page_shift) - 1)) ==
  ALIGN_DOWN(umem.address, odp->page_size) ==
  ib_umem_start(odp)

And when iova is different using it in math with a user_va is wrong.

Finally, do not allow an implicit ODP to be created with a non-zero IOVA
as we have no support for that.

Fixes: 7bdf65d4 ("IB/mlx5: Handle page faults")
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

8ffc3248

IB: Allow calls to ib_umem_get from kernel ULPs · c320e527

由 Moni Shoua 提交于 1月 15, 2020

So far the assumption was that ib_umem_get() and ib_umem_odp_get()
are called from flows that start in UVERBS and therefore has a user
context. This assumption restricts flows that are initiated by ULPs
and need the service that ib_umem_get() provides.

This patch changes ib_umem_get() and ib_umem_odp_get() to get IB device
directly by relying on the fact that both UVERBS and ULPs sets that
field correctly.
Reviewed-by: NGuy Levi <guyle@mellanox.com>
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

c320e527

04 1月, 2020 2 次提交

RDMA/mlx5: use true,false for bool variable · 2ab367a7

由 zhengbin 提交于 12月 24, 2019

Fixes coccicheck warning:

drivers/infiniband/hw/mlx5/mr.c:150:2-26: WARNING: Assignment of 0/1 to bool variable
drivers/infiniband/hw/mlx5/mr.c:1455:2-26: WARNING: Assignment of 0/1 to bool variable
drivers/infiniband/hw/mlx5/qp.c:1874:6-20: WARNING: Assignment of 0/1 to bool variable

Link: https://lore.kernel.org/r/1577176812-2238-6-git-send-email-zhengbin13@huawei.comReported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: Nzhengbin <zhengbin13@huawei.com>
Acked-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

2ab367a7

IB/mlx5: Unify ODP MR code paths to allow extra flexibility · cbe4b8f0

由 Artemy Kovalyov 提交于 12月 22, 2019

Building MR translation table in the ODP case requires additional
flexibility, namely random access to DMA addresses. Make both direct and
indirect ODP MR use same code path, separated from the non-ODP MR code
path.

With the restructuring the correct page_shift is now used around
__mlx5_ib_populate_pas().

Fixes: d2183c6f ("RDMA/umem: Move page_shift from ib_umem to ib_odp_umem")
Link: https://lore.kernel.org/r/20191222124649.52300-2-leon@kernel.orgSigned-off-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

cbe4b8f0

24 11月, 2019 1 次提交

RDMA/odp: Use mmu_interval_notifier_insert() · f25a546e

由 Jason Gunthorpe 提交于 11月 12, 2019

Replace the internal interval tree based mmu notifier with the new common
mmu_interval_notifier_insert() API. This removes a lot of code and fixes a
deadlock that can be triggered in ODP:

 zap_page_range()
  mmu_notifier_invalidate_range_start()
   [..]
    ib_umem_notifier_invalidate_range_start()
       down_read(&per_mm->umem_rwsem)
  unmap_single_vma()
    [..]
      __split_huge_page_pmd()
        mmu_notifier_invalidate_range_start()
        [..]
           ib_umem_notifier_invalidate_range_start()
              down_read(&per_mm->umem_rwsem)   // DEADLOCK

        mmu_notifier_invalidate_range_end()
           up_read(&per_mm->umem_rwsem)
  mmu_notifier_invalidate_range_end()
     up_read(&per_mm->umem_rwsem)

The umem_rwsem is held across the range_start/end as the ODP algorithm for
invalidate_range_end cannot tolerate changes to the interval
tree. However, due to the nested invalidation regions the second
down_read() can deadlock if there are competing writers. The new core code
provides an alternative scheme to solve this problem.

Fixes: ca748c39 ("RDMA/umem: Get rid of per_mm->notifier_count")
Link: https://lore.kernel.org/r/20191112202231.3856-6-jgg@ziepe.caTested-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

f25a546e

17 11月, 2019 1 次提交

IB/umem: remove the dmasync argument to ib_umem_get · 72b894b0

由 Christoph Hellwig 提交于 11月 13, 2019

The argument is always ignored, so remove it.

Link: https://lore.kernel.org/r/20191113073214.9514-3-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Acked-by: NMichal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

72b894b0

01 11月, 2019 1 次提交

RDMA/mlx5: Return proper error value · 546d3009

由 Leon Romanovsky 提交于 10月 29, 2019

Returned value from mlx5_mr_cache_alloc() is checked to be error or real
pointer. Return proper error code instead of NULL which is not checked
later.

Fixes: 81713d37 ("IB/mlx5: Add implicit MR support")
Link: https://lore.kernel.org/r/20191029055721.7192-1-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

546d3009

29 10月, 2019 6 次提交

RDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroy · 09689703

由 Jason Gunthorpe 提交于 10月 09, 2019

For creation, as soon as the umem_odp is created the notifier can be
called, however the underlying MR may not have been setup yet. This would
cause problems if mlx5_ib_invalidate_range() runs. There is some
confusing/ulocked/racy code that might by trying to solve this, but
without locks it isn't going to work right.

Instead trivially solve the problem by short-circuiting the invalidation
if there are not yet any DMA mapped pages. By definition there is nothing
to invalidate in this case.

The create code will have the umem fully setup before anything is DMA
mapped, and npages is fully locked by the umem_mutex.

For destroy, invalidate the entire MR at the HW to stop DMA then DMA unmap
the pages before destroying the MR. This drives npages to zero and
prevents similar racing with invalidate while the MR is undergoing
destruction.

Arguably it would be better if the umem was created after the MR and
destroyed before, but that would require a big rework of the MR code.

Fixes: 6aec21f6 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191009160934.3143-15-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

09689703

RDMA/mlx5: Rework implicit ODP destroy · 5256edcb

由 Jason Gunthorpe 提交于 10月 09, 2019

Use SRCU in a sensible way by removing all MRs in the implicit tree from
the two xarrays (the update operation), then a synchronize, followed by a
normal single threaded teardown.

This is only a little unusual from the normal pattern as there can still
be some work pending in the unbound wq that may also require a workqueue
flush. This is tracked with a single atomic, consolidating the redundant
existing atomics and wait queue.

For understand-ability the entire ODP implicit create/destroy flow now
largely exists in a single pair of functions within odp.c, with a few
support functions for tearing down an unused child.

Link: https://lore.kernel.org/r/20191009160934.3143-13-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

5256edcb

RDMA/mlx5: Delete struct mlx5_priv->mkey_table · 74bddb36

由 Jason Gunthorpe 提交于 10月 09, 2019

No users are left, delete it.

Link: https://lore.kernel.org/r/20191009160934.3143-5-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

74bddb36

RDMA/mlx5: Use a dedicated mkey xarray for ODP · 806b101b

由 Jason Gunthorpe 提交于 10月 09, 2019

There is a per device xarray storing mkeys that is used to store every
mkey in the system. However, this xarray is now only read by ODP for
certain ODP designated MRs (ODP, implicit ODP, MW, DEVX_INDIRECT).

Create an xarray only for use by ODP, that only contains ODP related
MKeys. This xarray is protected by SRCU and all erases are protected by a
synchronize.

This improves performance:

 - All MRs in the odp_mkeys xarray are ODP MRs, so some tests for is_odp()
   can be deleted. The xarray will also consume fewer nodes.

 - normal MR's are never mixed with ODP MRs in a SRCU data structure so
   performance sucking synchronize_srcu() on every MR destruction is not
   needed.

 - No smp_load_acquire(live) and xa_load() double barrier on read

Due to the SRCU locking scheme care must be taken with the placement of
the xa_store(). Once it completes the MR is immediately visible to other
threads and only through a xa_erase() & synchronize_srcu() cycle could it
be destroyed.

Link: https://lore.kernel.org/r/20191009160934.3143-4-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

806b101b

RDMA/mlx5: Split sig_err MR data into its own xarray · 50211ec9

由 Jason Gunthorpe 提交于 10月 09, 2019

The locking model for signature is completely different than ODP, do not
share the same xarray that relies on SRCU locking to support ODP.

Simply store the active mlx5_core_sig_ctx's in an xarray when signature
MRs are created and rely on trivial xarray locking to serialize
everything.

The overhead of storing only a handful of SIG related MRs is going to be
much less than an xarray full of every mkey.

Link: https://lore.kernel.org/r/20191009160934.3143-3-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

50211ec9

RDMA/mlx5: Use irq xarray locking for mkey_table · 1524b12a

由 Jason Gunthorpe 提交于 10月 24, 2019

The mkey_table xarray is touched by the reg_mr_callback() function which
is called from a hard irq. Thus all other uses of xa_lock must use the
_irq variants.

  WARNING: inconsistent lock state
  5.4.0-rc1 #12 Not tainted
  --------------------------------
  inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
  python3/343 [HC0[0]:SC0[0]:HE1:SE1] takes:
  ffff888182be1d40 (&(&xa->xa_lock)->rlock#3){?.-.}, at: xa_erase+0x12/0x30
  {IN-HARDIRQ-W} state was registered at:
    lock_acquire+0xe1/0x200
    _raw_spin_lock_irqsave+0x35/0x50
    reg_mr_callback+0x2dd/0x450 [mlx5_ib]
    mlx5_cmd_exec_cb_handler+0x2c/0x70 [mlx5_core]
    mlx5_cmd_comp_handler+0x355/0x840 [mlx5_core]
   [..]

   Possible unsafe locking scenario:

         CPU0
         ----
    lock(&(&xa->xa_lock)->rlock#3);
    <Interrupt>
      lock(&(&xa->xa_lock)->rlock#3);

   *** DEADLOCK ***

  2 locks held by python3/343:
   #0: ffff88818eb4bd38 (&uverbs_dev->disassociate_srcu){....}, at: ib_uverbs_ioctl+0xe5/0x1e0 [ib_uverbs]
   #1: ffff888176c76d38 (&file->hw_destroy_rwsem){++++}, at: uobj_destroy+0x2d/0x90 [ib_uverbs]

  stack backtrace:
  CPU: 3 PID: 343 Comm: python3 Not tainted 5.4.0-rc1 #12
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  Call Trace:
   dump_stack+0x86/0xca
   print_usage_bug.cold.50+0x2e5/0x355
   mark_lock+0x871/0xb50
   ? match_held_lock+0x20/0x250
   ? check_usage_forwards+0x240/0x240
   __lock_acquire+0x7de/0x23a0
   ? __kasan_check_read+0x11/0x20
   ? mark_lock+0xae/0xb50
   ? mark_held_locks+0xb0/0xb0
   ? find_held_lock+0xca/0xf0
   lock_acquire+0xe1/0x200
   ? xa_erase+0x12/0x30
   _raw_spin_lock+0x2a/0x40
   ? xa_erase+0x12/0x30
   xa_erase+0x12/0x30
   mlx5_ib_dealloc_mw+0x55/0xa0 [mlx5_ib]
   uverbs_dealloc_mw+0x3c/0x70 [ib_uverbs]
   uverbs_free_mw+0x1a/0x20 [ib_uverbs]
   destroy_hw_idr_uobject+0x49/0xa0 [ib_uverbs]
   [..]

Fixes: 04177915 ("RDMA/mlx5: Add missing synchronize_srcu() for MW cases")
Link: https://lore.kernel.org/r/20191024234910.GA9038@ziepe.caAcked-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

1524b12a

09 10月, 2019 1 次提交

IB/mlx5: Introduce and use mkey context setting helper routine · 03232cc4

由 Parav Pandit 提交于 10月 06, 2019

Introduce and use set_mkc_access_pd_addr_fields() which sets mkey
context's access rights, PD, address fields. Thereby avoid the code
duplication.

Link: https://lore.kernel.org/r/20191006155443.31068-1-leon@kernel.orgSigned-off-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

03232cc4

05 10月, 2019 4 次提交

RDMA/mlx5: Add missing synchronize_srcu() for MW cases · 04177915

由 Jason Gunthorpe 提交于 10月 01, 2019

While MR uses live as the SRCU 'update', the MW case uses the xarray
directly, xa_erase() causes the MW to become inaccessible to the pagefault
thread.

Thus whenever a MW is removed from the xarray we must synchronize_srcu()
before freeing it.

This must be done before freeing the mkey as re-use of the mkey while the
pagefault thread is using the stale mkey is undesirable.

Add the missing synchronizes to MW and DEVX indirect mkey and delete the
bogus protection against double destroy in mlx5_core_destroy_mkey()

Fixes: 534fd7aa ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP")
Fixes: 6aec21f6 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-7-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

04177915

RDMA/mlx5: Put live in the correct place for ODP MRs · aa603815

由 Jason Gunthorpe 提交于 10月 01, 2019

live is used to signal to the pagefault thread that the MR is initialized
and ready for use. It should be after the umem is assigned and all other
setup is completed. This prevents races (at least) of the form:

    CPU0                                     CPU1
mlx5_ib_alloc_implicit_mr()
 implicit_mr_alloc()
  live = 1
 imr->umem = umem
                                    num_pending_prefetch_inc()
                                      if (live)
				        atomic_inc(num_pending_prefetch)
 atomic_set(num_pending_prefetch,0) // Overwrites other thread's store

Further, live is being used with SRCU as the 'update' in an
acquire/release fashion, so it can not be read and written raw.

Move all live = 1's to after MR initialization is completed and use
smp_store_release/smp_load_acquire() for manipulating it.

Add a missing live = 0 when an implicit MR child is deleted, before
queuing work to do synchronize_srcu().

The barriers in update_odp_mr() were some broken attempt to create a
acquire/release, but were not even applied consistently and missed the
point, delete it as well.

Fixes: 6aec21f6 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-6-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

aa603815

RDMA/mlx5: Order num_pending_prefetch properly with synchronize_srcu · aa116b81

由 Jason Gunthorpe 提交于 10月 01, 2019

During destroy setting live = 0 and then synchronize_srcu() prevents
num_pending_prefetch from incrementing, and also, ensures that all work
holding that count is queued on the WQ. Testing before causes races of the
form:

    CPU0                                         CPU1
  dereg_mr()
                                          mlx5_ib_advise_mr_prefetch()
            				   srcu_read_lock()
                                            num_pending_prefetch_inc()
					      if (!live)
   live = 0
   atomic_read() == 0
     // skip flush_workqueue()
                                              atomic_inc()
 					      queue_work();
            				   srcu_read_unlock()
   WARN_ON(atomic_read())  // Fails

Swap the order so that the synchronize_srcu() prevents this.

Fixes: a6bc3875 ("IB/mlx5: Protect against prefetch of invalid MR")
Link: https://lore.kernel.org/r/20191001153821.23621-5-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

aa116b81

RDMA/mlx5: Do not allow rereg of a ODP MR · 880505cf

由 Jason Gunthorpe 提交于 10月 01, 2019

This code is completely broken, the umem of a ODP MR simply cannot be
discarded without a lot more locking, nor can an ODP mkey be blithely
destroyed via destroy_mkey().

Fixes: 6aec21f6 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-2-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

880505cf

22 8月, 2019 3 次提交

RDMA/odp: Provide ib_umem_odp_release() to undo the allocs · 0446cad9

由 Jason Gunthorpe 提交于 8月 19, 2019

Now that there are allocator APIs that return the ib_umem_odp directly
it should be freed through a umem_odp free'er as well.

Link: https://lore.kernel.org/r/20190819111710.18440-8-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

0446cad9

RDMA/odp: Split creating a umem_odp from ib_umem_get · 261dc53f

由 Jason Gunthorpe 提交于 8月 19, 2019

This is the last creation API that is overloaded for both, there is very
little code sharing and a driver has to be specifically ready for a
umem_odp to be created to use the odp version.

Link: https://lore.kernel.org/r/20190819111710.18440-7-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

261dc53f

RDMA/odp: Make it clearer when a umem is an implicit ODP umem · fd7dbf03

由 Jason Gunthorpe 提交于 8月 19, 2019

Implicit ODP umems are special, they don't have any page lists, they don't
exist in the interval tree and they are never DMA mapped.

Instead of trying to guess this based on a zero length use an explicit
flag.

Further, do not allow non-implicit umems to be 0 size.

Link: https://lore.kernel.org/r/20190819111710.18440-4-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

fd7dbf03

21 8月, 2019 2 次提交

IB/mlx5: Fix MR re-registration flow to use UMR properly · 25a45172

由 Moni Shoua 提交于 8月 15, 2019

The UMR WQE in the MR re-registration flow requires that
modify_atomic and modify_entity_size capabilities are enabled.
Therefore, check that the these capabilities are present before going to
umr flow and go through slow path if not.

Fixes: c8d75a98 ("IB/mlx5: Respect new UMR capabilities")
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Reviewed-by: NGuy Levi <guyle@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190815083834.9245-8-leon@kernel.orgSigned-off-by: NDoug Ledford <dledford@redhat.com>

25a45172

IB/mlx5: Consolidate use_umr checks into single function · 0e6613b4

由 Moni Shoua 提交于 8月 15, 2019

Introduce helper function to unify various use_umr checks.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190815083834.9245-6-leon@kernel.orgSigned-off-by: NDoug Ledford <dledford@redhat.com>

0e6613b4

01 8月, 2019 1 次提交

IB/mlx5: Fix MR registration flow to use UMR properly · e5366d30

由 Guy Levi 提交于 7月 31, 2019

Driver shouldn't allow to use UMR to register a MR when
umr_modify_atomic_disabled is set. Otherwise it will always end up with a
failure in the post send flow which sets the UMR WQE to modify atomic access
right.

Fixes: c8d75a98 ("IB/mlx5: Respect new UMR capabilities")
Signed-off-by: NGuy Levi <guyle@mellanox.com>
Reviewed-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190731081929.32559-1-leon@kernel.orgSigned-off-by: NDoug Ledford <dledford@redhat.com>

e5366d30

25 7月, 2019 3 次提交

IB/mlx5: Fix clean_mr() to work in the expected order · b9332dad

由 Yishai Hadas 提交于 7月 23, 2019

Any dma map underlying the MR should only be freed once the MR is fenced
at the hardware.

As of the above we first destroy the MKEY and just after that can safely
call to dma_unmap_single().

Link: https://lore.kernel.org/r/20190723065733.4899-6-leon@kernel.org
Cc: <stable@vger.kernel.org> # 4.3
Fixes: 8a187ee5 ("IB/mlx5: Support the new memory registration API")
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

b9332dad

IB/mlx5: Move MRs to a kernel PD when freeing them to the MR cache · 9ec4483a

由 Yishai Hadas 提交于 7月 23, 2019

Fix unreg_umr to move the MR to a kernel owned PD (i.e. the UMR PD) which
can't be accessed by userspace.

This ensures that nothing can continue to access the MR once it has been
placed in the kernels cache for reuse.

MRs in the cache continue to have their HW state, including DMA tables,
present. Even though the MR has been invalidated, changing the PD provides
an additional layer of protection against use of the MR.

Link: https://lore.kernel.org/r/20190723065733.4899-5-leon@kernel.org
Cc: <stable@vger.kernel.org> # 3.10
Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

9ec4483a

IB/mlx5: Use direct mkey destroy command upon UMR unreg failure · afd14174

由 Yishai Hadas 提交于 7月 23, 2019

Use a direct firmware command to destroy the mkey in case the unreg UMR
operation has failed.

This prevents a case that a mkey will leak out from the cache post a
failure to be destroyed by a UMR WR.

In case the MR cache limit didn't reach a call to add another entry to the
cache instead of the destroyed one is issued.

In addition, replaced a warn message to WARN_ON() as this flow is fatal
and can't happen unless some bug around.

Link: https://lore.kernel.org/r/20190723065733.4899-4-leon@kernel.org
Cc: <stable@vger.kernel.org> # 4.10
Fixes: 49780d42 ("IB/mlx5: Expose MR cache for mlx5_ib")
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

afd14174

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功