提交 · 9162420dde49c9a8f4819f28bf2d5c675fb12552 · openeuler / Kernel

29 10月, 2019 5 次提交

RDMA/mlx5: Set the HW IOVA of the child MRs to their place in the tree · 9162420d

由 Jason Gunthorpe 提交于 10月 09, 2019

Instead of rewriting all the IOVA's to 0 as things progress down the tree
make the IOVA of the children equal to placement in the tree. This makes
things easier to understand by keeping mmkey.iova == HW configuration.

Link: https://lore.kernel.org/r/20191009160934.3143-8-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

9162420d

RDMA/mlx5: Lift implicit_mr_alloc() into the two routines that call it · c2edcd69

由 Jason Gunthorpe 提交于 10月 09, 2019

This makes the routines easier to understand, particularly with respect
the locking requirements of the entire sequence. The implicit_mr_alloc()
had a lot of ifs specializing it to each of the callers, and only a very
small amount of code was actually shared.

Following patches will cause the flow in the two functions to diverge
further.

Link: https://lore.kernel.org/r/20191009160934.3143-7-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

c2edcd69

RDMA/mlx5: Rework implicit_mr_get_data · 3d5f3c54

由 Jason Gunthorpe 提交于 10月 09, 2019

This function is intended to loop across each MTT chunk in the implicit
parent that intersects the range [io_virt, io_virt+bnct).  But it is has a
confusing construction, so:

- Consistently use imr and odp_imr to refer to the implicit parent
  to avoid confusion with the normal mr and odp of the child
- Directly compute the inclusive start/end indexes by shifting. This is
  clearer to understand the intent and avoids any errors from unaligned
  values of addr
- Iterate directly over the range of MTT indexes, do not make a loop
  out of goto
- Follow 'success oriented flow', with goto error unwind
- Directly calculate the range of idx's that need update_xlt
- Ensure that any leaf MR added to the interval tree always results in an
  update to the XLT

Link: https://lore.kernel.org/r/20191009160934.3143-6-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

3d5f3c54

RDMA/mlx5: Use a dedicated mkey xarray for ODP · 806b101b

由 Jason Gunthorpe 提交于 10月 09, 2019

There is a per device xarray storing mkeys that is used to store every
mkey in the system. However, this xarray is now only read by ODP for
certain ODP designated MRs (ODP, implicit ODP, MW, DEVX_INDIRECT).

Create an xarray only for use by ODP, that only contains ODP related
MKeys. This xarray is protected by SRCU and all erases are protected by a
synchronize.

This improves performance:

 - All MRs in the odp_mkeys xarray are ODP MRs, so some tests for is_odp()
   can be deleted. The xarray will also consume fewer nodes.

 - normal MR's are never mixed with ODP MRs in a SRCU data structure so
   performance sucking synchronize_srcu() on every MR destruction is not
   needed.

 - No smp_load_acquire(live) and xa_load() double barrier on read

Due to the SRCU locking scheme care must be taken with the placement of
the xa_store(). Once it completes the MR is immediately visible to other
threads and only through a xa_erase() & synchronize_srcu() cycle could it
be destroyed.

Link: https://lore.kernel.org/r/20191009160934.3143-4-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

806b101b

RDMA/mlx5: Use SRCU properly in ODP prefetch · fb985e27

由 Jason Gunthorpe 提交于 10月 09, 2019

When working with SRCU protected xarrays the xarray itself should be the
SRCU 'update' point. Instead prefetch is using live as the SRCU update
point and this prevents switching the locking design to use the xarray
instead.

To solve this the prefetch must only read from the xarray once, and hold
on to the actual MR pointer for the duration of the async
operation. Incrementing num_pending_prefetch delays destruction of the MR,
so it is suitable.

Prefetch calls directly to the pagefault_mr using the MR pointer and only
does a single xarray lookup.

All the testing if a MR is prefetchable or not is now done only in the
prefetch code and removed from the pagefault critical path.

Link: https://lore.kernel.org/r/20191009160934.3143-2-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

fb985e27

05 10月, 2019 3 次提交

RDMA/mlx5: Put live in the correct place for ODP MRs · aa603815

由 Jason Gunthorpe 提交于 10月 01, 2019

live is used to signal to the pagefault thread that the MR is initialized
and ready for use. It should be after the umem is assigned and all other
setup is completed. This prevents races (at least) of the form:

    CPU0                                     CPU1
mlx5_ib_alloc_implicit_mr()
 implicit_mr_alloc()
  live = 1
 imr->umem = umem
                                    num_pending_prefetch_inc()
                                      if (live)
				        atomic_inc(num_pending_prefetch)
 atomic_set(num_pending_prefetch,0) // Overwrites other thread's store

Further, live is being used with SRCU as the 'update' in an
acquire/release fashion, so it can not be read and written raw.

Move all live = 1's to after MR initialization is completed and use
smp_store_release/smp_load_acquire() for manipulating it.

Add a missing live = 0 when an implicit MR child is deleted, before
queuing work to do synchronize_srcu().

The barriers in update_odp_mr() were some broken attempt to create a
acquire/release, but were not even applied consistently and missed the
point, delete it as well.

Fixes: 6aec21f6 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-6-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

aa603815

RDMA/odp: Lift umem_mutex out of ib_umem_odp_unmap_dma_pages() · 9dc775e7

由 Jason Gunthorpe 提交于 10月 01, 2019

This fixes a race of the form:
    CPU0                               CPU1
mlx5_ib_invalidate_range()     mlx5_ib_invalidate_range()
				 // This one actually makes npages == 0
				 ib_umem_odp_unmap_dma_pages()
				 if (npages == 0 && !dying)
  // This one does nothing
  ib_umem_odp_unmap_dma_pages()
  if (npages == 0 && !dying)
     dying = 1;
                                    dying = 1;
				    schedule_work(&umem_odp->work);
     // Double schedule of the same work
     schedule_work(&umem_odp->work);  // BOOM

npages and dying must be read and written under the umem_mutex lock.

Since whenever ib_umem_odp_unmap_dma_pages() is called mlx5 must also call
mlx5_ib_update_xlt, and both need to be done in the same locking region,
hoist the lock out of unmap.

This avoids an expensive double critical section in
mlx5_ib_invalidate_range().

Fixes: 81713d37 ("IB/mlx5: Add implicit MR support")
Link: https://lore.kernel.org/r/20191001153821.23621-4-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

9dc775e7

RDMA/mlx5: Fix a race with mlx5_ib_update_xlt on an implicit MR · f28b1932

由 Jason Gunthorpe 提交于 10月 01, 2019

mlx5_ib_update_xlt() must be protected against parallel free of the MR it
is accessing, also it must be called single threaded while updating the
HW. Otherwise we can have races of the form:

    CPU0                               CPU1
  mlx5_ib_update_xlt()
   mlx5_odp_populate_klm()
     odp_lookup() == NULL
     pklm = ZAP
                                      implicit_mr_get_data()
 				        implicit_mr_alloc()
 					  <update interval tree>
					mlx5_ib_update_xlt
					  mlx5_odp_populate_klm()
					    odp_lookup() != NULL
					    pklm = VALID
					   mlx5_ib_post_send_wait()

    mlx5_ib_post_send_wait() // Replaces VALID with ZAP

This can be solved by putting both the SRCU and the umem_mutex lock around
every call to mlx5_ib_update_xlt(). This ensures that the content of the
interval tree relavent to mlx5_odp_populate_klm() (ie mr->parent == mr)
will not change while it is running, and thus the posted WRs to update the
KLM will always reflect the correct information.

The race above will resolve by either having CPU1 wait till CPU0 completes
the ZAP or CPU0 will run after the add and instead store VALID.

The pagefault path adding children already holds the umem_mutex and SRCU,
so the only missed lock is during MR destruction.

Fixes: 81713d37 ("IB/mlx5: Add implicit MR support")
Link: https://lore.kernel.org/r/20191001153821.23621-3-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

f28b1932

17 9月, 2019 1 次提交

IB/mlx5: Use the original address for the page during free_pages · 130c2c57

由 Danit Goldberg 提交于 9月 16, 2019

The removal of 'buffer' in the patch below caused free_page() to use a
value that had been offset since the wqe pointer is adjusted while the
routine runs.

The current implementation of free_pages() rounds down to a pfn,
discarding the adjustment, but this is not the right way to use the
API. Preserve the initial value and use it for free_page().

Fixes: 0f51427b ("RDMA/mlx5: Cleanup WQE page fault handler")
Link: https://lore.kernel.org/r/20190916064818.19823-2-leon@kernel.orgSigned-off-by: NDanit Goldberg <danitg@mellanox.com>
Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

130c2c57

28 8月, 2019 2 次提交

IB/mlx5: Add page fault handler for DC initiator WQE · 75e46fc0

由 Michael Guralnik 提交于 8月 19, 2019

Parsing DC initiator WQEs upon page fault requires skipping an address
vector segment, as in UD WQEs.

Link: https://lore.kernel.org/r/20190819120815.21225-4-leon@kernel.orgSigned-off-by: NMichael Guralnik <michaelgur@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

75e46fc0

IB/mlx5: Remove check of FW capabilities in ODP page fault handling · 29af9498

由 Michael Guralnik 提交于 8月 19, 2019

As page fault handling is initiated by FW, there is no need to check that
the ODP supports the operation and transport.

Link: https://lore.kernel.org/r/20190819120815.21225-3-leon@kernel.orgSigned-off-by: NMichael Guralnik <michaelgur@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

29af9498

22 8月, 2019 7 次提交

RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr · fba0e448

由 Jason Gunthorpe 提交于 8月 19, 2019

These are the same thing since mr always comes from odp->private. It is
confusing to reference the same memory via two names.

Link: https://lore.kernel.org/r/20190819111710.18440-13-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

fba0e448

RDMA/mlx5: Use ib_umem_start instead of umem.address · a705f3e3

由 Jason Gunthorpe 提交于 8月 19, 2019

These are subtly different, the address is the original VA requested
during umem_get, while ib_umem_start() is the version that is rounded to
the proper page size, ie is the true start of the umem's dma map.

Link: https://lore.kernel.org/r/20190819111710.18440-12-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

a705f3e3

RDMA/core: Make invalidate_range a device operation · ce51346f

由 Moni Shoua 提交于 8月 19, 2019

The callback function 'invalidate_range' is implemented in a driver so the
place for it is in the ib_device_ops structure and not in ib_ucontext.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Reviewed-by: NGuy Levi <guyle@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Link: https://lore.kernel.org/r/20190819111710.18440-11-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

ce51346f

RDMA/odp: Provide ib_umem_odp_release() to undo the allocs · 0446cad9

由 Jason Gunthorpe 提交于 8月 19, 2019

Now that there are allocator APIs that return the ib_umem_odp directly
it should be freed through a umem_odp free'er as well.

Link: https://lore.kernel.org/r/20190819111710.18440-8-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

0446cad9

RDMA/odp: Make the three ways to create a umem_odp clear · f20bef6a

由 Jason Gunthorpe 提交于 8月 19, 2019

The three paths to build the umem_odps are kind of muddled, they are:
- As a normal ib_mr umem
- As a child in an implicit ODP umem tree
- As the root of an implicit ODP umem tree

Only the first two are actually umem's, the last is an abuse.

The implicit case can only be triggered by explicit driver request, it
should never be co-mingled with the normal case. While we are here, make
sensible function names and add some comments to make this clearer.

Link: https://lore.kernel.org/r/20190819111710.18440-6-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

f20bef6a

RDMA/odp: Make it clearer when a umem is an implicit ODP umem · fd7dbf03

由 Jason Gunthorpe 提交于 8月 19, 2019

Implicit ODP umems are special, they don't have any page lists, they don't
exist in the interval tree and they are never DMA mapped.

Instead of trying to guess this based on a zero length use an explicit
flag.

Further, do not allow non-implicit umems to be 0 size.

Link: https://lore.kernel.org/r/20190819111710.18440-4-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

fd7dbf03

RDMA/odp: Iterate over the whole rbtree directly · f993de88

由 Jason Gunthorpe 提交于 8月 19, 2019

Instead of intersecting a full interval, just iterate over every element
directly. This is faster and clearer.

Link: https://lore.kernel.org/r/20190819111710.18440-3-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

f993de88

21 8月, 2019 1 次提交

IB/mlx5: Report and handle ODP support properly · 00815752

由 Moni Shoua 提交于 8月 15, 2019

ODP depends on the several device capabilities, among them is the ability
to send UMR WQEs with that modify atomic and entity size of the MR.
Therefore, only if all conditions to send such a UMR WQE are met then
driver can report that ODP is supported. Use this check of conditions
in all places where driver needs to know about ODP support.

Also, implicit ODP support depends on ability of driver to send UMR WQEs
for an indirect mkey. Therefore, verify that all conditions to do so are
met when reporting support.

Fixes: c8d75a98 ("IB/mlx5: Respect new UMR capabilities")
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Reviewed-by: NGuy Levi <guyle@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190815083834.9245-7-leon@kernel.orgSigned-off-by: NDoug Ledford <dledford@redhat.com>

00815752

08 8月, 2019 1 次提交

IB/mlx5: Fix implicit MR release flow · f591822c

由 Yishai Hadas 提交于 8月 05, 2019

Once implicit MR is being called to be released by
ib_umem_notifier_release() its leaves were marked as "dying".

However, when dereg_mr()->mlx5_ib_free_implicit_mr()->mr_leaf_free() is
called, it skips running the mr_leaf_free_action (i.e. umem_odp->work)
when those leaves were marked as "dying".

As such ib_umem_release() for the leaves won't be called and their MRs
will be leaked as well.

When an application exits/killed without calling dereg_mr we might hit the
above flow.

This fatal scenario is reported by WARN_ON() upon
mlx5_ib_dealloc_ucontext() as ibcontext->per_mm_list is not empty, the
call trace can be seen below.

Originally the "dying" mark as part of ib_umem_notifier_release() was
introduced to prevent pagefault_mr() from returning a success response
once this happened. However, we already have today the completion
mechanism so no need for that in those flows any more.  Even in case a
success response will be returned the firmware will not find the pages and
an error will be returned in the following call as a released mm will
cause ib_umem_odp_map_dma_pages() to permanently fail mmget_not_zero().

Fix the above issue by dropping the "dying" from the above flows.  The
other flows that are using "dying" are still needed it for their
synchronization purposes.

   WARNING: CPU: 1 PID: 7218 at
   drivers/infiniband/hw/mlx5/main.c:2004
		  mlx5_ib_dealloc_ucontext+0x84/0x90 [mlx5_ib]
   CPU: 1 PID: 7218 Comm: ibv_rc_pingpong Tainted: G     E
	       5.2.0-rc6+ #13
   Call Trace:
   uverbs_destroy_ufile_hw+0xb5/0x120 [ib_uverbs]
   ib_uverbs_close+0x1f/0x80 [ib_uverbs]
   __fput+0xbe/0x250
   task_work_run+0x88/0xa0
   do_exit+0x2cb/0xc30
   ? __fput+0x14b/0x250
   do_group_exit+0x39/0xb0
   get_signal+0x191/0x920
   ? _raw_spin_unlock_bh+0xa/0x20
   ? inet_csk_accept+0x229/0x2f0
   do_signal+0x36/0x5e0
   ? put_unused_fd+0x5b/0x70
   ? __sys_accept4+0x1a6/0x1e0
   ? inet_hash+0x35/0x40
   ? release_sock+0x43/0x90
   ? _raw_spin_unlock_bh+0xa/0x20
   ? inet_listen+0x9f/0x120
   exit_to_usermode_loop+0x5c/0xc6
   do_syscall_64+0x182/0x1b0
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 81713d37 ("IB/mlx5: Add implicit MR support")
Link: https://lore.kernel.org/r/20190805083010.21777-1-leon@kernel.orgSigned-off-by: NYishai Hadas <yishaih@mellanox.com>
Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

f591822c

01 8月, 2019 1 次提交

RDMA/mlx5: Remove DEBUG ODP code · d129e3f4

由 Leon Romanovsky 提交于 7月 31, 2019

Delete DEBUG ODP dead code which is leftover from development
stage and doesn't need to be part of the upstream kernel.
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190731115627.5433-1-leon@kernel.orgSigned-off-by: NDoug Ledford <dledford@redhat.com>

d129e3f4

25 7月, 2019 1 次提交

IB/mlx5: Prevent concurrent MR updates during invalidation · 296e3a2a

由 Moni Shoua 提交于 7月 23, 2019

The device requires that memory registration work requests that update the
address translation table of a MR will be fenced if posted together. This
scenario can happen when address ranges are invalidated by the mmu in
separate concurrent calls to the invalidation callback.

We prefer to block concurrent address updates for a single MR over fencing
since making the decision if a WQE needs fencing will be more expensive
and fencing all WQEs is a too radical choice.

Further, it isn't clear that this code can even run safely concurrently,
so a lock is a safer choice.

Fixes: b4cfe447 ("IB/mlx5: Implement on demand paging by adding support for MMU notifiers")
Link: https://lore.kernel.org/r/20190723065733.4899-8-leon@kernel.orgSigned-off-by: NMoni Shoua <monis@mellanox.com>
Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

296e3a2a

23 7月, 2019 1 次提交

IB/mlx5: Replace kfree with kvfree · b7f406bb

由 Chuhong Yuan 提交于 7月 17, 2019

Memory allocated by kvzalloc should not be freed by kfree(), use kvfree()
instead.

Fixes: 813e90b1 ("IB/mlx5: Add advise_mr() support")
Link: https://lore.kernel.org/r/20190717082101.14196-1-hslester96@gmail.comSigned-off-by: NChuhong Yuan <hslester96@gmail.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

b7f406bb

04 7月, 2019 1 次提交

net/mlx5: Use event mask based on device capabilities · b9a7ba55

由 Yishai Hadas 提交于 6月 30, 2019

Use the reported device capabilities for the supported user events (i.e.
affiliated and un-affiliated) to set the EQ mask.

As the event mask can be up to 256 defined by 4 entries of u64 change
the applicable code to work accordingly.
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

b9a7ba55

25 6月, 2019 1 次提交

net/mlx5: Convert mkey_table to XArray · 792c4e9d

由 Matthew Wilcox 提交于 6月 20, 2019

The lock protecting the data structure does not need to be an rwlock.  The
only read access to the lock is in an error path, and if that's limiting
your scalability, you have bigger performance problems.

Eliminate mlx5_mkey_table in favour of using the xarray directly.
reg_mr_callback must use GFP_ATOMIC for allocating XArray nodes as it may
be called in interrupt context.

This also fixes a minor bug where SRCU locking was being used on the radix
tree read side, when RCU was needed too.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

792c4e9d

14 6月, 2019 4 次提交

net/mlx5: Add EQ enable/disable API · 1f8a7bee

由 Yuval Avnery 提交于 6月 10, 2019

Previously, EQ joined the chain notifier on creation.
This forced the caller to be ready to handle events before creating
the EQ through eq_create_generic interface.

To help the caller control when the created EQ will be attached to the
IRQ, add enable/disable API.
Signed-off-by: NYuval Avnery <yuvalav@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

1f8a7bee

net/mlx5: Use a single IRQ for all async EQs · 81bfa206

由 Ariel Levkovich 提交于 6月 10, 2019

The patch modifies the IRQ allocation so that all async EQs are
assigned to the same IRQ resulting in more available IRQs for
completion EQs.

The changes are using the support for IRQ sharing and EQ polling budget
that was introduced in previous patches so when the shared interrupt is
triggered, the kernel will serially call the handler of each of the
sharing EQs with a certain budget of EQEs to poll in order to prevent
starvation.
Signed-off-by: NAriel Levkovich <lariel@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

81bfa206

net/mlx5: Separate IRQ request/free from EQ life cycle · 24163189

由 Yuval Avnery 提交于 6月 10, 2019

Instead of requesting IRQ with eq creation, IRQs will be requested
before EQ table creation.
Instead of freeing the IRQs after EQ destroy, free IRQs after eq
table destroy.
Signed-off-by: NYuval Avnery <yuvalav@mellanox.com>
Reviewed-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

24163189

net/mlx5: Change interrupt handler to call chain notifier · ca390799

由 Yuval Avnery 提交于 6月 10, 2019

Multiple EQs may share the same IRQ in subsequent patches.

Instead of calling the IRQ handler directly, the EQ will register
to an atomic chain notfier.

The Linux built-in shared IRQ is not used because it forces the caller
to disable the IRQ and clear affinity before free_irq() can be called.

This patch is the first step in the separation of IRQ and EQ logic.
Signed-off-by: NYuval Avnery <yuvalav@mellanox.com>
Reviewed-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

ca390799

22 5月, 2019 1 次提交

RDMA/umem: Move page_shift from ib_umem to ib_odp_umem · d2183c6f

由 Jason Gunthorpe 提交于 5月 20, 2019

This value has always been set to PAGE_SHIFT in the core code, the only
thing that does differently was the ODP path. Move the value into the ODP
struct and still use it for ODP, but change all the non-ODP things to just
use PAGE_SHIFT/PAGE_SIZE/PAGE_MASK directly.
Reviewed-by: NShiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

d2183c6f

09 4月, 2019 1 次提交

RDMA/umem: Combine contiguous PAGE_SIZE regions in SGEs · d10bcf94

由 Shiraz Saleem 提交于 4月 02, 2019

Combine contiguous regions of PAGE_SIZE pages into single scatter list
entry while building the scatter table for a umem. This minimizes the
number of the entries in the scatter list and reduces the DMA mapping
overhead, particularly with the IOMMU.

Set default max_seg_size in core for IB devices to 2G and do not combine
if we exceed this limit.

Also, purge npages in struct ib_umem as we now DMA map the umem SGL with
sg_nents and npage computation is not needed. Drivers should now be using
ib_umem_num_pages(), so fix the last stragglers.

Move npages tracking to ib_umem_odp as ODP drivers still need it.
Suggested-by: NJason Gunthorpe <jgg@ziepe.ca>
Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Acked-by: NAdit Ranadive <aditr@vmware.com>
Signed-off-by: NShiraz Saleem <shiraz.saleem@intel.com>
Tested-by: NGal Pressman <galpress@amazon.com>
Tested-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

d10bcf94

04 4月, 2019 1 次提交

RDMA/mlx5: Cleanup WQE page fault handler · 0f51427b

由 Leon Romanovsky 提交于 2月 25, 2019

Refactor the page fault handler to be more readable and extensible, this
cleanup was triggered by the error reported below. The code structure made
it unclear to the automatic tools to identify that such a flow is not
possible in real life because "requestor != NULL" means that "qp != NULL"
too.

    drivers/infiniband/hw/mlx5/odp.c:1254 mlx5_ib_mr_wqe_pfault_handler()
    error: we previously assumed 'qp' could be null (see line 1230)

Fixes: 08100fad ("IB/mlx5: Add ODP SRQ support")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

0f51427b

28 3月, 2019 2 次提交

IB/mlx5: Compare only index part of a memory window rkey · d623dfd2

由 Artemy Kovalyov 提交于 3月 19, 2019

The InfiniBand Architecture Specification section 10.6.7.2.4 TYPE 2 MEMORY
WINDOWS says that if the CI supports the Base Memory Management Extensions
defined in this specification, the R_Key format for a Type 2 Memory Window
must consist of:

* 24 bit index in the most significant bits of the R_Key, which is owned
  by the CI, and
* 8 bit key in the least significant bits of the R_Key, which is owned by
  the Consumer.

This means that the kernel should compare only the index part of a R_Key
to determine equality with another R_Key.

Fixes: db570d7d ("IB/mlx5: Add ODP support to MW")
Signed-off-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

d623dfd2

IB/mlx5: Reset access mask when looping inside page fault handler · 1abe186e

由 Moni Shoua 提交于 3月 19, 2019

If page-fault handler spans multiple MRs then the access mask needs to
be reset before each MR handling or otherwise write access will be
granted to mapped pages instead of read-only.

Cc: <stable@vger.kernel.org> # 3.19
Fixes: 7bdf65d4 ("IB/mlx5: Handle page faults")
Reported-by: NJerome Glisse <jglisse@redhat.com>
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

1abe186e

04 3月, 2019 1 次提交

IB/mlx5: Set correct write permissions for implicit ODP MR · 7095ec3c

由 Moni Shoua 提交于 2月 25, 2019

The write access of an implicit MR is inherited to all of its children.
Therefore we must set the correct write access to the parent MR.

Pass full access_flags when creating umem to let it calculate write access
correctly.

Fixes: da6a496a ("IB/mlx5: Ranges in implicit ODP MR inherit its write access")
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

7095ec3c

22 2月, 2019 2 次提交

IB/mlx5: Validate correct PD before prefetch MR · 81dd4c4b

由 Moni Shoua 提交于 2月 17, 2019

When prefetching odp mr it is required to verify that pd of the mr is
identical to the pd for which the advise_mr request arrived with.

This check was missing from synchronous flow and is added now.

Fixes: 813e90b1 ("IB/mlx5: Add advise_mr() support")
Reported-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

81dd4c4b

IB/mlx5: Protect against prefetch of invalid MR · a6bc3875

由 Moni Shoua 提交于 2月 17, 2019

When deferring a prefetch request we need to protect against MR or PD
being destroyed while the request is still enqueued.

The first step is to validate that PD owns the lkey that describes the MR
and that the MR that the lkey refers to is owned by that PD.

The second step is to dequeue all requests when MR is destroyed.

Since PD can't be destroyed while it owns MRs it is guaranteed that when a
worker wakes up the request it refers to is still valid.

Now, it is possible to refrain from taking a reference on the device since
it is assured to be present as pd.

While that, replace the dedicated ordered workqueue with the system
unbound workqueue to reuse an existing resource and improve
performance. This will also fix a bug of queueing to the wrong workqueue.

Fixes: 813e90b1 ("IB/mlx5: Add advise_mr() support")
Reported-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

a6bc3875

05 2月, 2019 3 次提交

IB/mlx5: Advertise XRC ODP support · 6141f8fa

由 Moni Shoua 提交于 1月 22, 2019

Query all per transport caps for XRC and set the appropriate bits in the
per transport field of the advertised struct.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Reviewed-by: NMajd Dibbiny <majd@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

6141f8fa

IB/mlx5: Advertise SRQ ODP support for supported transports · 2e68dace

由 Moni Shoua 提交于 1月 22, 2019

ODP support in SRQ is per transport capability. Based on device
capabilities set this flag in device structure for future queries.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Reviewed-by: NMajd Dibbiny <majd@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

2e68dace

IB/mlx5: Add ODP SRQ support · 08100fad

由 Moni Shoua 提交于 1月 22, 2019

Add changes to the WQE page-fault handler to

1. Identify that the event is for a SRQ WQE
2. Pass SRQ object instead of a QP to the function that reads the WQE
3. Parse the SRQ WQE with respect to its structure

The rest is handled as for regular RQ WQE.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Reviewed-by: NMajd Dibbiny <majd@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

08100fad

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功