提交 · 1fb7f8973f51ca1a7ffe61a2c779ed15f57f3d82 · openeuler / Kernel

26 3月, 2021 1 次提交

RDMA: Support more than 255 rdma ports · 1fb7f897

由 Mark Bloch 提交于 3月 01, 2021

Current code uses many different types when dealing with a port of a RDMA
device: u8, unsigned int and u32. Switch to u32 to clean up the logic.

This allows us to make (at least) the core view consistent and use the
same type. Unfortunately not all places can be converted. Many uverbs
functions expect port to be u8 so keep those places in order not to break
UAPIs. HW/Spec defined values must also not be changed.

With the switch to u32 we now can support devices with more than 255
ports. U32_MAX is reserved to make control logic a bit easier to deal
with. As a device with U32_MAX ports probably isn't going to happen any
time soon this seems like a non issue.

When a device with more than 255 ports is created uverbs will report the
RDMA device as having 255 ports as this is the max currently supported.

The verbs interface is not changed yet because the IBTA spec limits the
port size in too many places to be u8 and all applications that relies in
verbs won't be able to cope with this change. At this stage, we are
extending the interfaces that are using vendor channel solely

Once the limitation is lifted mlx5 in switchdev mode will be able to have
thousands of SFs created by the device. As the only instance of an RDMA
device that reports more than 255 ports will be a representor device and
it exposes itself as a RAW Ethernet only device CM/MAD/IPoIB and other
ULPs aren't effected by this change and their sysfs/interfaces that are
exposes to userspace can remain unchanged.

While here cleanup some alignment issues and remove unneeded sanity
checks (mainly in rdmavt),

Link: https://lore.kernel.org/r/20210301070420.439400-1-leon@kernel.orgSigned-off-by: NMark Bloch <mbloch@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

1fb7f897

24 3月, 2021 2 次提交

RDMA/hns: Support to query firmware version · 847d19a4

由 Lang Cheng 提交于 3月 16, 2021

Implement the ops named get_dev_fw_str to support ib_get_device_fw_str().

Link: https://lore.kernel.org/r/1615882161-53827-1-git-send-email-liweihang@huawei.comSigned-off-by: NLang Cheng <chenglang@huawei.com>
Signed-off-by: NWeihang Li <liweihang@huawei.com>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

847d19a4

RDMA/mlx5: Create ODP EQ only when ODP MR is created · ad50294d

由 Shay Drory 提交于 3月 14, 2021

There is no need to create the ODP EQ if the user doesn't use ODP MRs.
Hence, create it only when the first ODP MR is created. This EQ will be
destroyed only when the device is unloaded.
This will decrease the number of EQs created per device. for example: If
we creates 1K devices (SF/VF/etc'), than we will decrease the num of EQs
by 1K.

Link: https://lore.kernel.org/r/20210314125418.179716-1-leon@kernel.orgSigned-off-by: NShay Drory <shayd@nvidia.com>
Reviewed-by: NMaor Gottlieb <maorg@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

ad50294d

23 3月, 2021 1 次提交

RDMA/hns: Fix memory corruption when allocating XRCDN · 783cf673

由 Weihang Li 提交于 3月 22, 2021

It's incorrect to cast the type of pointer to xrcdn from (u32 *) to
(unsigned long *), then pass it into hns_roce_bitmap_alloc(), this will
lead to a memory corruption.

Fixes: 32548870 ("RDMA/hns: Add support for XRC on HIP09")
Link: https://lore.kernel.org/r/1616381069-51759-1-git-send-email-liweihang@huawei.comReported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NWeihang Li <liweihang@huawei.com>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

783cf673

22 3月, 2021 4 次提交

RDMA/cma: Remove unused leftovers in cma code · 87115951

由 Gal Pressman 提交于 3月 14, 2021

Commit ee1c60b1 ("IB/SA: Modify SA to implicitly cache Class Port
info") removed the class_port_info_context struct usage, remove a couple
of leftovers.

Link: https://lore.kernel.org/r/20210314143427.76101-1-galpress@amazon.comSigned-off-by: NGal Pressman <galpress@amazon.com>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

87115951

RDMA: Delete not-used static inline functions · fdb68dd3

由 Leon Romanovsky 提交于 3月 14, 2021

Perform mass deletion of static inline functions that are not used.

Link: https://lore.kernel.org/r/20210314133908.291945-3-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

fdb68dd3

RDMA: Fix kernel-doc compilation warnings · ae360f41

由 Leon Romanovsky 提交于 3月 14, 2021

This patch fixes bunch of kernel-doc compilation warnings like below:

drivers/infiniband/hw/i40iw/i40iw_cm.c:4372: warning: expecting prototype for i40iw_ifdown_notify(). Prototype was for i40iw_if_notify() instead

Link: https://lore.kernel.org/r/20210314133908.291945-2-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

ae360f41

RDMA/mlx5: Add missing returned error check of mlx5_ib_dereg_mr · b5486430

由 Leon Romanovsky 提交于 3月 14, 2021

Fix the following smatch error:

drivers/infiniband/hw/mlx5/mr.c:1950 mlx5_ib_dereg_mr() error: uninitialized symbol 'rc'.

Fixes: e6fb246c ("RDMA/mlx5: Consolidate MR destruction to mlx5_ib_dereg_mr()")
Link: https://lore.kernel.org/r/20210314082250.10143-1-leon@kernel.orgReported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

b5486430

12 3月, 2021 13 次提交

RDMA/mlx5: Allow larger pages in DevX umem · 7610ab57

由 Jason Gunthorpe 提交于 3月 04, 2021

The umem DMA list calculation was locked at 4k pages due to confusion
around how this API works and is used when larger pages are present.

The conclusion is:

 - umem's cannot extend past what is mapped into the process, so creating
   a lage page size and referring to a sub-range is not allowed

 - umem's must always have a page offset of zero, except for sub PAGE_SIZE
   umems

 - The feature of umem_offset to create multiple objects inside a umem
   is buggy and isn't used anyplace. Thus we can assume all users of the
   current API have umem_offset == 0 as well

Provide a new page size calculator that limits the DMA list to the VA
range and enforces umem_offset == 0.

Allow user space to specify the page sizes which it can accept, this
bitmap must be derived from the intended use of the umem, based on
per-usage HW limitations.

Link: https://lore.kernel.org/r/20210304130501.1102577-4-leon@kernel.orgSigned-off-by: NYishai Hadas <yishaih@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

7610ab57

IB/core: Split uverbs_get_const/default to consider target type · 2904bb37

由 Yishai Hadas 提交于 3月 04, 2021

Change uverbs_get_const/uverbs_get_const_default to work properly with
both signed/unsigned parameters.

Current APIs mix s64 and u64 which leads to incorrect check when u64
value was supplied and its upper bit was set. In that case
uverbs_get_const() / uverbs_get_const_default() lower bound check may
fail unexpectedly, target is unsigned (lower bound is 0) but value
became negative as of the s64 usage.

Split to have two different APIs, no change to callers as the required
API will be called internally according to the target type.

Link: https://lore.kernel.org/r/20210304130501.1102577-3-leon@kernel.orgSigned-off-by: NYishai Hadas <yishaih@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

2904bb37

IB/core: Drop WARN_ON() from ib_umem_find_best_pgsz() · 3f32dc0f

由 Yishai Hadas 提交于 3月 04, 2021

The WARN_ON() issued as part of ib_umem_find_best_pgsz() blocked cases
when only page sizes larger than PAGE_SIZE were set, drop it to enable
those cases.

In addition, there is no need to have a specific check for zero
pgsz_bitmap, the function will do its job and return 0 at the end if
nothing match will be found.

Link: https://lore.kernel.org/r/20210304130501.1102577-2-leon@kernel.orgSigned-off-by: NYishai Hadas <yishaih@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

3f32dc0f

RDMA/mlx5: Fix mlx5 rates to IB rates map · 6fe6e568

由 Mark Zhang 提交于 3月 04, 2021

Correct the map between mlx5 rates and corresponding ib rates, as they
don't always have a fixed offset between them.

Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/20210304124517.1100608-4-leon@kernel.orgSigned-off-by: NMark Zhang <markzhang@nvidia.com>
Reviewed-by: NMaor Gottlieb <maorg@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

6fe6e568

RDMA/mlx5: Fix query RoCE port · 7852546f

由 Maor Gottlieb 提交于 3月 04, 2021

mlx5_is_roce_enabled returns the devlink RoCE init value, therefore it
should be used only when driver is loaded. Instead we just need to read
the roce_en field.

In addition, rename mlx5_is_roce_enabled to mlx5_is_roce_init_enabled.

Fixes: 7a58779e ("IB/mlx5: Improve query port for representor port")
Link: https://lore.kernel.org/r/20210304124517.1100608-2-leon@kernel.orgSigned-off-by: NMaor Gottlieb <maorg@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

7852546f

RDMA/mlx5: Rename mlx5_mr_cache_invalidate() to revoke_mr() · 14d05b55

由 Jason Gunthorpe 提交于 3月 04, 2021

Now that this is only used in a few places in mr.c give it a sensible
name. It has nothing to do with the cache and can be invoked on any
MR. DMA is stopped and the user cannot touch the MR any further once it
completes.

Link: https://lore.kernel.org/r/20210304120745.1090751-5-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

14d05b55

RDMA/mlx5: Consolidate MR destruction to mlx5_ib_dereg_mr() · e6fb246c

由 Jason Gunthorpe 提交于 3月 04, 2021

Now that the SRCU stuff has been removed the entire MR destroy logic can
be made a lot simpler. Currently there are many different ways to destroy a
MR and it makes it really hard to do this task correctly. Route all
destruction through mlx5_ib_dereg_mr() and make it work for all
situations.

Since it turns out all the different MR types do basically the same thing
this removes a lot of knowledge of MR internals from ODP and leaves ODP
just exporting an operation to clean up children.

This fixes a few weird corner cases bugs and firmly uses the correct
ordering of the MR destruction:
 - Stop parallel access to the mkey via the ODP xarray
 - Stop DMA
 - Release the umem
 - Clean up ODP children
 - Free/Recycle the MR

Link: https://lore.kernel.org/r/20210304120745.1090751-4-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

e6fb246c

RDMA/mlx5: Use a union inside mlx5_ib_mr · f18ec422

由 Jason Gunthorpe 提交于 3月 04, 2021

The struct mlx5_ib_mr can be used for three different things, but only one
at a time:

 - In the user MR cache
 - As a kernel MR
 - As a user MR

Overlay the three things into a single union with the following rules:

 - If the mr is found on the cache_ent->head list then it is a cache MR
   and umem == NULL. The entire union is zero after the MR is removed from
   the cache.

 - If umem != NULL or type == IB_MR_TYPE_USER then it is a user MR.

 - If umem == NULL then it is a kernel MR

This reduces the size of struct mlx5_ib_mr to 552 bytes from 702.

The only place the three flows overlap in the code is during dereg, so add
a few extra checks along there.

Link: https://lore.kernel.org/r/20210304120745.1090751-3-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

f18ec422

RDMA/mlx5: Zero out ODP related items in the mlx5_ib_mr · a639e667

由 Jason Gunthorpe 提交于 3月 04, 2021

All of the ODP code assumes when it calls mlx5_mr_cache_alloc() the ODP
related fields are zero'd. This is true if the MR was just allocated, but
if the MR is recycled through the cache then the values are never zero'd.

This causes a bug in the odp_stats, they don't reset when the MR is
reallocated, also is_odp_implicit is never 0'd.

So we can use memset on a block of the mlx5_ib_mr reorganize the structure
to put all the data that can be zero'd by the cache at the end.

It is organized as an anonymous struct because the next patch will make
this a union.

Delete the unused smr_info. Don't set the kernel only desc_size on the
user path. No longer any need to zero mr->parent before freeing it, the
memset() will get it now.

Fixes: a3de94e3 ("IB/mlx5: Introduce ODP diagnostic counters")
Link: https://lore.kernel.org/r/20210304120745.1090751-2-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

a639e667

RDMA/hns: Add support for XRC on HIP09 · 32548870

由 Wenpeng Liang 提交于 3月 04, 2021

The HIP09 supports XRC transport service, it greatly saves the number of
QPs required to connect all processes in a large cluster.

Link: https://lore.kernel.org/r/1614826558-35423-1-git-send-email-liweihang@huawei.comSigned-off-by: NWenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: NWeihang Li <liweihang@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

32548870

RDMA/rtrs-clt: Use rdma_event_msg in log · c33d516a

由 Jack Wang 提交于 2月 22, 2021

It's easier to understand a string instead of enum.

Link: https://lore.kernel.org/r/20210222141551.54345-2-jinpu.wang@cloud.ionos.comSigned-off-by: NJack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

c33d516a

RDMA/rtrs: Use new shared CQ mechanism · 3b89e92c

由 Jack Wang 提交于 2月 22, 2021

Have the driver use shared CQs which provids a ~10%-20% improvement during
test.

Instead of opening a CQ for each QP per connection, a CQ for each QP will
be provided by the RDMA core driver that will be shared between the QPs on
that core reducing interrupt overhead.

Link: https://lore.kernel.org/r/20210222141551.54345-1-jinpu.wang@cloud.ionos.comSigned-off-by: NJack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

3b89e92c

RDMA/core: Remove unused req_ncomp_notif device operation · f675ba12

由 Gal Pressman 提交于 3月 11, 2021

The request_ncomp_notif device operation and function are unused, remove
them.

Link: https://lore.kernel.org/r/20210311150921.23726-1-galpress@amazon.comSigned-off-by: NGal Pressman <galpress@amazon.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

f675ba12

11 3月, 2021 2 次提交

RDMA/iwcm: Allow AFONLY binding for IPv6 addresses · e35ecb46

由 Bernard Metzler 提交于 2月 19, 2021

Binding IPv6 address/port to AF_INET6 domain only is provided via
rdma_set_afonly(), but was not signalled to the provider. Applications
like NFS/RDMA bind the same port to both IPv4 and IPv6 addresses
simultaneously and thus rely on it working correctly.

Link: https://lore.kernel.org/r/20210219143441.1068-1-bmt@zurich.ibm.comTested-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NBenjamin Coddington <bcodding@redhat.com>
Signed-off-by: NBernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

e35ecb46

RDMA/hns: Use new SQ doorbell register for HIP09 · 0f00571f

由 Lang Cheng 提交于 2月 23, 2021

HIP09 uses new address space to map SQ doorbell registers, the doorbell of
each QP is isolated based on the size of 64KB, which can improve the
performance in concurrency scenarios.

Link: https://lore.kernel.org/r/1614082833-23130-1-git-send-email-liweihang@huawei.comSigned-off-by: NLang Cheng <chenglang@huawei.com>
Signed-off-by: NWeihang Li <liweihang@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

0f00571f

06 3月, 2021 3 次提交

RDMA/rxe: Fix errant WARN_ONCE in rxe_completer() · 545c4ab4

由 Bob Pearson 提交于 3月 04, 2021

In rxe_comp.c in rxe_completer() the function free_pkt() did not clear skb
which triggered a warning at 'done:' and could possibly at 'exit:'. The
WARN_ONCE() calls are not actually needed. The call to free_pkt() is
moved to the end to clearly show that all skbs are freed.

Fixes: 899aba89 ("RDMA/rxe: Fix FIXME in rxe_udp_encap_recv()")
Link: https://lore.kernel.org/r/20210304192048.2958-1-rpearson@hpe.comSigned-off-by: NBob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

545c4ab4

RDMA/rxe: Fix extra deref in rxe_rcv_mcast_pkt() · 5e4a7ccc

由 Bob Pearson 提交于 3月 04, 2021

rxe_rcv_mcast_pkt() dropped a reference to ib_device when no error
occurred causing an underflow on the reference counter. This code is
cleaned up to be clearer and easier to read.

5e4a7ccc

RDMA/rxe: Fix missed IB reference counting in loopback · 21e27ac8

由 Bob Pearson 提交于 3月 04, 2021

When the noted patch below extending the reference taken by
rxe_get_dev_from_net() in rxe_udp_encap_recv() until each skb is freed it
was not matched by a reference in the loopback path resulting in
underflows.

21e27ac8

04 3月, 2021 2 次提交

RDMA/uverbs: Fix kernel-doc warning of _uverbs_alloc · cca7f12b

由 Leon Romanovsky 提交于 3月 02, 2021

Fix the following W=1 compilation warning:

drivers/infiniband/core/uverbs_ioctl.c:108: warning: expecting prototype for uverbs_alloc(). Prototype was for _uverbs_alloc() instead

Fixes: 461bb2ee ("IB/uverbs: Add a simple allocator to uverbs_attr_bundle")
Link: https://lore.kernel.org/r/20210302074214.1054299-3-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

cca7f12b

RDMA/mlx5: Set correct kernel-doc identifier · f9180399

由 Leon Romanovsky 提交于 3月 02, 2021

The W=1 allmodconfig build produces the following warning:

drivers/infiniband/hw/mlx5/odp.c:1086: warning: wrong kernel-doc identifier on line:
  * Parse a series of data segments for page fault handling.

Fix it by changing /** to be /* as it is written in kernel-doc
documentation.

Fixes: 5e769e44 ("RDMA/hw/mlx5/odp: Fix formatting and add missing descriptions in 'pagefault_data_segments()'")
Link: https://lore.kernel.org/r/20210302074214.1054299-2-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

f9180399

02 3月, 2021 3 次提交

IB/mlx5: Add missing error code · 3a9b3d45

由 YueHaibing 提交于 2月 22, 2021

Set err to -ENOMEM if kzalloc fails instead of 0.

Fixes: 75973853 ("IB/mlx5: Enable subscription for device events over DEVX")
Link: https://lore.kernel.org/r/20210222122343.19720-1-yuehaibing@huawei.comSigned-off-by: NYueHaibing <yuehaibing@huawei.com>
Acked-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

3a9b3d45

RDMA/rxe: Fix missing kconfig dependency on CRYPTO · 475f23b8

由 Julian Braha 提交于 2月 19, 2021

When RDMA_RXE is enabled and CRYPTO is disabled, Kbuild gives the
following warning:

 WARNING: unmet direct dependencies detected for CRYPTO_CRC32
   Depends on [n]: CRYPTO [=n]
   Selected by [y]:
   - RDMA_RXE [=y] && (INFINIBAND_USER_ACCESS [=y] || !INFINIBAND_USER_ACCESS [=y]) && INET [=y] && PCI [=y] && INFINIBAND [=y] && INFINIBAND_VIRT_DMA [=y]

This is because RDMA_RXE selects CRYPTO_CRC32, without depending on or
selecting CRYPTO, despite that config option being subordinate to CRYPTO.

Fixes: cee2688e ("IB/rxe: Offload CRC calculation when possible")
Signed-off-by: NJulian Braha <julianbraha@gmail.com>
Link: https://lore.kernel.org/r/21525878.NYvzQUHefP@ubuntu-mate-laptopSigned-off-by: NJason Gunthorpe <jgg@nvidia.com>

475f23b8

RDMA/cm: Fix IRQ restore in ib_send_cm_sidr_rep · 221384df

由 Saeed Mahameed 提交于 3月 01, 2021

ib_send_cm_sidr_rep() {
	spin_lock_irqsave()
        cm_send_sidr_rep_locked() {
                ...
        	spin_lock_irq()
                ....
                spin_unlock_irq() <--- this will enable interrupts
        }
        spin_unlock_irqrestore()
}

spin_unlock_irqrestore() expects interrupts to be disabled but the
internal spin_unlock_irq() will always enable hard interrupts.

Fix this by replacing the internal spin_{lock,unlock}_irq() with
irqsave/restore variants.

It fixes the following kernel trace:

 raw_local_irq_restore() called with IRQs enabled
 WARNING: CPU: 2 PID: 20001 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x1d/0x20

 Call Trace:
  _raw_spin_unlock_irqrestore+0x4e/0x50
  ib_send_cm_sidr_rep+0x3a/0x50 [ib_cm]
  cma_send_sidr_rep+0xa1/0x160 [rdma_cm]
  rdma_accept+0x25e/0x350 [rdma_cm]
  ucma_accept+0x132/0x1cc [rdma_ucm]
  ucma_write+0xbf/0x140 [rdma_ucm]
  vfs_write+0xc1/0x340
  ksys_write+0xb3/0xe0
  do_syscall_64+0x2d/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: 87c4c774 ("RDMA/cm: Protect access to remote_sidr_table")
Link: https://lore.kernel.org/r/20210301081844.445823-1-leon@kernel.orgSigned-off-by: NSaeed Mahameed <saeedm@nvidia.com>
Reviewed-by: NJason Gunthorpe <jgg@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

221384df

17 2月, 2021 9 次提交

RDMA/rtrs-srv: Do not pass a valid pointer to PTR_ERR() · ed408529

由 Jack Wang 提交于 2月 16, 2021

smatch gives the warning:

drivers/infiniband/ulp/rtrs/rtrs-srv.c:1805 rtrs_rdma_connect() warn: passing zero to 'PTR_ERR'

Which is trying to say smatch has shown that srv is not an error pointer
and thus cannot be passed to PTR_ERR.

The solution is to move the list_add() down after full initilization of
rtrs_srv. To avoid holding the srv_mutex too long, only hold it during the
list operation as suggested by Leon.

Fixes: 03e9b33a ("RDMA/rtrs: Only allow addition of path to an already established session")
Link: https://lore.kernel.org/r/20210216143807.65923-1-jinpu.wang@cloud.ionos.comReported-by: Nkernel test robot <lkp@intel.com>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

ed408529

RDMA/srp: Fix support for unpopulated and unbalanced NUMA nodes · 2b5715fc

由 Nicolas Morey-Chaisemartin 提交于 2月 05, 2021

The current code computes a number of channels per SRP target and spreads
them equally across all online NUMA nodes.  Each channel is then assigned
a CPU within this node.

In the case of unbalanced, or even unpopulated nodes, some channels do not
get a CPU associated and thus do not get connected.  This causes the SRP
connection to fail.

This patch solves the issue by rewriting channel computation and
allocation:

- Drop channel to node/CPU association as it had no real effect on
  locality but added unnecessary complexity.

- Tweak the number of channels allocated to reduce CPU contention when
  possible:
  - Up to one channel per CPU (instead of up to 4 by node)
  - At least 4 channels per node, unless ch_count module parameter is
    used.

Link: https://lore.kernel.org/r/9cb4d9d3-30ad-2276-7eff-e85f7ddfb411@suse.comSigned-off-by: NNicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

2b5715fc

RDMA/mlx5: Fail QP creation if the device can not support the CQE TS · 2fe8d4b8

由 Aharon Landau 提交于 2月 09, 2021

In ConnectX6Dx device, HW can work in real time timestamp mode according
to the device capabilities per RQ/SQ/QP.

When the flag IB_UVERBS_CQ_FLAGS_TIMESTAMP_COMPLETION is set, the user
expect to get TS on the CQEs in free running format, so we need to fail
the QP creation if the current mode of the device doesn't support it.

Link: https://lore.kernel.org/r/20210209131107.698833-3-leon@kernel.orgSigned-off-by: NAharon Landau <aharonl@nvidia.com>
Signed-off-by: NMaor Gottlieb <maorg@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

2fe8d4b8

RDMA/mlx5: Allow CQ creation without attached EQs · 7232c132

由 Tal Gilboa 提交于 2月 11, 2021

The traditional DevX CQ creation flow goes through mlx5_core_create_cq()
which checks that the given EQN corresponds to an existing EQ and attaches
a devx handler to the EQN for the CQ.

In some cases the EQ will not be a kernel EQ, but will be controlled by
modify CQ, don't block creating these just because the EQN can't be found
in the kernel.

Link: https://lore.kernel.org/r/20210211085549.1277674-1-leon@kernel.orgSigned-off-by: NTal Gilboa <talgi@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

7232c132

RDMA/rtrs-srv-sysfs: fix missing put_device · e2853c49

由 Gioh Kim 提交于 2月 12, 2021

put_device() decreases the ref-count and then the device will be
cleaned-up, while at is also add missing put_device in
rtrs_srv_create_once_sysfs_root_folders

This patch solves a kmemleak error as below:

  unreferenced object 0xffff88809a7a0710 (size 8):
    comm "kworker/4:1H", pid 113, jiffies 4295833049 (age 6212.380s)
    hex dump (first 8 bytes):
      62 6c 61 00 6b 6b 6b a5                          bla.kkk.
    backtrace:
      [<0000000054413611>] kstrdup+0x2e/0x60
      [<0000000078e3120a>] kobject_set_name_vargs+0x2f/0xb0
      [<00000000f1a17a6b>] dev_set_name+0xab/0xe0
      [<00000000d5502e32>] rtrs_srv_create_sess_files+0x2fb/0x314 [rtrs_server]
      [<00000000ed11a1ef>] rtrs_srv_info_req_done+0x631/0x800 [rtrs_server]
      [<000000008fc5aa8f>] __ib_process_cq+0x94/0x100 [ib_core]
      [<00000000a9599cb4>] ib_cq_poll_work+0x32/0xc0 [ib_core]
      [<00000000cfc376be>] process_one_work+0x4bc/0x980
      [<0000000016e5c96a>] worker_thread+0x78/0x5c0
      [<00000000c20b8be0>] kthread+0x191/0x1e0
      [<000000006c9c0003>] ret_from_fork+0x3a/0x50

Fixes: baa5b28b ("RDMA/rtrs-srv: Replace device_register with device_initialize and device_add")
Link: https://lore.kernel.org/r/20210212134525.103456-5-jinpu.wang@cloud.ionos.comSigned-off-by: NGioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: NJack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

e2853c49

RDMA/rtrs-srv: fix memory leak by missing kobject free · f7452a7e

由 Gioh Kim 提交于 2月 12, 2021

kmemleak reported an error as below:

  unreferenced object 0xffff8880674b7640 (size 64):
    comm "kworker/4:1H", pid 113, jiffies 4296403507 (age 507.840s)
    hex dump (first 32 bytes):
      69 70 3a 31 39 32 2e 31 36 38 2e 31 32 32 2e 31  ip:192.168.122.1
      31 30 40 69 70 3a 31 39 32 2e 31 36 38 2e 31 32  10@ip:192.168.12
    backtrace:
      [<0000000054413611>] kstrdup+0x2e/0x60
      [<0000000078e3120a>] kobject_set_name_vargs+0x2f/0xb0
      [<00000000ca2be3ee>] kobject_init_and_add+0xb0/0x120
      [<0000000062ba5e78>] rtrs_srv_create_sess_files+0x14c/0x314 [rtrs_server]
      [<00000000b45b7217>] rtrs_srv_info_req_done+0x5b1/0x800 [rtrs_server]
      [<000000008fc5aa8f>] __ib_process_cq+0x94/0x100 [ib_core]
      [<00000000a9599cb4>] ib_cq_poll_work+0x32/0xc0 [ib_core]
      [<00000000cfc376be>] process_one_work+0x4bc/0x980
      [<0000000016e5c96a>] worker_thread+0x78/0x5c0
      [<00000000c20b8be0>] kthread+0x191/0x1e0
      [<000000006c9c0003>] ret_from_fork+0x3a/0x50

It is caused by the not-freed kobject of rtrs_srv_sess.  The kobject
embedded in rtrs_srv_sess has ref-counter 2 after calling
process_info_req(). Therefore it must call kobject_put twice.  Currently
it calls kobject_put only once at rtrs_srv_destroy_sess_files because
kobject_del removes the state_in_sysfs flag and then kobject_put in
free_sess() is not called.

This patch moves kobject_del() into free_sess() so that the kobject of
rtrs_srv_sess can be freed. And also this patch adds the missing call of
sysfs_remove_group() to clean-up the sysfs directory.

Fixes: 9cb83748 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20210212134525.103456-4-jinpu.wang@cloud.ionos.comSigned-off-by: NGioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: NJack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

f7452a7e

RDMA/rtrs: Only allow addition of path to an already established session · 03e9b33a

由 Md Haris Iqbal 提交于 2月 12, 2021

While adding a path from the client side to an already established
session, it was possible to provide the destination IP to a different
server. This is dangerous.

This commit adds an extra member to the rtrs_msg_conn_req structure, named
first_conn; which is supposed to notify if the connection request is the
first for that session or not.

On the server side, if a session does not exist but the first_conn
received inside the rtrs_msg_conn_req structure is 1, the connection
request is failed. This signifies that the connection request is for an
already existing session, and since the server did not find one, it is an
wrong connection request.

Fixes: 6a98d71d ("RDMA/rtrs: client: main functionality")
Fixes: 9cb83748 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20210212134525.103456-3-jinpu.wang@cloud.ionos.comSigned-off-by: NMd Haris Iqbal <haris.iqbal@cloud.ionos.com>
Reviewed-by: NLutz Pogrell <lutz.pogrell@cloud.ionos.com>
Signed-off-by: NJack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

03e9b33a

RDMA/rtrs-srv: Fix stack-out-of-bounds · e6daa8f6

由 Jack Wang 提交于 2月 12, 2021

  BUG: KASAN: stack-out-of-bounds in _mlx4_ib_post_send+0x1bd2/0x2770 [mlx4_ib]
  Read of size 4 at addr ffff8880d5a7f980 by task kworker/0:1H/565

  CPU: 0 PID: 565 Comm: kworker/0:1H Tainted: G           O      5.4.84-storage #5.4.84-1+feature+linux+5.4.y+dbg+20201216.1319+b6b887b~deb10
  Hardware name: Supermicro H8QG6/H8QG6, BIOS 3.00       09/04/2012
  Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
  Call Trace:
   dump_stack+0x96/0xe0
   print_address_description.constprop.4+0x1f/0x300
   ? irq_work_claim+0x2e/0x50
   __kasan_report.cold.8+0x78/0x92
   ? _mlx4_ib_post_send+0x1bd2/0x2770 [mlx4_ib]
   kasan_report+0x10/0x20
   _mlx4_ib_post_send+0x1bd2/0x2770 [mlx4_ib]
   ? check_chain_key+0x1d7/0x2e0
   ? _mlx4_ib_post_recv+0x630/0x630 [mlx4_ib]
   ? lockdep_hardirqs_on+0x1a8/0x290
   ? stack_depot_save+0x218/0x56e
   ? do_profile_hits.isra.6.cold.13+0x1d/0x1d
   ? check_chain_key+0x1d7/0x2e0
   ? save_stack+0x4d/0x80
   ? save_stack+0x19/0x80
   ? __kasan_slab_free+0x125/0x170
   ? kfree+0xe7/0x3b0
   rdma_write_sg+0x5b0/0x950 [rtrs_server]

The problem is when we send imm_wr, the type should be ib_rdma_wr, so hw
driver like mlx4 can do rdma_wr(wr), so fix it by use the ib_rdma_wr as
type for imm_wr.

Fixes: 9cb83748 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20210212134525.103456-2-jinpu.wang@cloud.ionos.comSigned-off-by: NJack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: NGioh Kim <gi-oh.kim@cloud.ionos.com>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

e6daa8f6

RDMA/rxe: Remove unused pkt->offset · bf139b58

由 Bob Pearson 提交于 2月 11, 2021

The pkt->offset field is never used except to assign it to 0. But it adds
lots of unneeded code. This patch removes the field and related code. This
causes a measurable improvement in performance.

Link: https://lore.kernel.org/r/20210211210455.3274-1-rpearson@hpe.comSigned-off-by: NBob Pearson <rpearson@hpe.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

bf139b58

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功