提交 · 391c6bd5ac80094a5a8984d7ca20df7e3ec5b837 · openeuler / Kernel

22 4月, 2021 5 次提交

RDMA/nldev: Return SRQ information · 391c6bd5

由 Neta Ostrovsky 提交于 4月 18, 2021

Extend the RDMA nldev return a SRQ information, like SRQ number, SRQ type,
PD number, CQ number and process ID that created that SRQ.

Sample output:

$ rdma res show srq
dev ibp8s0f0 srqn 0 type BASIC pdn 3 comm [ib_ipoib]
dev ibp8s0f0 srqn 4 type BASIC pdn 9 pid 3581 comm ibv_srq_pingpon
dev ibp8s0f0 srqn 5 type BASIC pdn 10 pid 3584 comm ibv_srq_pingpon
dev ibp8s0f0 srqn 6 type BASIC pdn 11 pid 3590 comm ibv_srq_pingpon
dev ibp8s0f1 srqn 0 type BASIC pdn 3 comm [ib_ipoib]
dev ibp8s0f1 srqn 1 type BASIC pdn 4 pid 3586 comm ibv_srq_pingpon

Link: https://lore.kernel.org/r/322f9210b95812799190dd4a0fb92f3a3bba0333.1618753110.git.leonro@nvidia.comSigned-off-by: NNeta Ostrovsky <netao@nvidia.com>
Reviewed-by: NMark Zhang <markzhang@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

391c6bd5

RDMA/restrack: Add support to get resource tracking for SRQ · 48f8a70e

由 Neta Ostrovsky 提交于 4月 18, 2021

In order to track SRQ resources, a new restrack object is initialized and
added to the resource tracking database.

Link: https://lore.kernel.org/r/0db71c409f24f2f6b019bf8797a8fed96fe7079c.1618753110.git.leonro@nvidia.comSigned-off-by: NNeta Ostrovsky <netao@nvidia.com>
Reviewed-by: NMark Zhang <markzhang@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

48f8a70e

RDMA/nldev: Return context information · 12ce208f

由 Neta Ostrovsky 提交于 4月 18, 2021

Extend the RDMA nldev return a context information, like ctx number and
process ID that created that context. This functionality is helpful to
find orphan contexts that are not closed for some reason.

Sample output:

$ rdma res show ctx
dev ibp8s0f0 ctxn 0 pid 980 comm ibv_rc_pingpong
dev ibp8s0f0 ctxn 1 pid 981 comm ibv_rc_pingpong
dev ibp8s0f0 ctxn 2 pid 992 comm ibv_rc_pingpong
dev ibp8s0f1 ctxn 0 pid 984 comm ibv_rc_pingpong
dev ibp8s0f1 ctxn 1 pid 987 comm ibv_rc_pingpong

$ rdma res show ctx dev ibp8s0f1
dev ibp8s0f1 ctxn 0 pid 984 comm ibv_rc_pingpong
dev ibp8s0f1 ctxn 1 pid 987 comm ibv_rc_pingpong

Link: https://lore.kernel.org/r/5c956acfeac4e9d532988575f3da7d64cb449374.1618753110.git.leonro@nvidia.comSigned-off-by: NNeta Ostrovsky <netao@nvidia.com>
Reviewed-by: NMark Zhang <markzhang@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

12ce208f

RDMA/core: Add CM to restrack after successful attachment to a device · cb5cd0ea

由 Shay Drory 提交于 4月 18, 2021

The device attach triggers addition of CM_ID to the restrack DB.
However, when error occurs, we releasing this device, but defer CM_ID
release. This causes to the situation where restrack sees CM_ID that
is not valid anymore.

As a solution, add the CM_ID to the resource tracking DB only after the
attachment is finished.

Found by syzcaller:
infiniband syz0: added syz_tun
rdma_rxe: ignoring netdev event = 10 for syz_tun
infiniband syz0: set down
infiniband syz0: ib_query_port failed (-19)
restrack: ------------[ cut here    ]------------
infiniband syz0: BUG: RESTRACK detected leak of resources
restrack: User CM_ID object allocated by syz-executor716 is not freed
restrack: ------------[ cut here    ]------------

Fixes: b09c4d70 ("RDMA/restrack: Improve readability in task name management")
Link: https://lore.kernel.org/r/ab93e56ba831eac65c322b3256796fa1589ec0bb.1618753862.git.leonro@nvidia.comSigned-off-by: NShay Drory <shayd@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

cb5cd0ea

RDMA/cma: Skip device which doesn't support CM · 4d51c3d9

由 Parav Pandit 提交于 4月 18, 2021

A switchdev RDMA device do not support IB CM. When such device is added to
the RDMA CM's device list, when application invokes rdma_listen(), cma
attempts to listen to such device, however it has IB CM attribute
disabled.

Due to this, rdma_listen() call fails to listen for other non switchdev
devices as well.

A below error message can be seen.

infiniband mlx5_0: RDMA CMA: cma_listen_on_dev, error -38

A failing call flow is below.

  cma_listen_on_all()
    cma_listen_on_dev()
      _cma_attach_to_dev()
        rdma_listen() <- fails on a specific switchdev device

This is because rdma_listen() is hardwired to only work with iwarp or IB
CM compatible devices.

Hence, when a IB device doesn't support IB CM or IW CM, avoid adding such
device to the cma list so rdma_listen() can't even be called.

Link: https://lore.kernel.org/r/f9cac00d52864ea7c61295e43fb64cf4db4fdae6.1618753862.git.leonro@nvidia.comSigned-off-by: NParav Pandit <parav@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

4d51c3d9

19 4月, 2021 1 次提交

RDMA/core: Unify RoCE check and re-factor code · 65d4801a

由 Håkon Bugge 提交于 4月 06, 2021

In cm_req_handler(), unify the check for RoCE and re-factor to avoid
one test.

Link: https://lore.kernel.org/r/1617705423-15570-1-git-send-email-haakon.bugge@oracle.comSuggested-by: NJason Gunthorpe <jgg@nvidia.com>
Fixes: 8f974860 ("IB/cm: Reduce dependency on gid attribute ndev check")
Fixes: 194f64a3 ("RDMA/core: Fix corrupted SL on passive side")
Signed-off-by: NHåkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

65d4801a

13 4月, 2021 7 次提交

IB/cma: Introduce rdma_set_min_rnr_timer() · 3aeffc46

由 Håkon Bugge 提交于 3月 31, 2021

Introduce the ability for kernel ULPs to adjust the minimum RNR Retry
timer. The INIT -> RTR transition executed by RDMA CM will be used for
this adjustment. This avoids an additional ib_modify_qp() call.

rdma_set_min_rnr_timer() must be called before the call to rdma_connect()
on the active side and before the call to rdma_accept() on the passive
side.

The default value of RNR Retry timer is zero, which translates to 655
ms. When the receiver is not ready to accept a send messages, it encodes
the RNR Retry timer value in the NAK. The requestor will then wait at
least the specified time value before retrying the send.

The 5-bit value to be supplied to the rdma_set_min_rnr_timer() is
documented in IBTA Table 45: "Encoding for RNR NAK Timer Field".

Link: https://lore.kernel.org/r/1617216194-12890-2-git-send-email-haakon.bugge@oracle.comSigned-off-by: NHåkon Bugge <haakon.bugge@oracle.com>
Acked-by: NJason Gunthorpe <jgg@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

3aeffc46

RDMA/core: Correct format of block comments · 26caea5f

由 Wenpeng Liang 提交于 4月 07, 2021

Block comments should not use a trailing */ on a separate line and every
line of a block comment should start with an '*'.

Link: https://lore.kernel.org/r/1617783353-48249-7-git-send-email-liweihang@huawei.comSigned-off-by: NWenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: NWeihang Li <liweihang@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

26caea5f

RDMA/core: Correct format of braces · b6eb7011

由 Wenpeng Liang 提交于 4月 07, 2021

Do following cleanups about braces:

- Add the necessary braces to maintain context alignment.
- Fix the open '{' that is not on the same line as "switch".
- Remove braces that are not necessary for single statement blocks.
- Fix "else" that doesn't follow close brace '}'.

Link: https://lore.kernel.org/r/1617783353-48249-6-git-send-email-liweihang@huawei.comSigned-off-by: NWenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: NWeihang Li <liweihang@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

b6eb7011

RDMA/core: Remove redundant spaces · f681967a

由 Wenpeng Liang 提交于 4月 07, 2021

Space is not required after '(', before ')', before ',' and between '*'
and symbol name of a definition.

Link: https://lore.kernel.org/r/1617783353-48249-5-git-send-email-liweihang@huawei.comSigned-off-by: NWenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: NWeihang Li <liweihang@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

f681967a

RDMA/core: Add necessary spaces · 9516b8f9

由 Wenpeng Liang 提交于 4月 07, 2021

Space is required before '(' of switch statements and around '='.

Link: https://lore.kernel.org/r/1617783353-48249-4-git-send-email-liweihang@huawei.comSigned-off-by: NWenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: NWeihang Li <liweihang@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

9516b8f9

RDMA/core: Remove the redundant return statements · 9279c35b

由 Wenpeng Liang 提交于 4月 07, 2021

The return statements at the end of a void function is meaningless.

Link: https://lore.kernel.org/r/1617783353-48249-3-git-send-email-liweihang@huawei.comSigned-off-by: NWenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: NWeihang Li <liweihang@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

9279c35b

RDMA/core: Print the function name by __func__ instead of an fixed string · ab27f45f

由 Wenpeng Liang 提交于 4月 07, 2021

It's better to use __func__ than a fixed string to print a function's
name.

Link: https://lore.kernel.org/r/1617783353-48249-2-git-send-email-liweihang@huawei.comSigned-off-by: NWenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: NWeihang Li <liweihang@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

ab27f45f

08 4月, 2021 1 次提交

RDMA/core: Make the wc status prompt message clearer · 7d8f3465

由 Yixian Liu 提交于 4月 06, 2021

Local invalidate is also a kind of memory management operation, not only
memory bind operation. Furthermore, as invalidate operations include local
and remote, add prefix to the prompt message to make it clearer.

Link: https://lore.kernel.org/r/1617698772-13871-1-git-send-email-liweihang@huawei.comSigned-off-by: NYixian Liu <liuyixian@huawei.com>
Signed-off-by: NWeihang Li <liweihang@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

7d8f3465

02 4月, 2021 1 次提交

RDMA/core: Fix corrupted SL on passive side · 194f64a3

由 Håkon Bugge 提交于 3月 22, 2021

On RoCE systems, a CM REQ contains a Primary Hop Limit > 1 and Primary
Subnet Local is zero.

In cm_req_handler(), the cm_process_routed_req() function is called. Since
the Primary Subnet Local value is zero in the request, and since this is
RoCE (Primary Local LID is permissive), the following statement will be
executed:

IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl);

This corrupts SL in req_msg if it was different from zero. In other words,
a request to setup a connection using an SL != zero, will not be honored,
and a connection using SL zero will be created instead.

Fixed by not calling cm_process_routed_req() on RoCE systems, the
cm_process_route_req() is only for IB anyhow.

Fixes: 3971c9f6 ("IB/cm: Add interim support for routed paths")
Link: https://lore.kernel.org/r/1616420132-31005-1-git-send-email-haakon.bugge@oracle.comSigned-off-by: NHåkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

194f64a3

26 3月, 2021 3 次提交

RDMA/core: Correct misspellings of two words in comments · 016b26af

由 Yangyang Li 提交于 3月 19, 2021

Correct the following spelling errors:
1. shold -> should
2. uncontext -> ucontext

Link: https://lore.kernel.org/r/1616147749-49106-1-git-send-email-liweihang@huawei.comSigned-off-by: NYangyang Li <liyangyang20@huawei.com>
Signed-off-by: NWeihang Li <liweihang@huawei.com>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

016b26af

RDMA/uverbs: Refactor rdma_counter_set_auto_mode and __counter_set_mode · 49695e95

由 Patrisious Haddad 提交于 3月 18, 2021

Success is returned in the following flows:
 * New mode is the same as the current one.
 * Switched to new mode and there are no bound counters yet.

Link: https://lore.kernel.org/r/20210318110502.673676-1-leon@kernel.orgSigned-off-by: NPatrisious Haddad <phaddad@nvidia.com>
Reviewed-by: NMark Zhang <markzhang@nvidia.com>
Reviewed-by: NMaor Gottlieb <maorg@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

49695e95

RDMA: Support more than 255 rdma ports · 1fb7f897

由 Mark Bloch 提交于 3月 01, 2021

Current code uses many different types when dealing with a port of a RDMA
device: u8, unsigned int and u32. Switch to u32 to clean up the logic.

This allows us to make (at least) the core view consistent and use the
same type. Unfortunately not all places can be converted. Many uverbs
functions expect port to be u8 so keep those places in order not to break
UAPIs. HW/Spec defined values must also not be changed.

With the switch to u32 we now can support devices with more than 255
ports. U32_MAX is reserved to make control logic a bit easier to deal
with. As a device with U32_MAX ports probably isn't going to happen any
time soon this seems like a non issue.

When a device with more than 255 ports is created uverbs will report the
RDMA device as having 255 ports as this is the max currently supported.

The verbs interface is not changed yet because the IBTA spec limits the
port size in too many places to be u8 and all applications that relies in
verbs won't be able to cope with this change. At this stage, we are
extending the interfaces that are using vendor channel solely

Once the limitation is lifted mlx5 in switchdev mode will be able to have
thousands of SFs created by the device. As the only instance of an RDMA
device that reports more than 255 ports will be a representor device and
it exposes itself as a RAW Ethernet only device CM/MAD/IPoIB and other
ULPs aren't effected by this change and their sysfs/interfaces that are
exposes to userspace can remain unchanged.

While here cleanup some alignment issues and remove unneeded sanity
checks (mainly in rdmavt),

Link: https://lore.kernel.org/r/20210301070420.439400-1-leon@kernel.orgSigned-off-by: NMark Bloch <mbloch@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

1fb7f897

22 3月, 2021 1 次提交

RDMA/cma: Remove unused leftovers in cma code · 87115951

由 Gal Pressman 提交于 3月 14, 2021

Commit ee1c60b1 ("IB/SA: Modify SA to implicitly cache Class Port
info") removed the class_port_info_context struct usage, remove a couple
of leftovers.

Link: https://lore.kernel.org/r/20210314143427.76101-1-galpress@amazon.comSigned-off-by: NGal Pressman <galpress@amazon.com>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

87115951

12 3月, 2021 4 次提交

IB/core: Split uverbs_get_const/default to consider target type · 2904bb37

由 Yishai Hadas 提交于 3月 04, 2021

Change uverbs_get_const/uverbs_get_const_default to work properly with
both signed/unsigned parameters.

Current APIs mix s64 and u64 which leads to incorrect check when u64
value was supplied and its upper bit was set. In that case
uverbs_get_const() / uverbs_get_const_default() lower bound check may
fail unexpectedly, target is unsigned (lower bound is 0) but value
became negative as of the s64 usage.

Split to have two different APIs, no change to callers as the required
API will be called internally according to the target type.

Link: https://lore.kernel.org/r/20210304130501.1102577-3-leon@kernel.orgSigned-off-by: NYishai Hadas <yishaih@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

2904bb37

IB/core: Drop WARN_ON() from ib_umem_find_best_pgsz() · 3f32dc0f

由 Yishai Hadas 提交于 3月 04, 2021

The WARN_ON() issued as part of ib_umem_find_best_pgsz() blocked cases
when only page sizes larger than PAGE_SIZE were set, drop it to enable
those cases.

In addition, there is no need to have a specific check for zero
pgsz_bitmap, the function will do its job and return 0 at the end if
nothing match will be found.

Link: https://lore.kernel.org/r/20210304130501.1102577-2-leon@kernel.orgSigned-off-by: NYishai Hadas <yishaih@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

3f32dc0f

RDMA/mlx5: Consolidate MR destruction to mlx5_ib_dereg_mr() · e6fb246c

由 Jason Gunthorpe 提交于 3月 04, 2021

Now that the SRCU stuff has been removed the entire MR destroy logic can
be made a lot simpler. Currently there are many different ways to destroy a
MR and it makes it really hard to do this task correctly. Route all
destruction through mlx5_ib_dereg_mr() and make it work for all
situations.

Since it turns out all the different MR types do basically the same thing
this removes a lot of knowledge of MR internals from ODP and leaves ODP
just exporting an operation to clean up children.

This fixes a few weird corner cases bugs and firmly uses the correct
ordering of the MR destruction:
 - Stop parallel access to the mkey via the ODP xarray
 - Stop DMA
 - Release the umem
 - Clean up ODP children
 - Free/Recycle the MR

Link: https://lore.kernel.org/r/20210304120745.1090751-4-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

e6fb246c

RDMA/core: Remove unused req_ncomp_notif device operation · f675ba12

由 Gal Pressman 提交于 3月 11, 2021

The request_ncomp_notif device operation and function are unused, remove
them.

Link: https://lore.kernel.org/r/20210311150921.23726-1-galpress@amazon.comSigned-off-by: NGal Pressman <galpress@amazon.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

f675ba12

11 3月, 2021 1 次提交

RDMA/iwcm: Allow AFONLY binding for IPv6 addresses · e35ecb46

由 Bernard Metzler 提交于 2月 19, 2021

Binding IPv6 address/port to AF_INET6 domain only is provided via
rdma_set_afonly(), but was not signalled to the provider. Applications
like NFS/RDMA bind the same port to both IPv4 and IPv6 addresses
simultaneously and thus rely on it working correctly.

Link: https://lore.kernel.org/r/20210219143441.1068-1-bmt@zurich.ibm.comTested-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NBenjamin Coddington <bcodding@redhat.com>
Signed-off-by: NBernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

e35ecb46

04 3月, 2021 1 次提交

RDMA/uverbs: Fix kernel-doc warning of _uverbs_alloc · cca7f12b

由 Leon Romanovsky 提交于 3月 02, 2021

Fix the following W=1 compilation warning:

drivers/infiniband/core/uverbs_ioctl.c:108: warning: expecting prototype for uverbs_alloc(). Prototype was for _uverbs_alloc() instead

Fixes: 461bb2ee ("IB/uverbs: Add a simple allocator to uverbs_attr_bundle")
Link: https://lore.kernel.org/r/20210302074214.1054299-3-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

cca7f12b

02 3月, 2021 1 次提交

RDMA/cm: Fix IRQ restore in ib_send_cm_sidr_rep · 221384df

由 Saeed Mahameed 提交于 3月 01, 2021

ib_send_cm_sidr_rep() {
	spin_lock_irqsave()
        cm_send_sidr_rep_locked() {
                ...
        	spin_lock_irq()
                ....
                spin_unlock_irq() <--- this will enable interrupts
        }
        spin_unlock_irqrestore()
}

spin_unlock_irqrestore() expects interrupts to be disabled but the
internal spin_unlock_irq() will always enable hard interrupts.

Fix this by replacing the internal spin_{lock,unlock}_irq() with
irqsave/restore variants.

It fixes the following kernel trace:

 raw_local_irq_restore() called with IRQs enabled
 WARNING: CPU: 2 PID: 20001 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x1d/0x20

 Call Trace:
  _raw_spin_unlock_irqrestore+0x4e/0x50
  ib_send_cm_sidr_rep+0x3a/0x50 [ib_cm]
  cma_send_sidr_rep+0xa1/0x160 [rdma_cm]
  rdma_accept+0x25e/0x350 [rdma_cm]
  ucma_accept+0x132/0x1cc [rdma_ucm]
  ucma_write+0xbf/0x140 [rdma_ucm]
  vfs_write+0xc1/0x340
  ksys_write+0xb3/0xe0
  do_syscall_64+0x2d/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: 87c4c774 ("RDMA/cm: Protect access to remote_sidr_table")
Link: https://lore.kernel.org/r/20210301081844.445823-1-leon@kernel.orgSigned-off-by: NSaeed Mahameed <saeedm@nvidia.com>
Reviewed-by: NJason Gunthorpe <jgg@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

221384df

17 2月, 2021 3 次提交

RDMA/ucma: Fix use-after-free bug in ucma_create_uevent · fe454dc3

由 Avihai Horon 提交于 2月 11, 2021

ucma_process_join() allocates struct ucma_multicast mc and frees it if an
error occurs during its run.  Specifically, if an error occurs in
copy_to_user(), a use-after-free might happen in the following scenario:

1. mc struct is allocated.
2. rdma_join_multicast() is called and succeeds. During its run,
   cma_iboe_join_multicast() enqueues a work that will later use the
   aforementioned mc struct.
3. copy_to_user() is called and fails.
4. mc struct is deallocated.
5. The work that was enqueued by cma_iboe_join_multicast() is run and
   calls ucma_create_uevent() which tries to access mc struct (which is
   freed by now).

Fix this bug by cancelling the work enqueued by cma_iboe_join_multicast().
Since cma_work_handler() frees struct cma_work, we don't use it in
cma_iboe_join_multicast() so we can safely cancel the work later.

The following syzkaller report revealed it:

   BUG: KASAN: use-after-free in ucma_create_uevent+0x2dd/0x;3f0 drivers/infiniband/core/ucma.c:272
   Read of size 8 at addr ffff88810b3ad110 by task kworker/u8:1/108

   CPU: 1 PID: 108 Comm: kworker/u8:1 Not tainted 5.10.0-rc6+ #257
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS   rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
   Workqueue: rdma_cm cma_work_handler
   Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0xbe/0xf9 lib/dump_stack.c:118
    print_address_description.constprop.0+0x3e/0×60 mm/kasan/report.c:385
    __kasan_report mm/kasan/report.c:545 [inline]
    kasan_report.cold+0x1f/0×37 mm/kasan/report.c:562
    ucma_create_uevent+0x2dd/0×3f0 drivers/infiniband/core/ucma.c:272
    ucma_event_handler+0xb7/0×3c0 drivers/infiniband/core/ucma.c:349
    cma_cm_event_handler+0x5d/0×1c0 drivers/infiniband/core/cma.c:1977
    cma_work_handler+0xfa/0×190 drivers/infiniband/core/cma.c:2718
    process_one_work+0x54c/0×930 kernel/workqueue.c:2272
    worker_thread+0x82/0×830 kernel/workqueue.c:2418
    kthread+0x1ca/0×220 kernel/kthread.c:292
    ret_from_fork+0x1f/0×30 arch/x86/entry/entry_64.S:296

   Allocated by task 359:
     kasan_save_stack+0x1b/0×40 mm/kasan/common.c:48
     kasan_set_track mm/kasan/common.c:56 [inline]
     __kasan_kmalloc mm/kasan/common.c:461 [inline]
     __kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:434
     kmalloc include/linux/slab.h:552 [inline]
     kzalloc include/linux/slab.h:664 [inline]
     ucma_process_join+0x16e/0×3f0 drivers/infiniband/core/ucma.c:1453
     ucma_join_multicast+0xda/0×140 drivers/infiniband/core/ucma.c:1538
     ucma_write+0x1f7/0×280 drivers/infiniband/core/ucma.c:1724
     vfs_write fs/read_write.c:603 [inline]
     vfs_write+0x191/0×4c0 fs/read_write.c:585
     ksys_write+0x1a1/0×1e0 fs/read_write.c:658
     do_syscall_64+0x2d/0×40 arch/x86/entry/common.c:46
     entry_SYSCALL_64_after_hwframe+0x44/0xa9

   Freed by task 359:
     kasan_save_stack+0x1b/0×40 mm/kasan/common.c:48
     kasan_set_track+0x1c/0×30 mm/kasan/common.c:56
     kasan_set_free_info+0x1b/0×30 mm/kasan/generic.c:355
     __kasan_slab_free+0x112/0×160 mm/kasan/common.c:422
     slab_free_hook mm/slub.c:1544 [inline]
     slab_free_freelist_hook mm/slub.c:1577 [inline]
     slab_free mm/slub.c:3142 [inline]
     kfree+0xb3/0×3e0 mm/slub.c:4124
     ucma_process_join+0x22d/0×3f0 drivers/infiniband/core/ucma.c:1497
     ucma_join_multicast+0xda/0×140 drivers/infiniband/core/ucma.c:1538
     ucma_write+0x1f7/0×280 drivers/infiniband/core/ucma.c:1724
     vfs_write fs/read_write.c:603 [inline]
     vfs_write+0x191/0×4c0 fs/read_write.c:585
     ksys_write+0x1a1/0×1e0 fs/read_write.c:658
     do_syscall_64+0x2d/0×40 arch/x86/entry/common.c:46
     entry_SYSCALL_64_after_hwframe+0x44/0xa9
     The buggy address belongs to the object at ffff88810b3ad100
     which belongs to the cache kmalloc-192 of size 192
     The buggy address is located 16 bytes inside of
     192-byte region [ffff88810b3ad100, ffff88810b3ad1c0)

Fixes: b5de0c60 ("RDMA/cma: Fix use after free race in roce multicast join")
Link: https://lore.kernel.org/r/20210211090517.1278415-1-leon@kernel.orgReported-by: NAmit Matityahu <mitm@nvidia.com>
Signed-off-by: NAvihai Horon <avihaih@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

fe454dc3

RDMA/core: Fix kernel doc warnings for ib_port_immutable_read() · 168e4cd9

由 Leon Romanovsky 提交于 2月 10, 2021

drivers/infiniband/core/device.c:859: warning: Function parameter or member 'dev' not described in 'ib_port_immutable_read'
drivers/infiniband/core/device.c:859: warning: Function parameter or member 'port' not described in 'ib_port_immutable_read'

Fixes: 7416790e ("RDMA/core: Introduce and use API to read port immutable data")
Link: https://lore.kernel.org/r/20210210151421.1108809-1-leon@kernel.orgReported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

168e4cd9

RDMA/ipoib: Remove racy Subnet Manager sendonly join checks · 633d6102

由 Christoph Lameter 提交于 1月 28, 2021

When a system receives a REREG event from the SM, then the SM information
in the kernel is marked as invalid and a request is sent to the SM to
update the information. The SM information is invalid in that time period.

However, receiving a REREG also occurs simultaneously in user space
applications that are now trying to rejoin the multicast groups. Some of
those may be sendonly multicast groups which are then failing.

If the SM information is invalid then ib_sa_sendonly_fullmem_support()
returns false. That is wrong because it just means that we do not know yet
if the potentially new SM supports sendonly joins.

Sendonly join was introduced in 2015 and all the Subnet managers have
supported it ever since. So there is no point in checking if a subnet
manager supports it.

Should an old opensm get a request for a sendonly join then the request
will fail. The code that is removed here accomodated that situation and
fell back to a full join.

Falling back to a full join is problematic in itself. The reason to use
the sendonly join was to reduce the traffic on the Infiniband fabric
otherwise one could have just stayed with the regular join. So this patch
may cause users of very old opensms to discover that lots of traffic
needlessly crosses their IB fabrics.

Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2101281845160.13303@www.lameter.comSigned-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

633d6102

06 2月, 2021 1 次提交

RDMA/core: Introduce and use API to read port immutable data · 7416790e

由 Parav Pandit 提交于 2月 03, 2021

Currently mlx5 driver caches port GID table length for 2 ports. It is
also cached by IB core as port immutable data.

When mlx5 representor ports are present, which are usually more than 2,
invalid access to port_caps array can happen while validating the GID
table length which is only for 2 ports.

To avoid this, take help of the IB cores port immutable data by exposing
an API to read the port immutable fields.

Remove mlx5 driver's internal cache, thereby reduce code and data.

Link: https://lore.kernel.org/r/20210203130133.4057329-5-leon@kernel.orgSigned-off-by: NParav Pandit <parav@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

7416790e

03 2月, 2021 2 次提交

IB/core: Use valid port number to check link layer · 904f4f64

由 Parav Pandit 提交于 1月 27, 2021

IB HCA port starts from 1. Use IB core provided port iterator API to avoid
any assumption with start port number.

Link: https://lore.kernel.org/r/20210127150010.1876121-11-leon@kernel.orgSigned-off-by: NParav Pandit <parav@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

904f4f64

IB/cm: Avoid a loop when device has 255 ports · 131be267

由 Parav Pandit 提交于 1月 27, 2021

When RDMA device has 255 ports, loop iterator i overflows. Due to which
cm_add_one() port iterator loops infinitely. Use core provided port
iterator to avoid the infinite loop.

Fixes: a977049d ("[PATCH] IB: Add the kernel CM implementation")
Link: https://lore.kernel.org/r/20210127150010.1876121-9-leon@kernel.orgSigned-off-by: NMark Bloch <mbloch@nvidia.com>
Signed-off-by: NParav Pandit <parav@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

131be267

29 1月, 2021 2 次提交

IB/umad: Return EPOLLERR in case of when device disassociated · def4cd43

由 Shay Drory 提交于 1月 25, 2021

Currently, polling a umad device will always works, even if the device was
disassociated. A disassociated device should immediately return EPOLLERR
from poll(). Otherwise userspace is endlessly hung on poll() with no idea
that the device has been removed from the system.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Link: https://lore.kernel.org/r/20210125121339.837518-3-leon@kernel.orgSigned-off-by: NShay Drory <shayd@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

def4cd43

IB/umad: Return EIO in case of when device disassociated · 4fc54618

由 Shay Drory 提交于 1月 25, 2021

MAD message received by the user has EINVAL error in all flows
including when the device is disassociated. That makes it impossible
for the applications to treat such flow differently.

Change it to return EIO, so the applications will be able to perform
disassociation recovery.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Link: https://lore.kernel.org/r/20210125121339.837518-2-leon@kernel.orgSigned-off-by: NShay Drory <shayd@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

4fc54618

21 1月, 2021 4 次提交

RDMA/uverbs: Don't set rcq for a QP if qp_type is IB_QPT_XRC_INI · efeb973f

由 Xiao Yang 提交于 12月 16, 2020

An INI QP doesn't require receive CQ, the creation flow sets the recv counts
to zero:

		if (cmd->qp_type == IB_QPT_XRC_INI) {
			cmd->max_recv_wr = 0;
			cmd->max_recv_sge = 0;

The new IOCTL path also does not set the rcq, so make things the same.

Link: https://lore.kernel.org/r/20201216071755.149449-1-yangx.jy@cn.fujitsu.comSigned-off-by: NXiao Yang <yangx.jy@cn.fujitsu.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

efeb973f

RDMA/uverbs: Add uverbs command for dma-buf based MR registration · bfe0cc6e

由 Jianxin Xiong 提交于 12月 15, 2020

Implement a new uverbs ioctl method for memory registration with file
descriptor as an extra parameter.

Link: https://lore.kernel.org/r/1608067636-98073-4-git-send-email-jianxin.xiong@intel.comSigned-off-by: NJianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: NSean Hefty <sean.hefty@intel.com>
Acked-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Acked-by: NChristian Koenig <christian.koenig@amd.com>
Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

bfe0cc6e

RDMA/core: Add device method for registering dma-buf based memory region · 3bc489e8

由 Jianxin Xiong 提交于 12月 15, 2020

Dma-buf based memory region requires one extra parameter and is processed
quite differently. Adding a separate method allows clean separation from
regular memory regions.

Link: https://lore.kernel.org/r/1608067636-98073-3-git-send-email-jianxin.xiong@intel.comSigned-off-by: NJianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: NSean Hefty <sean.hefty@intel.com>
Acked-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Acked-by: NChristian Koenig <christian.koenig@amd.com>
Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

3bc489e8

RDMA/umem: Support importing dma-buf as user memory region · 368c0159

由 Jianxin Xiong 提交于 12月 15, 2020

Dma-buf is a standard cross-driver buffer sharing mechanism that can be
used to support peer-to-peer access from RDMA devices.

Device memory exported via dma-buf is associated with a file descriptor.
This is passed to the user space as a property associated with the buffer
allocation. When the buffer is registered as a memory region, the file
descriptor is passed to the RDMA driver along with other parameters.

Implement the common code for importing dma-buf object and mapping dma-buf
pages.

Link: https://lore.kernel.org/r/1608067636-98073-2-git-send-email-jianxin.xiong@intel.comSigned-off-by: NJianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: NSean Hefty <sean.hefty@intel.com>
Acked-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Acked-by: NChristian Koenig <christian.koenig@amd.com>
Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

368c0159

20 1月, 2021 2 次提交

RDMA/core/iwpm_msg: Add proper descriptions for 'skb' param · abfa4565

由 Lee Jones 提交于 1月 18, 2021

Fixes the following W=1 kernel build warning(s):

drivers/infiniband/core/iwpm_msg.c:402: warning: Function parameter or member 'skb' not described in 'iwpm_register_pid_cb'
drivers/infiniband/core/iwpm_msg.c:475: warning: Function parameter or member 'skb' not described in 'iwpm_add_mapping_cb'
drivers/infiniband/core/iwpm_msg.c:553: warning: Function parameter or member 'skb' not described in 'iwpm_add_and_query_mapping_cb'
drivers/infiniband/core/iwpm_msg.c:636: warning: Function parameter or member 'skb' not described in 'iwpm_remote_info_cb'
drivers/infiniband/core/iwpm_msg.c:716: warning: Function parameter or member 'skb' not described in 'iwpm_mapping_info_cb'
drivers/infiniband/core/iwpm_msg.c:773: warning: Function parameter or member 'skb' not described in 'iwpm_ack_mapping_info_cb'
drivers/infiniband/core/iwpm_msg.c:803: warning: Function parameter or member 'skb' not described in 'iwpm_mapping_error_cb'
drivers/infiniband/core/iwpm_msg.c:851: warning: Function parameter or member 'skb' not described in 'iwpm_hello_cb'

Link: https://lore.kernel.org/r/20210118223929.512175-21-lee.jones@linaro.org
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: NLee Jones <lee.jones@linaro.org>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

abfa4565

RDMA/core/iwpm_util: Fix some param description misspellings · db038e70

由 Lee Jones 提交于 1月 18, 2021

Fixes the following W=1 kernel build warning(s):

drivers/infiniband/core/iwpm_util.c:138: warning: Function parameter or member 'local_sockaddr' not described in 'iwpm_create_mapinfo'
drivers/infiniband/core/iwpm_util.c:138: warning: Function parameter or member 'mapped_sockaddr' not described in 'iwpm_create_mapinfo'
drivers/infiniband/core/iwpm_util.c:138: warning: Excess function parameter 'local_addr' description in 'iwpm_create_mapinfo'
drivers/infiniband/core/iwpm_util.c:138: warning: Excess function parameter 'mapped_addr' description in 'iwpm_create_mapinfo'
drivers/infiniband/core/iwpm_util.c:185: warning: Function parameter or member 'local_sockaddr' not described in 'iwpm_remove_mapinfo'
drivers/infiniband/core/iwpm_util.c:185: warning: Excess function parameter 'local_addr' description in 'iwpm_remove_mapinfo'

Link: https://lore.kernel.org/r/20210118223929.512175-20-lee.jones@linaro.org
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: NLee Jones <lee.jones@linaro.org>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

db038e70

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功