提交 · ea660ad7c1c476fd6e5e3b17780d47159db71dea · openeuler / Kernel

28 1月, 2020 1 次提交

IB/mlx4: Fix leak in id_map_find_del · ea660ad7

由 Håkon Bugge 提交于 1月 23, 2020

Using CX-3 virtual functions, either from a bare-metal machine or
pass-through from a VM, MAD packets are proxied through the PF driver.

Since the VF drivers have separate name spaces for MAD Transaction Ids
(TIDs), the PF driver has to re-map the TIDs and keep the book keeping in
a cache.

Following the RDMA Connection Manager (CM) protocol, it is clear when an
entry has to evicted from the cache. When a DREP is sent from
mlx4_ib_multiplex_cm_handler(), id_map_find_del() is called. Similar when
a REJ is received by the mlx4_ib_demux_cm_handler(), id_map_find_del() is
called.

This function wipes out the TID in use from the IDR or XArray and removes
the id_map_entry from the table.

In short, it does everything except the topping of the cake, which is to
remove the entry from the list and free it. In other words, for the REJ
case enumerated above, one id_map_entry will be leaked.

For the other case above, a DREQ has been received first. The reception of
the DREQ will trigger queuing of a delayed work to delete the
id_map_entry, for the case where the VM doesn't send back a DREP.

In the normal case, the VM _will_ send back a DREP, and id_map_find_del()
will be called.

But this scenario introduces a secondary leak. First, when the DREQ is
received, a delayed work is queued. The VM will then return a DREP, which
will call id_map_find_del(). As stated above, this will free the TID used
from the XArray or IDR. Now, there is window where that particular TID can
be re-allocated, lets say by an outgoing REQ. This TID will later be wiped
out by the delayed work, when the function id_map_ent_timeout() is
called. But the id_map_entry allocated by the outgoing REQ will not be
de-allocated, and we have a leak.

Both leaks are fixed by removing the id_map_find_del() function and only
using schedule_delayed(). Of course, a check in schedule_delayed() to see
if the work already has been queued, has been added.

Another benefit of always using the delayed version for deleting entries,
is that we do get a TimeWait effect; a TID no longer in use, will occupy
the XArray or IDR for CM_CLEANUP_CACHE_TIMEOUT time, without any ability
of being re-used for that time period.

Fixes: 3cf69cc8 ("IB/mlx4: Add CM paravirtualization")
Link: https://lore.kernel.org/r/20200123155521.1212288-1-haakon.bugge@oracle.comSigned-off-by: NHåkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: NManjunath Patil <manjunath.b.patil@oracle.com>
Reviewed-by: NRama Nichanamatlu <rama.nichanamatlu@oracle.com>
Reviewed-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

ea660ad7

26 1月, 2020 11 次提交

IB/opa_vnic: Spelling correction of 'erorr' to 'error' · 7f04c71f

由 Dillon Brock 提交于 1月 18, 2020

Correcting a minor spelling mistake in the comments.

Link: https://lore.kernel.org/r/20200118162542.15188-1-dab9861@gmail.comSigned-off-by: NDillon Brock <dab9861@gmail.com>
Acked-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

7f04c71f

IB/hfi1: Fix logical condition in msix_request_irq · 79ba4f93

由 Nathan Chancellor 提交于 1月 16, 2020

Clang warns:

drivers/infiniband/hw/hfi1/msix.c:136:22: warning: overlapping
comparisons always evaluate to false [-Wtautological-overlap-compare]
        if (type < IRQ_SDMA && type >= IRQ_OTHER)
            ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
1 warning generated.

It is impossible for something to be less than 0 (IRQ_SDMA) and greater
than or equal to 3 (IRQ_OTHER) at the same time. A logical OR should
have been used to keep the same logic as before.

Link: https://lore.kernel.org/r/20200116222658.5285-1-natechancellor@gmail.com
Link: https://github.com/ClangBuiltLinux/linux/issues/841
Fixes: 13d2a838 ("IB/hfi1: Decouple IRQ name from type")
Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
Acked-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

79ba4f93

RDMA/cm: Remove CM message structs · 13e0af18

由 Jason Gunthorpe 提交于 1月 16, 2020

All accesses now use the new IBA acessor scheme, so delete the structs
entirely and generate the structures from the schema file.

Link: https://lore.kernel.org/r/20200116170037.30109-8-jgg@ziepe.caTested-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

13e0af18

RDMA/cm: Use IBA functions for complex structure members · 4ca662a3

由 Jason Gunthorpe 提交于 1月 16, 2020

Use a Coccinelle spatch to replace CM structure members used as
structures, arrays, or pointers with IBA_GET/SET versions. Applied with

$ spatch --sp-file edits.sp --in-place drivers/infiniband/core/cm.c

The spatch file was generated using the template pattern:

@@
expression src;
expression len;
{struct} *msg;
@@
- memcpy(msg->{old_name}, src, len)
+ IBA_SET_MEM({new_name}, msg, src, len)
@@
{struct} *msg;
identifier x;
@@
- msg->{old_name}.x
+ IBA_GET_MEM_PTR({new_name}, msg)->x
@@
{struct} *msg;
@@
- &msg->{old_name}
+ IBA_GET_MEM_PTR({new_name}, msg)

For GIDs:
@@
{struct} *msg;
@@
- msg->{old_name}
+ *IBA_GET_MEM_PTR({new_name}, msg)

For non-GIDs:
@@
{struct} *msg;
@@
- msg->{old_name}
+ IBA_GET_MEM_PTR({new_name}, msg)

Iterated for every remaining IBA_CHECK_OFF()/IBA_CHECK_GET()
pairing. Touched up with clang-format after.

Link: https://lore.kernel.org/r/20200116170037.30109-7-jgg@ziepe.caTested-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

4ca662a3

RDMA/cm: Use IBA functions for simple structure members · 91b60a71

由 Jason Gunthorpe 提交于 1月 16, 2020

Use a Coccinelle spatch script to replace use of simple CM structure
members with IBA_GET/SET versions. Applied with

$ spatch --sp-file edits.sp --in-place drivers/infiniband/core/cm.c

The spatch file was generated using the template pattern:

@@
expression val;
{struct} *msg;
@@
- msg->{old_name} = val
+ IBA_SET({new_name}, msg, be{bits}_to_cpu(val))
@@
{struct} *msg;
@@
- msg->{old_name}
+ cpu_to_be{bits}(IBA_GET({new_name}, msg))

Iterated for every IBA_CHECK_OFF that isn't a CM_FIELD_MLOC.

And the below iterated over all byte sizes to remove doubled byte swaps:

@@
expression val;
@@
-be{bits}_to_cpu(cpu_to_be{bits}(val))
+val

(and __be_to_cpu and ntoh varients)

Touched up with clang-format after.

Link: https://lore.kernel.org/r/20200116170037.30109-6-jgg@ziepe.caTested-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

91b60a71

RDMA/cm: Use IBA functions for swapping get/set acessors · 01adb7f4

由 Jason Gunthorpe 提交于 1月 16, 2020

Use a Coccinelle spatch script to replace CM helper functions that
return/accept BE values with IBA_GET/SET versions. Applied with

$ spatch --sp-file edits.sp --in-place drivers/infiniband/core/cm.c

The spatch file was generated using the template pattern:

@@
expression val;
{struct} *msg;
@@
- {old_setter}(msg, val)
+ IBA_SET({new_name}, msg, be{bits}_to_cpu(val))
@@
{struct} *msg;
@@
- {old_getter}(msg)
+ cpu_to_be{bits}(IBA_GET({new_name}, msg))

Iterated for every IBA_CHECK_GET_BE()/IBA_CHECK_SET_BE() pairing.

And the below iterated over all byte sizes to remove doubled byte swaps:

@@
expression val;
@@
-be{bits}_to_cpu(cpu_to_be{bits}(val))
+val

(and __be_to_cpu and ntoh varients)

Touched up with clang-format after.

Link: https://lore.kernel.org/r/20200116170037.30109-5-jgg@ziepe.caTested-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

01adb7f4

RDMA/cm: Use IBA functions for simple get/set acessors · b6bbee68

由 Jason Gunthorpe 提交于 1月 16, 2020

Use a Coccinelle spatch to replace CM helper functions with IBA_GET/SET
versions. Applied with

$ spatch --sp-file edits.sp --in-place drivers/infiniband/core/cm.c

The spatch file was generated using the template pattern:

@@
expression val;
{struct} *msg;
@@
- {old_setter}
+ IBA_SET({new_name}, msg, val)
@@
{struct} *msg;
@@
- {old_getter}
+ IBA_GET({new_name}, msg)

Iterated for every IBA_CHECK_GET()/IBA_CHECK_GET() pairing. Touched up
with clang-format after.

Link: https://lore.kernel.org/r/20200116170037.30109-4-jgg@ziepe.caTested-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

b6bbee68

RDMA/cm: Add SET/GET implementations to hide IBA wire format · d05d4ac4

由 Leon Romanovsky 提交于 1月 16, 2020

There is no separation between RDMA-CM wire format as it is declared in
IBTA and kernel logic which implements needed support. Such situation
causes to many mistakes in conversion between big-endian (wire format)
and CPU format used by kernel. It also mixes RDMA core code with
combination of uXX and beXX variables.

The idea that all accesses to IBA definitions will go through special
GET/SET macros to ensure that no conversion mistakes are made. The
shifting and masking required to read the value is automatically deduced
using the field offset description from the tables in the IBA
specification.

This starts with the CM MADs described in IBTA release 1.3 volume 1.

To confirm that the new macros behave the same as the old accessors a
self-test is included in this patch.

Each macro replacing a straightforward struct field compile-time tests
that the new field has the same offsetof() and width as the old field.

For the fields with accessor functions a runtime test, the 'all ones'
value is placed in a dummy message and read back in several ways to
confirm that both approaches give identical results.

Later patches in this series delete the self test.

This creates a tested table of new field name, old field name(s) and some
meta information like BE coding for the functions which will be used in
the next patches.

Link: https://lore.kernel.org/r/20200116170037.30109-3-jgg@ziepe.ca
Link: https://lore.kernel.org/r/20191212093830.316934-5-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Tested-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

d05d4ac4

RDMA/cm: Add accessors for CM_REQ transport_type · 792a7c1f

由 Jason Gunthorpe 提交于 1月 16, 2020

Access the two fields through wrappers, like all other fields, to make it
clearer what is happening.

Link: https://lore.kernel.org/r/20200116170037.30109-2-jgg@ziepe.caTested-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

792a7c1f

IB/mlx5: Return the administrative GUID if exists · 4bbd4923

由 Danit Goldberg 提交于 1月 16, 2020

A user can change the operational GUID (a.k.a affective GUID) through
link/infiniband. Therefore it is preferred to return the currently set
GUID if it exists instead of the operational.

This way the PF can query which VF GUID will be set in the next bind.  In
order to align with MAC address, zero is returned if administrative GUID
is not set.

For example, before setting administrative GUID:
 $ ip link show
 ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP mode DEFAULT group default qlen 256
 link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
 vf 0     link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff,
 spoof checking off, NODE_GUID 00:00:00:00:00:00:00:00, PORT_GUID 00:00:00:00:00:00:00:00, link-state auto, trust off, query_rss off

Then:

 $ ip link set ib0 vf 0 node_guid 11:00:af:21:cb:05:11:00
 $ ip link set ib0 vf 0 port_guid 22:11:af:21:cb:05:11:00

After setting administrative GUID:
 $ ip link show
 ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP mode DEFAULT group default qlen 256
 link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
 vf 0     link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff,
 spoof checking off, NODE_GUID 11:00:af:21:cb:05:11:00, PORT_GUID 22:11:af:21:cb:05:11:00, link-state auto, trust off, query_rss off

Fixes: 9c0015ef ("IB/mlx5: Implement callbacks for getting VFs GUID attributes")
Link: https://lore.kernel.org/r/20200116120048.12744-1-leon@kernel.orgSigned-off-by: NDanit Goldberg <danitg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

4bbd4923

RDMA/core: Ensure that rdma_user_mmap_entry_remove() is a fence · 6b3712c0

由 Jason Gunthorpe 提交于 1月 15, 2020

The set of entry->driver_removed is missing locking, protect it with
xa_lock() which is held by the only reader.

Otherwise readers may continue to see driver_removed = false after
rdma_user_mmap_entry_remove() returns and may continue to try and
establish new mmaps.

Fixes: 3411f9f0 ("RDMA/core: Create mmap database and cookie helper functions")
Link: https://lore.kernel.org/r/20200115202041.GA17199@ziepe.caReviewed-by: NGal Pressman <galpress@amazon.com>
Acked-by: NMichal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

6b3712c0

21 1月, 2020 1 次提交

Merge tag 'rds-odp-for-5.5' into rdma.git for-next · e8b3a426

由 Jason Gunthorpe 提交于 1月 21, 2020

From https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma

Leon Romanovsky says:

====================
Use ODP MRs for kernel ULPs

The following series extends MR creation routines to allow creation of
user MRs through kernel ULPs as a proxy. The immediate use case is to
allow RDS to work over FS-DAX, which requires ODP (on-demand-paging)
MRs to be created and such MRs were not possible to create prior this
series.

The first part of this patchset extends RDMA to have special verb
ib_reg_user_mr(). The common use case that uses this function is a
userspace application that allocates memory for HCA access but the
responsibility to register the memory at the HCA is on an kernel ULP.
This ULP acts as an agent for the userspace application.

The second part provides advise MR functionality for ULPs. This is
integral part of ODP flows and used to trigger pagefaults in advance
to prepare memory before running working set.

The third part is actual user of those in-kernel APIs.
====================

* tag 'rds-odp-for-5.5':
  net/rds: Use prefetch for On-Demand-Paging MR
  net/rds: Handle ODP mr registration/unregistration
  net/rds: Detect need of On-Demand-Paging memory registration
  RDMA/mlx5: Fix handling of IOVA != user_va in ODP paths
  IB/mlx5: Mask out unsupported ODP capabilities for kernel QPs
  RDMA/mlx5: Don't fake udata for kernel path
  IB/mlx5: Add ODP WQE handlers for kernel QPs
  IB/core: Add interface to advise_mr for kernel users
  IB/core: Introduce ib_reg_user_mr
  IB: Allow calls to ib_umem_get from kernel ULPs
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

e8b3a426

18 1月, 2020 2 次提交

net/rds: Use prefetch for On-Demand-Paging MR · b2dfc676

由 Hans Westgaard Ry 提交于 1月 15, 2020

Try prefetching pages when using On-Demand-Paging MR using
ib_advise_mr.
Signed-off-by: NHans Westgaard Ry <hans.westgaard.ry@oracle.com>
Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

b2dfc676

net/rds: Handle ODP mr registration/unregistration · 2eafa174

由 Hans Westgaard Ry 提交于 1月 15, 2020

On-Demand-Paging MRs are registered using ib_reg_user_mr and
unregistered with ib_dereg_mr.
Signed-off-by: NHans Westgaard Ry <hans.westgaard.ry@oracle.com>
Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

2eafa174

17 1月, 2020 14 次提交

IB/mlx4: Fix memory leak in add_gid error flow · eaad647e

由 Jack Morgenstein 提交于 1月 15, 2020

In procedure mlx4_ib_add_gid(), if the driver is unable to update the FW
gid table, there is a memory leak in the driver's copy of the gid table:
the gid entry's context buffer is not freed.

If such an error occurs, free the entry's context buffer, and mark the
entry as available (by setting its context pointer to NULL).

Fixes: e26be1bf ("IB/mlx4: Implement ib_device callbacks")
Link: https://lore.kernel.org/r/20200115085050.73746-1-leon@kernel.orgSigned-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

eaad647e

IB/mlx5: Expose RoCE accelerator counters · d7fab916

由 Avihai Horon 提交于 1月 15, 2020

Introduce the following RoCE accelerator counters:
* roce_adp_retrans - number of adaptive retransmission for RoCE traffic.
* roce_adp_retrans_to - number of times RoCE traffic reached time out
  due to adaptive retransmission.
* roce_slow_restart - number of times RoCE slow restart was used.
* roce_slow_restart_cnps - number of times RoCE slow restart
  generate CNP packets.
* roce_slow_restart_trans - number of times RoCE slow restart change
  state to slow restart.

Link: https://lore.kernel.org/r/20200115145459.83280-3-leon@kernel.orgSigned-off-by: NAvihai Horon <avihaih@mellanox.com>
Reviewed-by: NMaor Gottlieb <maorg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

d7fab916

RDMA/mlx5: Set relaxed ordering when requested · d6de0bb1

由 Michael Guralnik 提交于 1月 08, 2020

Enable relaxed ordering in the mkey context when requested. As relaxed
ordering is not currently supported in UMR, disable UMR usage for relaxed
ordering MRs.

Link: https://lore.kernel.org/r/1578506740-22188-11-git-send-email-yishaih@mellanox.comSigned-off-by: NMichael Guralnik <michaelgur@mellanox.com>
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

d6de0bb1

RDMA/core: Add the core support field to METHOD_GET_CONTEXT · 81164699

由 Michael Guralnik 提交于 1月 08, 2020

Add the core support field to METHOD_GET_CONTEXT, this field should
represent capabilities that are not device-specific.

Return support for optional access flags for memory regions. User-space
will use this capability to mask the optional access flags for
unsupporting kernels.

Link: https://lore.kernel.org/r/1578506740-22188-10-git-send-email-yishaih@mellanox.comSigned-off-by: NMichael Guralnik <michaelgur@mellanox.com>
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

81164699

RDMA/uverbs: Add new relaxed ordering memory region access flag · 2233c660

由 Michael Guralnik 提交于 1月 08, 2020

Add a new relaxed ordering access flag for memory regions. Using memory
regions with relaxed ordeing set can enhance performance.

This access flag is handled in a best-effort manner, drivers should ignore
if they don't support setting relaxed ordering.

Link: https://lore.kernel.org/r/1578506740-22188-9-git-send-email-yishaih@mellanox.comSigned-off-by: NMichael Guralnik <michaelgur@mellanox.com>
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

2233c660

RDMA/efa: Allow passing of optional access flags for MR registration · 86dd738c

由 Michael Guralnik 提交于 1月 08, 2020

As part of adding a range of optional access flags that drivers need to be
able to accept, mask this range inside efa driver. This will prevent the
driver from failing when an access flag from that range is passed.

Link: https://lore.kernel.org/r/1578506740-22188-8-git-send-email-yishaih@mellanox.comSigned-off-by: NMichael Guralnik <michaelgur@mellanox.com>
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

86dd738c

RDMA/core: Add optional access flags range · 68d384b9

由 Michael Guralnik 提交于 1月 08, 2020

Define a range of access flags that are defined to be optional, both
uverbs and drivers should enable getting them and use if they are
applicable

This will be used, for example, for the relaxed ordering access flag which
unsupporting drivers can ignore.

Link: https://lore.kernel.org/r/1578506740-22188-7-git-send-email-yishaih@mellanox.comSigned-off-by: NMichael Guralnik <michaelgur@mellanox.com>
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

68d384b9

RDMA/uverbs: Verify MR access flags · ca95c141

由 Michael Guralnik 提交于 1月 08, 2020

Verify that MR access flags that are passed from user are all supported
ones, otherwise an error is returned.

Fixes: 4fca0377 ("IB/uverbs: Move ib_access_flags and ib_read_counters_flags to uapi")
Link: https://lore.kernel.org/r/1578506740-22188-6-git-send-email-yishaih@mellanox.comSigned-off-by: NMichael Guralnik <michaelgur@mellanox.com>
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

ca95c141

RDMA/uverbs: Add ioctl command to get a device context · a1123418

由 Jason Gunthorpe 提交于 1月 08, 2020

Allow future extensions of the get context command through the uverbs
ioctl kabi.

Unlike the uverbs version this does not return an async_fd as well, that
has to be done with another command.

Link: https://lore.kernel.org/r/1578506740-22188-5-git-send-email-yishaih@mellanox.comSigned-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

a1123418

RDMA/core: Remove ucontext_lock from the uverbs_destry_ufile_hw() path · da57db25

由 Jason Gunthorpe 提交于 1月 08, 2020

This lock only serializes ucontext creation. Instead of checking the
ucontext_lock during destruction hold the existing hw_destroy_rwsem during
creation, which is the standard pattern for object creation.

The simplification of locking is needed for the next patch.

Link: https://lore.kernel.org/r/1578506740-22188-4-git-send-email-yishaih@mellanox.comSigned-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

da57db25

RDMA/core: Add UVERBS_METHOD_ASYNC_EVENT_ALLOC · d680e88e

由 Jason Gunthorpe 提交于 1月 08, 2020

Allow the async FD to be allocated separately from the context.

This is necessary to introduce the ioctl to create a context, as an ioctl
should only ever create a single uobject at a time.

If multiple async FDs are created then the first one is used to deliver
affiliated events from any ib_uevent_object, with all subsequent ones will
receive only unaffiliated events.

Link: https://lore.kernel.org/r/1578506740-22188-3-git-send-email-yishaih@mellanox.comSigned-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

d680e88e

Merge branch 'mlx5-next' into rdma.git for-next · f8623085

由 Jason Gunthorpe 提交于 1月 16, 2020

From the mlx5-next branch at
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Merged due to dependencies in the next patches.

* branch 'mlx5-next':
  net/mlx5: Expose relaxed ordering bits
  net/mlx5: Add RoCE accelerator counters

f8623085

net/mlx5: Expose relaxed ordering bits · a880a6dd

由 Michael Guralnik 提交于 1月 08, 2020

Expose relaxed ordering bits in HCA capability and mkey context structs.
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NMichael Guralnik <michaelgur@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

a880a6dd

L
net/mlx5: Add RoCE accelerator counters · 8fd5b75d
由 Leon Romanovsky 提交于 1月 15, 2020
```
Add RoCE accelerator definitions.
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
```
8fd5b75d

16 1月, 2020 11 次提交

net/rds: Detect need of On-Demand-Paging memory registration · c4c86abb

由 Hans Westgaard Ry 提交于 1月 15, 2020

Add code to check if memory intended for RDMA is FS-DAX-memory. RDS
will fail with error code EOPNOTSUPP if FS-DAX-memory is detected.
Signed-off-by: NHans Westgaard Ry <hans.westgaard.ry@oracle.com>
Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

c4c86abb

RDMA/mlx5: Fix handling of IOVA != user_va in ODP paths · 8ffc3248

由 Jason Gunthorpe 提交于 1月 15, 2020

Till recently it was not possible for userspace to specify a different
IOVA, but with the new ibv_reg_mr_iova() library call this can be done.

To compute the user_va we must compute:
  user_va = (iova - iova_start) + user_va_start

while being cautious of overflow and other math problems.

The iova is not reliably stored in the mmkey when the MR is created. Only
the cached creation path (the common one) set it, so it must also be set
when creating uncached MRs.

Fix the weird use of iova when computing the starting page index in the
MR. In the normal case, when iova == umem.address:
  iova & (~(BIT(page_shift) - 1)) ==
  ALIGN_DOWN(umem.address, odp->page_size) ==
  ib_umem_start(odp)

And when iova is different using it in math with a user_va is wrong.

Finally, do not allow an implicit ODP to be created with a non-zero IOVA
as we have no support for that.

Fixes: 7bdf65d4 ("IB/mlx5: Handle page faults")
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

8ffc3248

IB/mlx5: Mask out unsupported ODP capabilities for kernel QPs · a73a8955

由 Moni Shoua 提交于 1月 15, 2020

The ODP handler for WQEs in RQ or SRQ is not implented for kernel QPs.
Therefore don't report support in these if query comes from a kernel user.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

a73a8955

RDMA/mlx5: Don't fake udata for kernel path · 48357091

由 Leon Romanovsky 提交于 1月 15, 2020

Kernel paths must not set udata and provide NULL pointer,
instead of faking zeroed udata struct.
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

48357091

IB/mlx5: Add ODP WQE handlers for kernel QPs · da9ee9d8

由 Moni Shoua 提交于 1月 15, 2020

One of the steps in ODP page fault handler for WQEs is to read a WQE
from a QP send queue or receive queue buffer at a specific index.

Since the implementation of this buffer is different between kernel and
user QP the implementation of the handler needs to be aware of that and
handle it in a different way.

ODP for kernel MRs is currently supported only for RDMA_READ
and RDMA_WRITE operations so change the handler to
- read a WQE from a kernel QP send queue
- fail if access to receive queue or shared receive queue is
  required for a kernel QP
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

da9ee9d8

IB/core: Add interface to advise_mr for kernel users · 87d8069f

由 Moni Shoua 提交于 1月 15, 2020

Allow ULPs to call advise_mr, so they can control ODP regions
in the same way as user space applications.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

87d8069f

IB/core: Introduce ib_reg_user_mr · 33006bd4

由 Moni Shoua 提交于 1月 15, 2020

Add ib_reg_user_mr() for kernel ULPs to register user MRs.

The common use case that uses this function is a userspace application
that allocates memory for HCA access but the responsibility to register
the memory at the HCA is on an kernel ULP. This ULP that acts as an agent
for the userspace application.

This function is intended to be used without a user context so vendor
drivers need to be aware of calling reg_user_mr() device operation with
udata equal to NULL.

Among all drivers, i40iw is the only driver which relies on presence
of udata, so check udata existence for that driver.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Reviewed-by: NGuy Levi <guyle@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

33006bd4

IB: Allow calls to ib_umem_get from kernel ULPs · c320e527

由 Moni Shoua 提交于 1月 15, 2020

So far the assumption was that ib_umem_get() and ib_umem_odp_get()
are called from flows that start in UVERBS and therefore has a user
context. This assumption restricts flows that are initiated by ULPs
and need the service that ib_umem_get() provides.

This patch changes ib_umem_get() and ib_umem_odp_get() to get IB device
directly by relying on the fact that both UVERBS and ULPs sets that
field correctly.
Reviewed-by: NGuy Levi <guyle@mellanox.com>
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

c320e527

IB/srp: Never use immediate data if it is disabled by a user · 0fbb37dd

由 Sergey Gorenko 提交于 1月 15, 2020

Some SRP targets that do not support specification SRP-2, put the garbage
to the reserved bits of the SRP login response. The problem was not
detected for a long time because the SRP initiator ignored those bits. But
now one of them is used as SRP_LOGIN_RSP_IMMED_SUPP. And it causes a
critical error on the target when the initiator sends immediate data.

The ib_srp module has a use_imm_date parameter to enable or disable
immediate data manually. But it does not help in the above case, because
use_imm_date is ignored at handling the SRP login response. The problem is
definitely caused by a bug on the target side, but the initiator's
behavior also does not look correct. The initiator should not use
immediate data if use_imm_date is disabled by a user.

This commit adds an additional checking of use_imm_date at the handling of
SRP login response to avoid unexpected use of immediate data.

Fixes: 882981f4 ("RDMA/srp: Add support for immediate data")
Link: https://lore.kernel.org/r/20200115133055.30232-1-sergeygo@mellanox.comSigned-off-by: NSergey Gorenko <sergeygo@mellanox.com>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

0fbb37dd

RDMA/rxe: Compute the maximum sges and inline size based on the WQE size · 363824f9

由 Rao Shoaib 提交于 1月 13, 2020

The SGE buffer size and max_inline data should be derived from the size of
the WQE. Each value individually sets the WQE size, so compute the actual
sizes based on the actual WQE size and configure the QP with the maximums.

Also fix the missing return of the actual maximum capability to the caller.

Link: https://lore.kernel.org/r/1578962480-17814-3-git-send-email-rao.shoaib@oracle.comSigned-off-by: NRao Shoaib <rao.shoaib@oracle.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

363824f9

Introduce maximum WQE size to check limits · 4e8d683f

由 Rao Shoaib 提交于 1月 13, 2020

Introduce maximum WQE size to impose limits on max SGE's and inline data

Link: https://lore.kernel.org/r/1578962480-17814-2-git-send-email-rao.shoaib@oracle.comSigned-off-by: NRao Shoaib <rao.shoaib@oracle.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

4e8d683f

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功