提交 · 296e3a2aad09d328f22e54655c3d736033fe1ae8 · openeuler / Kernel

25 7月, 2019 5 次提交

IB/mlx5: Prevent concurrent MR updates during invalidation · 296e3a2a

由 Moni Shoua 提交于 7月 23, 2019

The device requires that memory registration work requests that update the
address translation table of a MR will be fenced if posted together. This
scenario can happen when address ranges are invalidated by the mmu in
separate concurrent calls to the invalidation callback.

We prefer to block concurrent address updates for a single MR over fencing
since making the decision if a WQE needs fencing will be more expensive
and fencing all WQEs is a too radical choice.

Further, it isn't clear that this code can even run safely concurrently,
so a lock is a safer choice.

Fixes: b4cfe447 ("IB/mlx5: Implement on demand paging by adding support for MMU notifiers")
Link: https://lore.kernel.org/r/20190723065733.4899-8-leon@kernel.orgSigned-off-by: NMoni Shoua <monis@mellanox.com>
Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

296e3a2a

IB/mlx5: Fix clean_mr() to work in the expected order · b9332dad

由 Yishai Hadas 提交于 7月 23, 2019

Any dma map underlying the MR should only be freed once the MR is fenced
at the hardware.

As of the above we first destroy the MKEY and just after that can safely
call to dma_unmap_single().

Link: https://lore.kernel.org/r/20190723065733.4899-6-leon@kernel.org
Cc: <stable@vger.kernel.org> # 4.3
Fixes: 8a187ee5 ("IB/mlx5: Support the new memory registration API")
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

b9332dad

IB/mlx5: Move MRs to a kernel PD when freeing them to the MR cache · 9ec4483a

由 Yishai Hadas 提交于 7月 23, 2019

Fix unreg_umr to move the MR to a kernel owned PD (i.e. the UMR PD) which
can't be accessed by userspace.

This ensures that nothing can continue to access the MR once it has been
placed in the kernels cache for reuse.

MRs in the cache continue to have their HW state, including DMA tables,
present. Even though the MR has been invalidated, changing the PD provides
an additional layer of protection against use of the MR.

Link: https://lore.kernel.org/r/20190723065733.4899-5-leon@kernel.org
Cc: <stable@vger.kernel.org> # 3.10
Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

9ec4483a

IB/mlx5: Use direct mkey destroy command upon UMR unreg failure · afd14174

由 Yishai Hadas 提交于 7月 23, 2019

Use a direct firmware command to destroy the mkey in case the unreg UMR
operation has failed.

This prevents a case that a mkey will leak out from the cache post a
failure to be destroyed by a UMR WR.

In case the MR cache limit didn't reach a call to add another entry to the
cache instead of the destroyed one is issued.

In addition, replaced a warn message to WARN_ON() as this flow is fatal
and can't happen unless some bug around.

Link: https://lore.kernel.org/r/20190723065733.4899-4-leon@kernel.org
Cc: <stable@vger.kernel.org> # 4.10
Fixes: 49780d42 ("IB/mlx5: Expose MR cache for mlx5_ib")
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

afd14174

IB/mlx5: Fix unreg_umr to ignore the mkey state · 6a053953

由 Yishai Hadas 提交于 7月 23, 2019

Fix unreg_umr to ignore the mkey state and do not fail if was freed.  This
prevents a case that a user space application already changed the mkey
state to free and then the UMR operation will fail leaving the mkey in an
inappropriate state.

Link: https://lore.kernel.org/r/20190723065733.4899-3-leon@kernel.org
Cc: <stable@vger.kernel.org> # 3.19
Fixes: 968e78dd ("IB/mlx5: Enhance UMR support to allow partial page table update")
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

6a053953

23 7月, 2019 9 次提交

RDMA/siw: Remove set but not used variables 'rv' · af0653d5

由 Mao Wenan 提交于 7月 19, 2019

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/infiniband/sw/siw/siw_cm.c: In function siw_cep_set_inuse:
drivers/infiniband/sw/siw/siw_cm.c:223:6: warning: variable rv set but not used [-Wunused-but-set-variable]

Fixes: 6c52fdc2 ("rdma/siw: connection management")
Link: https://lore.kernel.org/r/20190719012938.100628-1-maowenan@huawei.comSigned-off-by: NMao Wenan <maowenan@huawei.com>
Reviewed-by: NBernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

af0653d5

IB/mlx5: Replace kfree with kvfree · b7f406bb

由 Chuhong Yuan 提交于 7月 17, 2019

Memory allocated by kvzalloc should not be freed by kfree(), use kvfree()
instead.

Fixes: 813e90b1 ("IB/mlx5: Add advise_mr() support")
Link: https://lore.kernel.org/r/20190717082101.14196-1-hslester96@gmail.comSigned-off-by: NChuhong Yuan <hslester96@gmail.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

b7f406bb

RDMA/bnxt_re: Honor vlan_id in GID entry comparison · c56b593d

由 Selvin Xavier 提交于 7月 15, 2019

A GID entry consists of GID, vlan, netdev and smac. Extend GID duplicate
check comparisons to consider vlan_id as well to support IPv6 VLAN based
link local addresses. Introduce a new structure (bnxt_qplib_gid_info) to
hold gid and vlan_id information.

The issue is discussed in the following thread
https://lore.kernel.org/r/AM0PR05MB4866CFEDCDF3CDA1D7D18AA5D1F20@AM0PR05MB4866.eurprd05.prod.outlook.com

Fixes: 823b23da ("IB/core: Allow vlan link local address based RoCE GIDs")
Cc: <stable@vger.kernel.org> # v5.2+
Link: https://lore.kernel.org/r/20190715091913.15726-1-selvin.xavier@broadcom.comReported-by: NYi Zhang <yi.zhang@redhat.com>
Co-developed-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Tested-by: NYi Zhang <yi.zhang@redhat.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

c56b593d

IB/hfi1: Drop all TID RDMA READ RESP packets after r_next_psn · f4d46119

由 Kaike Wan 提交于 7月 15, 2019

When a TID sequence error occurs while receiving TID RDMA READ RESP
packets, all packets after flow->flow_state.r_next_psn should be dropped,
including those response packets for subsequent segments.

The current implementation will drop the subsequent response packets for
the segment to complete next, but may accept packets for subsequent
segments and therefore mistakenly advance the r_next_psn fields for the
corresponding software flows. This may result in failures to complete
subsequent segments after the current segment is completed.

The fix is to only use the flow pointed by req->clear_tail for checking
KDETH PSN instead of finding a flow from the request's flow array.

Fixes: b885d5be ("IB/hfi1: Unify the software PSN check for TID RDMA READ/WRITE")
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20190715164540.74174.54702.stgit@awfm-01.aw.intel.comReviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NKaike Wan <kaike.wan@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

f4d46119

IB/hfi1: Field not zero-ed when allocating TID flow memory · dc25b239

由 Kaike Wan 提交于 7月 15, 2019

The field flow->resync_npkts is added for TID RDMA WRITE request and
zero-ed when a TID RDMA WRITE RESP packet is received by the requester.
This field is used to rewind a request during retry in the function
hfi1_tid_rdma_restart_req() shared by both TID RDMA WRITE and TID RDMA
READ requests. Therefore, when a TID RDMA READ request is retried, this
field may not be initialized at all, which causes the retry to start at an
incorrect psn, leading to the drop of the retry request by the responder.

This patch fixes the problem by zeroing out the field when the flow memory
is allocated.

Fixes: 838b6fd2 ("IB/hfi1: TID RDMA RcvArray programming and TID allocation")
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20190715164534.74174.6177.stgit@awfm-01.aw.intel.comReviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NKaike Wan <kaike.wan@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

dc25b239

IB/hfi1: Unreserve a flushed OPFN request · 2b74c878

由 Kaike Wan 提交于 7月 15, 2019

When an OPFN request is flushed, the request is completed without
unreserving itself from the send queue. Subsequently, when a new
request is post sent, the following warning will be triggered:

WARNING: CPU: 4 PID: 8130 at rdmavt/qp.c:1761 rvt_post_send+0x72a/0x880 [rdmavt]
Call Trace:
[<ffffffffbbb61e41>] dump_stack+0x19/0x1b
[<ffffffffbb497688>] __warn+0xd8/0x100
[<ffffffffbb4977cd>] warn_slowpath_null+0x1d/0x20
[<ffffffffc01c941a>] rvt_post_send+0x72a/0x880 [rdmavt]
[<ffffffffbb4dcabe>] ? account_entity_dequeue+0xae/0xd0
[<ffffffffbb61d645>] ? __kmalloc+0x55/0x230
[<ffffffffc04e1a4c>] ib_uverbs_post_send+0x37c/0x5d0 [ib_uverbs]
[<ffffffffc04e5e36>] ? rdma_lookup_put_uobject+0x26/0x60 [ib_uverbs]
[<ffffffffc04dbce6>] ib_uverbs_write+0x286/0x460 [ib_uverbs]
[<ffffffffbb6f9457>] ? security_file_permission+0x27/0xa0
[<ffffffffbb641650>] vfs_write+0xc0/0x1f0
[<ffffffffbb64246f>] SyS_write+0x7f/0xf0
[<ffffffffbbb74ddb>] system_call_fastpath+0x22/0x27

This patch fixes the problem by moving rvt_qp_wqe_unreserve() into
rvt_qp_complete_swqe() to simplify the code and make it less
error-prone.

Fixes: ca95f802 ("IB/hfi1: Unreserve a reserved request when it is completed")
Link: https://lore.kernel.org/r/20190715164528.74174.31364.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NKaike Wan <kaike.wan@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

2b74c878

IB/hfi1: Check for error on call to alloc_rsm_map_table · cd48a820

由 John Fleck 提交于 7月 15, 2019

The call to alloc_rsm_map_table does not check if the kmalloc fails.
Check for a NULL on alloc, and bail if it fails.

Fixes: 372cc85a ("IB/hfi1: Extract RSM map table init from QOS")
Link: https://lore.kernel.org/r/20190715164521.74174.27047.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NJohn Fleck <john.fleck@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

cd48a820

RDMA/hns: Fix sg offset non-zero issue · 60c3becf

由 Xi Wang 提交于 7月 11, 2019

When run perftest in many times, the system will report a BUG as follows:

BUG: Bad rss-counter state mm:(____ptrval____) idx:0 val:-1
BUG: Bad rss-counter state mm:(____ptrval____) idx:1 val:1

We tested with different kernel version and found it started from the the
following commit:

commit d10bcf94 ("RDMA/umem: Combine contiguous PAGE_SIZE regions in
SGEs")

In this commit, the sg->offset is always 0 when sg_set_page() is called in
ib_umem_get() and the drivers are not allowed to change the sgl, otherwise
it will get bad page descriptor when unfolding SGEs in __ib_umem_release()
as sg_page_count() will get wrong result while sgl->offset is not 0.

However, there is a weird sgl usage in the current hns driver, the driver
modified sg->offset after calling ib_umem_get(), which caused we iterate
past the wrong number of pages in for_each_sg_page iterator.

This patch fixes it by correcting the non-standard sgl usage found in the
hns_roce_db_map_user() function.

Fixes: d10bcf94 ("RDMA/umem: Combine contiguous PAGE_SIZE regions in SGEs")
Fixes: 0425e3e6 ("RDMA/hns: Support flush cqe for hip08 in kernel space")
Link: https://lore.kernel.org/r/1562808737-45723-1-git-send-email-oulijun@huawei.comSigned-off-by: NXi Wang <wangxi11@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

60c3becf

RDMA/siw: Fix error return code in siw_init_module() · d5121ffe

由 Wei Yongjun 提交于 7月 18, 2019

Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Fixes: bdcf26bf ("rdma/siw: network and RDMA core interface")
Link: https://lore.kernel.org/r/20190718092710.85709-1-weiyongjun1@huawei.comSigned-off-by: NWei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: NBernard Metzler <bmt@zurich.ibm.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

d5121ffe

17 7月, 2019 2 次提交

scsi: IB/srp: set virt_boundary_mask in the scsi host · 8c175d31

由 Christoph Hellwig 提交于 6月 17, 2019

This ensures all proper DMA layer handling is taken care of by the SCSI
midlayer.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Acked-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

8c175d31

scsi: IB/iser: set virt_boundary_mask in the scsi host · 09a4460b

由 Christoph Hellwig 提交于 6月 17, 2019

This ensures all proper DMA layer handling is taken care of by the SCSI
midlayer.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

09a4460b

13 7月, 2019 1 次提交

mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options · 6471384a

由 Alexander Potapenko 提交于 7月 11, 2019

Patch series "add init_on_alloc/init_on_free boot options", v10.

Provide init_on_alloc and init_on_free boot options.

These are aimed at preventing possible information leaks and making the
control-flow bugs that depend on uninitialized values more deterministic.

Enabling either of the options guarantees that the memory returned by the
page allocator and SL[AU]B is initialized with zeroes.  SLOB allocator
isn't supported at the moment, as its emulation of kmem caches complicates
handling of SLAB_TYPESAFE_BY_RCU caches correctly.

Enabling init_on_free also guarantees that pages and heap objects are
initialized right after they're freed, so it won't be possible to access
stale data by using a dangling pointer.

As suggested by Michal Hocko, right now we don't let the heap users to
disable initialization for certain allocations.  There's not enough
evidence that doing so can speed up real-life cases, and introducing ways
to opt-out may result in things going out of control.

This patch (of 2):

The new options are needed to prevent possible information leaks and make
control-flow bugs that depend on uninitialized values more deterministic.

This is expected to be on-by-default on Android and Chrome OS.  And it
gives the opportunity for anyone else to use it under distros too via the
boot args.  (The init_on_free feature is regularly requested by folks
where memory forensics is included in their threat models.)

init_on_alloc=1 makes the kernel initialize newly allocated pages and heap
objects with zeroes.  Initialization is done at allocation time at the
places where checks for __GFP_ZERO are performed.

init_on_free=1 makes the kernel initialize freed pages and heap objects
with zeroes upon their deletion.  This helps to ensure sensitive data
doesn't leak via use-after-free accesses.

Both init_on_alloc=1 and init_on_free=1 guarantee that the allocator
returns zeroed memory.  The two exceptions are slab caches with
constructors and SLAB_TYPESAFE_BY_RCU flag.  Those are never
zero-initialized to preserve their semantics.

Both init_on_alloc and init_on_free default to zero, but those defaults
can be overridden with CONFIG_INIT_ON_ALLOC_DEFAULT_ON and
CONFIG_INIT_ON_FREE_DEFAULT_ON.

If either SLUB poisoning or page poisoning is enabled, those options take
precedence over init_on_alloc and init_on_free: initialization is only
applied to unpoisoned allocations.

Slowdown for the new features compared to init_on_free=0, init_on_alloc=0:

hackbench, init_on_free=1:  +7.62% sys time (st.err 0.74%)
hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%)

Linux build with -j12, init_on_free=1:  +8.38% wall time (st.err 0.39%)
Linux build with -j12, init_on_free=1:  +24.42% sys time (st.err 0.52%)
Linux build with -j12, init_on_alloc=1: -0.13% wall time (st.err 0.42%)
Linux build with -j12, init_on_alloc=1: +0.57% sys time (st.err 0.40%)

The slowdown for init_on_free=0, init_on_alloc=0 compared to the baseline
is within the standard error.

The new features are also going to pave the way for hardware memory
tagging (e.g.  arm64's MTE), which will require both on_alloc and on_free
hooks to set the tags for heap objects.  With MTE, tagging will have the
same cost as memory initialization.

Although init_on_free is rather costly, there are paranoid use-cases where
in-memory data lifetime is desired to be minimized.  There are various
arguments for/against the realism of the associated threat models, but
given that we'll need the infrastructure for MTE anyway, and there are
people who want wipe-on-free behavior no matter what the performance cost,
it seems reasonable to include it in this series.

[glider@google.com: v8]
  Link: http://lkml.kernel.org/r/20190626121943.131390-2-glider@google.com
[glider@google.com: v9]
  Link: http://lkml.kernel.org/r/20190627130316.254309-2-glider@google.com
[glider@google.com: v10]
  Link: http://lkml.kernel.org/r/20190628093131.199499-2-glider@google.com
Link: http://lkml.kernel.org/r/20190617151050.92663-2-glider@google.comSigned-off-by: NAlexander Potapenko <glider@google.com>
Acked-by: NKees Cook <keescook@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.cz>		[page and dmapool parts
Acked-by: James Morris <jamorris@linux.microsoft.com>]
Cc: Christoph Lameter <cl@linux.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6471384a

12 7月, 2019 2 次提交

RMDA/siw: Require a 64 bit arch · 0b043644

由 Jason Gunthorpe 提交于 7月 12, 2019

The new siw driver fails to build on i386 with

drivers/infiniband/sw/siw/siw_qp.c:1025:3: error: invalid output size for constraint '+q'
                smp_store_mb(*cq->notify, SIW_NOTIFY_NOT);

As it is using 64 bit values with the smp_store_mb.

Since the entire scheme here seems questionable, and we are in the merge
window, fix the compile failures by disabling 32 bit support on this
driver.

A proper fix will be reviewed post merge window.

Fixes: c0cf5bdd ("rdma/siw: addition to kernel build environment")
Reported-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

0b043644

RDMA/siw: Mark expected switch fall-throughs · cea743f2

由 Gustavo A. R. Silva 提交于 7月 11, 2019

In preparation to enabling -Wimplicit-fallthrough, mark switch cases where
we are expecting to fall through.

This patch fixes the following warnings:

drivers/infiniband/sw/siw/siw_qp_rx.c: In function ‘siw_rdmap_complete’:
drivers/infiniband/sw/siw/siw_qp_rx.c:1214:18: warning: this statement may fall through [-Wimplicit-fallthrough=]
   wqe->rqe.flags |= SIW_WQE_SOLICITED;
   ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
drivers/infiniband/sw/siw/siw_qp_rx.c:1215:2: note: here
  case RDMAP_SEND:
  ^~~~

drivers/infiniband/sw/siw/siw_qp_tx.c: In function ‘siw_qp_sq_process’:
drivers/infiniband/sw/siw/siw_qp_tx.c:1044:4: warning: this statement may fall through [-Wimplicit-fallthrough=]
    siw_wqe_put_mem(wqe, tx_type);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/infiniband/sw/siw/siw_qp_tx.c:1045:3: note: here
   case SIW_OP_INVAL_STAG:
   ^~~~
drivers/infiniband/sw/siw/siw_qp_tx.c:1128:4: warning: this statement may fall through [-Wimplicit-fallthrough=]
    siw_wqe_put_mem(wqe, tx_type);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/infiniband/sw/siw/siw_qp_tx.c:1129:3: note: here
   case SIW_OP_INVAL_STAG:
   ^~~~

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.
Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: NBernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

cea743f2

11 7月, 2019 7 次提交

RDMA/core: Fix -Wunused-const-variable warnings · bedc0fd0

由 Qian Cai 提交于 7月 11, 2019

The commit below introduced a few compilation warnings.

In file included from ./include/rdma/ib_verbs.h:64,
                 from ./include/linux/mlx5/device.h:37,
                 from ./include/linux/mlx5/driver.h:51,
                 from drivers/net/ethernet/mellanox/mlx5/core/uar.c:36:
./include/linux/dim.h:378:1: warning: 'rdma_dim_prof' defined but not
used [-Wunused-const-variable=]
 rdma_dim_prof[RDMA_DIM_PARAMS_NUM_PROFILES] = {
 ^~~~~~~~~~~~~
In file included from ./include/rdma/ib_verbs.h:64,
                 from ./include/linux/mlx5/device.h:37,
                 from ./include/linux/mlx5/driver.h:51,
                 from
drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:37:
./include/linux/dim.h:378:1: warning: 'rdma_dim_prof' defined but not
used [-Wunused-const-variable=]
 rdma_dim_prof[RDMA_DIM_PARAMS_NUM_PROFILES] = {
 ^~~~~~~~~~~~~

Since only ib_cq_rdma_dim_work() in drivers/infiniband/core/cq.c uses it,
just move the definition over there.

Fixes: f4915455 ("linux/dim: Implement RDMA adaptive moderation (DIM)")
Signed-off-by: NQian Cai <cai@lca.pw>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

bedc0fd0

rdma/siw: Remove set but not used variable 's' · 855085d9

由 YueHaibing 提交于 7月 11, 2019

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/infiniband/sw/siw/siw_cm.c: In function siw_cm_llp_state_change:
drivers/infiniband/sw/siw/siw_cm.c:1278:17: warning: variable s set but not used [-Wunused-but-set-variable]

Fixes: 6c52fdc2 ("rdma/siw: connection management")
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Reviewed-by: NBernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

855085d9

rdma/siw: Add missing dependencies on LIBCRC32C and DMA_VIRT_OPS · b45305d7

由 Geert Uytterhoeven 提交于 7月 10, 2019

If LIBCRC32C and DMA_VIRT_OPS are not enabled:

    drivers/infiniband/sw/siw/siw_main.o: In function `siw_newlink':
    siw_main.c:(.text+0x35c): undefined reference to `dma_virt_ops'
    drivers/infiniband/sw/siw/siw_qp_rx.o: In function `siw_csum_update':
    siw_qp_rx.c:(.text+0x16): undefined reference to `crc32c'

Fix the first issue by adding a select of DMA_VIRT_OPS.  Fix the second
issue by replacing the unneeded dependency on CRYPTO_CRC32 by a dependency
on LIBCRC32C.

Reported-by: noreply@ellerman.id.au (first issue)
Fixes: c0cf5bdd ("rdma/siw: addition to kernel build environment")
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

b45305d7

RDMA/siw: Add missing rtnl_lock around access to ifa · c421651f

由 Jason Gunthorpe 提交于 7月 11, 2019

ifa is protected by rcu or rtnl, add the missing locking. In this case we
have to use rtnl since siw_listen_address() is sleeping.

Fixes: 6c52fdc2 ("rdma/siw: connection management")
Reviewed-by: NBernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

c421651f

rdma/siw: Use proper enumerated type in map_cqe_status · 775a41e2

由 Nathan Chancellor 提交于 7月 10, 2019

clang warns several times:

drivers/infiniband/sw/siw/siw_cq.c:31:4: warning: implicit conversion
from enumeration type 'enum siw_wc_status' to different enumeration type
'enum siw_opcode' [-Wenum-conversion]
        { SIW_WC_SUCCESS, IB_WC_SUCCESS },
        ~ ^~~~~~~~~~~~~~

Fixes: b0fff731 ("rdma/siw: completion queue methods")
Link: https://github.com/ClangBuiltLinux/linux/issues/596Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

775a41e2

RDMA/siw: Remove unnecessary kthread create/destroy printouts · 85de5d53

由 Bernard Metzler 提交于 7月 10, 2019

There is already a warning if we cannot start any thread, and stopping
those threads is not worth spamming the console.

This also corrects a warning from gcc:

 drivers/infiniband/sw/siw/siw_main.c: In function 'siw_create_tx_threads':
 drivers/infiniband/sw/siw/siw_main.c:91:11: warning:
  variable 'rv' set but not used [-Wunused-but-set-variable]
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NBernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

85de5d53

IB/rdmavt: Fix variable shadowing issue in rvt_create_cq · 4d2b8517

由 Nathan Chancellor 提交于 7月 09, 2019

clang warns:

drivers/infiniband/sw/rdmavt/cq.c:260:7: warning: variable 'err' is used
uninitialized whenever 'if' condition is true
[-Wsometimes-uninitialized]
                if (err)
                    ^~~
drivers/infiniband/sw/rdmavt/cq.c:310:9: note: uninitialized use occurs
here
        return err;
               ^~~
drivers/infiniband/sw/rdmavt/cq.c:260:3: note: remove the 'if' if its
condition is always false
                if (err)
                ^~~~~~~~
drivers/infiniband/sw/rdmavt/cq.c:253:7: warning: variable 'err' is used
uninitialized whenever 'if' condition is true
[-Wsometimes-uninitialized]
                if (!cq->ip) {
                    ^~~~~~~
drivers/infiniband/sw/rdmavt/cq.c:310:9: note: uninitialized use occurs
here
        return err;
               ^~~
drivers/infiniband/sw/rdmavt/cq.c:253:3: note: remove the 'if' if its
condition is always false
                if (!cq->ip) {
                ^~~~~~~~~~~~~~
drivers/infiniband/sw/rdmavt/cq.c:211:9: note: initialize the variable
'err' to silence this warning
        int err;
               ^
                = 0
2 warnings generated.

The function scoped err variable is uninitialized when the flow jumps into
the if statement. The if scoped err variable shadows the function scoped
err variable, preventing the err assignments within the if statement to be
reflected at the function level, which will cause uninitialized use when
the goto statements are taken.

Just remove the if scoped err declaration so that there is only one copy
of the err variable for this function.

Fixes: 239b0e52 ("IB/hfi1: Move rvt_cq_wc struct into uapi directory")
Link: https://github.com/ClangBuiltLinux/linux/issues/594Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
Acked-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

4d2b8517

10 7月, 2019 1 次提交

RDMA/core: Fix race when resolving IP address · d8d9ec7d

由 Dag Moxnes 提交于 7月 09, 2019

Use the neighbour lock when copying the MAC address from the neighbour
data struct in dst_fetch_ha.

When not using the lock, it is possible for the function to race with
neigh_update(), causing it to copy an torn MAC address:

rdma_resolve_addr()
  rdma_resolve_ip()
    addr_resolve()
      addr_resolve_neigh()
        fetch_ha()
          dst_fetch_ha()
	     memcpy(dev_addr->dst_dev_addr, n->ha, MAX_ADDR_LEN)

and

net_ioctl()
  arp_ioctl()
    arp_rec_delete()
      arp_invalidate()
        neigh_update()
          __neigh_update()
	    memcpy(&neigh->ha, lladdr, dev->addr_len)

It is possible to provoke this error by calling rdma_resolve_addr() in a
tight loop, while deleting the corresponding ARP entry in another tight
loop.

Fixes: 51d45974 ("infiniband: addr: Consolidate code to fetch neighbour hardware address from dst.")
Signed-off-by: NDag Moxnes <dag.moxnes@oracle.com>
Signed-off-by: NHåkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

d8d9ec7d

09 7月, 2019 8 次提交

IB/core: Work on the caller socket net namespace in nldev_newlink() · 7a54f78d

由 Parav Pandit 提交于 7月 04, 2019

While creating new RDMA devices based on netdevice name, consider the net
namespace of the caller skb's socket similar to rest of the doit()
callbacks and nldev_dellink() which deletes the RDMA device created using
nldev_newlink().

Fixes: 3856ec4b ("RDMA/core: Add RDMA_NLDEV_CMD_NEWLINK/DELLINK support")
Signed-off-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

7a54f78d

RDMA/rxe: Fill in wc byte_len with IB_WC_RECV_RDMA_WITH_IMM · bdce1290

由 Konstantin Taranov 提交于 6月 27, 2019

Calculate the correct byte_len on the receiving side when a work
completion is generated with IB_WC_RECV_RDMA_WITH_IMM opcode.

According to the IBA byte_len must indicate the number of written bytes,
whereas it was always equal to zero for the IB_WC_RECV_RDMA_WITH_IMM
opcode, even though data was transferred.

Fixes: 8700e3e7 ("Soft RoCE driver")
Signed-off-by: NKonstantin Taranov <konstantin.taranov@inf.ethz.ch>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

bdce1290

RDMA/mlx5: Set RDMA DIM to be enabled by default · 96e2fd73

由 Leon Romanovsky 提交于 7月 08, 2019

Enable RDMA DIM by default for better user experience.
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

96e2fd73

RDMA/nldev: Added configuration of RDMA dynamic interrupt moderation to netlink · f8fc8cd9

由 Yamin Friedman 提交于 7月 08, 2019

Added parameter in ib_device for enabling dynamic interrupt moderation so
that it can be configured in userspace using rdma tool.

In order to set adaptive-moderation for an ib device the command is:
rdma dev set [DEV] adaptive-moderation [on|off]
Please set on/off.

rdma dev show
0: mlx5_0: node_type ca fw 16.26.0055 node_guid 248a:0703:00a5:29d0
sys_image_guid 248a:0703:00a5:29d0 adaptive-moderation on

rdma resource show cq
dev mlx5_0 cqn 0 cqe 1023 users 4 poll-ctx UNBOUND_WORKQUEUE
adaptive-moderation off comm [ib_core]
Signed-off-by: NYamin Friedman <yaminf@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

f8fc8cd9

RDMA/core: Provide RDMA DIM support for ULPs · da662979

由 Yamin Friedman 提交于 7月 08, 2019

Added the interface in the infiniband driver that applies the rdma_dim
adaptive moderation. There is now a special function for allocating an
ib_cq that uses rdma_dim.

Performance improvement (ConnectX-5 100GbE, x86) running FIO benchmark over
NVMf between two equal end-hosts with 56 cores across a Mellanox switch
using null_blk device:

READS without DIM:
blk size | BW       | IOPS | 99th percentile latency  | 99.99th latency
512B     | 3.8GiB/s | 7.7M | 1401  usec               | 2442  usec
4k       | 7.0GiB/s | 1.8M | 4817  usec               | 6587  usec
64k      | 10.7GiB/s| 175k | 9896  usec               | 10028 usec

IO WRITES without DIM:
blk size | BW       | IOPS | 99th percentile latency  | 99.99th latency
512B     | 3.6GiB/s | 7.5M | 1434  usec               | 2474  usec
4k       | 6.3GiB/s | 1.6M | 938   usec               | 1221  usec
64k      | 10.7GiB/s| 175k | 8979  usec               | 12780 usec

IO READS with DIM:
blk size | BW       | IOPS | 99th percentile latency  | 99.99th latency
512B     | 4GiB/s   | 8.2M | 816    usec              | 889   usec
4k       | 10.1GiB/s| 2.65M| 3359   usec              | 5080  usec
64k      | 10.7GiB/s| 175k | 9896   usec              | 10028 usec

IO WRITES with DIM:
blk size | BW       | IOPS  | 99th percentile latency | 99.99th latency
512B     | 3.9GiB/s | 8.1M  | 799   usec              | 922   usec
4k       | 9.6GiB/s | 2.5M  | 717   usec              | 1004  usec
64k      | 10.7GiB/s| 176k  | 8586  usec              | 12256 usec

The rdma_dim algorithm was designed to measure the effectiveness of
moderation on the flow in a general way and thus should be appropriate
for all RDMA storage protocols.

rdma_dim is configured to be the default option based on performance
improvement seen after extensive tests.
Signed-off-by: NYamin Friedman <yaminf@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

da662979

IB/mlx5: Report correctly tag matching rendezvous capability · 89705e92

由 Danit Goldberg 提交于 7月 05, 2019

Userspace expects the IB_TM_CAP_RC bit to indicate that the device
supports RC transport tag matching with rendezvous offload. However the
firmware splits this into two capabilities for eager and rendezvous tag
matching.

Only if the FW supports both modes should userspace be told the tag
matching capability is available.

Cc: <stable@vger.kernel.org> # 4.13
Fixes: eb761894 ("IB/mlx5: Fill XRQ capabilities")
Signed-off-by: NDanit Goldberg <danitg@mellanox.com>
Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

89705e92

IB/mlx5: Implement VHCA tunnel mechanism in DEVX · b6142608

由 Max Gurtovoy 提交于 7月 01, 2019

This mechanism will allow function-A to perform operations "on behalf" of
function-B via tunnel object. Function-A will have privileges for creating
and using this tunnel object.

For example, in the device emulation feature presented in Bluefield-1 SoC,
using device emulation capability, one can present NVMe function to the
host OS.

Since the NVMe function doesn't have a normal command interface to the HCA
HW, here is a need to create a channel that will be able to issue commands
"on behalf" of this function.

This channel is the VHCA_TUNNEL general object. The emulation software
will create this tunnel for every managed function and issue commands via
devx general cmd interface using the appropriate tunnel ID. When devX
context will receive a command with non-zero vhca_tunnel_id, it will pass
the command as-is down to the HCA.

All the validation, security and resource tracking of the commands and the
created tunneled objects is in the responsibility of the HCA FW. When a
VHCA_TUNNEL object destroyed, the device will issue an internal
FLR (function level reset) to the emulated function associated with this
tunnel. This will destroy all the created resources using the tunnel
mechanism.
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

b6142608

RDMA/rvt: Do not use a kernel header in the ABI · f10ff380

由 Jason Gunthorpe 提交于 7月 08, 2019

rvt was using ib_sge as part of it's ABI, which is not allowed. Introduce
a new struct with the same layout and use it instead.

Fixes: dabac6e4 ("IB/hfi1: Move receive work queue struct into uapi directory")
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

f10ff380

08 7月, 2019 1 次提交

RDMA/siw: Fix DEFINE_PER_CPU compilation when ARCH_NEEDS_WEAK_PER_CPU · 4c7d6dcd

由 Jason Gunthorpe 提交于 7月 08, 2019

The initializer for the variable cannot be inside the macro (and zero
initialization isn't needed anyhow).

include/linux/percpu-defs.h:92:33: warning: '__pcpu_unique_use_cnt' initialized and declared 'extern'
  extern __PCPU_DUMMY_ATTRS char __pcpu_unique_##name;  \
                                 ^~~~~~~~~~~~~~
include/linux/percpu-defs.h:115:2: note: in expansion of macro 'DEFINE_PER_CPU_SECTION'
  DEFINE_PER_CPU_SECTION(type, name, "")
  ^~~~~~~~~~~~~~~~~~~~~~
drivers/infiniband/sw/siw/siw_main.c:129:8: note: in expansion of macro 'DEFINE_PER_CPU'
 static DEFINE_PER_CPU(atomic_t, use_cnt = ATOMIC_INIT(0));
        ^~~~~~~~~~~~~~

Also the rules for PER_CPU require the variable names to be globally
unique, so prefix them with siw_

Fixes: b9be6f18 ("rdma/siw: transmit path")
Fixes: bdcf26bf ("rdma/siw: network and RDMA core interface")
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

4c7d6dcd

07 7月, 2019 4 次提交

ibverbs/rxe: Remove variable self-initialization · d3e53971

由 Maksym Planeta 提交于 7月 02, 2019

In some cases (not in this particular one) variable self-initialization
can lead to undefined behavior. In this case, it is just obscure code.
Signed-off-by: NMaksym Planeta <mplaneta@os.inf.tu-dresden.de>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

d3e53971

RDMA/hns: Clean up unnecessary variable initialization · 617cf24f

由 Lang Cheng 提交于 6月 24, 2019

Here Clean up unnecessary initial value for some variable.
Signed-off-by: NLang Cheng <chenglang@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

617cf24f

RDMA/hns: Fixs hw access invalid dma memory error · ec5bc2cc

由 Xi Wang 提交于 6月 24, 2019

When smmu is enable, if execute the perftest command and then use 'kill
-9' to exit, follow this operation repeatedly, the kernel will have a high
probability to print the following smmu event:

  arm-smmu-v3 arm-smmu-v3.1.auto: event 0x10 received:
  arm-smmu-v3 arm-smmu-v3.1.auto:  0x00007d0000000010
  arm-smmu-v3 arm-smmu-v3.1.auto:  0x0000020900000080
  arm-smmu-v3 arm-smmu-v3.1.auto:  0x00000000f47cf000
  arm-smmu-v3 arm-smmu-v3.1.auto:  0x00000000f47cf000

This is because the hw will periodically refresh the qpc cache until the
next reset.

This patch fixed it by removing the action that release qpc memory in the
'hns_roce_qp_free' function.

Fixes: 9a443537 ("IB/hns: Add driver files for hns RoCE driver")
Signed-off-by: NXi Wang <wangxi11@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

ec5bc2cc

RDMA/hns: Use %pK format pointer print · fd7dd8bc

由 Lang Cheng 提交于 6月 24, 2019

The format specifier \"%p\" can leak kernel addresses.  Use \"%pK\"
instead.
Signed-off-by: NLang Cheng <chenglang@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

fd7dd8bc

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功