提交 · d757c60eca9b22f4d108929a24401e0fdecda0b1 · openeuler / Kernel

05 3月, 2019 3 次提交

IB/rdmavt: Fix concurrency panics in QP post_send and modify to error · d757c60e

由 Michael J. Ruhl 提交于 2月 26, 2019

The RC/UC code path can go through a software loopback. In this code path
the receive side QP is manipulated.

If two threads are working on the QP receive side (i.e. post_send, and
modify_qp to an error state), QP information can be corrupted.

(post_send via loopback)
  set r_sge
  loop
     update r_sge
(modify_qp)
     take r_lock
     update r_sge <---- r_sge is now incorrect
(post_send)
     update r_sge <---- crash, etc.
     ...

This can lead to one of the two following crashes:

 BUG: unable to handle kernel NULL pointer dereference at (null)
  IP:  hfi1_copy_sge+0xf1/0x2e0 [hfi1]
  PGD 8000001fe6a57067 PUD 1fd9e0c067 PMD 0
 Call Trace:
  ruc_loopback+0x49b/0xbc0 [hfi1]
  hfi1_do_send+0x38e/0x3e0 [hfi1]
  _hfi1_do_send+0x1e/0x20 [hfi1]
  process_one_work+0x17f/0x440
  worker_thread+0x126/0x3c0
  kthread+0xd1/0xe0
  ret_from_fork_nospec_begin+0x21/0x21

or:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
  IP:  rvt_clear_mr_refs+0x45/0x370 [rdmavt]
  PGD 80000006ae5eb067 PUD ef15d0067 PMD 0
 Call Trace:
  rvt_error_qp+0xaa/0x240 [rdmavt]
  rvt_modify_qp+0x47f/0xaa0 [rdmavt]
  ib_security_modify_qp+0x8f/0x400 [ib_core]
  ib_modify_qp_with_udata+0x44/0x70 [ib_core]
  modify_qp.isra.23+0x1eb/0x2b0 [ib_uverbs]
  ib_uverbs_modify_qp+0xaa/0xf0 [ib_uverbs]
  ib_uverbs_write+0x272/0x430 [ib_uverbs]
  vfs_write+0xc0/0x1f0
  SyS_write+0x7f/0xf0
  system_call_fastpath+0x1c/0x21

Fix by using the appropriate locking on the receiving QP.

Fixes: 15703461 ("IB/{hfi1, qib, rdmavt}: Move ruc_loopback to rdmavt")
Cc: <stable@vger.kernel.org> #v4.9+
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

d757c60e

IB/rdmavt: Fix loopback send with invalidate ordering · 38bbc9f0

由 Mike Marciniszyn 提交于 2月 26, 2019

The IBTA spec notes:

o9-5.2.1: For any HCA which supports SEND with Invalidate, upon receiving
an IETH, the Invalidate operation must not take place until after the
normal transport header validation checks have been successfully
completed.

The rdmavt loopback code does the validation after the invalidate.

Fix by relocating the operation specific logic for all SEND variants until
after the validity checks.

Cc: <stable@vger.kernel.org> #v4.20+
Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

38bbc9f0

IB/iser: Fix dma_nents type definition · c1545f1a

由 Max Gurtovoy 提交于 2月 26, 2019

The retured value from ib_dma_map_sg saved in dma_nents variable. To avoid
future mismatch between types, define dma_nents as an integer instead of
unsigned.

Fixes: 57b26497 ("IB/iser: Pass the correct number of entries for dma mapped SGL")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NIsrael Rukshin <israelr@mellanox.com>
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Acked-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

c1545f1a

04 3月, 2019 2 次提交

IB/mlx5: Set correct write permissions for implicit ODP MR · 7095ec3c

由 Moni Shoua 提交于 2月 25, 2019

The write access of an implicit MR is inherited to all of its children.
Therefore we must set the correct write access to the parent MR.

Pass full access_flags when creating umem to let it calculate write access
correctly.

Fixes: da6a496a ("IB/mlx5: Ranges in implicit ODP MR inherit its write access")
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

7095ec3c

bnxt_re: Clean cq for kernel consumers only · 0fca467e

由 Devesh Sharma 提交于 2月 25, 2019

Kernel space provider driver should clean the CQs belonging to kernel
space consumers only. The current implementation is doing reverse of it.

Fixing the same by avoiding the call to __clean_cq on a kernel qp during
destroy.

Fixes: c50866e2 ("bnxt_re: fix the regression due to changes in alloc_pbl")
Signed-off-by: NDevesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

0fca467e

26 2月, 2019 1 次提交

RDMA/uverbs: Don't do double free of allocated PD · bb618451

由 Leon Romanovsky 提交于 2月 13, 2019

There is no need to call kfree(pd) because ib_dealloc_pd() internally
frees PD.

Fixes: 21a428a0 ("RDMA: Handle PD allocations by IB/core")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

bb618451

23 2月, 2019 4 次提交

RDMA: Handle ucontext allocations by IB/core · a2a074ef

由 Leon Romanovsky 提交于 2月 12, 2019

Following the PD conversion patch, do the same for ucontext allocations.
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

a2a074ef

RDMA/core: Fix a WARN() message · afc1990e

由 Dan Carpenter 提交于 2月 22, 2019

The first parameter of WARN_ONCE() is a condition, then following
parameters are the message.  In this case, we left out the condition so it
will just print the ops->type string.

Fixes: 3856ec4b ("RDMA/core: Add RDMA_NLDEV_CMD_NEWLINK/DELLINK support")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NMajd Dibbiny <majd@mellanox.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

afc1990e

bnxt_re: fix the regression due to changes in alloc_pbl · c50866e2

由 Devesh Sharma 提交于 2月 22, 2019

While adding the use of for_each_sg_dma_page iterator for Brodcom's rdma
driver, there was a regression added in the __alloc_pbl path. The change
left bnxt_re in DOA state in for-next branch.

Fixing the regression to avoid the host crash when a user space object is
created. Restricting the unconditional access to hwq.pg_arr when hwq is
initialized for user space objects.

Fixes: 161ebe24 ("RDMA/bnxt_re: Use for_each_sg_dma_page iterator on umem SGL")
Reported-by: NGal Pressman <galpress@amazon.com>
Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: NDevesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

c50866e2

IB/mlx4: Increase the timeout for CM cache · 2612d723

由 Håkon Bugge 提交于 2月 17, 2019

Using CX-3 virtual functions, either from a bare-metal machine or
pass-through from a VM, MAD packets are proxied through the PF driver.

Since the VF drivers have separate name spaces for MAD Transaction Ids
(TIDs), the PF driver has to re-map the TIDs and keep the book keeping
in a cache.

Following the RDMA Connection Manager (CM) protocol, it is clear when
an entry has to evicted form the cache. But life is not perfect,
remote peers may die or be rebooted. Hence, it's a timeout to wipe out
a cache entry, when the PF driver assumes the remote peer has gone.

During workloads where a high number of QPs are destroyed concurrently,
excessive amount of CM DREQ retries has been observed

The problem can be demonstrated in a bare-metal environment, where two
nodes have instantiated 8 VFs each. This using dual ported HCAs, so we
have 16 vPorts per physical server.

64 processes are associated with each vPort and creates and destroys
one QP for each of the remote 64 processes. That is, 1024 QPs per
vPort, all in all 16K QPs. The QPs are created/destroyed using the
CM.

When tearing down these 16K QPs, excessive CM DREQ retries (and
duplicates) are observed. With some cat/paste/awk wizardry on the
infiniband_cm sysfs, we observe as sum of the 16 vPorts on one of the
nodes:

cm_rx_duplicates:
      dreq  2102
cm_rx_msgs:
      drep  1989
      dreq  6195
       rep  3968
       req  4224
       rtu  4224
cm_tx_msgs:
      drep  4093
      dreq 27568
       rep  4224
       req  3968
       rtu  3968
cm_tx_retries:
      dreq 23469

Note that the active/passive side is equally distributed between the
two nodes.

Enabling pr_debug in cm.c gives tons of:

[171778.814239] <mlx4_ib> mlx4_ib_multiplex_cm_handler: id{slave:
1,sl_cm_id: 0xd393089f} is NULL!

By increasing the CM_CLEANUP_CACHE_TIMEOUT from 5 to 30 seconds, the
tear-down phase of the application is reduced from approximately 90 to
50 seconds. Retries/duplicates are also significantly reduced:

cm_rx_duplicates:
      dreq  2460
[]
cm_tx_retries:
      dreq  3010
       req    47

Increasing the timeout further didn't help, as these duplicates and
retries stems from a too short CMA timeout, which was 20 (~4 seconds)
on the systems. By increasing the CMA timeout to 22 (~17 seconds), the
numbers fell down to about 10 for both of them.

Adjustment of the CMA timeout is not part of this commit.
Signed-off-by: NHåkon Bugge <haakon.bugge@oracle.com>
Acked-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

2612d723

22 2月, 2019 6 次提交

IB/core: Abort page fault handler silently during owning process exit · 4438ee3f

由 Moni Shoua 提交于 2月 17, 2019

It is possible that during a page fault handling, the process that owns
the MR is terminating. The indication for it is failure to get the
task_struct or take reference on the mm_struct. In this case just abort
the page-fault handler with error but without a warning to the kernel log.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

4438ee3f

IB/mlx5: Validate correct PD before prefetch MR · 81dd4c4b

由 Moni Shoua 提交于 2月 17, 2019

When prefetching odp mr it is required to verify that pd of the mr is
identical to the pd for which the advise_mr request arrived with.

This check was missing from synchronous flow and is added now.

Fixes: 813e90b1 ("IB/mlx5: Add advise_mr() support")
Reported-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

81dd4c4b

IB/mlx5: Protect against prefetch of invalid MR · a6bc3875

由 Moni Shoua 提交于 2月 17, 2019

When deferring a prefetch request we need to protect against MR or PD
being destroyed while the request is still enqueued.

The first step is to validate that PD owns the lkey that describes the MR
and that the MR that the lkey refers to is owned by that PD.

The second step is to dequeue all requests when MR is destroyed.

Since PD can't be destroyed while it owns MRs it is guaranteed that when a
worker wakes up the request it refers to is still valid.

Now, it is possible to refrain from taking a reference on the device since
it is assured to be present as pd.

While that, replace the dedicated ordered workqueue with the system
unbound workqueue to reuse an existing resource and improve
performance. This will also fix a bug of queueing to the wrong workqueue.

Fixes: 813e90b1 ("IB/mlx5: Add advise_mr() support")
Reported-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

a6bc3875

RDMA/uverbs: Store PR pointer before it is overwritten · 25fd08eb

由 Leon Romanovsky 提交于 2月 21, 2019

The IB_MR_REREG_PD command rewrites mr->pd after successful
rereg_user_mr(), such change causes to lost usecnt information and
produces the following warning:

 WARNING: CPU: 1 PID: 1771 at drivers/infiniband/core/verbs.c:336 ib_dealloc_pd+0x4e/0x60 [ib_core]
 CPU: 1 PID: 1771 Comm: rereg_mr Tainted: G        W  OE 5.0.0-rc7-for-upstream-perf-2019-02-20_14-03-40-34 #1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
 RIP: 0010:ib_dealloc_pd+0x4e/0x60 [ib_core]
 RSP: 0018:ffffc90003923dc0 EFLAGS: 00010286
 RAX: 00000000ffffffff RBX: ffff88821f7f0400 RCX: ffff888236a40c00
 RDX: ffff88821f7f0400 RSI: 0000000000000001 RDI: 0000000000000000
 RBP: 0000000000000001 R08: ffff88835f665d80 R09: ffff8882209c90d8
 R10: ffff88835ec003e0 R11: 0000000000000000 R12: ffff888221680ba0
 R13: ffff888221680b00 R14: 00000000ffffffea R15: ffff88821f53c318
 FS:  00007f70db11e740(0000) GS:ffff88835f640000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000001dfd030 CR3: 000000029d9d8000 CR4: 00000000000006e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  uverbs_free_pd+0x2d/0x30 [ib_uverbs]
  destroy_hw_idr_uobject+0x16/0x40 [ib_uverbs]
  uverbs_destroy_uobject+0x28/0x170 [ib_uverbs]
  __uverbs_cleanup_ufile+0x6b/0x90 [ib_uverbs]
  uverbs_destroy_ufile_hw+0x8b/0x110 [ib_uverbs]
  ib_uverbs_close+0x1f/0x80 [ib_uverbs]
  __fput+0xb1/0x220
  task_work_run+0x7f/0xa0
  exit_to_usermode_loop+0x6b/0xb2
  do_syscall_64+0xc5/0x100
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7f70dad00664

Fixes: e278173f ("RDMA/core: Cosmetic change - move member initialization to correct block")
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NMajd Dibbiny <majd@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

25fd08eb

IB/hfi1: Add missing break in switch statement · 7264235e

由 Gustavo A. R. Silva 提交于 2月 20, 2019

Fix the following warning by adding a missing break:

drivers/infiniband/hw/hfi1/tid_rdma.c: In function ‘hfi1_tid_rdma_wqe_interlock’:
drivers/infiniband/hw/hfi1/tid_rdma.c:3251:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
   switch (prev->wr.opcode) {
   ^~~~~~
drivers/infiniband/hw/hfi1/tid_rdma.c:3259:2: note: here
  case IB_WR_RDMA_READ:
  ^~~~

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.

Fixes: c6c23117 ("IB/hfi1: Add interlock between TID RDMA WRITE and other requests")
Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: NKaike Wan <Kaike.wan@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

7264235e

Merge branch 'mlx5-next' into rdma.git for-next · 815f7480

由 Jason Gunthorpe 提交于 2月 21, 2019

From
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

To resolve conflicts with net-next and pick up the first patch.

* branch 'mlx5-next':
  net/mlx5: Factor out HCA capabilities functions
  IB/mlx5: Add support for 50Gbps per lane link modes
  net/mlx5: Add support to ext_* fields introduced in Port Type and Speed register
  net/mlx5: Add new fields to Port Type and Speed register
  net/mlx5: Refactor queries to speed fields in Port Type and Speed register
  net/mlx5: E-Switch, Avoid magic numbers when initializing offloads mode
  net/mlx5: Relocate vport macros to the vport header file
  net/mlx5: E-Switch, Normalize the name of uplink vport number
  net/mlx5: Provide an alternative VF upper bound for ECPF
  net/mlx5: Add host params change event
  net/mlx5: Add query host params command
  net/mlx5: Update enable HCA dependency
  net/mlx5: Introduce Mellanox SmartNIC and modify page management logic
  IB/mlx5: Use unified register/load function for uplink and VF vports
  net/mlx5: Use consistent vport num argument type
  net/mlx5: Use void pointer as the type in address_of macro
  net/mlx5: Align ODP capability function with netdev coding style
  mlx5: use RCU lock in mlx5_eq_cq_get()
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

815f7480

21 2月, 2019 1 次提交

drivers/IB,qib: Fix pinned/locked limit check in qib_get_user_pages() · ec95e0fa

由 Davidlohr Bueso 提交于 2月 16, 2019

The current check does not take into account the previous value of
pinned_vm; thus it is quite bogus as is. Fix this by checking the
new value after the (optimistic) atomic inc.
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

ec95e0fa

20 2月, 2019 23 次提交

RDMA/core: Verify that memory window type is legal · d0e02bf6

由 Noa Osherovich 提交于 2月 19, 2019

Before calling the provider's alloc_mw function, verify that the
given memory type is either IB_MW_TYPE_1 or IB_MW_TYPE_2.
Signed-off-by: NNoa Osherovich <noaos@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

d0e02bf6

RDMA/iwcm: Fix string truncation error · 1882ab86

由 Leon Romanovsky 提交于 2月 19, 2019

The strlen() check at the beginning of iw_cm_map() ensures that devname
and ifname strings are less than destinations to which they are supposed
to be copied. Change strncpy() call to be strcpy(), because we are
protected from overflow. Zero the entire string buffer to avoid copying
uninitialized kernel stack memory to userspace.

This fixes the compilation warning below:

In file included from ./include/linux/dma-mapping.h:6,
                 from drivers/infiniband/core/iwcm.c:38:
In function _strncpy_,
    inlined from _iw_cm_map_ at drivers/infiniband/core/iwcm.c:519:2:
./include/linux/string.h:253:9: warning: ___builtin_strncpy_ specified
bound 32 equals destination size [-Wstringop-truncation]
  return __builtin_strncpy(p, q, size);
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: d53ec8af ("RDMA/iwcm: Don't copy past the end of dev_name() string")
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

1882ab86

RDMA/core: Cosmetic change - move member initialization to correct block · e278173f

由 Yuval Shaia 提交于 2月 18, 2019

old_pd is used only if IB_MR_REREG_PD flags is set.
For readability move it's initialization to where it is used.

While there rewrite the whole 'if-else' block so on error jump directly
to label and no need for 'else'
Signed-off-by: NYuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

e278173f

iw_cxgb4: Make function read_tcb() static · 3b8f8b95

由 Wei Yongjun 提交于 2月 18, 2019

Fixes the following sparse warning:

drivers/infiniband/hw/cxgb4/cm.c:658:6: warning:
 symbol 'read_tcb' was not declared. Should it be static?

Fixes: 11a27e21 ("iw_cxgb4: complete the cached SRQ buffers")
Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
Acked-by: NRaju Rangoju <rajur@chelsio.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

3b8f8b95

RDMA/hns: Bugfix for set hem of SCC · 6ac16e40

由 Yangyang Li 提交于 2月 16, 2019

The method of set hem for scc context is different from other contexts. It
should notify the hardware with the detailed idx in bt0 for scc, while for
other contexts, it only need to notify the bt step and the hardware will
calculate the idx.

Here fixes the following error when unloading the hip08 driver:

[  123.570768] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[  123.579023] {1}[Hardware Error]: event severity: recoverable
[  123.584670] {1}[Hardware Error]:  Error 0, type: recoverable
[  123.590317] {1}[Hardware Error]:   section_type: PCIe error
[  123.595877] {1}[Hardware Error]:   version: 4.0
[  123.600395] {1}[Hardware Error]:   command: 0x0006, status: 0x0010
[  123.606562] {1}[Hardware Error]:   device_id: 0000:7d:00.0
[  123.612034] {1}[Hardware Error]:   slot: 0
[  123.616120] {1}[Hardware Error]:   secondary_bus: 0x00
[  123.621245] {1}[Hardware Error]:   vendor_id: 0x19e5, device_id: 0xa222
[  123.627847] {1}[Hardware Error]:   class_code: 000002
[  123.632977] hns3 0000:7d:00.0: aer_status: 0x00000000, aer_mask: 0x00000000
[  123.639928] hns3 0000:7d:00.0: aer_layer=Transaction Layer, aer_agent=Receiver ID
[  123.647400] hns3 0000:7d:00.0: aer_uncor_severity: 0x00000000
[  123.653136] hns3 0000:7d:00.0: PCI error detected, state(=1)!!
[  123.658959] hns3 0000:7d:00.0: ROCEE uncorrected RAS error identified
[  123.665395] hns3 0000:7d:00.0: ROCEE RAS AXI rresp error
[  123.670713] hns3 0000:7d:00.0: requesting reset due to PCI error
[  123.676715] hns3 0000:7d:00.0: received reset event , reset type is 5
[  123.683147] hns3 0000:7d:00.0: AER: Device recovery successful
[  123.688978] hns3 0000:7d:00.0: PF Reset requested
[  123.693684] hns3 0000:7d:00.0: PF failed(=-5) to send mailbox message to VF
[  123.700633] hns3 0000:7d:00.0: inform reset to vf(1) failded -5!

Fixes: 6a157f7d ("RDMA/hns: Add SCC context allocation support for hip08")
Signed-off-by: NYangyang Li <liyangyang20@huawei.com>
Reviewed-by: NYixian Liu <liuyixian@huawei.com>
Reviewed-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

6ac16e40

RDMA/hns: Modify qp&cq&pd specification according to UM · 3e394f94

由 Lijun Ou 提交于 2月 16, 2019

Accroding to hip08's limitation, qp&cq specification is 1M, mtpt
specification 1M in kernel space.
Signed-off-by: NYangyang Li <liyangyang20@huawei.com>
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

3e394f94

lib/irq_poll: Support schedules in non-interrupt contexts · 4133b013

由 Steve Wise 提交于 2月 06, 2019

Do not assume irq_poll_sched() is called from an interrupt context only.
So use raise_softirq_irqoff() instead of __raise_softirq_irqoff() so it
will kick the ksoftirqd if the schedule is from a non-interrupt context.

This is required for RDMA drivers, like soft iwarp, that generate cq
completion notifications in a workqueue or kthread context. Without this
change, siw completion notifications to the ULP can take several hundred
usecs, depending on the system load.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

4133b013

rdma_rxe: Use netlink messages to add/delete links · 66920e1b

由 Steve Wise 提交于 2月 15, 2019

Add support for the RDMA_NLDEV_CMD_NEWLINK/DELLINK messages which allow
dynamically adding new RXE links.  Deprecate the old module options for
now.

Cc: Moni Shoua <monis@mellanox.com>
Reviewed-by: NYanjun Zhu <yanjun.zhu@oracle.com>
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

66920e1b

RDMA/core: Add RDMA_NLDEV_CMD_NEWLINK/DELLINK support · 3856ec4b

由 Steve Wise 提交于 2月 15, 2019

Add support for new LINK messages to allow adding and deleting rdma
interfaces.  This will be used initially for soft rdma drivers which
instantiate device instances dynamically by the admin specifying a netdev
device to use.  The rdma_rxe module will be the first user of these
messages.

The design is modeled after RTNL_NEWLINK/DELLINK: rdma drivers register
with the rdma core if they provide link add/delete functions.  Each driver
registers with a unique "type" string, that is used to dispatch messages
coming from user space.  A new RDMA_NLDEV_ATTR is defined for the "type"
string.  User mode will pass 3 attributes in a NEWLINK message:
RDMA_NLDEV_ATTR_DEV_NAME for the desired rdma device name to be created,
RDMA_NLDEV_ATTR_LINK_TYPE for the "type" of link being added, and
RDMA_NLDEV_ATTR_NDEV_NAME for the net_device interface to use for this
link.  The DELLINK message will contain the RDMA_NLDEV_ATTR_DEV_INDEX of
the device to delete.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

3856ec4b

IB/usnic: Fix deadlock · 5bb3c1e9

由 Parvi Kaustubhi 提交于 2月 09, 2019

There is a dead lock in usnic ib_register and netdev_notify path.

	usnic_ib_discover_pf()
	| mutex_lock(&usnic_ib_ibdev_list_lock);
	 | usnic_ib_device_add();
	  | ib_register_device()
	   | usnic_ib_query_port()
	    | mutex_lock(&us_ibdev->usdev_lock);
	     | ib_get_eth_speed()
	      | rtnl_lock()

order of lock: &usnic_ib_ibdev_list_lock -> usdev_lock -> rtnl_lock

	rtnl_lock()
	 | usnic_ib_netdevice_event()
	  | mutex_lock(&usnic_ib_ibdev_list_lock);

order of lock: rtnl_lock -> &usnic_ib_ibdev_list_lock

Solution is to use the core's lock-free ib_device_get_by_netdev() scheme
to lookup ib_dev while handling netdev & inet events.
Signed-off-by: NParvi Kaustubhi <pkaustub@cisco.com>
Reviewed-by: NGovindarajulu Varadarajan <gvaradar@cisco.com>
Reviewed-by: NTanmay Inamdar <tinamdar@cisco.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

5bb3c1e9

RDMA/rxe: Close a race after ib_register_device · ca22354b

由 Jason Gunthorpe 提交于 2月 12, 2019

Since rxe allows unregistration from other threads the rxe pointer can
become invalid any moment after ib_register_driver returns. This could
cause a user triggered use after free.

Add another driver callback to be called right after the device becomes
registered to complete any device setup required post-registration.  This
callback has enough core locking to prevent the device from becoming
unregistered.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

ca22354b

RDMA/rxe: Add ib_device_get_by_name() and use it in rxe · 6cc2c8e5

由 Jason Gunthorpe 提交于 2月 12, 2019

rxe has an open coded version of this that is not as safe as the core
version. This lets us eliminate the internal device list entirely from
rxe.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

6cc2c8e5

RDMA/rxe: Use driver_unregister and new unregistration API · c367074b

由 Jason Gunthorpe 提交于 1月 22, 2019

rxe does not have correct locking for its registration/unregistration
paths, use the core code to handle it instead. In this mode
ib_unregister_device will also do the dealloc, so rxe is required to do
clean up from a callback.

The core code ensures that unregistration is done only once, and generally
takes care of locking and concurrency problems for rxe.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

c367074b

RDMA/device: Provide APIs from the core code to help unregistration · d0899892

由 Jason Gunthorpe 提交于 2月 12, 2019

These APIs are intended to support drivers that exist outside the usual
driver core probe()/remove() callbacks. Normally the driver core will
prevent remove() from running concurrently with probe(), once this safety
is lost drivers need more support to get the locking and lifetimes right.

ib_unregister_driver() is intended to be used during module_exit of a
driver using these APIs. It unregisters all the associated ib_devices.

ib_unregister_device_and_put() is to be used by a driver-specific removal
function (ie removal by name, removal from a netdev notifier, removal from
netlink)

ib_unregister_queued() is to be used from netdev notifier chains where
RTNL is held.

The locking is tricky here since once things become async it is possible
to race unregister with registration. This is largely solved by relying on
the registration refcount, unregistration will only ever work on something
that has a positive registration refcount - and then an unregistration
mutex serializes all competing unregistrations of the same device.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

d0899892

RDMA/rxe: Use ib_device_get_by_netdev() instead of open coding · 4c173f59

由 Jason Gunthorpe 提交于 2月 12, 2019

The core API handles the locking correctly and is faster if there are
multiple devices.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

4c173f59

RDMA/device: Add ib_device_get_by_netdev() · 324e227e

由 Jason Gunthorpe 提交于 2月 12, 2019

Several drivers need to find the ib_device from a given netdev. rxe needs
this at speed in an unsleepable context, so choose to implement the
translation using a RCU safe hash table.

The hash table can have a many to one mapping. This is intended to support
some future case where multiple IB drivers (ie iWarp and RoCE) connect to
the same netdevs. driver_ids will need to be different to support this.

In the process this makes the struct ib_device and ib_port_data RCU safe
by deferring their kfrees.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

324e227e

RDMA/device: Add ib_device_set_netdev() as an alternative to get_netdev · c2261dd7

由 Jason Gunthorpe 提交于 2月 12, 2019

The associated netdev should not actually be very dynamic, so for most
drivers there is no reason for a callback like this. Provide an API to
inform the core code about the net dev affiliation and use a core
maintained data structure instead.

This allows the core code to be more aware of the ndev relationship which
will allow some new APIs based around this.

This also uses locking that makes some kind of sense, many drivers had a
confusing RCU lock, or missing locking which isn't right.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

c2261dd7

RDMA/cache: Move the cache per-port data into the main ib_port_data · 8faea9fd

由 Jason Gunthorpe 提交于 2月 12, 2019

Like the other cases there no real reason to have another array just for
the cache. This larger conversion gets its own patch.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

8faea9fd

RDMA/device: Consolidate ib_device per_port data into one place · 8ceb1357

由 Jason Gunthorpe 提交于 2月 12, 2019

There is no reason to have three allocations of per-port data. Combine
them together and make the lifetime for all the per-port data match the
struct ib_device.

Following patches will require more port-specific data, now there is a
good place to put it.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

8ceb1357

RDMA: Add and use rdma_for_each_port · ea1075ed

由 Jason Gunthorpe 提交于 2月 12, 2019

We have many loops iterating over all of the end port numbers on a struct
ib_device, simplify them with a for_each helper.
Reviewed-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

ea1075ed

RDMA/nldev: Don't expose number of not-visible entries · f2a0e45f

由 Leon Romanovsky 提交于 2月 18, 2019

Netlink dumpit handshake exchanges the index from which kernel should
start to return its value, in current code, this index included
not-visible in this PID items too and indirectly revealed the number of
entries.
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

f2a0e45f

RDMA/nldev: Connect QP number to .doit callback · 1b8b7788

由 Leon Romanovsky 提交于 2月 18, 2019

This patch adds ability to query specific QP based on its LQPN (local
QPN), which is assigned by HW and needs special treatment while inserting
into restrack DB.
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

1b8b7788

RDMA/nldev: Provide parent IDs for PD, MR and QP objects · c3d02788

由 Leon Romanovsky 提交于 2月 18, 2019

PD, MR and QP objects have parents objects: contexts and PDs.  The exposed
parent IDs allow to correlate various objects and simplify debug
investigation.
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

c3d02788

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功