提交 · e32d2d7144efca745f6d182ac9abd16ee6c7598e · openeuler / Kernel

14 11月, 2017 1 次提交

RDMA/bnxt_re: Remove unused vlan_tag variable · e32d2d71

由 Leon Romanovsky 提交于 10月 29, 2017

The Broadcom driver produces the following compilation warning

drivers/infiniband/hw/bnxt_re/ib_verbs.c:
	In function ‘bnxt_re_create_ah’:
drivers/infiniband/hw/bnxt_re/ib_verbs.c:668:6:
	warning: variable ‘vlan_tag’ set but not used [-Wunused-but-set-variable]
	u16 vlan_tag;

Let's remove it till vlan_tag will be implemented properly.

Cc: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e32d2d71

11 11月, 2017 22 次提交

IB/mlx5: Add PCI write end padding support · b1383aa6

由 Noa Osherovich 提交于 10月 29, 2017

Add the PCI write end padding flag to device_cap_flags enum and set it
during mlx5_ib_query_device so it will be reported to user-space.

During WQ/QP creation, set that capability for WQ/QP if user requested
it and HW supports it.

PCI write end padding modification is not supported for now. There's no
such flag for a QP but for a WQ, create and modify use the same flag.
Return an error if PCI write end padding flag is set during modify_wq.
Signed-off-by: NNoa Osherovich <noaos@mellanox.com>
Reviewed-by: NMajd Dibbiny <majd@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b1383aa6

IB/core: Add PCI write end padding flags for WQ and QP · e1d2e887

由 Noa Osherovich 提交于 10月 29, 2017

There are root complexes that are able to optimize their
performance when incoming data is multiple full cache lines.

PCI write end padding is the device's ability to pad the ending of
incoming packets (scatter) to full cache line such that the last
upstream write generated by an incoming packet will be a full cache
line.

Add a relevant entry to ib_device_cap_flags to report such capability
of an RDMA device.

Add the QP and WQ create flags:
 * A QP/WQ created with a scatter end padding flag will cause
   HW to pad the last upstream write generated by a packet to cache line.

User should consider several factors before activating this feature:
- In case of high CPU memory load (which may cause PCI back pressure in
  turn), if a large percent of the writes are partial cache line, this
  feature should be checked as an optional solution.
- This feature might reduce performance if most packets are between one
  and two cache lines and PCIe throughput has reached its maximum
  capacity. E.g. 65B packet from the network port will lead to 128B
  write on PCIe, which may cause traffic on PCIe to reach high
  throughput.
Signed-off-by: NNoa Osherovich <noaos@mellanox.com>
Reviewed-by: NMajd Dibbiny <majd@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e1d2e887

IB/rxe: don't crash, if allocation of crc algorithm failed · 3192c53e

由 Thomas Bogendoerfer 提交于 10月 31, 2017

Following crash happens, if crc algorithm couldn't be allocated:

[ 1087.989072] rdma_rxe: loaded
[ 1097.855397] PCLMULQDQ-NI instructions are not detected.
[ 1097.901220] rdma_rxe: failed to allocate crc algorithmi err:-2
[ 1097.901248] BUG: unable to handle kernel
[ 1097.901249] NULL pointer dereference
[ 1097.901250]  at 0000000000000046
[...]

Reason is that rxe->tfm is assigned the error return, which will then
be used for crypto_free_shash() in rxe_cleanup. Fix by using a
temporary variable and assigning it rxe->tfm after allocation succeeded.

Fixes: cee2688e ("IB/rxe: Offload CRC calculation when possible")
Signed-off-by: NThomas Bogendoerfer <tbogendoerfer@suse.de>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Acked-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

3192c53e

IB/core: Avoid crash on pkey enforcement failed in received MADs · 89548bca

由 Parav Pandit 提交于 10月 31, 2017

Below kernel crash is observed when Pkey security enforcement fails on
received MADs. This issue is reported in [1].

ib_free_recv_mad() accesses the rmpp_list, whose initialization is
needed before accessing it.
When security enformcent fails on received MADs, MAD processing avoided
due to security checks failed.

OpenSM[3770]: SM port is down
kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
kernel: IP: ib_free_recv_mad+0x44/0xa0 [ib_core]
kernel: PGD 0
kernel: P4D 0
kernel:
kernel: Oops: 0002 [#1] SMP
kernel: CPU: 0 PID: 2833 Comm: kworker/0:1H Tainted: P          IO    4.13.4-1-pve #1
kernel: Hardware name: Dell       XS23-TY3        /9CMP63, BIOS 1.71 09/17/2013
kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
kernel: task: ffffa069c6541600 task.stack: ffffb9a729054000
kernel: RIP: 0010:ib_free_recv_mad+0x44/0xa0 [ib_core]
kernel: RSP: 0018:ffffb9a729057d38 EFLAGS: 00010286
kernel: RAX: ffffa069cb138a48 RBX: ffffa069cb138a10 RCX: 0000000000000000
kernel: RDX: ffffb9a729057d38 RSI: 0000000000000000 RDI: ffffa069cb138a20
kernel: RBP: ffffb9a729057d60 R08: ffffa072d2d49800 R09: ffffa069cb138ae0
kernel: R10: ffffa069cb138ae0 R11: ffffa072b3994e00 R12: ffffb9a729057d38
kernel: R13: ffffa069d1c90000 R14: 0000000000000000 R15: ffffa069d1c90880
kernel: FS:  0000000000000000(0000) GS:ffffa069dba00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000008 CR3: 00000011f51f2000 CR4: 00000000000006f0
kernel: Call Trace:
kernel:  ib_mad_recv_done+0x5cc/0xb50 [ib_core]
kernel:  __ib_process_cq+0x5c/0xb0 [ib_core]
kernel:  ib_cq_poll_work+0x20/0x60 [ib_core]
kernel:  process_one_work+0x1e9/0x410
kernel:  worker_thread+0x4b/0x410
kernel:  kthread+0x109/0x140
kernel:  ? process_one_work+0x410/0x410
kernel:  ? kthread_create_on_node+0x70/0x70
kernel:  ? SyS_exit_group+0x14/0x20
kernel:  ret_from_fork+0x25/0x30
kernel: RIP: ib_free_recv_mad+0x44/0xa0 [ib_core] RSP: ffffb9a729057d38
kernel: CR2: 0000000000000008

[1] : https://www.spinics.net/lists/linux-rdma/msg56190.html

Fixes: 47a2b338 ("IB/core: Enforce security on management datagrams")
Cc: stable@vger.kernel.org # 4.13+
Signed-off-by: NParav Pandit <parav@mellanox.com>
Reported-by: NChris Blake <chrisrblake93@gmail.com>
Reviewed-by: NDaniel Jurgens <danielj@mellanox.com>
Reviewed-by: NHal Rosenstock <hal@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

89548bca

RDMA/cxgb4: Annotate r2 and stag as __be32 · 7d7d065a

由 Leon Romanovsky 提交于 10月 25, 2017

Chelsio cxgb4 HW is big-endian, hence there is need to properly
annotate r2 and stag fields as __be32 and not __u32 to fix the
following sparse warnings.

  drivers/infiniband/hw/cxgb4/qp.c:614:16:
    warning: incorrect type in assignment (different base types)
      expected unsigned int [unsigned] [usertype] r2
      got restricted __be32 [usertype] <noident>
  drivers/infiniband/hw/cxgb4/qp.c:615:18:
    warning: incorrect type in assignment (different base types)
      expected unsigned int [unsigned] [usertype] stag
      got restricted __be32 [usertype] <noident>

Cc: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

7d7d065a

IB/mlx4: Fix RSS's QPC attributes assignments · 108809a0

由 Guy Levi 提交于 10月 25, 2017

In the modify QP handler the base_qpn_udp field in the RSS QPC is
overwrite later by irrelevant value assignment. Hence, ingress packets
which gets to the RSS QP will be steered then to a garbage QPN.

The patch fixes this by skipping the above assignment when a RSS QP is
modified, also, the RSS context's attributes assignments are relocated
just before the context is posted to avoid future issues like this.

Additionally, this patch takes the opportunity to change the code to be
disciplined to the device's manual and assigns the RSS QP context just at
RESET to INIT transition.

Fixes:3078f5f1 ("IB/mlx4: Add support for RSS QP")
Signed-off-by: NGuy Levi <guyle@mellanox.com>
Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

108809a0

IB/mlx4: Add report for RSS capabilities by vendor channel · 09d208b2

由 Guy Levi 提交于 10月 25, 2017

The mlx4's RSS patches submission missed a report of RSS capabilities
which should be reported by the vendor channel in query_device.
Signed-off-by: NGuy Levi <guyle@mellanox.com>
Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

09d208b2

RDMA/umem: Avoid partial declaration of non-static function · fec99ede

由 Leon Romanovsky 提交于 10月 25, 2017

The RDMA/umem uses generic RB-trees macros to generate various ib_umem
access functions. The generation is performed with INTERVAL_TREE_DEFINE
macro, which allows one of two modes: declare all functions as static or
declare none of the function to be static.

The second mode of operation produces the following sparse errors:
 drivers/infiniband/core/umem_rbtree.c:69:1:
	warning: symbol 'rbt_ib_umem_iter_first' was not declared.
	Should it be static?
 drivers/infiniband/core/umem_rbtree.c:69:1:
	warning: symbol 'rbt_ib_umem_iter_next' was not declared.
	Should it be static?

Code relocation together with declaration of such functions to be
"static" solves the issue.

Because there is no need to have separate file for two functions,
let's consolidate umem_rtree.c and umem_odp.c into one file.
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

fec99ede

RDMA/hns: Modify the usage of cmd_sn in hip08 · 26beb85f

由 oulijun 提交于 11月 10, 2017

The cmd_sn field of CQ doorbell inits for 0. It should be
increment on each first db rung after a completion Event.
if the cmd_sn of notify doorbell Adjacent two times is the
same, the hardware will distinguish it for the same notify
request and update its type according to the priority level
of next event and solicited event.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

26beb85f

RDMA/hns: Unify the calculation for hem index in hip08 · 0203b14c

由 oulijun 提交于 11月 10, 2017

The calculation of hem index are different between hns_roce_table_get
and hns_roce_table_find. When the table chunk size of TRRL is not
divisible by object size, it will faile to find the trrl table.

This patch is to update the calculation of the hem index in the
hns_roce_table_find to the same as which in the hns_roce_table_get.
Signed-off-by: NShaobo Xu <xushaobo2@huawei.com>
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0203b14c

RDMA/hns: Set the owner field of SQWQE in hip08 RoCE · e8d18533

由 oulijun 提交于 11月 10, 2017

the owner need to be set when posting sqwqe in hip08 RoCE.
The owner be used according to the below algorithm:
The value of owner should be 1 in the first lap, it
should be 0 in the second lap and in turn.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NWei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: NShaobo Xu <xushaobo2@huawei.com>
Signed-off-by: NYixian Liu <liuyixian@huawei.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e8d18533

RDMA/hns: Add sq_invld_flg field in QP context · b5fddb7c

由 oulijun 提交于 11月 10, 2017

In hip08 RoCE, it need to add the sq_invld_flg field
in QP context for RoCE hardware.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NWei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: NShaobo Xu <xushaobo2@huawei.com>
Signed-off-by: NYixian Liu <liuyixian@huawei.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b5fddb7c

RDMA/hns: Update the usage of ack timeout in hip08 · 28726461

由 oulijun 提交于 11月 10, 2017

The ack timeout's value in qp context shall be a 5-bit value
and be assgined by users. When at of qpc is set zero, the
timer is disabled.

When attr_mask set for IB_QP_TIMEOUT, The ack timeout field
is effective.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NWei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: NShaobo Xu <xushaobo2@huawei.com>
Signed-off-by: NYixian Liu <liuyixian@huawei.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

28726461

RDMA/hns: Set sq_cur_sge_blk_addr field in QPC in hip08 · befb63b4

由 oulijun 提交于 11月 10, 2017

If the extend sges exist, the sq_cur_sge_blk_addr field in QPC
(qp context) should be configured.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NShaobo Xu <xushaobo2@huawei.com>
Signed-off-by: NWei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: NYixian Liu <liuyixian@huawei.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

befb63b4

RDMA/hns: Enable the cqe field of sqwqe of RC · a49d761f

由 oulijun 提交于 11月 10, 2017

When sig_type of qpc is non-selectable, all sq's wqes will produce
cqe and not depend on the cqe attribute of wqe. When sig_type of
qpc is selectable, The cqe attribute of wqe will decide whether to
produce the cqe.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NWei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: NShaobo Xu <xushaobo2@huawei.com>
Signed-off-by: NYixian Liu <liuyixian@huawei.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a49d761f

RDMA/hns: Set se attribute of sqwqe in hip08 · 492b2bd0

由 oulijun 提交于 11月 10, 2017

When send flags is IB_SEND_SOLICITED, the se(solicated event)
field of sqwqe will be set.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NWei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: NShaobo Xu <xushaobo2@huawei.com>
Signed-off-by: NYixian Liu <liuyixian@huawei.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

492b2bd0

RDMA/hns: Configure fence attribute in hip08 RoCE · 651487c2

由 oulijun 提交于 11月 10, 2017

When post wr for mixed rdma operation, we need to use fence
mechanism to keep the correct execute order.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NWei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: NShaobo Xu <xushaobo2@huawei.com>
Signed-off-by: NYixian Liu <liuyixian@huawei.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

651487c2

RDMA/hns: Configure TRRL field in hip08 RoCE device · e92f2c18

由 oulijun 提交于 11月 10, 2017

The TRRL(Target RDMA Read/aTOMIC List) record the information
of receiving RDMA READ or ATOMIC operation in hip08. It will
be used the hardware. The driver need to assign a continuous
physical address for trrl_ba field of qp context.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NWei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: NShaobo Xu <xushaobo2@huawei.com>
Signed-off-by: NYixian Liu <liuyixian@huawei.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e92f2c18

RDMA/hns: Update calculation of irrl_ba field for hip08 · d5514246

由 oulijun 提交于 11月 10, 2017

The irrl(initiator RDMA Read/Atomic list) base address of qp
context is assigned for addr[63:6]. This patch mainly fixed
it.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NWei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: NShaobo Xu <xushaobo2@huawei.com>
Signed-off-by: NYixian Liu <liuyixian@huawei.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d5514246

RDMA/hns: Configure sgid type for hip08 RoCE · b5ff0f61

由 Wei Hu(Xavier) 提交于 10月 26, 2017

The hardware vendors need to generate RoCEv1 or RoCEv2
packet according to the sgid type configured.

Besides, update the gid table size for hip08 RoCE
device.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NWei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: NShaobo Xu <xushaobo2@huawei.com>
Signed-off-by: NYixian Liu <liuyixian@huawei.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b5ff0f61

RDMA/hns: Generate gid type of RoCEv2 · 023c1477

由 Wei Hu(Xavier) 提交于 10月 26, 2017

HNS_ROCE_CAP_FALG_ROCE_V1_V2 is added for selecting capability of
RoCE in hns driver. When HNS_ROCE_CAP_FALG_ROCE_V1_V2 is set,
driver will inform ib core that the related hns device can support
RoCEv2, and ib core can generate the gid of the related type.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NWei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: NShaobo Xu <xushaobo2@huawei.com>
Signed-off-by: NYixian Liu <liuyixian@huawei.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

023c1477

RDMA/hns: Add rereg mr support for hip08 · a2c80b7b

由 Wei Hu(Xavier) 提交于 10月 26, 2017

This patch adds rereg mr support for hip08.
Signed-off-by: NShaobo Xu <xushaobo2@huawei.com>
Signed-off-by: NWei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NYixian Liu <liuyixian@huawei.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a2c80b7b

02 11月, 2017 1 次提交
- D
  Merge branch 'k.o/for-rc' into k.o/for-next · 5c08681b
  由 Doug Ledford 提交于 11月 01, 2017
```
Pick up the missing netlink oops fix
Signed-off-by: NDoug Ledford <dledford@redhat.com>
```
  5c08681b
31 10月, 2017 7 次提交

IB/hfi1: Take advantage of kvzalloc_node in sdma initialization · 31acd18b

由 Mike Marciniszyn 提交于 10月 23, 2017

The code that allocates the tx ring in the sdma code fails to take
advantage of kvzalloc variations.

Fix by converting to use kvzalloc_node.
Reported-by: NLeon Romanovsky <leon@kernel.org>
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

31acd18b

IB/hfi1: Don't modify num_user_contexts module parameter · 45a041cc

由 Kamenee Arumugam 提交于 10月 23, 2017

The driver parameter num_user_contexts controls global behavior and
should not be modified by the driver.
This patch eliminates modification of num_user_contexts by using a
local variable to keep track of the value.
Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NKamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

45a041cc

IB/hfi1: Insure int mask for in-kernel receive contexts is clear · 2d9544aa

由 Mike Marciniszyn 提交于 10月 23, 2017

The only use for the urg interrupt is for priority PSM packets.

There is no reason for this interrupt to be enabled for kernel
contexts.
Reviewed-by: NKaike Wan <kaike.wan@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

2d9544aa

IB/hfi1: Add tx_opcode_stats like the opcode_stats · 1b311f89

由 Mike Marciniszyn 提交于 10月 23, 2017

This patch adds tx_opcode_stats to parallel the
(rx)opcode_stats in the debugfs.
Reviewed-by: NKaike Wan <kaike.wan@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

1b311f89

IB/hfi1: Validate PKEY for incoming GSI MAD packets · 406310c6

由 Sebastian Sanchez 提交于 10月 25, 2017

These are the use-cases where the pkey needs to be tested to see
if a packet needs to be dropped.

a) Check if pkey is not FULL_MGMT_P_KEY or LIM_MGMT_P_KEY,
   drop the packet as it's not part of the management partition.
   Self-originated packets are an exception.

b) If pkey index points to FULL_MGMT_P_KEY and LIM_MGMT_P_KEY is
   in the table, the packet is coming from a management node,
   and the receiving node is also a management node, so it is safe
   for the packet to go through.

c) If pkey index points to FULL_MGMT_P_KEY and LIM_MGMT_P_KEY is
   NOT in the table, drop the packet as LIM_MGMT_P_KEY should
   always be in the pkey table. It could be a misconfiguration.

d) If pkey index points to LIM_MGMT_P_KEY and FULL_MGMT_P_KEY is
   NOT in the table, it is safe for the packet to go through
   since a non-management node is talking to another non-managment
   node.

e) If pkey index points to LIM_MGMT_P_KEY and FULL_MGMT_P_KEY is in
   the table, drop the packet because a non-management node is
   talking to a management node, and it could be an attack.

For the implementation, these rules can be simplied to only checking
for (a) and (e). There's no need to check for rule (b) as
the packet doesn't need to be dropped. Rule (c) is not possible in
the driver as LIM_MGMT_P_KEY is always in the pkey table.
Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

406310c6

Ib/hfi1: Return actual operational VLs in port info query · 00f92031

由 Patel Jay P 提交于 10月 23, 2017

__subn_get_opa_portinfo stores value returned by hfi1_get_ib_cfg() as
operational vls. hfi1_get_ib_cfg() returns vls_operational field in
hfi1_pportdata. The problem with this is that the value is always equal
to vls_supported field in hfi1_pportdata.

The logic to calculate operational_vls is to set value passed by FM
(in  __subn_set_opa_portinfo routine). If no value is passed then
default value is stored in operational_vls.

Field actual_vls_operational is calculated on the basis of buffer
control table. Hence, modifying hfi1_get_ib_cfg() to return
actual_operational_vls when used with HFI1_IB_CFG_OP_VLS parameter
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NPatel Jay P <jay.p.patel@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

00f92031

IB/hfi1: Race condition between user notification and driver state · 4061f3a4

由 Michael J. Ruhl 提交于 10月 23, 2017

The handler for link init state (HLS_UP_INIT) notifies userspace
(update_statusp()) before enabling the device
(RCV_CTRL_RCV_PORT_ENABLE_SMASK) or setting the device state
(ppd->host_link_state).  This causes a race condition where the
userspace thinks the interface is in the INIT state before the driver
has set that state.

Rework the code path to eliminate the race.

Delay setting the init state until after a HW settling period.
Reviewed-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4061f3a4

26 10月, 2017 9 次提交

bnxt_re: Implement the shutdown hook of the L2-RoCE driver interface · 5455e73a

由 Somnath Kotur 提交于 10月 25, 2017

When host is shutting down, it invokes the shutdown hook of the
L2 driver where it would attempt to free the MSI-X vectors, but would fail
because some vectors are held by the RoCE driver.
Implement the new hook in the L2 -> RoCE interface which will be invoked so that
the RoCE driver can unregister the device and free up the MSI-X vectors it had
claimed so that L2 can proceed with it's shutdown without failure.
Signed-off-by: NSomnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5455e73a

RDMA/cxgb4: Declare stag as __be32 · 35fb2a88

由 Leon Romanovsky 提交于 10月 25, 2017

The scqe.stag is actually __b32, fix it.

  drivers/infiniband/hw/cxgb4/cq.c:754:52: warning: cast to restricted __be32

Cc: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

35fb2a88

IB/rxe: Convert timers to use timer_setup() · 3bfbea74

由 Kees Cook 提交于 10月 24, 2017

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Moni Shoua <monis@mellanox.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

3bfbea74

RDMA/netlink: OOPs in rdma_nl_rcv_msg() from misinterpreted flag · b4d91aeb

由 Michael J. Ruhl 提交于 10月 24, 2017

rdma_nl_rcv_msg() checks to see if it should use the .dump() callback
or the .doit() callback.  The check is done with this check:

if (flags & NLM_F_DUMP) ...

The NLM_F_DUMP flag is two bits (NLM_F_ROOT | NLM_F_MATCH).

When an RDMA_NL_LS message (response) is received, the bit used for
indicating an error is the same bit as NLM_F_ROOT.

NLM_F_ROOT == (0x100) == RDMA_NL_LS_F_ERR.

ibacm sends a response with the RDMA_NL_LS_F_ERR bit set if an error
occurs in the service.  The current code then misinterprets the
NLM_F_DUMP bit and trys to call the .dump() callback.

If the .dump() callback for the specified request is not available
(which is true for the RDMA_NL_LS messages) the following Oops occurs:

[ 4555.960256] BUG: unable to handle kernel NULL pointer dereference at
   (null)
[ 4555.969046] IP:           (null)
[ 4555.972664] PGD 10543f1067 P4D 10543f1067 PUD 1033f93067 PMD 0
[ 4555.979287] Oops: 0010 [#1] SMP
[ 4555.982809] Modules linked in: rpcrdma ib_isert iscsi_target_mod
target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm
ib_uverbs ib_umad rdma_cm ib_cm iw_cm dm_mirror dm_region_hash dm_log dm_mod
dax sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd
glue_helper cryptd hfi1 rdmavt iTCO_wdt iTCO_vendor_support ib_core mei_me
lpc_ich pcspkr mei ioatdma sg shpchp i2c_i801 mfd_core wmi ipmi_si ipmi_devintf
ipmi_msghandler acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl lockd grace
sunrpc ip_tables ext4 mbcache jbd2 sd_mod mgag200 drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops ttm igb ahci crc32c_intel ptp libahci
pps_core drm dca libata i2c_algo_bit i2c_core
[ 4556.061190] CPU: 54 PID: 9841 Comm: ibacm Tainted: G          I
4.14.0-rc2+ #6
[ 4556.069667] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS
SE5C610.86B.01.01.0008.021120151325 02/11/2015
[ 4556.081339] task: ffff880855f42d00 task.stack: ffffc900246b4000
[ 4556.087967] RIP: 0010:          (null)
[ 4556.092166] RSP: 0018:ffffc900246b7bc8 EFLAGS: 00010246
[ 4556.098018] RAX: ffffffff81dbe9e0 RBX: ffff881058bb1000 RCX:
0000000000000000
[ 4556.105997] RDX: 0000000000001100 RSI: ffff881058bb1320 RDI:
ffff881056362000
[ 4556.113984] RBP: ffffc900246b7bf8 R08: 0000000000000ec0 R09:
0000000000001100
[ 4556.121971] R10: ffff8810573a5000 R11: 0000000000000000 R12:
ffff881056362000
[ 4556.129957] R13: 0000000000000ec0 R14: ffff881058bb1320 R15:
0000000000000ec0
[ 4556.137945] FS:  00007fe0ba5a38c0(0000) GS:ffff88105f080000(0000)
knlGS:0000000000000000
[ 4556.147000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4556.153433] CR2: 0000000000000000 CR3: 0000001056f5d003 CR4:
00000000001606e0
[ 4556.161419] Call Trace:
[ 4556.164167]  ? netlink_dump+0x12c/0x290
[ 4556.168468]  __netlink_dump_start+0x186/0x1f0
[ 4556.173357]  rdma_nl_rcv_msg+0x193/0x1b0 [ib_core]
[ 4556.178724]  rdma_nl_rcv+0xdc/0x130 [ib_core]
[ 4556.183604]  netlink_unicast+0x181/0x240
[ 4556.187998]  netlink_sendmsg+0x2c2/0x3b0
[ 4556.192392]  sock_sendmsg+0x38/0x50
[ 4556.196299]  SYSC_sendto+0x102/0x190
[ 4556.200308]  ? __audit_syscall_entry+0xaf/0x100
[ 4556.205387]  ? syscall_trace_enter+0x1d0/0x2b0
[ 4556.210366]  ? __audit_syscall_exit+0x209/0x290
[ 4556.215442]  SyS_sendto+0xe/0x10
[ 4556.219060]  do_syscall_64+0x67/0x1b0
[ 4556.223165]  entry_SYSCALL64_slow_path+0x25/0x25
[ 4556.228328] RIP: 0033:0x7fe0b9db2a63
[ 4556.232333] RSP: 002b:00007ffc55edc260 EFLAGS: 00000293 ORIG_RAX:
000000000000002c
[ 4556.240808] RAX: ffffffffffffffda RBX: 0000000000000010 RCX:
00007fe0b9db2a63
[ 4556.248796] RDX: 0000000000000010 RSI: 00007ffc55edc280 RDI:
000000000000000d
[ 4556.256782] RBP: 00007ffc55edc670 R08: 00007ffc55edc270 R09:
000000000000000c
[ 4556.265321] R10: 0000000000000000 R11: 0000000000000293 R12:
00007ffc55edc280
[ 4556.273846] R13: 000000000260b400 R14: 000000000000000d R15:
0000000000000001
[ 4556.282368] Code:  Bad RIP value.
[ 4556.286629] RIP:           (null) RSP: ffffc900246b7bc8
[ 4556.293013] CR2: 0000000000000000
[ 4556.297292] ---[ end trace 8d67abcfd10ec209 ]---
[ 4556.305465] Kernel panic - not syncing: Fatal exception
[ 4556.313786] Kernel Offset: disabled
[ 4556.321563] ---[ end Kernel panic - not syncing: Fatal exception
[ 4556.328960] ------------[ cut here ]------------

Special case RDMA_NL_LS response messages to call the appropriate
callback.

Additionally, make sure that the .dump() callback is not NULL
before calling it.

Fixes: 647c75ac ("RDMA/netlink: Convert LS to doit callback")
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: NKaike Wan <kaike.wan@intel.com>
Reviewed-by: NAlex Estrin <alex.estrin@intel.com>
Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: NShiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b4d91aeb

IB/cm: Fix memory corruption in handling CM request · 5a3dc323

由 Parav Pandit 提交于 10月 19, 2017

In recent code, two path record entries are alwasy cleared while
allocated could be either one or two path record entries.
This leads to zero out of unallocated memory.

This fix initializes alternative path record only when alternative path
is set.

While we are at it, path record allocation doesn't check for OPA
alternative path, but rest of the code checks for OPA alternative path.
Path record allocation code doesn't check for OPA alternative LID.
This can further lead to memory corruption when only one path record is
allocated, but there is actually alternative OPA path record present in CM
request.

Cc: <stable@vger.kernel.org> # v4.12+
Fixes: 9fdca4da ("IB/SA: Split struct sa_path_rec based on IB and ROCE specific fields")
Signed-off-by: NParav Pandit <parav@mellanox.com>
Reviewed-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5a3dc323

IB/mlx5: Add support for RSS on the inner packet · 309fa347

由 Maor Gottlieb 提交于 10月 19, 2017

Some user space application would like to do RSS on the inner
packet fields instead on the outer.
When MLX5_RX_HASH_INNER is set with one or more of the other
hash fields, then the RSS will be done using the inner packet.
Signed-off-by: NMaor Gottlieb <maorg@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

309fa347

IB/mlx5: Add tunneling offloads support · f95ef6cb

由 Maor Gottlieb 提交于 10月 19, 2017

The device can support receive Stateless Offloads for the inner
packet's fields only when the packet is processed by TIR which is
enabled to support tunneling. Otherwise, the device treats the
packet as an ordinary non-tunneling packet and receive offloads
can be done only for the outer packet's field.
In order to enable receive Stateless Offloading support for incoming
tunneling traffic the TIR should be created with tunneled_offload_en.
Tunneling offloads is supported only be raw ethernet QP.

This patch includes:
* New QP creation flag for tunneling offloads.
* Reports device capabilities.
Signed-off-by: NMaor Gottlieb <maorg@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

f95ef6cb

IB/mlx5: Update tunnel offloads bits · 4d350f1f

由 Maor Gottlieb 提交于 10月 19, 2017

This patch updates the mlx5_ifc with the following:
- Fix tunnel_stateless_gre typo.
- max_geneve_opt_len - Maximum geneve options length.
- tunnel_stateless_geneve_rx - If set, receive Stateless
  Offloads for Geneve tunneled (inner) packets are supported.
Signed-off-by: NMaor Gottlieb <maorg@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4d350f1f

IB/mlx5: Support padded 128B CQE feature · 7a0c8f42

由 Guy Levi 提交于 10月 19, 2017

In some benchmarks and some CPU architectures, writing the CQE on a full
cache line size improves performance by saving memory access operations
(read-modify-write) relative to partial cache line change. This patch
lets the user to configure the device to pad the CQE up to 128B in case
its content is less than 128B. Currently the driver supports only padding
for a CQE size of 128B.
Signed-off-by: NGuy Levi <guyle@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

7a0c8f42

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功