提交 · 9003ac2aa412104cc4f8124db49c6bf89a3eec7b · openeuler / Kernel

27 4月, 2023 6 次提交

RDMA/hns: Remove the struct member 'bond_grp' from hns_roce_dev · 9003ac2a

由 Junxian Huang 提交于 4月 27, 2023

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6Z4E9

---------------------------------------------------------------

Currently, bond_grp is attached to only one hr_dev corresponding to a
slave in RoCE bonding, which is called main_hr_dev in the driver. When
a non-main_hr_dev try to obtain its bond_grp, the driver has to find
the main_hr_dev, and then obtain the bond_grp, which leads to a
complicated code.

Applying this patch, bond_grp is removed from struct hns_roce_dev. hr_dev
can obtain bond_grp by XArray, where die_info and bond_grp are stored
according to bus number. With this change, hr_dev can get its bond_grp
directly without depending on main_hr_dev, and the code logic can be
simplified.
Signed-off-by: NJunxian Huang <huangjunxian6@hisilicon.com>

9003ac2a

RDMA/hns: Initial value assignment cleanup for RoCE Bonding variables · 96064446

由 Junxian Huang 提交于 4月 27, 2023

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6Z4E9

---------------------------------------------------------------

This patch assigns initial value when variable is defined in
HNS RoCE Bonding driver, instead of doing so on a new line.
Signed-off-by: NJunxian Huang <huangjunxian6@hisilicon.com>

96064446

RDMA/hns: Delete a useless assignment to bond_state · 41adb38e

由 Junxian Huang 提交于 4月 27, 2023

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6Z4E9

---------------------------------------------------------------

In hns_roce_slave_dec(), bond_state will be changed to
HNS_ROCE_BOND_REGISTERING right before the current main_hr_dev
is being removed from bond group. When the slave decrease
operation is over, bond_state will be changed to
HNS_ROCE_BOND_IS_BONDED in the end of this function. So the
assignment to bond_state in the beginning of the function is
useless and should be deleted.
Signed-off-by: NJunxian Huang <huangjunxian6@hisilicon.com>

41adb38e

RDMA/hns: Apply XArray for Bond ID allocation · 82ee5d30

由 Junxian Huang 提交于 4月 27, 2023

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6Z4E9

---------------------------------------------------------------

This patch provides the ability to map an integer ID to a bond
group pointer by:
	1. adding a new struct hns_roce_die_info to store the
	   pointers and IDs of bond groups on a specific I/O die.
	2. applying XArray to map the bus number to the die info struct.
Signed-off-by: NJunxian Huang <huangjunxian6@hisilicon.com>

82ee5d30

RDMA/hns: Move bond_work from hns_roce_dev to hns_roce_bond_group · 8aeaa671

由 Junxian Huang 提交于 4月 27, 2023

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6Z4E9

---------------------------------------------------------------

Currently, bond_work, the struct of delayed work for RoCE bonding,
is attached to hns_roce_dev. During setting bond, hns_roce_dev
will be uninited and the pending works will be canceled.

This patch moves bond_work from hns_roce_dev to hns_roce_bond_group so
that the pending works can be executed after setting bond rather than
being cancelled.
Signed-off-by: NJunxian Huang <huangjunxian6@hisilicon.com>

8aeaa671

RDMA/hns: Support getting xrcd num from firmware · bbfeb5d8

由 Luoyouming 提交于 4月 24, 2023

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I6WAZI

---------------------------------------------------------------

Support driver gets the num_xrcds and reserved_xrcds from firmware.
Signed-off-by: NLuoyouming <luoyouming@huawei.com>
Signed-off-by: NChengchang Tang <tangchengchang@huawei.com>

bbfeb5d8

17 4月, 2023 1 次提交

RDMA/hns: Add SVE DIRECT WQE flag to support libhns · 2c61dcac

由 Yixing Liu 提交于 4月 17, 2023

driver inclusion
category: bugfix
bugzilla: https://gitee.com/src-openeuler/rdma-core/issues/I6VLLM

---------------------------------------------------------------

Added SVE DWQE flag to control libhns SVE DWQE function.
Signed-off-by: NYixing Liu <liuyixing1@huawei.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>

2c61dcac

13 4月, 2023 2 次提交

RDMA/hns: Support congestion control algorithm configuration at QP granularity · 09f1b7cb

由 Yixing Liu 提交于 4月 12, 2023

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I6N1G4

---------------------------------------------------------------

This patch supports to configure congestion control algorithm
based on QP granulariy. The configuration will be sent to
driver from user space. And then driver configures the selected
algorithm into QPC.

The current XRC type QP cannot deliver the configured
algorithm to kernel space, so the driver will set the default
algorithm for XRC type QP. And the default algorithm type is
controlled by the firmware.
Signed-off-by: NYixing Liu <liuyixing1@huawei.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>

09f1b7cb

RDMA/hns: Modify congestion abbreviation · 87d0ab38

由 Yixing Liu 提交于 4月 12, 2023

driver inclusion
category: cleanup
bugzilla: https://gitee.com/openeuler/kernel/issues/I6N1G4

---------------------------------------------------------------

The currently used abbreviation of cong cannot clearly
indicate the meaning, so the full name congest is used instead.
Signed-off-by: NYixing Liu <liuyixing1@huawei.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>

87d0ab38

31 3月, 2023 3 次提交

RDMA/hns: Fix error code of CMD · be89d155

由 Chengchang Tang 提交于 11月 26, 2022

mainline inclusion
from mainline-v6.2-rc1
commit 667d6164
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6RO6S
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=667d6164b84884c64de3fc18670cd5a98b0b10cf

----------------------------------------------------------------------

The error code is fixed to EIO when CMD fails to excute. This patch
converts the error status reported by firmware to linux errno.

Fixes: a04ff739 ("RDMA/hns: Add command queue support for hip08 RoCE driver")
Link: https://lore.kernel.org/r/20221126102911.2921820-6-xuhaoyue1@hisilicon.comSigned-off-by: NChengchang Tang <tangchengchang@huawei.com>
Signed-off-by: NHaoyue Xu <xuhaoyue1@hisilicon.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
Signed-off-by: NZhou Juan <nnuzj07170227@163.com>

be89d155

RDMA/hns: fix memory leak in hns_roce_alloc_mr() · 064d39fb

由 Zhengchao Shao 提交于 11月 19, 2022

mainline inclusion
from mainline-v6.2-rc1
commit a115aa00
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6RP11
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a115aa00b18f7b8982b8f458149632caf64a862a

----------------------------------------------------------------------

When hns_roce_mr_enable() failed in hns_roce_alloc_mr(), mr_key is not
released. Compiled test only.

Fixes: 9b2cf76c ("RDMA/hns: Optimize PBL buffer allocation process")
Signed-off-by: NZhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20221119070834.48502-1-shaozhengchao@huawei.comSigned-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NZhou Juan <nnuzj07170227@163.com>

064d39fb

RDMA/hns: Disable local invalidate operation · 17b6c197

由 Yangyang Li 提交于 10月 24, 2022

mainline inclusion
from mainline-v6.1-rc4
commit 9e272ed6
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6ROBG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9e272ed69ad6f6952fafd0599d6993575512408e

----------------------------------------------------------------------

When function reset and local invalidate are mixed, HNS RoCEE may hang.
Before introducing the cause of the problem, two hardware internal
concepts need to be introduced:

    1. Execution queue: The queue of hardware execution instructions,
    function reset and local invalidate are queued for execution in this
    queue.

    2.Local queue: A queue that stores local operation instructions. The
    instructions in the local queue will be sent to the execution queue
    for execution. The instructions in the local queue will not be removed
    until the execution is completed.

The reason for the problem is as follows:

    1. There is a function reset instruction in the execution queue, which
    is currently being executed. A necessary condition for the successful
    execution of function reset is: the hardware pipeline needs to empty
    the instructions that were not completed before;

    2. A local invalidate instruction at the head of the local queue is
    sent to the execution queue. Now there are two instructions in the
    execution queue, the first is the function reset instruction, and the
    second is the local invalidate instruction, which will be executed in
    se quence;

    3. The user has issued many local invalidate operations, causing the
    local queue to be filled up.

    4. The user still has a new local operation command and is queuing to
    enter the local queue. But the local queue is full and cannot receive
    new instructions, this instruction is temporarily stored at the
    hardware pipeline.

    5. The function reset has been waiting for the instruction before the
    hardware pipeline stage is drained. The hardware pipeline stage also
    caches a local invalidate instruction, so the function reset cannot be
    completed, and the instructions after it cannot be executed.

These factors together cause the execution logic deadlock of the hardware,
and the consequence is that RoCEE will not have any response.  Considering
that the local operation command may potentially cause RoCEE to hang, this
feature is no longer supported.

Fixes: e93df010 ("RDMA/hns: Support local invalidate for hip08 in kernel space")
Signed-off-by: NYangyang Li <liyangyang20@huawei.com>
Signed-off-by: NWenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: NHaoyue Xu <xuhaoyue1@hisilicon.com>
Link: https://lore.kernel.org/r/20221024083814.1089722-2-xuhaoyue1@hisilicon.comSigned-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NZhou Juan <nnuzj07170227@163.com>

17b6c197

28 3月, 2023 1 次提交

RDMA/hns: Add new command to support query vf caps · 1890b7dd

由 Yixing Liu 提交于 3月 23, 2023

maillist inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I6K9B6

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?id=faa63656fc36

---------------------------------------------------------------

The current resource query for vf caps is driven by the driver,
which is unreasonable.

This patch adds a new command HNS_ROCE_OPC_QUERY_VF_CAPS_NUM
to support obtaining vf caps information from firmware.
Signed-off-by: NYixing Liu <liuyixing1@huawei.com>
Signed-off-by: NHaoyue Xu <xuhaoyue1@hisilicon.com>
Link: https://lore.kernel.org/r/20230304091555.2241298-2-xuhaoyue1@hisilicon.comSigned-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NJuan Zhou <zhoujuan51@h-partners.com>

1890b7dd

13 3月, 2023 1 次提交

RDMA/hns: Support congestion control algorithm parameter configuration · 523f34d8

由 Chengchang Tang 提交于 3月 13, 2023

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I6J5O7

---------------------------------------------------------------

hns roce support 4 congestion control algorithms. Each algorihm
involves multiple parameters. This patch add port sysfs directory
for each algorithm, which allows users to modify the parameters
of these algorithms.
Signed-off-by: NChengchang Tang <tangchengchang@huawei.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>

523f34d8

09 3月, 2023 1 次提交

RDMA/hns: Add dfx cnt stats · d5a4ca75

由 Chengchang Tang 提交于 3月 01, 2023

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I6GSZL

---------------------------------------------------------------

Add more dfx cnt to help diagnosis. And this stats could be got by
sysfs or rdmatool.
Signed-off-by: NChengchang Tang <tangchengchang@huawei.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>

d5a4ca75

02 3月, 2023 1 次提交

RDMA/hns: Support hns HW stats · 05491dda

由 Chengchang Tang 提交于 3月 01, 2023

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I6GSZL

---------------------------------------------------------------

Support query hns HW stats to help debugging several issues.
Signed-off-by: NChengchang Tang <tangchengchang@huawei.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>

05491dda

15 2月, 2023 3 次提交

RDMA/hns: fix the error of RoCE VF based on RoCE Bonding PF · a1598d86

由 Junxian Huang 提交于 12月 15, 2022

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6F1IQ

---------------------------------------------------------------

In this patch, the following constraints are added:
1. RoCE Bonding cannot be set with a PF which enables VF;
2. A PF in RoCE Bonding cannot enable RoCE VF.

Fixes: 6ba084e0 ("RDMA/hns: add constraints for bonding-unsupported situations")
Signed-off-by: NJunxian Huang <huangjunxian6@hisilicon.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a1598d86

RDMA/hns: Fix AH attr queried by query_qp · 93b1c492

由 Chengchang Tang 提交于 12月 15, 2022

driver inclusion
category: bugfix
bugzilla:https://gitee.com/openeuler/kernel/issues/I6F3ZA

---------------------------------------------------------------------------
The queried AH attr is invalid. This patch fix it.

This problem is found by rdma-core test test_mr_rereg_pd

ERROR: test_mr_rereg_pd (tests.test_mr.MRTest)
Test that cover rereg MR's PD with this flow:
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./tests/test_mr.py", line 157, in test_mr_rereg_pd
    self.restate_qps()
  File "./tests/test_mr.py", line 113, in restate_qps
    self.server.qp.to_rts(self.server_qp_attr)
  File "qp.pyx", line 1137, in pyverbs.qp.QP.to_rts
  File "qp.pyx", line 1123, in pyverbs.qp.QP.to_rtr
pyverbs.pyverbs_error.PyverbsRDMAError: Failed to modify QP state to RTR.
Errno: 22, Invalid argument

Fixes: 926a01dc ("RDMA/hns: Add QP operations support for hip08 SoC")
Signed-off-by: NChengchang Tang <tangchengchang@huawei.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

93b1c492

RDMA/hns: Kernel notify usr space to stop ring db · e8b1fec4

由 Guofeng Yue 提交于 12月 15, 2022

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6F3ZU

---------------------------------------------------------------

In the reset scenario, if the kernel receives the reset signal,
it needs to notify the user space to stop ring doorbell.
Signed-off-by: NYixing Liu <liuyixing1@huawei.com>
Signed-off-by: NGuofeng Yue <yueguofeng@hisilicon.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e8b1fec4

13 2月, 2023 5 次提交

RDMA/mlx5: Set local port to one when accessing counters · e007189e

由 Chris Mi 提交于 2月 13, 2023

stable inclusion
from stable-v5.10.143
commit a00b1b10e0a60474c2a60ef84f4d4736f7051535
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0U6

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a00b1b10e0a60474c2a60ef84f4d4736f7051535

--------------------------------

[ Upstream commit 74b30b3a ]

When accessing Ports Performance Counters Register (PPCNT),
local port must be one if it is Function-Per-Port HCA that
HCA_CAP.num_ports is 1.

The offending patch can change the local port to other values
when accessing PPCNT after enabling switchdev mode. The following
syndrome will be printed:

 # cat /sys/class/infiniband/rdmap4s0f0/ports/2/counters/*
 # dmesg
 mlx5_core 0000:04:00.0: mlx5_cmd_check:756:(pid 12450): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x1e5585)

Fix it by setting local port to one for Function-Per-Port HCA.

Fixes: 210b1f78 ("IB/mlx5: When not in dual port RoCE mode, use provided port as native")
Reviewed-by: NMark Bloch <mbloch@nvidia.com>
Signed-off-by: NChris Mi <cmi@nvidia.com>
Link: https://lore.kernel.org/r/6c5086c295c76211169e58dbd610fb0402360bab.1661763459.git.leonro@nvidia.comSigned-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>

e007189e

IB/core: Fix a nested dead lock as part of ODP flow · 1686ee21

由 Yishai Hadas 提交于 2月 13, 2023

stable inclusion
from stable-v5.10.143
commit e8de6cb5755eae7b793d8c00c8696c8667d44a7f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0U6

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e8de6cb5755eae7b793d8c00c8696c8667d44a7f

--------------------------------

[ Upstream commit 85eaeb50 ]

Fix a nested dead lock as part of ODP flow by using mmput_async().

From the below call trace [1] can see that calling mmput() once we have
the umem_odp->umem_mutex locked as required by
ib_umem_odp_map_dma_and_lock() might trigger in the same task the
exit_mmap()->__mmu_notifier_release()->mlx5_ib_invalidate_range() which
may dead lock when trying to lock the same mutex.

Moving to use mmput_async() will solve the problem as the above
exit_mmap() flow will be called in other task and will be executed once
the lock will be available.

[1]
[64843.077665] task:kworker/u133:2  state:D stack:    0 pid:80906 ppid:
2 flags:0x00004000
[64843.077672] Workqueue: mlx5_ib_page_fault mlx5_ib_eqe_pf_action [mlx5_ib]
[64843.077719] Call Trace:
[64843.077722]  <TASK>
[64843.077724]  __schedule+0x23d/0x590
[64843.077729]  schedule+0x4e/0xb0
[64843.077735]  schedule_preempt_disabled+0xe/0x10
[64843.077740]  __mutex_lock.constprop.0+0x263/0x490
[64843.077747]  __mutex_lock_slowpath+0x13/0x20
[64843.077752]  mutex_lock+0x34/0x40
[64843.077758]  mlx5_ib_invalidate_range+0x48/0x270 [mlx5_ib]
[64843.077808]  __mmu_notifier_release+0x1a4/0x200
[64843.077816]  exit_mmap+0x1bc/0x200
[64843.077822]  ? walk_page_range+0x9c/0x120
[64843.077828]  ? __cond_resched+0x1a/0x50
[64843.077833]  ? mutex_lock+0x13/0x40
[64843.077839]  ? uprobe_clear_state+0xac/0x120
[64843.077860]  mmput+0x5f/0x140
[64843.077867]  ib_umem_odp_map_dma_and_lock+0x21b/0x580 [ib_core]
[64843.077931]  pagefault_real_mr+0x9a/0x140 [mlx5_ib]
[64843.077962]  pagefault_mr+0xb4/0x550 [mlx5_ib]
[64843.077992]  pagefault_single_data_segment.constprop.0+0x2ac/0x560
[mlx5_ib]
[64843.078022]  mlx5_ib_eqe_pf_action+0x528/0x780 [mlx5_ib]
[64843.078051]  process_one_work+0x22b/0x3d0
[64843.078059]  worker_thread+0x53/0x410
[64843.078065]  ? process_one_work+0x3d0/0x3d0
[64843.078073]  kthread+0x12a/0x150
[64843.078079]  ? set_kthread_struct+0x50/0x50
[64843.078085]  ret_from_fork+0x22/0x30
[64843.078093]  </TASK>

Fixes: 36f30e48 ("IB/core: Improve ODP to use hmm_range_fault()")
Reviewed-by: NMaor Gottlieb <maorg@nvidia.com>
Signed-off-by: NYishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/74d93541ea533ef7daec6f126deb1072500aeb16.1661251841.git.leonro@nvidia.comSigned-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>

1686ee21

RDMA/siw: Pass a pointer to virt_to_page() · b2a0ddc6

由 Linus Walleij 提交于 2月 13, 2023

stable inclusion
from stable-v5.10.143
commit 047e66867eb6ffc4dcbf145b13f9943990f1ca88
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0U6

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=047e66867eb6ffc4dcbf145b13f9943990f1ca88

--------------------------------

[ Upstream commit 0d1b756a ]

Functions that work on a pointer to virtual memory such as
virt_to_pfn() and users of that function such as
virt_to_page() are supposed to pass a pointer to virtual
memory, ideally a (void *) or other pointer. However since
many architectures implement virt_to_pfn() as a macro,
this function becomes polymorphic and accepts both a
(unsigned long) and a (void *).

If we instead implement a proper virt_to_pfn(void *addr)
function the following happens (occurred on arch/arm):

drivers/infiniband/sw/siw/siw_qp_tx.c:32:23: warning: incompatible
  integer to pointer conversion passing 'dma_addr_t' (aka 'unsigned int')
  to parameter of type 'const void *' [-Wint-conversion]
drivers/infiniband/sw/siw/siw_qp_tx.c:32:37: warning: passing argument
  1 of 'virt_to_pfn' makes pointer from integer without a cast
  [-Wint-conversion]
drivers/infiniband/sw/siw/siw_qp_tx.c:538:36: warning: incompatible
  integer to pointer conversion passing 'unsigned long long'
  to parameter of type 'const void *' [-Wint-conversion]

Fix this with an explicit cast. In one case where the SIW
SGE uses an unaligned u64 we need a double cast modifying the
virtual address (va) to a platform-specific uintptr_t before
casting to a (void *).

Fixes: b9be6f18 ("rdma/siw: transmit path")
Cc: linux-rdma@vger.kernel.org
Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20220902215918.603761-1-linus.walleij@linaro.orgSigned-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>

b2a0ddc6

RDMA/hns: Fix wrong fixed value of qp->rq.wqe_shift · b64a3e46

由 Wenpeng Liang 提交于 2月 13, 2023

stable inclusion
from stable-v5.10.143
commit e198c0857032ceac643f2c58a114cf1ded672b28
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0U6

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e198c0857032ceac643f2c58a114cf1ded672b28

--------------------------------

[ Upstream commit 0c8b5d62 ]

The value of qp->rq.wqe_shift of HIP08 is always determined by the number
of sge. So delete the wrong branch.

Fixes: cfc85f3e ("RDMA/hns: Add profile support for hip08 driver")
Fixes: 926a01dc ("RDMA/hns: Add QP operations support for hip08 SoC")
Link: https://lore.kernel.org/r/20220829105021.1427804-3-liangwenpeng@huawei.comSigned-off-by: NWenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>

b64a3e46

RDMA/cma: Fix arguments order in net device validation · 900af2b5

由 Michael Guralnik 提交于 2月 13, 2023

stable inclusion
from stable-v5.10.143
commit e9ea271c2e43af02c9d9c876519c078b09beeb5c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0U6

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e9ea271c2e43af02c9d9c876519c078b09beeb5c

--------------------------------

[ Upstream commit 27cfde79 ]

Fix the order of source and destination addresses when resolving the
route between server and client to validate use of correct net device.

The reverse order we had so far didn't actually validate the net device
as the server would try to resolve the route to itself, thus always
getting the server's net device.

The issue was discovered when running cm applications on a single host
between 2 interfaces with same subnet and source based routing rules.
When resolving the reverse route the source based route rules were
ignored.

Fixes: f887f2ac ("IB/cma: Validate routing of incoming requests")
Link: https://lore.kernel.org/r/1c1ec2277a131d277ebcceec987fd338d35b775f.1661251872.git.leonro@nvidia.comSigned-off-by: NMichael Guralnik <michaelgur@nvidia.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>

900af2b5

04 1月, 2023 2 次提交

scsi: iscsi: remove .unbind_conn from iscsi_transport · 2eabc50d

由 Li Nan 提交于 1月 04, 2023

hulk inclusion
category: bugfix
bugzilla: 188176, https://gitee.com/openeuler/kernel/issues/I67294
CVE: NA

-------------------------------

Commit 891e2639 ("scsi: iscsi: Stop queueing during ep_disconnect")
introduces .unbind_conn to fix the race between __iscsi_conn_send_pdu()
and .ep_disconnect, however it also introduces the KABI problem.

Considering the issue is only related with offload iscsi driver but not
iscsi_tcp, so tried to revert it, however the above commit is just one
patch in a patchset, the following patches depends on it and these
patches fix problem related with iscsi_tcp.

So just reverting it manually by removing .unbind_conn from
iscsi_transport.
Signed-off-by: NLi Nan <linan122@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2eabc50d

Revert "scsi: iscsi: fix kabi broken in struct iscsi_transport" · a372f6e3

由 Li Nan 提交于 1月 04, 2023

hulk inclusion
category: bugfix
bugzilla: 188176, https://gitee.com/openeuler/kernel/issues/I67294
CVE: NA

--------------------------------

This reverts commit 230035ef.

Drivers use tgt_dscvr will compile failed because API has changed.
Signed-off-by: NLi Nan <linan122@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a372f6e3

08 12月, 2022 4 次提交

RDMA/hns: adjust the structure of RoCE bonding driver · 646b97db

由 Junxian Huang 提交于 12月 08, 2022

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63IM5

---------------------------------------------------------------------------

This patch deletes some used variables, encapsulates repeated codes in a
new function get_upper_dev_from_ndev and adjusts the structure of
hns_roce_bond_event to make the logic clearer.

Fixes: e62a2027 ("RDMA/hns: support RoCE bonding")
Signed-off-by: NJunxian Huang <huangjunxian6@hisilicon.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

646b97db

RDMA/hns: add constraints for bonding-unsupported situations · 6ba084e0

由 Junxian Huang 提交于 12月 08, 2022

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63IM5

---------------------------------------------------------------------------

Applying this patch, RoCE driver will not set bond when NIC sets bond
in a mode not supported for RoCE bonding, with VF slaves or with
slaves from different I/O die.

Fixes: e62a2027 ("RDMA/hns: support RoCE bonding")
Signed-off-by: NJunxian Huang <huangjunxian6@hisilicon.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6ba084e0

RDMA/hns: fix the error of missing GID in RoCE bonding mode 1 · 4920275a

由 Junxian Huang 提交于 12月 08, 2022

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63IM5

---------------------------------------------------------------------------

In the existing hns RoCE code, ib_dev->ops.get_netdev is not set, which
cause that only one slave will be assigned IP-based GID in RoCE bonding
mode 1.

This patch adds hns_roce_get_netdev() and set the function to
ib_dev->ops.get_netdev so that IB-Core can assign GID to different
net device according to the active slave in mode 1.

Fixes: e62a2027 ("RDMA/hns: support RoCE bonding")
Signed-off-by: NJunxian Huang <huangjunxian6@hisilicon.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

4920275a

RDMA/hns: fix possible dead lock when setting RoCE Bonding · b6623fd2

由 Junxian Huang 提交于 12月 08, 2022

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63IM5

---------------------------------------------------------------------------

When setting RoCE Bonding, a new hr_dev will be registered as
"hns_bond_xx". In the process, the bonding thread will try to acquire
rtnl_lock() while holding roce_bond_mutex. However, it's possible that
another thread running bond_netdev_notify_work() grabs rtnl_lock() before
the bonding thread, and call the bonding notifier function, in which the
thread will try to acquire roce_bond_mutex, finally leading to a dead lock.

As the event informer notifier_call_chain() will not call the next notifier
function until the current one returns, there is no need to use a mutex in
the bonding notifier function. Thus, remove roce_bond_mutex in
hns_roce_bond_event() and the dead lock can be avoided.

Fixes: e62a2027 ("RDMA/hns: support RoCE bonding")
Signed-off-by: NJunxian Huang <huangjunxian6@hisilicon.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b6623fd2

30 11月, 2022 10 次提交

RDMA/hns: Fixes concurrent ressetting and post_recv in DCA mode · 21a0d4fe

由 Chengchang Tang 提交于 11月 30, 2022

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63KVU

----------------------------------------------------------

read_poll_timeout() in MBX may cause sleep, especially at reset, the
probability becomes higher. In other words, it is not safe to use MBX
in an atomic context.

In order to ensure the atomicity of QPC setup, DCA will use locks to
protect the QPC setup operation in DCA ATTACH_MEM phase(i.e.
post_send/post_recv). This results in the above-mentioned problem at
reset.

Replace read_poll_timeout() with read_poll_timeout_atomic() to avoid
MBX operation sleep in an atomic context().

Fixes: 306b8c76 ("RDMA/hns: Do not destroy QP resources in the hw resetting phase")
Signed-off-by: NChengchang Tang <tangchengchang@huawei.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>
Reviewed-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

21a0d4fe

RDMA/hns: Optimize user DCA perfermance by sharing DCA status · d3caaebd

由 Chengchang Tang 提交于 11月 30, 2022

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I63KVU

----------------------------------------------------------

Use the shared memory to store the DCA status by getting the max qp num
from uctx alloc param.
Signed-off-by: NChengchang Tang <tangchengchang@huawei.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>
Reviewed-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d3caaebd

RDMA/hns: Add debugfs support for DCA · a2178118

由 Chengchang Tang 提交于 11月 30, 2022

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I63KVU

----------------------------------------------------------

This patch synchonize DCA code from CI and is based on RFC v2 from the
community including DCA kernel suuport and debugfs support.

Add a group of debugfs files for DCA memory pool statistics.

The debugfs entries for DCA memory statistics include:
hns_roce/<ibdev_name>/dca/qp : show all DCA QPs for each device.
hns_roce/<ibdev_name>/dca/pool : show all DCA mem for each device.
hns_roce/<ibdev_name>/<pid>/qp : show all active DCA QPs for one process.
hns_roce/<ibdev_name>/<pid>/mstats : show DCA mem info for one process.
Signed-off-by: NChengchang Tang <tangchengchang@huawei.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>
Reviewed-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a2178118

RDMA/hns: Add DCA support for kernel space · 12aa71f8

由 Chengchang Tang 提交于 11月 30, 2022

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I63KVU

----------------------------------------------------------

This patch add DCA support for kernel space.
Signed-off-by: NChengchang Tang <tangchengchang@huawei.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>
Reviewed-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

12aa71f8

RDMA/hns: Add method to query WQE buffer's address · f0384ddc

由 Chengchang Tang 提交于 11月 30, 2022

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I63KVU

----------------------------------------------------------

If a uQP works in DCA mode, the userspace driver need to get the buffer's
address in DCA memory pool by calling the 'HNS_IB_METHOD_DCA_MEM_QUERY'
method after the QP was attached by calling the
'HNS_IB_METHOD_DCA_MEM_ATTACH' method.

This method will return the DCA mem object's key and the offset to let the
userspace driver get the WQE's virtual address in DCA memory pool.
Signed-off-by: NChengchang Tang <tangchengchang@huawei.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>
Reviewed-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f0384ddc

RDMA/hns: Add method to detach WQE buffer · 0273952c

由 Chengchang Tang 提交于 11月 30, 2022

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I63KVU

----------------------------------------------------------

If a uQP works in DCA mode, the userspace driver needs to drop the WQE
buffer by calling the 'HNS_IB_METHOD_DCA_MEM_DETACH' method when the QP's
CI is equal to PI, that means, the hns ROCEE will not access the WQE's
buffer at this time, and the userspace driver can free this WQE's buffer.

This method will start an worker queue to recycle the WQE buffer in kernel
space, if the WQE buffer is indeed not being accessed by hns ROCEE, the
worker will change the pages' state as free in DCA memory pool.
Signed-off-by: NChengchang Tang <tangchengchang@huawei.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>
Reviewed-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0273952c

RDMA/hns: Setup the configuration of WQE addressing to QPC · 0cf17392

由 Chengchang Tang 提交于 11月 30, 2022

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I63KVU

----------------------------------------------------------

Add a new command to update the configuration of WQE buffer addressing to
QPC in DCA mode.
Signed-off-by: NChengchang Tang <tangchengchang@huawei.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>
Reviewed-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0cf17392

RDMA/hns: Add method for attaching WQE buffer · d8cca476

由 Chengchang Tang 提交于 11月 30, 2022

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I63KVU

----------------------------------------------------------

If a uQP works as DCA mode, the userspace driver need config the WQE buffer
by calling the 'HNS_IB_METHOD_DCA_MEM_ATTACH' method before filling the
WQE. This method will allocate a group of pages from DCA memory pool and
write the configuration of addressing to QPC.
Signed-off-by: NChengchang Tang <tangchengchang@huawei.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>
Reviewed-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d8cca476

RDMA/hns: Configure DCA mode for the userspace QP · 40e4b148

由 Chengchang Tang 提交于 11月 30, 2022

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I63KVU

----------------------------------------------------------

If the userspace driver assign a NULL to the field of 'buf_addr' in
'struct hns_roce_ib_create_qp' when creating QP, this means the kernel
driver need setup the QP as DCA mode. So add a QP capability bit in
response to indicate the userspace driver that the DCA mode has been
enabled.
Signed-off-by: NChengchang Tang <tangchengchang@huawei.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>
Reviewed-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

40e4b148

RDMA/hns: Add method for shrinking DCA memory pool · bca9ff27

由 Chengchang Tang 提交于 11月 30, 2022

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I63KVU

----------------------------------------------------------

If no QP is using a DCA mem object, the userspace driver can destroy it.
So add a new method 'HNS_IB_METHOD_DCA_MEM_SHRINK' to allow the userspace
dirver to remove an object from DCA memory pool.

If a DCA mem object has been shrunk, the userspace driver can destroy it
by 'HNS_IB_METHOD_DCA_MEM_DEREG' method and free the buffer which is
allocated in userspace.
Signed-off-by: NChengchang Tang <tangchengchang@huawei.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>
Reviewed-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

bca9ff27

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功