!538 Disable local invalidate operation, fix memory leak and error code of CMD
Merge Pull Request from: @stinft
Bugfix information:
1. RDMA/hns: Disable local invalidate operation
When function reset and local invalidate are mixed, HNS RoCEE may hang.
Before introducing the cause of the problem, two hardware internal
concepts need to be introduced:
1. Execution queue: The queue of hardware execution instructions,
function reset and local invalidate are queued for execution in this
queue.
2.Local queue: A queue that stores local operation instructions. The
instructions in the local queue will be sent to the execution queue
for execution. The instructions in the local queue will not be removed
until the execution is completed.
The reason for the problem is as follows:
1. There is a function reset instruction in the execution queue, which
is currently being executed. A necessary condition for the successful
execution of function reset is: the hardware pipeline needs to empty
the instructions that were not completed before;
2. A local invalidate instruction at the head of the local queue is
sent to the execution queue. Now there are two instructions in the
execution queue, the first is the function reset instruction, and the
second is the local invalidate instruction, which will be executed in
se quence;
3. The user has issued many local invalidate operations, causing the
local queue to be filled up.
4. The user still has a new local operation command and is queuing to
enter the local queue. But the local queue is full and cannot receive
new instructions, this instruction is temporarily stored at the
hardware pipeline.
5. The function reset has been waiting for the instruction before the
hardware pipeline stage is drained. The hardware pipeline stage also
caches a local invalidate instruction, so the function reset cannot be
completed, and the instructions after it cannot be executed.
These factors together cause the execution logic deadlock of the hardware,
and the consequence is that RoCEE will not have any response. Considering
that the local operation command may potentially cause RoCEE to hang, this
feature is no longer supported.
bugzilla:#I6ROBG
2. RDMA/hns: fix memory leak in hns_roce_alloc_mr()
When hns_roce_mr_enable() failed in hns_roce_alloc_mr(), mr_key is not
released.
bugzilla:#I6RP11
3. RDMA/hns: Fix error code of CMD
The error code is fixed to EIO when CMD fails to excute. This patch
converts the error status reported by firmware to linux errno.
bugzilla: #I6RO6S
Link:https://gitee.com/openeuler/kernel/pulls/538
Reviewed-by: Chengchang Tang <tangchengchang@huawei.com>
Reviewed-by: Jialin Zhang <zhangjialin11@huawei.com>
Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>
Showing
想要评论请 注册 或 登录