• Y
    RDMA/hns: Do not destroy QP resources in the hw resetting phase · b0969f83
    Yangyang Li 提交于
    When hns_roce_v2_destroy_qp() is called, the brief calling process of the
    driver is as follows:
    
     ......
     hns_roce_v2_destroy_qp
     hns_roce_v2_qp_modify
    	   hns_roce_cmd_mbox
     hns_roce_qp_destroy
    
    If hns_roce_cmd_mbox() detects that the hardware is being reset during the
    execution of the hns_roce_cmd_mbox(), the driver will not be able to get
    the return value from the hardware (the firmware cannot respond to the
    driver's mailbox during the hardware reset phase).
    
    The driver needs to wait for the hardware reset to complete before
    continuing to execute hns_roce_qp_destroy(), otherwise it may happen that
    the driver releases the resources but the hardware is still accessing. In
    order to fix this problem, HNS RoCE needs to add a piece of code to wait
    for the hardware reset to complete.
    
    The original interface get_hw_reset_stat() is the instantaneous state of
    the hardware reset, which cannot accurately reflect whether the hardware
    reset is completed, so it needs to be replaced with the ae_dev_reset_cnt
    interface.
    
    The sign that the hardware reset is complete is that the return value of
    the ae_dev_reset_cnt interface is greater than the original value
    reset_cnt recorded by the driver.
    
    Fixes: 6a04aed6 ("RDMA/hns: Fix the chip hanging caused by sending mailbox&CMQ during reset")
    Link: https://lore.kernel.org/r/20211123142402.26936-1-liangwenpeng@huawei.comSigned-off-by: NYangyang Li <liyangyang20@huawei.com>
    Signed-off-by: NWenpeng Liang <liangwenpeng@huawei.com>
    Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
    b0969f83
hns_roce_hw_v2.c 183.6 KB