1. 25 6月, 2019 1 次提交
  2. 21 6月, 2019 3 次提交
    • L
      RDMA: Check umem pointer validity prior to release · 836a0fbb
      Leon Romanovsky 提交于
      Update ib_umem_release() to behave similarly to kfree() and allow
      submitting NULL pointer as safe input to this function.
      
      Fixes: a52c8e24 ("RDMA: Clean destroy CQ in drivers do not return errors")
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      836a0fbb
    • L
      RDMA/hns: reset function when removing module · 89a6da3c
      Lang Cheng 提交于
      During removing the driver, we needs to notify the roce engine to
      stop working immediately,and symmetrically recycle the hardware
      resources requested during initialization.
      
      The hardware provides a command called function clear that can package
      these operations,so that the driver can only focus on releasing
      resources that applied from the operating system.
      This patch implements the call of this command.
      Signed-off-by: NLang Cheng <chenglang@huawei.com>
      Signed-off-by: NLijun Ou <oulijun@huawei.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      89a6da3c
    • L
      RDMA/hns: Fix bug when wqe num is larger than 16K · 8d18ad83
      Lijun Ou 提交于
      hip08 can support up to 32768 wqes in one qp. currently if the wqe num
      is larger than 16384, the driver will lead a calltrace as follows.
      
      [21361.393725] Call trace:
      [21361.398605]  hns_roce_v2_modify_qp+0xbcc/0x1360 [hns_roce_hw_v2]
      [21361.410627]  hns_roce_modify_qp+0x1d8/0x2f8 [hns_roce]
      [21361.420906]  _ib_modify_qp+0x70/0x118
      [21361.428222]  ib_modify_qp+0x14/0x1c
      [21361.435193]  rt_ktest_modify_qp+0xb8/0x650 [rdma_test]
      [21361.445472]  exec_modify_qp_cmd+0x110/0x4d8 [rdma_test]
      [21361.455924]  rt_ktest_dispatch_cmd_3+0xa94/0x2edc [rdma_test]
      [21361.467422]  rt_ktest_dispatch_cmd_2+0x9c/0x108 [rdma_test]
      [21361.478570]  rt_ktest_dispatch_cmd+0x138/0x904 [rdma_test]
      [21361.489545]  rt_ktest_dev_write+0x328/0x4b0 [rdma_test]
      [21361.499998]  __vfs_write+0x38/0x15c
      [21361.506966]  vfs_write+0xa8/0x1a0
      [21361.513586]  ksys_write+0x50/0xb0
      [21361.520206]  sys_write+0xc/0x14
      [21361.526479]  el0_svc_naked+0x30/0x34
      [21361.533622] Code: 1ac10841 d37d7c22 0b000021 d37df021 (f86268c0)
      [21361.545815] ---[ end trace e2a1feb2c3d7f13c ]---
      
      When the wqe num is larger than 16384, hns_roce_table_find will return an
      invalid mtt, this will lead an kernel paging requet error if the driver try
      to access it. It's the mtt design defect which can't support up to the max
      wqe num of hip08.
      
      This patch fixs it by replacing mtt with mtr for wqe.
      
      Fixes: 926a01dc ("RDMA/hns: Add QP operations support for hip08 SoC")
      Signed-off-by: NXi Wang <wangxi11@huawei.com>
      Signed-off-by: NLijun Ou <oulijun@huawei.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      8d18ad83
  3. 08 6月, 2019 1 次提交
    • L
      RDMA/hns: Bugfix for filling the sge of srq · 4f18904c
      Lijun Ou 提交于
      When user post recv a srq with multiple sges, the hardware will get the
      last correct sge and count the sge numbers according to the specific
      identifier with lkey. For example, when the driver fills the sges with
      every wr less than the max sge that the user configured when creating srq,
      the hardware will stop getting the sge according to the specific lkey in
      the sge. However, it will always end with the first sge in the current
      post srq recv interface implementation.
      
      Fixes: c7bcb134 ("RDMA/hns: Add SRQ support for hip08 kernel mode")
      Signed-off-by: NLijun Ou <oulijun@huawei.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      4f18904c
  4. 01 6月, 2019 1 次提交
  5. 28 5月, 2019 2 次提交
  6. 03 5月, 2019 1 次提交
  7. 09 4月, 2019 2 次提交
  8. 02 4月, 2019 2 次提交
  9. 28 3月, 2019 1 次提交
  10. 26 3月, 2019 6 次提交
  11. 05 3月, 2019 1 次提交
  12. 20 2月, 2019 1 次提交
    • Y
      RDMA/hns: Bugfix for set hem of SCC · 6ac16e40
      Yangyang Li 提交于
      The method of set hem for scc context is different from other contexts. It
      should notify the hardware with the detailed idx in bt0 for scc, while for
      other contexts, it only need to notify the bt step and the hardware will
      calculate the idx.
      
      Here fixes the following error when unloading the hip08 driver:
      
      [  123.570768] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
      [  123.579023] {1}[Hardware Error]: event severity: recoverable
      [  123.584670] {1}[Hardware Error]:  Error 0, type: recoverable
      [  123.590317] {1}[Hardware Error]:   section_type: PCIe error
      [  123.595877] {1}[Hardware Error]:   version: 4.0
      [  123.600395] {1}[Hardware Error]:   command: 0x0006, status: 0x0010
      [  123.606562] {1}[Hardware Error]:   device_id: 0000:7d:00.0
      [  123.612034] {1}[Hardware Error]:   slot: 0
      [  123.616120] {1}[Hardware Error]:   secondary_bus: 0x00
      [  123.621245] {1}[Hardware Error]:   vendor_id: 0x19e5, device_id: 0xa222
      [  123.627847] {1}[Hardware Error]:   class_code: 000002
      [  123.632977] hns3 0000:7d:00.0: aer_status: 0x00000000, aer_mask: 0x00000000
      [  123.639928] hns3 0000:7d:00.0: aer_layer=Transaction Layer, aer_agent=Receiver ID
      [  123.647400] hns3 0000:7d:00.0: aer_uncor_severity: 0x00000000
      [  123.653136] hns3 0000:7d:00.0: PCI error detected, state(=1)!!
      [  123.658959] hns3 0000:7d:00.0: ROCEE uncorrected RAS error identified
      [  123.665395] hns3 0000:7d:00.0: ROCEE RAS AXI rresp error
      [  123.670713] hns3 0000:7d:00.0: requesting reset due to PCI error
      [  123.676715] hns3 0000:7d:00.0: received reset event , reset type is 5
      [  123.683147] hns3 0000:7d:00.0: AER: Device recovery successful
      [  123.688978] hns3 0000:7d:00.0: PF Reset requested
      [  123.693684] hns3 0000:7d:00.0: PF failed(=-5) to send mailbox message to VF
      [  123.700633] hns3 0000:7d:00.0: inform reset to vf(1) failded -5!
      
      Fixes: 6a157f7d ("RDMA/hns: Add SCC context allocation support for hip08")
      Signed-off-by: NYangyang Li <liyangyang20@huawei.com>
      Reviewed-by: NYixian Liu <liuyixian@huawei.com>
      Reviewed-by: NLijun Ou <oulijun@huawei.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      6ac16e40
  13. 15 2月, 2019 3 次提交
  14. 12 2月, 2019 1 次提交
  15. 05 2月, 2019 4 次提交
    • W
      RDMA/hns: Fix the chip hanging caused by sending doorbell during reset · d3743fa9
      Wei Hu (Xavier) 提交于
      On hi08 chip, There is a possibility of chip hanging when sending doorbell
      during reset. We can fix it by prohibiting doorbell during reset.
      
      Fixes: 2d407888 ("RDMA/hns: Add support for processing send wr and receive wr")
      Signed-off-by: NWei Hu (Xavier) <xavier.huwei@huawei.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      d3743fa9
    • W
      RDMA/hns: Fix the chip hanging caused by sending mailbox&CMQ during reset · 6a04aed6
      Wei Hu (Xavier) 提交于
      On hi08 chip, There is a possibility of chip hanging and some errors when
      sending mailbox & doorbell during reset.  We can fix it by prohibiting
      mailbox and doorbell during reset and reset occurred to ensure that
      hardware can work normally.
      
      Fixes: a04ff739 ("RDMA/hns: Add command queue support for hip08 RoCE driver")
      Signed-off-by: NWei Hu (Xavier) <xavier.huwei@huawei.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      6a04aed6
    • W
      RDMA/hns: Fix the Oops during rmmod or insmod ko when reset occurs · d061effc
      Wei Hu (Xavier) 提交于
      In the reset process, the hns3 NIC driver notifies the RoCE driver to
      perform reset related processing by calling the .reset_notify() interface
      registered by the RoCE driver in hip08 SoC.
      
      In the current version, if a reset occurs simultaneously during the
      execution of rmmod or insmod ko, there may be Oops error as below:
      
       Internal error: Oops: 86000007 [#1] PREEMPT SMP
       Modules linked in: hns_roce(O) hns3(O) hclge(O) hnae3(O) [last unloaded: hns_roce_hw_v2]
       CPU: 0 PID: 14 Comm: kworker/0:1 Tainted: G           O      4.19.0-ge00d540 #1
       Hardware name: Huawei Technologies Co., Ltd.
       Workqueue: events hclge_reset_service_task [hclge]
       pstate: 60c00009 (nZCv daif +PAN +UAO)
       pc : 0xffff00000100b0b8
       lr : 0xffff00000100aea0
       sp : ffff000009afbab0
       x29: ffff000009afbab0 x28: 0000000000000800
       x27: 0000000000007ff0 x26: ffff80002f90c004
       x25: 00000000000007ff x24: ffff000008f97000
       x23: ffff80003efee0a8 x22: 0000000000001000
       x21: ffff80002f917ff0 x20: ffff8000286ea070
       x19: 0000000000000800 x18: 0000000000000400
       x17: 00000000c4d3225d x16: 00000000000021b8
       x15: 0000000000000400 x14: 0000000000000400
       x13: 0000000000000000 x12: ffff80003fac6e30
       x11: 0000800036303000 x10: 0000000000000001
       x9 : 0000000000000000 x8 : ffff80003016d000
       x7 : 0000000000000000 x6 : 000000000000003f
       x5 : 0000000000000040 x4 : 0000000000000000
       x3 : 0000000000000004 x2 : 00000000000007ff
       x1 : 0000000000000000 x0 : 0000000000000000
       Process kworker/0:1 (pid: 14, stack limit = 0x00000000af8f0ad9)
       Call trace:
        0xffff00000100b0b8
        0xffff00000100b3a0
        hns_roce_init+0x624/0xc88 [hns_roce]
        0xffff000001002df8
        0xffff000001006960
        hclge_notify_roce_client+0x74/0xe0 [hclge]
        hclge_reset_service_task+0xa58/0xbc0 [hclge]
        process_one_work+0x1e4/0x458
        worker_thread+0x40/0x450
        kthread+0x12c/0x130
        ret_from_fork+0x10/0x18
       Code: bad PC value
      
      In the reset process, we will release the resources firstly, and after the
      hardware reset is completed, we will reapply resources and reconfigure the
      hardware.
      
      We can solve this problem by modifying both the NIC and the RoCE
      driver. We can modify the concurrent processing in the NIC driver to avoid
      calling the .reset_notify and .uninit_instance ops at the same time. And
      we need to modify the RoCE driver to record the reset stage and the
      driver's init/uninit state, and check the state in the .reset_notify,
      .init_instance. and uninit_instance functions to avoid NULL pointer
      operation.
      
      Fixes: cb7a94c9 ("RDMA/hns: Add reset process for RoCE in hip08")
      Signed-off-by: NWei Hu (Xavier) <xavier.huwei@huawei.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      d061effc
    • Y
      RDMA/hns: Make some function static · c3c668e7
      YueHaibing 提交于
      Fixes the following sparse warnings:
      
      drivers/infiniband/hw/hns/hns_roce_hw_v2.c:5822:5: warning:
       symbol 'hns_roce_v2_query_srq' was not declared. Should it be static?
      drivers/infiniband/hw/hns/hns_roce_srq.c:158:6: warning:
       symbol 'hns_roce_srq_free' was not declared. Should it be static?
      drivers/infiniband/hw/hns/hns_roce_srq.c:81:5: warning:
       symbol 'hns_roce_srq_alloc' was not declared. Should it be static?
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      c3c668e7
  16. 01 2月, 2019 1 次提交
  17. 31 1月, 2019 1 次提交
  18. 25 1月, 2019 3 次提交
  19. 22 1月, 2019 1 次提交
  20. 19 1月, 2019 1 次提交
  21. 08 1月, 2019 3 次提交