1. 09 4月, 2019 1 次提交
  2. 26 3月, 2019 1 次提交
  3. 20 2月, 2019 1 次提交
  4. 05 2月, 2019 3 次提交
    • W
      RDMA/hns: Fix the chip hanging caused by sending doorbell during reset · d3743fa9
      Wei Hu (Xavier) 提交于
      On hi08 chip, There is a possibility of chip hanging when sending doorbell
      during reset. We can fix it by prohibiting doorbell during reset.
      
      Fixes: 2d407888 ("RDMA/hns: Add support for processing send wr and receive wr")
      Signed-off-by: NWei Hu (Xavier) <xavier.huwei@huawei.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      d3743fa9
    • W
      RDMA/hns: Fix the chip hanging caused by sending mailbox&CMQ during reset · 6a04aed6
      Wei Hu (Xavier) 提交于
      On hi08 chip, There is a possibility of chip hanging and some errors when
      sending mailbox & doorbell during reset.  We can fix it by prohibiting
      mailbox and doorbell during reset and reset occurred to ensure that
      hardware can work normally.
      
      Fixes: a04ff739 ("RDMA/hns: Add command queue support for hip08 RoCE driver")
      Signed-off-by: NWei Hu (Xavier) <xavier.huwei@huawei.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      6a04aed6
    • W
      RDMA/hns: Fix the Oops during rmmod or insmod ko when reset occurs · d061effc
      Wei Hu (Xavier) 提交于
      In the reset process, the hns3 NIC driver notifies the RoCE driver to
      perform reset related processing by calling the .reset_notify() interface
      registered by the RoCE driver in hip08 SoC.
      
      In the current version, if a reset occurs simultaneously during the
      execution of rmmod or insmod ko, there may be Oops error as below:
      
       Internal error: Oops: 86000007 [#1] PREEMPT SMP
       Modules linked in: hns_roce(O) hns3(O) hclge(O) hnae3(O) [last unloaded: hns_roce_hw_v2]
       CPU: 0 PID: 14 Comm: kworker/0:1 Tainted: G           O      4.19.0-ge00d540 #1
       Hardware name: Huawei Technologies Co., Ltd.
       Workqueue: events hclge_reset_service_task [hclge]
       pstate: 60c00009 (nZCv daif +PAN +UAO)
       pc : 0xffff00000100b0b8
       lr : 0xffff00000100aea0
       sp : ffff000009afbab0
       x29: ffff000009afbab0 x28: 0000000000000800
       x27: 0000000000007ff0 x26: ffff80002f90c004
       x25: 00000000000007ff x24: ffff000008f97000
       x23: ffff80003efee0a8 x22: 0000000000001000
       x21: ffff80002f917ff0 x20: ffff8000286ea070
       x19: 0000000000000800 x18: 0000000000000400
       x17: 00000000c4d3225d x16: 00000000000021b8
       x15: 0000000000000400 x14: 0000000000000400
       x13: 0000000000000000 x12: ffff80003fac6e30
       x11: 0000800036303000 x10: 0000000000000001
       x9 : 0000000000000000 x8 : ffff80003016d000
       x7 : 0000000000000000 x6 : 000000000000003f
       x5 : 0000000000000040 x4 : 0000000000000000
       x3 : 0000000000000004 x2 : 00000000000007ff
       x1 : 0000000000000000 x0 : 0000000000000000
       Process kworker/0:1 (pid: 14, stack limit = 0x00000000af8f0ad9)
       Call trace:
        0xffff00000100b0b8
        0xffff00000100b3a0
        hns_roce_init+0x624/0xc88 [hns_roce]
        0xffff000001002df8
        0xffff000001006960
        hclge_notify_roce_client+0x74/0xe0 [hclge]
        hclge_reset_service_task+0xa58/0xbc0 [hclge]
        process_one_work+0x1e4/0x458
        worker_thread+0x40/0x450
        kthread+0x12c/0x130
        ret_from_fork+0x10/0x18
       Code: bad PC value
      
      In the reset process, we will release the resources firstly, and after the
      hardware reset is completed, we will reapply resources and reconfigure the
      hardware.
      
      We can solve this problem by modifying both the NIC and the RoCE
      driver. We can modify the concurrent processing in the NIC driver to avoid
      calling the .reset_notify and .uninit_instance ops at the same time. And
      we need to modify the RoCE driver to record the reset stage and the
      driver's init/uninit state, and check the state in the .reset_notify,
      .init_instance. and uninit_instance functions to avoid NULL pointer
      operation.
      
      Fixes: cb7a94c9 ("RDMA/hns: Add reset process for RoCE in hip08")
      Signed-off-by: NWei Hu (Xavier) <xavier.huwei@huawei.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      d061effc
  5. 25 1月, 2019 3 次提交
  6. 12 12月, 2018 2 次提交
  7. 05 12月, 2018 3 次提交
  8. 16 10月, 2018 1 次提交
  9. 04 10月, 2018 5 次提交
  10. 27 9月, 2018 4 次提交
  11. 31 7月, 2018 2 次提交
  12. 27 7月, 2018 1 次提交
  13. 12 7月, 2018 4 次提交
  14. 31 5月, 2018 1 次提交
    • W
      RDMA/hns: Fix the illegal memory operation when cross page · 0b25c9cc
      Wei Hu(Xavier) 提交于
      This patch fixed the potential illegal operation when using the
      extend sge buffer cross page in post send operation. The bug
      will cause the calltrace as below.
      
      [ 3302.922107] Unable to handle kernel paging request at virtual address ffff00003b3a0004
      [ 3302.930009] Mem abort info:
      [ 3302.932790]   Exception class = DABT (current EL), IL = 32 bits
      [ 3302.938695]   SET = 0, FnV = 0
      [ 3302.941735]   EA = 0, S1PTW = 0
      [ 3302.944863] Data abort info:
      [ 3302.947729]   ISV = 0, ISS = 0x00000047
      [ 3302.951551]   CM = 0, WnR = 1
      [ 3302.954506] swapper pgtable: 4k pages, 48-bit VAs, pgd = ffff000009ea5000
      [ 3302.961279] [ffff00003b3a0004] *pgd=00000023dfffe003, *pud=00000023dfffd003, *pmd=00000022dc84c003, *pte=0000000000000000
      [ 3302.972224] Internal error: Oops: 96000047 [#1] SMP
      [ 3302.999509] CPU: 9 PID: 19628 Comm: roce_test_main Tainted: G           OE   4.14.10 #1
      [ 3303.007498] task: ffff80234df78000 task.stack: ffff00000f640000
      [ 3303.013412] PC is at hns_roce_v2_post_send+0x690/0xe20 [hns_roce_pci]
      [ 3303.019843] LR is at hns_roce_v2_post_send+0x658/0xe20 [hns_roce_pci]
      [ 3303.026269] pc : [<ffff0000020694f8>] lr : [<ffff0000020694c0>] pstate: 804001c9
      [ 3303.033649] sp : ffff00000f643870
      [ 3303.036951] x29: ffff00000f643870 x28: ffff80232bfa9c00
      [ 3303.042250] x27: ffff80234d909380 x26: ffff00003b37f0c0
      [ 3303.047549] x25: 0000000000000000 x24: 0000000000000003
      [ 3303.052848] x23: 0000000000000000 x22: 0000000000000000
      [ 3303.058148] x21: 0000000000000101 x20: 0000000000000001
      [ 3303.063447] x19: ffff80236163f800 x18: 0000000000000000
      [ 3303.068746] x17: 0000ffff86b76fc8 x16: ffff000008301600
      [ 3303.074045] x15: 000020a51c000000 x14: 3128726464615f65
      [ 3303.079344] x13: 746f6d6572202c29 x12: 303035312879656b
      [ 3303.084643] x11: 723a6f666e692072 x10: 573a6f666e693a5d
      [ 3303.089943] x9 : 0000000000000004 x8 : ffff8023ce38b000
      [ 3303.095242] x7 : ffff8023ce38b320 x6 : 0000000000000418
      [ 3303.100541] x5 : ffff80232bfa9cc8 x4 : 0000000000000030
      [ 3303.105839] x3 : 0000000000000100 x2 : 0000000000000200
      [ 3303.111138] x1 : 0000000000000320 x0 : ffff00003b3a0000
      [ 3303.116438] Process roce_test_main (pid: 19628, stack limit = 0xffff00000f640000)
      [ 3303.123906] Call trace:
      [ 3303.126339] Exception stack(0xffff00000f643730 to 0xffff00000f643870)
      [ 3303.215790] [<ffff0000020694f8>] hns_roce_v2_post_send+0x690/0xe20 [hns_roce_pci]
      [ 3303.223293] [<ffff0000021c3750>] rt_ktest_post_send+0x5d0/0x8b8 [rdma_test]
      [ 3303.230261] [<ffff0000021b3234>] exec_send_cmd+0x664/0x1350 [rdma_test]
      [ 3303.236881] [<ffff0000021b8b30>] rt_ktest_dispatch_cmd_3+0x1510/0x3790 [rdma_test]
      [ 3303.244455] [<ffff0000021bae54>] rt_ktest_dispatch_cmd_2+0xa4/0x118 [rdma_test]
      [ 3303.251770] [<ffff0000021bafec>] rt_ktest_dispatch_cmd+0x124/0xaa8 [rdma_test]
      [ 3303.258997] [<ffff0000021bbc3c>] rt_ktest_dev_write+0x2cc/0x568 [rdma_test]
      [ 3303.265947] [<ffff0000082ad688>] __vfs_write+0x60/0x18c
      [ 3303.271158] [<ffff0000082ad998>] vfs_write+0xa8/0x198
      [ 3303.276196] [<ffff0000082adc7c>] SyS_write+0x6c/0xd4
      [ 3303.281147] Exception stack(0xffff00000f643ec0 to 0xffff00000f644000)
      [ 3303.287573] 3ec0: 0000000000000003 0000fffffc85faa8 0000000000004e60 0000000000000000
      [ 3303.295388] 3ee0: 0000000021fb2000 000000000000ffff eff0e3efe4e58080 0000fffffcc724fe
      [ 3303.303204] 3f00: 0000000000000040 1999999999999999 0101010101010101 0000000000000038
      [ 3303.311019] 3f20: 0000000000000005 ffffffffffffffff 0d73757461747320 ffffffffffffffff
      [ 3303.318835] 3f40: 0000000000000000 0000000000459b00 0000fffffc85e360 000000000043d788
      [ 3303.326650] 3f60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      [ 3303.334465] 3f80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      [ 3303.342281] 3fa0: 0000000000000000 0000fffffc85e570 0000000000438804 0000fffffc85e570
      [ 3303.350096] 3fc0: 0000ffff8553f618 0000000080000000 0000000000000003 0000000000000040
      [ 3303.357911] 3fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      [ 3303.365729] [<ffff000008083808>] __sys_trace_return+0x0/0x4
      [ 3303.371288] Code: b94008e9 34000129 b9400ce2 110006b5 (b9000402)
      [ 3303.377377] ---[ end trace fd5ab98b3325cf9a ]---
      Reported-by: NJie Chen <chenjie103@huawei.com>
      Reported-by: NXiping Zhang (Francis) <zhangxiping3@huawei.com>
      Fixes: b1c15835("RDMA/hns: Get rid of virt_to_page and vmap calls after dma_alloc_coherent")
      Signed-off-by: NWei Hu (Xavier) <xavier.huwei@huawei.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      0b25c9cc
  15. 25 5月, 2018 1 次提交
  16. 24 5月, 2018 1 次提交
  17. 14 3月, 2018 1 次提交
  18. 05 2月, 2018 1 次提交
    • O
      RDMA/hns: Fix the endian problem for hns · 8b9b8d14
      oulijun 提交于
      The hip06 and hip08 run on a little endian ARM, it needs to
      revise the annotations to indicate that the HW uses little
      endian data in the various DMA buffers, and flow the necessary
      swaps throughout.
      
      The imm_data use big endian mode. The cpu_to_le32/le32_to_cpu
      swaps are no-op for this, which makes the only substantive
      change the handling of imm_data which is now mandatory swapped.
      
      This also keep match with the userspace hns driver and resolve
      the warning by sparse.
      Signed-off-by: NLijun Ou <oulijun@huawei.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      8b9b8d14
  19. 17 1月, 2018 2 次提交
  20. 16 1月, 2018 1 次提交
  21. 23 12月, 2017 1 次提交