1. 19 4月, 2020 1 次提交
  2. 27 3月, 2020 2 次提交
  3. 25 3月, 2020 1 次提交
    • L
      RDMA/mlx5: Fix access to wrong pointer while performing flush due to error · 950bf4f1
      Leon Romanovsky 提交于
      The main difference between send and receive SW completions is related to
      separate treatment of WQ queue. For receive completions, the initial index
      to be flushed is stored in "tail", while for send completions, it is in
      deleted "last_poll".
      
        CPU: 54 PID: 53405 Comm: kworker/u161:0 Kdump: loaded Tainted: G           OE    --------- -t - 4.18.0-147.el8.ppc64le #1
        Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core]
        NIP:  c000003c7c00a000 LR: c00800000e586af4 CTR: c000003c7c00a000
        REGS: c0000036cc9db940 TRAP: 0400   Tainted: G           OE    --------- -t -  (4.18.0-147.el8.ppc64le)
        MSR:  9000000010009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24004488  XER: 20040000
        CFAR: c00800000e586af0 IRQMASK: 0
        GPR00: c00800000e586ab4 c0000036cc9dbbc0 c00800000e5f1a00 c0000037d8433800
        GPR04: c000003895a26800 c0000037293f2000 0000000000000201 0000000000000011
        GPR08: c000003895a26c80 c000003c7c00a000 0000000000000000 c00800000ed30438
        GPR12: c000003c7c00a000 c000003fff684b80 c00000000017c388 c00000396ec4be40
        GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
        GPR20: c00000000151e498 0000000000000010 c000003895a26848 0000000000000010
        GPR24: 0000000000000010 0000000000010000 c000003895a26800 0000000000000000
        GPR28: 0000000000000010 c0000037d8433800 c000003895a26c80 c000003895a26800
        NIP [c000003c7c00a000] 0xc000003c7c00a000
        LR [c00800000e586af4] __ib_process_cq+0xec/0x1b0 [ib_core]
        Call Trace:
        [c0000036cc9dbbc0] [c00800000e586ab4] __ib_process_cq+0xac/0x1b0 [ib_core] (unreliable)
        [c0000036cc9dbc40] [c00800000e586c88] ib_cq_poll_work+0x40/0xb0 [ib_core]
        [c0000036cc9dbc70] [c000000000171f44] process_one_work+0x2f4/0x5c0
        [c0000036cc9dbd10] [c000000000172a0c] worker_thread+0xcc/0x760
        [c0000036cc9dbdc0] [c00000000017c52c] kthread+0x1ac/0x1c0
        [c0000036cc9dbe30] [c00000000000b75c] ret_from_kernel_thread+0x5c/0x80
      
      Fixes: 8e3b6883 ("RDMA/mlx5: Delete unreachable handle_atomic code by simplifying SW completion")
      Link: https://lore.kernel.org/r/20200318091640.44069-1-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      950bf4f1
  4. 16 1月, 2020 1 次提交
  5. 17 11月, 2019 1 次提交
  6. 29 10月, 2019 1 次提交
  7. 04 7月, 2019 2 次提交
  8. 25 6月, 2019 1 次提交
  9. 21 6月, 2019 1 次提交
  10. 12 6月, 2019 2 次提交
  11. 02 4月, 2019 3 次提交
  12. 05 2月, 2019 1 次提交
  13. 11 1月, 2019 1 次提交
  14. 19 12月, 2018 1 次提交
  15. 10 12月, 2018 1 次提交
  16. 04 12月, 2018 2 次提交
  17. 19 10月, 2018 1 次提交
    • T
      net/mlx5: Refactor fragmented buffer struct fields and init flow · 4972e6fa
      Tariq Toukan 提交于
      Take struct mlx5_frag_buf out of mlx5_frag_buf_ctrl, as it is not
      needed to manage and control the datapath of the fragmented buffers API.
      
      struct mlx5_frag_buf contains control info to manage the allocation
      and de-allocation of the fragmented buffer.
      Its fields are not relevant for datapath, so here I take them out of the
      struct mlx5_frag_buf_ctrl, except for the fragments array itself.
      
      In addition, modified mlx5_fill_fbc to initialise the frags pointers
      as well. This implies that the buffer must be allocated before the
      function is called.
      
      A set of type-specific *_get_byte_size() functions are replaced by
      a generic one.
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      4972e6fa
  18. 17 10月, 2018 1 次提交
  19. 26 9月, 2018 1 次提交
  20. 01 8月, 2018 1 次提交
  21. 30 5月, 2018 1 次提交
  22. 24 5月, 2018 1 次提交
    • E
      IB/mlx5: Fetch soft WQE's on fatal error state · 7b74a83c
      Erez Shitrit 提交于
      On fatal error the driver simulates CQE's for ULPs that rely on
      completion of all their posted work-request.
      
      For the GSI traffic, the mlx5 has its own mechanism that sends the
      completions via software CQE's directly to the relevant CQ.
      
      This should be kept in fatal error too, so the driver should simulate
      such CQE's with the specified error state in order to complete GSI QP
      work requests.
      
      Without the fix the next deadlock might appears:
              schedule_timeout+0x274/0x350
              wait_for_common+0xec/0x240
              mcast_remove_one+0xd0/0x120 [ib_core]
              ib_unregister_device+0x12c/0x230 [ib_core]
              mlx5_ib_remove+0xc4/0x270 [mlx5_ib]
              mlx5_detach_device+0x184/0x1a0 [mlx5_core]
              mlx5_unload_one+0x308/0x340 [mlx5_core]
              mlx5_pci_err_detected+0x74/0xe0 [mlx5_core]
      
      Cc: <stable@vger.kernel.org> # 4.7
      Fixes: 89ea94a7 ("IB/mlx5: Reset flow support for IB kernel ULPs")
      Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      7b74a83c
  23. 17 5月, 2018 1 次提交
  24. 28 3月, 2018 1 次提交
  25. 10 3月, 2018 2 次提交
    • L
      RDMA/mlx5: Fix integer overflow while resizing CQ · 28e9091e
      Leon Romanovsky 提交于
      The user can provide very large cqe_size which will cause to integer
      overflow as it can be seen in the following UBSAN warning:
      
      =======================================================================
      UBSAN: Undefined behaviour in drivers/infiniband/hw/mlx5/cq.c:1192:53
      signed integer overflow:
      64870 * 65536 cannot be represented in type 'int'
      CPU: 0 PID: 267 Comm: syzkaller605279 Not tainted 4.15.0+ #90 Hardware
      name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
      rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
      Call Trace:
       dump_stack+0xde/0x164
       ? dma_virt_map_sg+0x22c/0x22c
       ubsan_epilogue+0xe/0x81
       handle_overflow+0x1f3/0x251
       ? __ubsan_handle_negate_overflow+0x19b/0x19b
       ? lock_acquire+0x440/0x440
       mlx5_ib_resize_cq+0x17e7/0x1e40
       ? cyc2ns_read_end+0x10/0x10
       ? native_read_msr_safe+0x6c/0x9b
       ? cyc2ns_read_end+0x10/0x10
       ? mlx5_ib_modify_cq+0x220/0x220
       ? sched_clock_cpu+0x18/0x200
       ? lookup_get_idr_uobject+0x200/0x200
       ? rdma_lookup_get_uobject+0x145/0x2f0
       ib_uverbs_resize_cq+0x207/0x3e0
       ? ib_uverbs_ex_create_cq+0x250/0x250
       ib_uverbs_write+0x7f9/0xef0
       ? cyc2ns_read_end+0x10/0x10
       ? print_irqtrace_events+0x280/0x280
       ? ib_uverbs_ex_create_cq+0x250/0x250
       ? uverbs_devnode+0x110/0x110
       ? sched_clock_cpu+0x18/0x200
       ? do_raw_spin_trylock+0x100/0x100
       ? __lru_cache_add+0x16e/0x290
       __vfs_write+0x10d/0x700
       ? uverbs_devnode+0x110/0x110
       ? kernel_read+0x170/0x170
       ? sched_clock_cpu+0x18/0x200
       ? security_file_permission+0x93/0x260
       vfs_write+0x1b0/0x550
       SyS_write+0xc7/0x1a0
       ? SyS_read+0x1a0/0x1a0
       ? trace_hardirqs_on_thunk+0x1a/0x1c
       entry_SYSCALL_64_fastpath+0x1e/0x8b
      RIP: 0033:0x433549
      RSP: 002b:00007ffe63bd1ea8 EFLAGS: 00000217
      =======================================================================
      
      Cc: syzkaller <syzkaller@googlegroups.com>
      Cc: <stable@vger.kernel.org> # 3.13
      Fixes: bde51583 ("IB/mlx5: Add support for resize CQ")
      Reported-by: NNoa Osherovich <noaos@mellanox.com>
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      28e9091e
    • D
      Revert "RDMA/mlx5: Fix integer overflow while resizing CQ" · 212a0cbc
      Doug Ledford 提交于
      The original commit of this patch has a munged log message that is
      missing several of the tags the original author intended to be on the
      patch.  This was due to patchworks misinterpreting a cut-n-paste
      separator line as an end of message line and munging the mbox that was
      used to import the patch:
      
      https://patchwork.kernel.org/patch/10264089/
      
      The original patch will be reapplied with a fixed commit message so the
      proper tags are applied.
      
      This reverts commit aa0de36a.
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      212a0cbc
  26. 08 3月, 2018 1 次提交
  27. 01 3月, 2018 1 次提交
  28. 15 2月, 2018 1 次提交
    • Y
      IB/mlx5: Implement fragmented completion queue (CQ) · 388ca8be
      Yonatan Cohen 提交于
      The current implementation of create CQ requires contiguous
      memory, such requirement is problematic once the memory is
      fragmented or the system is low in memory, it causes for
      failures in dma_zalloc_coherent().
      
      This patch implements new scheme of fragmented CQ to overcome
      this issue by introducing new type: 'struct mlx5_frag_buf_ctrl'
      to allocate fragmented buffers, rather than contiguous ones.
      
      Base the Completion Queues (CQs) on this new fragmented buffer.
      
      It fixes following crashes:
      kworker/29:0: page allocation failure: order:6, mode:0x80d0
      CPU: 29 PID: 8374 Comm: kworker/29:0 Tainted: G OE 3.10.0
      Workqueue: ib_cm cm_work_handler [ib_cm]
      Call Trace:
      [<>] dump_stack+0x19/0x1b
      [<>] warn_alloc_failed+0x110/0x180
      [<>] __alloc_pages_slowpath+0x6b7/0x725
      [<>] __alloc_pages_nodemask+0x405/0x420
      [<>] dma_generic_alloc_coherent+0x8f/0x140
      [<>] x86_swiotlb_alloc_coherent+0x21/0x50
      [<>] mlx5_dma_zalloc_coherent_node+0xad/0x110 [mlx5_core]
      [<>] ? mlx5_db_alloc_node+0x69/0x1b0 [mlx5_core]
      [<>] mlx5_buf_alloc_node+0x3e/0xa0 [mlx5_core]
      [<>] mlx5_buf_alloc+0x14/0x20 [mlx5_core]
      [<>] create_cq_kernel+0x90/0x1f0 [mlx5_ib]
      [<>] mlx5_ib_create_cq+0x3b0/0x4e0 [mlx5_ib]
      Signed-off-by: NYonatan Cohen <yonatanc@mellanox.com>
      Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      388ca8be
  29. 30 1月, 2018 1 次提交
  30. 14 11月, 2017 1 次提交
  31. 26 10月, 2017 2 次提交
  32. 15 10月, 2017 1 次提交