1. 06 9月, 2018 3 次提交
  2. 05 9月, 2018 1 次提交
    • M
      IB/mlx5: Change TX affinity assignment in RoCE LAG mode · c6a21c38
      Majd Dibbiny 提交于
      In the current code, the TX affinity is per RoCE device, which can cause
      unfairness between different contexts. e.g. if we open two contexts, and
      each open 10 QPs concurrently, all of the QPs of the first context might
      end up on the first port instead of distributed on the two ports as
      expected
      
      To overcome this unfairness between processes, we maintain per device TX
      affinity, and per process TX affinity.
      
      The allocation algorithm is as follow:
      
      1. Hold two tx_port_affinity atomic variables, one per RoCE device and one
         per ucontext. Both initialized to 0.
      
      2. In mlx5_ib_alloc_ucontext do:
       2.1. ucontext.tx_port_affinity = device.tx_port_affinity
       2.2. device.tx_port_affinity += 1
      
      3. In modify QP INIT2RST:
       3.1. qp.tx_port_affinity = ucontext.tx_port_affinity % MLX5_PORT_NUM
       3.2. ucontext.tx_port_affinity += 1
      Signed-off-by: NMajd Dibbiny <majd@mellanox.com>
      Reviewed-by: NMoni Shoua <monis@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      c6a21c38
  3. 11 8月, 2018 1 次提交
  4. 31 7月, 2018 2 次提交
  5. 25 7月, 2018 4 次提交
  6. 14 7月, 2018 2 次提交
    • L
      RDMA/mlx5: Check that supplied blue flame index doesn't overflow · 05f58ceb
      Leon Romanovsky 提交于
      User's supplied index is checked again total number of system pages, but
      this number already includes num_static_sys_pages, so addition of that
      value to supplied index causes to below error while trying to access
      sys_pages[].
      
      BUG: KASAN: slab-out-of-bounds in bfregn_to_uar_index+0x34f/0x400
      Read of size 4 at addr ffff880065561904 by task syz-executor446/314
      
      CPU: 0 PID: 314 Comm: syz-executor446 Not tainted 4.18.0-rc1+ #256
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
      Call Trace:
       dump_stack+0xef/0x17e
       print_address_description+0x83/0x3b0
       kasan_report+0x18d/0x4d0
       bfregn_to_uar_index+0x34f/0x400
       create_user_qp+0x272/0x227d
       create_qp_common+0x32eb/0x43e0
       mlx5_ib_create_qp+0x379/0x1ca0
       create_qp.isra.5+0xc94/0x22d0
       ib_uverbs_create_qp+0x21b/0x2a0
       ib_uverbs_write+0xc2c/0x1010
       vfs_write+0x1b0/0x550
       ksys_write+0xc6/0x1a0
       do_syscall_64+0xa7/0x590
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x433679
      Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 91 fd ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fff2b3d8e48 EFLAGS: 00000217 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 00000000004002f8 RCX: 0000000000433679
      RDX: 0000000000000040 RSI: 0000000020000240 RDI: 0000000000000003
      RBP: 00000000006d4018 R08: 00000000004002f8 R09: 00000000004002f8
      R10: 00000000004002f8 R11: 0000000000000217 R12: 0000000000000000
      R13: 000000000040cb00 R14: 000000000040cb90 R15: 0000000000000006
      
      Allocated by task 314:
       kasan_kmalloc+0xa0/0xd0
       __kmalloc+0x1a9/0x510
       mlx5_ib_alloc_ucontext+0x966/0x2620
       ib_uverbs_get_context+0x23f/0xa60
       ib_uverbs_write+0xc2c/0x1010
       __vfs_write+0x10d/0x720
       vfs_write+0x1b0/0x550
       ksys_write+0xc6/0x1a0
       do_syscall_64+0xa7/0x590
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 1:
       __kasan_slab_free+0x12e/0x180
       kfree+0x159/0x630
       kvfree+0x37/0x50
       single_release+0x8e/0xf0
       __fput+0x2d8/0x900
       task_work_run+0x102/0x1f0
       exit_to_usermode_loop+0x159/0x1c0
       do_syscall_64+0x408/0x590
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff880065561100
       which belongs to the cache kmalloc-4096 of size 4096
      The buggy address is located 2052 bytes inside of
       4096-byte region [ffff880065561100, ffff880065562100)
      The buggy address belongs to the page:
      page:ffffea0001955800 count:1 mapcount:0 mapping:ffff88006c402480 index:0x0 compound_mapcount: 0
      flags: 0x4000000000008100(slab|head)
      raw: 4000000000008100 ffffea0001a7c000 0000000200000002 ffff88006c402480
      raw: 0000000000000000 0000000080070007 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff880065561800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff880065561880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      >ffff880065561900: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                         ^
       ffff880065561980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff880065561a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      
      Cc: <stable@vger.kernel.org> # 4.15
      Fixes: 1ee47ab3 ("IB/mlx5: Enable QP creation with a given blue flame index")
      Reported-by: NNoa Osherovich <noaos@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      05f58ceb
    • L
      RDMA/mlx5: Melt consecutive calls to alloc_bfreg() in one call · ffaf58de
      Leon Romanovsky 提交于
      There is no need for three consecutive calls to alloc_bfreg(). It can be
      implemented with one function.
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      ffaf58de
  7. 26 6月, 2018 1 次提交
    • Y
      IB/mlx5: Add support for drain SQ & RQ · d0e84c0a
      Yishai Hadas 提交于
      This patch follows the logic from ib_core but considers the internal
      device state upon executing the involved commands.
      
      Specifically,
      Upon internal error state modify QP to an error state can be assumed to
      be success as each in-progress WR going to be flushed in error in any
      case as expected by that modify command.
      
      In addition,
      As the drain should never fail the driver makes sure that post_send/recv
      will succeed even if the device is already in an internal error state.
      As such once the driver will supply the simulated/SW CQEs the CQE for
      the drain WR will be handled as well.
      
      In case of an internal error state the CQE for the drain WR may be
      completed as part of the main task that handled the error state or by
      the task that issued the drain WR.
      
      As the above depends on scheduling the code takes the relevant locks and
      actions to make sure that the completion handler for that WR will always
      be called after that the post_send/recv were issued but not in parallel
      to the other task that handles the error flow.
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      d0e84c0a
  8. 22 6月, 2018 1 次提交
  9. 20 6月, 2018 3 次提交
    • Y
      IB/mlx5: Expose DEVX tree · c59450c4
      Yishai Hadas 提交于
      Expose DEVX tree to be used by upper layers.
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      c59450c4
    • Y
      IB/mlx5: Add support for DEVX query UAR · 7c043e90
      Yishai Hadas 提交于
      Return a device UAR index for a given user index via the DEVX interface.
      
      Security note:
      The hardware protection mechanism works like this: Each device object that
      is subject to UAR doorbells (QP/SQ/CQ) gets a UAR ID (called uar_page in
      the device specification manual) upon its creation. Then upon doorbell,
      hardware fetches the object context for which the doorbell was rang, and
      validates that the UAR through which the DB was rang matches the UAR ID
      of the object.
      
      If no match the doorbell is silently ignored by the hardware.  Of
      course, the user cannot ring a doorbell on a UAR that was not mapped to
      it.
      
      Now in devx, as the devx kernel does not manipulate the QP/SQ/CQ command
      mailboxes (except tagging them with UID), we expose to the user its UAR
      ID, so it can embed it in these objects in the expected specification
      format. So the only thing the user can do is hurt itself by creating a
      QP/SQ/CQ with a UAR ID other than his, and then in this case other users
      may ring a doorbell on its objects.
      
      The consequence of that will be that another user can schedule a QP/SQ
      of the buggy user for execution (just insert it to the hardware schedule
      queue or arm its CQ for event generation), no further harm is expected.
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      7c043e90
    • Y
      IB/mlx5: Introduce DEVX · a8b92ca1
      Yishai Hadas 提交于
      Introduce DEVX to enable direct device commands in downstream patches
      from this series.
      
      In that mode of work the firmware manages the isolation between
      processes' resources and as such a DEVX user id is created and assigned
      to the given user context upon allocation request.
      
      A capability check is done to make sure that this feature is really
      supported by the firmware prior to creating the DEVX user id.
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      a8b92ca1
  10. 19 6月, 2018 1 次提交
  11. 02 6月, 2018 3 次提交
  12. 06 4月, 2018 2 次提交
  13. 05 4月, 2018 3 次提交
  14. 20 3月, 2018 1 次提交
  15. 15 3月, 2018 2 次提交
    • M
      IB/mlx5: Fix cleanup order on unload · 42cea83f
      Mark Bloch 提交于
      On load we create private CQ/QP/PD in order to be used by UMR, we create
      those resources after we register ourself as an IB device, and we destroy
      them after we unregister as an IB device. This was changed by commit
      16c1975f ("IB/mlx5: Create profile infrastructure to add and remove
      stages") which moved the destruction before we unregistration. This
      allowed to trigger an invalid memory access when unloading mlx5_ib while
      there are open resources:
      
      BUG: unable to handle kernel paging request at 00000001002c012c
      ...
      Call Trace:
       mlx5_ib_post_send_wait+0x75/0x110 [mlx5_ib]
       __slab_free+0x9a/0x2d0
       delay_time_func+0x10/0x10 [mlx5_ib]
       unreg_umr.isra.15+0x4b/0x50 [mlx5_ib]
       mlx5_mr_cache_free+0x46/0x150 [mlx5_ib]
       clean_mr+0xc9/0x190 [mlx5_ib]
       dereg_mr+0xba/0xf0 [mlx5_ib]
       ib_dereg_mr+0x13/0x20 [ib_core]
       remove_commit_idr_uobject+0x16/0x70 [ib_uverbs]
       uverbs_cleanup_ucontext+0xe8/0x1a0 [ib_uverbs]
       ib_uverbs_cleanup_ucontext.isra.9+0x19/0x40 [ib_uverbs]
       ib_uverbs_remove_one+0x162/0x2e0 [ib_uverbs]
       ib_unregister_device+0xd4/0x190 [ib_core]
       __mlx5_ib_remove+0x2e/0x40 [mlx5_ib]
       mlx5_remove_device+0xf5/0x120 [mlx5_core]
       mlx5_unregister_interface+0x37/0x90 [mlx5_core]
       mlx5_ib_cleanup+0xc/0x225 [mlx5_ib]
       SyS_delete_module+0x153/0x230
       do_syscall_64+0x62/0x110
       entry_SYSCALL_64_after_hwframe+0x21/0x86
      ...
      
      We restore the original behavior by breaking the UMR stage into two parts,
      pre and post IB registration stages, this way we can restore the original
      functionality and maintain clean separation of logic between stages.
      
      Fixes: 16c1975f ("IB/mlx5: Create profile infrastructure to add and remove stages")
      Signed-off-by: NMark Bloch <markb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      42cea83f
    • I
      IB/mlx5: Maintain a single emergency page · c44ef998
      Ilya Lesokhin 提交于
      The mlx5 driver needs to be able to issue invalidation to ODP MRs
      even if it cannot allocate memory. To this end it preallocates
      emergency pages to use when the situation arises.
      
      This flow should be extremely rare enough, that we don't need
      to worry about contention and therefore a single emergency page
      is good enough.
      Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      c44ef998
  16. 24 2月, 2018 5 次提交
  17. 15 2月, 2018 1 次提交
    • Y
      IB/mlx5: Implement fragmented completion queue (CQ) · 388ca8be
      Yonatan Cohen 提交于
      The current implementation of create CQ requires contiguous
      memory, such requirement is problematic once the memory is
      fragmented or the system is low in memory, it causes for
      failures in dma_zalloc_coherent().
      
      This patch implements new scheme of fragmented CQ to overcome
      this issue by introducing new type: 'struct mlx5_frag_buf_ctrl'
      to allocate fragmented buffers, rather than contiguous ones.
      
      Base the Completion Queues (CQs) on this new fragmented buffer.
      
      It fixes following crashes:
      kworker/29:0: page allocation failure: order:6, mode:0x80d0
      CPU: 29 PID: 8374 Comm: kworker/29:0 Tainted: G OE 3.10.0
      Workqueue: ib_cm cm_work_handler [ib_cm]
      Call Trace:
      [<>] dump_stack+0x19/0x1b
      [<>] warn_alloc_failed+0x110/0x180
      [<>] __alloc_pages_slowpath+0x6b7/0x725
      [<>] __alloc_pages_nodemask+0x405/0x420
      [<>] dma_generic_alloc_coherent+0x8f/0x140
      [<>] x86_swiotlb_alloc_coherent+0x21/0x50
      [<>] mlx5_dma_zalloc_coherent_node+0xad/0x110 [mlx5_core]
      [<>] ? mlx5_db_alloc_node+0x69/0x1b0 [mlx5_core]
      [<>] mlx5_buf_alloc_node+0x3e/0xa0 [mlx5_core]
      [<>] mlx5_buf_alloc+0x14/0x20 [mlx5_core]
      [<>] create_cq_kernel+0x90/0x1f0 [mlx5_ib]
      [<>] mlx5_ib_create_cq+0x3b0/0x4e0 [mlx5_ib]
      Signed-off-by: NYonatan Cohen <yonatanc@mellanox.com>
      Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      388ca8be
  18. 30 1月, 2018 1 次提交
  19. 19 1月, 2018 1 次提交
  20. 09 1月, 2018 2 次提交