1. 22 11月, 2018 1 次提交
  2. 17 10月, 2018 4 次提交
  3. 16 10月, 2018 1 次提交
  4. 04 10月, 2018 1 次提交
  5. 28 9月, 2018 1 次提交
  6. 27 9月, 2018 1 次提交
  7. 26 9月, 2018 8 次提交
  8. 22 9月, 2018 3 次提交
  9. 13 9月, 2018 1 次提交
  10. 07 9月, 2018 1 次提交
  11. 05 9月, 2018 1 次提交
    • M
      IB/mlx5: Change TX affinity assignment in RoCE LAG mode · c6a21c38
      Majd Dibbiny 提交于
      In the current code, the TX affinity is per RoCE device, which can cause
      unfairness between different contexts. e.g. if we open two contexts, and
      each open 10 QPs concurrently, all of the QPs of the first context might
      end up on the first port instead of distributed on the two ports as
      expected
      
      To overcome this unfairness between processes, we maintain per device TX
      affinity, and per process TX affinity.
      
      The allocation algorithm is as follow:
      
      1. Hold two tx_port_affinity atomic variables, one per RoCE device and one
         per ucontext. Both initialized to 0.
      
      2. In mlx5_ib_alloc_ucontext do:
       2.1. ucontext.tx_port_affinity = device.tx_port_affinity
       2.2. device.tx_port_affinity += 1
      
      3. In modify QP INIT2RST:
       3.1. qp.tx_port_affinity = ucontext.tx_port_affinity % MLX5_PORT_NUM
       3.2. ucontext.tx_port_affinity += 1
      Signed-off-by: NMajd Dibbiny <majd@mellanox.com>
      Reviewed-by: NMoni Shoua <monis@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      c6a21c38
  12. 15 8月, 2018 1 次提交
  13. 08 8月, 2018 1 次提交
    • L
      RDMA/mlx5: Fix shift overflow in mlx5_ib_create_wq · 0dfe4522
      Leon Romanovsky 提交于
      [   61.182439] UBSAN: Undefined behaviour in drivers/infiniband/hw/mlx5/qp.c:5366:34
      [   61.183673] shift exponent 4294967288 is too large for 32-bit type 'unsigned int'
      [   61.185530] CPU: 0 PID: 639 Comm: qp Not tainted 4.18.0-rc1-00037-g4aa1d69a9c60-dirty #96
      [   61.186981] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
      [   61.188315] Call Trace:
      [   61.188661]  dump_stack+0xc7/0x13b
      [   61.190427]  ubsan_epilogue+0x9/0x49
      [   61.190899]  __ubsan_handle_shift_out_of_bounds+0x1ea/0x22f
      [   61.197040]  mlx5_ib_create_wq+0x1c99/0x1d50
      [   61.206632]  ib_uverbs_ex_create_wq+0x499/0x820
      [   61.213892]  ib_uverbs_write+0x77e/0xae0
      [   61.248018]  vfs_write+0x121/0x3b0
      [   61.249831]  ksys_write+0xa1/0x120
      [   61.254024]  do_syscall_64+0x7c/0x2a0
      [   61.256178]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   61.259211] RIP: 0033:0x7f54bab70e99
      [   61.262125] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89
      [   61.268678] RSP: 002b:00007ffe1541c318 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [   61.271076] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f54bab70e99
      [   61.273795] RDX: 0000000000000070 RSI: 0000000020000240 RDI: 0000000000000003
      [   61.276982] RBP: 00007ffe1541c330 R08: 00000000200078e0 R09: 0000000000000002
      [   61.280035] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004005c0
      [   61.283279] R13: 00007ffe1541c420 R14: 0000000000000000 R15: 0000000000000000
      
      Cc: <stable@vger.kernel.org> # 4.7
      Fixes: 79b20a6c ("IB/mlx5: Add receive Work Queue verbs")
      Cc: syzkaller <syzkaller@googlegroups.com>
      Reported-by: NNoa Osherovich <noaos@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      0dfe4522
  14. 31 7月, 2018 3 次提交
  15. 14 7月, 2018 2 次提交
    • L
      RDMA/mlx5: Check that supplied blue flame index doesn't overflow · 05f58ceb
      Leon Romanovsky 提交于
      User's supplied index is checked again total number of system pages, but
      this number already includes num_static_sys_pages, so addition of that
      value to supplied index causes to below error while trying to access
      sys_pages[].
      
      BUG: KASAN: slab-out-of-bounds in bfregn_to_uar_index+0x34f/0x400
      Read of size 4 at addr ffff880065561904 by task syz-executor446/314
      
      CPU: 0 PID: 314 Comm: syz-executor446 Not tainted 4.18.0-rc1+ #256
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
      Call Trace:
       dump_stack+0xef/0x17e
       print_address_description+0x83/0x3b0
       kasan_report+0x18d/0x4d0
       bfregn_to_uar_index+0x34f/0x400
       create_user_qp+0x272/0x227d
       create_qp_common+0x32eb/0x43e0
       mlx5_ib_create_qp+0x379/0x1ca0
       create_qp.isra.5+0xc94/0x22d0
       ib_uverbs_create_qp+0x21b/0x2a0
       ib_uverbs_write+0xc2c/0x1010
       vfs_write+0x1b0/0x550
       ksys_write+0xc6/0x1a0
       do_syscall_64+0xa7/0x590
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x433679
      Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 91 fd ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fff2b3d8e48 EFLAGS: 00000217 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 00000000004002f8 RCX: 0000000000433679
      RDX: 0000000000000040 RSI: 0000000020000240 RDI: 0000000000000003
      RBP: 00000000006d4018 R08: 00000000004002f8 R09: 00000000004002f8
      R10: 00000000004002f8 R11: 0000000000000217 R12: 0000000000000000
      R13: 000000000040cb00 R14: 000000000040cb90 R15: 0000000000000006
      
      Allocated by task 314:
       kasan_kmalloc+0xa0/0xd0
       __kmalloc+0x1a9/0x510
       mlx5_ib_alloc_ucontext+0x966/0x2620
       ib_uverbs_get_context+0x23f/0xa60
       ib_uverbs_write+0xc2c/0x1010
       __vfs_write+0x10d/0x720
       vfs_write+0x1b0/0x550
       ksys_write+0xc6/0x1a0
       do_syscall_64+0xa7/0x590
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 1:
       __kasan_slab_free+0x12e/0x180
       kfree+0x159/0x630
       kvfree+0x37/0x50
       single_release+0x8e/0xf0
       __fput+0x2d8/0x900
       task_work_run+0x102/0x1f0
       exit_to_usermode_loop+0x159/0x1c0
       do_syscall_64+0x408/0x590
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff880065561100
       which belongs to the cache kmalloc-4096 of size 4096
      The buggy address is located 2052 bytes inside of
       4096-byte region [ffff880065561100, ffff880065562100)
      The buggy address belongs to the page:
      page:ffffea0001955800 count:1 mapcount:0 mapping:ffff88006c402480 index:0x0 compound_mapcount: 0
      flags: 0x4000000000008100(slab|head)
      raw: 4000000000008100 ffffea0001a7c000 0000000200000002 ffff88006c402480
      raw: 0000000000000000 0000000080070007 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff880065561800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff880065561880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      >ffff880065561900: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                         ^
       ffff880065561980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff880065561a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      
      Cc: <stable@vger.kernel.org> # 4.15
      Fixes: 1ee47ab3 ("IB/mlx5: Enable QP creation with a given blue flame index")
      Reported-by: NNoa Osherovich <noaos@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      05f58ceb
    • L
      RDMA/mlx5: Melt consecutive calls to alloc_bfreg() in one call · ffaf58de
      Leon Romanovsky 提交于
      There is no need for three consecutive calls to alloc_bfreg(). It can be
      implemented with one function.
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      ffaf58de
  16. 26 6月, 2018 1 次提交
    • Y
      IB/mlx5: Add support for drain SQ & RQ · d0e84c0a
      Yishai Hadas 提交于
      This patch follows the logic from ib_core but considers the internal
      device state upon executing the involved commands.
      
      Specifically,
      Upon internal error state modify QP to an error state can be assumed to
      be success as each in-progress WR going to be flushed in error in any
      case as expected by that modify command.
      
      In addition,
      As the drain should never fail the driver makes sure that post_send/recv
      will succeed even if the device is already in an internal error state.
      As such once the driver will supply the simulated/SW CQEs the CQE for
      the drain WR will be handled as well.
      
      In case of an internal error state the CQE for the drain WR may be
      completed as part of the main task that handled the error state or by
      the task that issued the drain WR.
      
      As the above depends on scheduling the code takes the relevant locks and
      actions to make sure that the completion handler for that WR will always
      be called after that the post_send/recv were issued but not in parallel
      to the other task that handles the error flow.
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      d0e84c0a
  17. 20 6月, 2018 1 次提交
    • Y
      IB/mlx5: Add support for DEVX query UAR · 7c043e90
      Yishai Hadas 提交于
      Return a device UAR index for a given user index via the DEVX interface.
      
      Security note:
      The hardware protection mechanism works like this: Each device object that
      is subject to UAR doorbells (QP/SQ/CQ) gets a UAR ID (called uar_page in
      the device specification manual) upon its creation. Then upon doorbell,
      hardware fetches the object context for which the doorbell was rang, and
      validates that the UAR through which the DB was rang matches the UAR ID
      of the object.
      
      If no match the doorbell is silently ignored by the hardware.  Of
      course, the user cannot ring a doorbell on a UAR that was not mapped to
      it.
      
      Now in devx, as the devx kernel does not manipulate the QP/SQ/CQ command
      mailboxes (except tagging them with UID), we expose to the user its UAR
      ID, so it can embed it in these objects in the expected specification
      format. So the only thing the user can do is hurt itself by creating a
      QP/SQ/CQ with a UAR ID other than his, and then in this case other users
      may ring a doorbell on its objects.
      
      The consequence of that will be that another user can schedule a QP/SQ
      of the buggy user for execution (just insert it to the hardware schedule
      queue or arm its CQ for event generation), no further harm is expected.
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      7c043e90
  18. 19 6月, 2018 1 次提交
  19. 10 5月, 2018 1 次提交
    • I
      IB/mlx5: posting klm/mtt list inline in the send queue for reg_wr · 064e5262
      Idan Burstein 提交于
      As most kernel RDMA ULPs, (e.g. NVMe over Fabrics in its default
      "register_always=Y" mode) registers and invalidates user buffer
      upon each IO.
      
      Today the mlx5 driver is posting the registration work
      request using scatter/gather entry for the MTT/KLM list.
      The fetch of the MTT/KLM list becomes the bottleneck in
      number of IO operation could be done by NVMe over Fabrics
      host driver on a single adapter as shown below.
      
      This patch is adding the support for inline registration
      work request upon MTT/KLM list of size <=64B.
      
      The result for NVMe over Fabrics is increase of > x3.5 for small
      IOs as shown below, I expect other ULPs (e.g iSER, SRP, NFS over RDMA)
      performance to be enhanced as well.
      
      The following results were taken against a single NVMe-oF (RoCE link layer)
      subsystem with a single namespace backed by null_blk using fio benchmark
      (with rw=randread, numjobs=48, iodepth={16,64}, ioengine=libaio direct=1):
      
      ConnectX-5 (pci Width x16)
      ---------------------------
      
      Block Size       s/g reg_wr            inline reg_wr
      ++++++++++     +++++++++++++++        ++++++++++++++++
      512B            1302.8K/34.82%         4951.9K/99.02%
      1KB             1284.3K/33.86%         4232.7K/98.09%
      2KB             1238.6K/34.1%          2797.5K/80.04%
      4KB             1169.3K/32.46%         1941.3K/61.35%
      8KB             1013.4K/30.08%         1236.6K/39.47%
      16KB            695.7K/20.19%          696.9K/20.59%
      32KB            350.3K/9.64%           350.6K/10.3%
      64KB            175.86K/5.27%          175.9K/5.28%
      
      ConnectX-4 (pci Width x8)
      ---------------------------
      
      Block Size       s/g reg_wr            inline reg_wr
      ++++++++++     +++++++++++++++        ++++++++++++++++
      512B            1285.8K/42.66%          4242.7K/98.18%
      1KB             1254.1K/41.74%          3569.2K/96.00%
      2KB             1185.9K/39.83%          2173.9K/75.58%
      4KB             1069.4K/36.46%          1343.3K/47.47%
      8KB             755.1K/27.77%           748.7K/29.14%
      Tested-by: NNitzan Carmi <nitzanc@mellanox.com>
      Signed-off-by: NIdan Burstein <idanb@mellanox.com>
      Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      064e5262
  20. 09 5月, 2018 1 次提交
  21. 27 4月, 2018 2 次提交
    • D
      IB/mlx5: Use unlimited rate when static rate is not supported · 4f32ac2e
      Danit Goldberg 提交于
      Before the change, if the user passed a static rate value different
      than zero and the FW doesn't support static rate,
      it would end up configuring rate of 2.5 GBps.
      
      Fix this by using rate 0; unlimited, in cases where FW
      doesn't support static rate configuration.
      
      Cc: <stable@vger.kernel.org> # 3.10
      Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Reviewed-by: NMajd Dibbiny <majd@mellanox.com>
      Signed-off-by: NDanit Goldberg <danitg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      4f32ac2e
    • L
      RDMA/mlx5: Protect from shift operand overflow · 002bf228
      Leon Romanovsky 提交于
      Ensure that user didn't supply values too large that can cause overflow.
      
      UBSAN: Undefined behaviour in drivers/infiniband/hw/mlx5/qp.c:263:23
      shift exponent -2147483648 is negative
      CPU: 0 PID: 292 Comm: syzkaller612609 Not tainted 4.16.0-rc1+ #131
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014 Call
      Trace:
      dump_stack+0xde/0x164
      ubsan_epilogue+0xe/0x81
      set_rq_size+0x7c2/0xa90
      create_qp_common+0xc18/0x43c0
      mlx5_ib_create_qp+0x379/0x1ca0
      create_qp.isra.5+0xc94/0x2260
      ib_uverbs_create_qp+0x21b/0x2a0
      ib_uverbs_write+0xc2c/0x1010
      vfs_write+0x1b0/0x550
      SyS_write+0xc7/0x1a0
      do_syscall_64+0x1aa/0x740
      entry_SYSCALL_64_after_hwframe+0x26/0x9b
      RIP: 0033:0x433569
      RSP: 002b:00007ffc6e62f448 EFLAGS: 00000217 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 00000000004002f8 RCX: 0000000000433569
      RDX: 0000000000000070 RSI: 00000000200042c0 RDI: 0000000000000003
      RBP: 00000000006d5018 R08: 00000000004002f8 R09: 00000000004002f8
      R10: 00000000004002f8 R11: 0000000000000217 R12: 0000000000000000
      R13: 000000000040c9f0 R14: 000000000040ca80 R15: 0000000000000006
      
      Cc: <stable@vger.kernel.org> # 3.10
      Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Cc: syzkaller <syzkaller@googlegroups.com>
      Reported-by: NNoa Osherovich <noaos@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      002bf228
  22. 05 4月, 2018 1 次提交
  23. 04 4月, 2018 1 次提交
  24. 28 3月, 2018 1 次提交