1. 31 7月, 2018 5 次提交
    • J
      IB/mlx4: Use 4K pages for kernel QP's WQE buffer · f95ccffc
      Jack Morgenstein 提交于
      In the current implementation, the driver tries to allocate contiguous
      memory, and if it fails, it falls back to 4K fragmented allocation.
      
      Once the memory is fragmented, the first allocation might take a lot
      of time, and even fail, which can cause connection failures.
      
      This patch changes the logic to always allocate with 4K granularity,
      since it's more robust and more likely to succeed.
      
      This patch was tested with Lustre and no performance degradation
      was observed.
      
      Note: This commit eliminates the "shrinking WQE" feature. This feature
      depended on using vmap to create a virtually contiguous send WQ.
      vmap use was abandoned due to problems with several processors (see the
      commit cited below). As a result, shrinking WQE was available only with
      physically contiguous send WQs. Allocating such send WQs caused the
      problems described above.
      Therefore, as a side effect of eliminating the use of large physically
      contiguous send WQs, the shrinking WQE feature became unavailable.
      
      Warning example:
      worker/20:1: page allocation failure: order:8, mode:0x80d0
      CPU: 20 PID: 513 Comm: kworker/20:1 Tainted: G OE ------------
      Workqueue: ib_cm cm_work_handler [ib_cm]
      Call Trace:
      [<ffffffff81686d81>] dump_stack+0x19/0x1b
      [<ffffffff81186160>] warn_alloc_failed+0x110/0x180
      [<ffffffff8118a954>] __alloc_pages_nodemask+0x9b4/0xba0
      [<ffffffff811ce868>] alloc_pages_current+0x98/0x110
      [<ffffffff81184fae>] __get_free_pages+0xe/0x50
      [<ffffffff8133f6fe>] swiotlb_alloc_coherent+0x5e/0x150
      [<ffffffff81062551>] x86_swiotlb_alloc_coherent+0x41/0x50
      [<ffffffffa056b4c4>] mlx4_buf_direct_alloc.isra.7+0xc4/0x180 [mlx4_core]
      [<ffffffffa056b73b>] mlx4_buf_alloc+0x1bb/0x260 [mlx4_core]
      [<ffffffffa0b15496>] create_qp_common+0x536/0x1000 [mlx4_ib]
      [<ffffffff811c6ef7>] ? dma_pool_free+0xa7/0xd0
      [<ffffffffa0b163c1>] mlx4_ib_create_qp+0x3b1/0xdc0 [mlx4_ib]
      [<ffffffffa0b01bc2>] ? mlx4_ib_create_cq+0x2d2/0x430 [mlx4_ib]
      [<ffffffffa0b21f20>] mlx4_ib_create_qp_wrp+0x10/0x20 [mlx4_ib]
      [<ffffffffa08f152a>] ib_create_qp+0x7a/0x2f0 [ib_core]
      [<ffffffffa06205d4>] rdma_create_qp+0x34/0xb0 [rdma_cm]
      [<ffffffffa08275c9>] kiblnd_create_conn+0xbf9/0x1950 [ko2iblnd]
      [<ffffffffa074077a>] ? cfs_percpt_unlock+0x1a/0xb0 [libcfs]
      [<ffffffffa0835519>] kiblnd_passive_connect+0xa99/0x18c0 [ko2iblnd]
      
      Fixes: 73898db0 ("net/mlx4: Avoid wrong virtual mappings")
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      f95ccffc
    • J
      IB/uverbs: Add UVERBS_ATTR_FLAGS_IN to the specs language · bccd0622
      Jason Gunthorpe 提交于
      This clearly indicates that the input is a bitwise combination of values
      in an enum, and identifies which enum contains the definition of the bits.
      
      Special accessors are provided that handle the mandatory validation of the
      allowed bits and enforce the correct type for bitwise flags.
      
      If we had introduced this at the start then the kabi would have uniformly
      used u64 data to pass flags, however today there is a mixture of u64 and
      u32 flags. All places are converted to accept both sizes and the accessor
      fixes it. This allows all existing flags to grow to u64 in future without
      any hassle.
      
      Finally all flags are, by definition, optional. If flags are not passed
      the accessor does not fail, but provides a value of zero.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
      bccd0622
    • B
      RDMA, core and ULPs: Declare ib_post_send() and ib_post_recv() arguments const · d34ac5cd
      Bart Van Assche 提交于
      Since neither ib_post_send() nor ib_post_recv() modify the data structure
      their second argument points at, declare that argument const. This change
      makes it necessary to declare the 'bad_wr' argument const too and also to
      modify all ULPs that call ib_post_send(), ib_post_recv() or
      ib_post_srq_recv(). This patch does not change any functionality but makes
      it possible for the compiler to verify whether the
      ib_post_(send|recv|srq_recv) really do not modify the posted work request.
      
      To make this possible, only one cast had to be introduce that casts away
      constness, namely in rpcrdma_post_recvs(). The only way I can think of to
      avoid that cast is to introduce an additional loop in that function or to
      change the data type of bad_wr from struct ib_recv_wr ** into int
      (an index that refers to an element in the work request list). However,
      both approaches would require even more extensive changes than this
      patch.
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      d34ac5cd
    • B
      IB/mlx5, ib_post_send(), IB_WR_REG_SIG_MR: Do not modify the 'wr' argument · 7bb1fafc
      Bart Van Assche 提交于
      Since the next patch will constify the wr pointer, do not modify the data
      that pointer points at.
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Cc: Saeed Mahameed <saeedm@mellanox.com>
      Acked-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      7bb1fafc
    • B
      RDMA: Constify the argument of the work request conversion functions · f696bf6d
      Bart Van Assche 提交于
      When posting a send work request, the work request that is posted is not
      modified by any of the RDMA drivers. Make this explicit by constifying
      most ib_send_wr pointers in RDMA transport drivers.
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
      Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      f696bf6d
  2. 27 7月, 2018 9 次提交
  3. 26 7月, 2018 6 次提交
  4. 25 7月, 2018 5 次提交
  5. 24 7月, 2018 2 次提交
  6. 19 7月, 2018 1 次提交
  7. 14 7月, 2018 3 次提交
    • L
      RDMA/mlx5: Check that supplied blue flame index doesn't overflow · 05f58ceb
      Leon Romanovsky 提交于
      User's supplied index is checked again total number of system pages, but
      this number already includes num_static_sys_pages, so addition of that
      value to supplied index causes to below error while trying to access
      sys_pages[].
      
      BUG: KASAN: slab-out-of-bounds in bfregn_to_uar_index+0x34f/0x400
      Read of size 4 at addr ffff880065561904 by task syz-executor446/314
      
      CPU: 0 PID: 314 Comm: syz-executor446 Not tainted 4.18.0-rc1+ #256
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
      Call Trace:
       dump_stack+0xef/0x17e
       print_address_description+0x83/0x3b0
       kasan_report+0x18d/0x4d0
       bfregn_to_uar_index+0x34f/0x400
       create_user_qp+0x272/0x227d
       create_qp_common+0x32eb/0x43e0
       mlx5_ib_create_qp+0x379/0x1ca0
       create_qp.isra.5+0xc94/0x22d0
       ib_uverbs_create_qp+0x21b/0x2a0
       ib_uverbs_write+0xc2c/0x1010
       vfs_write+0x1b0/0x550
       ksys_write+0xc6/0x1a0
       do_syscall_64+0xa7/0x590
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x433679
      Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 91 fd ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fff2b3d8e48 EFLAGS: 00000217 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 00000000004002f8 RCX: 0000000000433679
      RDX: 0000000000000040 RSI: 0000000020000240 RDI: 0000000000000003
      RBP: 00000000006d4018 R08: 00000000004002f8 R09: 00000000004002f8
      R10: 00000000004002f8 R11: 0000000000000217 R12: 0000000000000000
      R13: 000000000040cb00 R14: 000000000040cb90 R15: 0000000000000006
      
      Allocated by task 314:
       kasan_kmalloc+0xa0/0xd0
       __kmalloc+0x1a9/0x510
       mlx5_ib_alloc_ucontext+0x966/0x2620
       ib_uverbs_get_context+0x23f/0xa60
       ib_uverbs_write+0xc2c/0x1010
       __vfs_write+0x10d/0x720
       vfs_write+0x1b0/0x550
       ksys_write+0xc6/0x1a0
       do_syscall_64+0xa7/0x590
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 1:
       __kasan_slab_free+0x12e/0x180
       kfree+0x159/0x630
       kvfree+0x37/0x50
       single_release+0x8e/0xf0
       __fput+0x2d8/0x900
       task_work_run+0x102/0x1f0
       exit_to_usermode_loop+0x159/0x1c0
       do_syscall_64+0x408/0x590
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff880065561100
       which belongs to the cache kmalloc-4096 of size 4096
      The buggy address is located 2052 bytes inside of
       4096-byte region [ffff880065561100, ffff880065562100)
      The buggy address belongs to the page:
      page:ffffea0001955800 count:1 mapcount:0 mapping:ffff88006c402480 index:0x0 compound_mapcount: 0
      flags: 0x4000000000008100(slab|head)
      raw: 4000000000008100 ffffea0001a7c000 0000000200000002 ffff88006c402480
      raw: 0000000000000000 0000000080070007 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff880065561800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff880065561880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      >ffff880065561900: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                         ^
       ffff880065561980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff880065561a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      
      Cc: <stable@vger.kernel.org> # 4.15
      Fixes: 1ee47ab3 ("IB/mlx5: Enable QP creation with a given blue flame index")
      Reported-by: NNoa Osherovich <noaos@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      05f58ceb
    • L
      RDMA/mlx5: Melt consecutive calls to alloc_bfreg() in one call · ffaf58de
      Leon Romanovsky 提交于
      There is no need for three consecutive calls to alloc_bfreg(). It can be
      implemented with one function.
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      ffaf58de
    • R
      rdma/cxgb4: Add support for 64Byte cqes · 65ca8d96
      Raju Rangoju 提交于
      This patch adds support for iw_cxb4 to extend cqes from existing 32Byte
      size to 64Byte.
      
      Also includes adds backward compatibility support (for 32Byte) to work
      with older libraries.
      Signed-off-by: NRaju Rangoju <rajur@chelsio.com>
      Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      65ca8d96
  8. 12 7月, 2018 9 次提交