1. 16 9月, 2019 4 次提交
    • B
      RDMA/srp: Document srp_parse_in() arguments · 95416047
      Bart Van Assche 提交于
      [ Upstream commit e37df2d5b569390e3b80ebed9a73fd5b9dcda010 ]
      
      This patch avoids that a warning is reported when building with W=1.
      
      Cc: Sergey Gorenko <sergeygo@mellanox.com>
      Cc: Max Gurtovoy <maxg@mellanox.com>
      Cc: Laurence Oberman <loberman@redhat.com>
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      95416047
    • M
      IB/hfi1: Avoid hardlockup with flushlist_lock · 90ca4912
      Mike Marciniszyn 提交于
      [ Upstream commit cf131a81967583ae737df6383a0893b9fee75b4e ]
      
      Heavy contention of the sde flushlist_lock can cause hard lockups at
      extreme scale when the flushing logic is under stress.
      
      Mitigate by replacing the item at a time copy to the local list with
      an O(1) list_splice_init() and using the high priority work queue to
      do the flushes.
      
      Fixes: 77241056 ("IB/hfi1: add driver files")
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      90ca4912
    • M
      IB/mlx5: Reset access mask when looping inside page fault handler · feced628
      Moni Shoua 提交于
      [ Upstream commit 1abe186ed8a6593069bc122da55fc684383fdc1c ]
      
      If page-fault handler spans multiple MRs then the access mask needs to
      be reset before each MR handling or otherwise write access will be
      granted to mapped pages instead of read-only.
      
      Cc: <stable@vger.kernel.org> # 3.19
      Fixes: 7bdf65d4 ("IB/mlx5: Handle page faults")
      Reported-by: NJerome Glisse <jglisse@redhat.com>
      Signed-off-by: NMoni Shoua <monis@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      feced628
    • Y
      IB/uverbs: Fix OOPs upon device disassociation · f0e28655
      Yishai Hadas 提交于
      [ Upstream commit 425784aa5b029eeb80498c73a68f62c3ad1d3b3f ]
      
      The async_file might be freed before the disassociation has been ended,
      causing qp shutdown to use after free on it.
      
      Since uverbs_destroy_ufile_hw is not a fence, it returns if a
      disassociation is ongoing in another thread. It has to be written this way
      to avoid deadlock. However this means that the ufile FD close cannot
      destroy anything that may still be used by an active kref, such as the the
      async_file.
      
      To fix that move the kref_put() to be in ib_uverbs_release_file().
      
       BUG: unable to handle kernel paging request at ffffffffba682787
       PGD bc80e067 P4D bc80e067 PUD bc80f063 PMD 1313df163 PTE 80000000bc682061
       Oops: 0003 [#1] SMP PTI
       CPU: 1 PID: 32410 Comm: bash Tainted: G           OE 4.20.0-rc6+ #3
       Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
       RIP: 0010:__pv_queued_spin_lock_slowpath+0x1b3/0x2a0
       Code: 98 83 e2 60 49 89 df 48 8b 04 c5 80 18 72 ba 48 8d
      		ba 80 32 02 00 ba 00 80 00 00 4c 8d 65 14 41 bd 01 00 00 00 48 01 c7 85
      		d2 <48> 89 2f 48 89 fb 74 14 8b 45 08 85 c0 75 42 84 d2 74 6b f3 90 83
       RSP: 0018:ffffc1bbc064fb58 EFLAGS: 00010006
       RAX: ffffffffba65f4e7 RBX: ffff9f209c656c00 RCX: 0000000000000001
       RDX: 0000000000008000 RSI: 0000000000000000 RDI: ffffffffba682787
       RBP: ffff9f217bb23280 R08: 0000000000000001 R09: 0000000000000000
       R10: ffff9f209d2c7800 R11: ffffffffffffffe8 R12: ffff9f217bb23294
       R13: 0000000000000001 R14: 0000000000000000 R15: ffff9f209c656c00
       FS:  00007fac55aad740(0000) GS:ffff9f217bb00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: ffffffffba682787 CR3: 000000012f8e0000 CR4: 00000000000006e0
       Call Trace:
        _raw_spin_lock_irq+0x27/0x30
        ib_uverbs_release_uevent+0x1e/0xa0 [ib_uverbs]
        uverbs_free_qp+0x7e/0x90 [ib_uverbs]
        destroy_hw_idr_uobject+0x1c/0x50 [ib_uverbs]
        uverbs_destroy_uobject+0x2e/0x180 [ib_uverbs]
        __uverbs_cleanup_ufile+0x73/0x90 [ib_uverbs]
        uverbs_destroy_ufile_hw+0x5d/0x120 [ib_uverbs]
        ib_uverbs_remove_one+0xea/0x240 [ib_uverbs]
        ib_unregister_device+0xfb/0x200 [ib_core]
        mlx5_ib_remove+0x51/0xe0 [mlx5_ib]
        mlx5_remove_device+0xc1/0xd0 [mlx5_core]
        mlx5_unregister_device+0x3d/0xb0 [mlx5_core]
        remove_one+0x2a/0x90 [mlx5_core]
        pci_device_remove+0x3b/0xc0
        device_release_driver_internal+0x16d/0x240
        unbind_store+0xb2/0x100
        kernfs_fop_write+0x102/0x180
        __vfs_write+0x36/0x1a0
        ? __alloc_fd+0xa9/0x170
        ? set_close_on_exec+0x49/0x70
        vfs_write+0xad/0x1a0
        ksys_write+0x52/0xc0
        do_syscall_64+0x5b/0x180
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7fac551aac60
      
      Cc: <stable@vger.kernel.org> # 4.2
      Fixes: 036b1063 ("IB/uverbs: Enable device removal when there are active user space applications")
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      f0e28655
  2. 10 9月, 2019 3 次提交
  3. 25 8月, 2019 3 次提交
    • J
      IB/mad: Fix use-after-free in ib mad completion handling · b4f0fee7
      Jack Morgenstein 提交于
      [ Upstream commit 770b7d96cfff6a8bf6c9f261ba6f135dc9edf484 ]
      
      We encountered a use-after-free bug when unloading the driver:
      
      [ 3562.116059] BUG: KASAN: use-after-free in ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
      [ 3562.117233] Read of size 4 at addr ffff8882ca5aa868 by task kworker/u13:2/23862
      [ 3562.118385]
      [ 3562.119519] CPU: 2 PID: 23862 Comm: kworker/u13:2 Tainted: G           OE     5.1.0-for-upstream-dbg-2019-05-19_16-44-30-13 #1
      [ 3562.121806] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014
      [ 3562.123075] Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core]
      [ 3562.124383] Call Trace:
      [ 3562.125640]  dump_stack+0x9a/0xeb
      [ 3562.126911]  print_address_description+0xe3/0x2e0
      [ 3562.128223]  ? ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
      [ 3562.129545]  __kasan_report+0x15c/0x1df
      [ 3562.130866]  ? ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
      [ 3562.132174]  kasan_report+0xe/0x20
      [ 3562.133514]  ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
      [ 3562.134835]  ? find_mad_agent+0xa00/0xa00 [ib_core]
      [ 3562.136158]  ? qlist_free_all+0x51/0xb0
      [ 3562.137498]  ? mlx4_ib_sqp_comp_worker+0x1970/0x1970 [mlx4_ib]
      [ 3562.138833]  ? quarantine_reduce+0x1fa/0x270
      [ 3562.140171]  ? kasan_unpoison_shadow+0x30/0x40
      [ 3562.141522]  ib_mad_recv_done+0xdf6/0x3000 [ib_core]
      [ 3562.142880]  ? _raw_spin_unlock_irqrestore+0x46/0x70
      [ 3562.144277]  ? ib_mad_send_done+0x1810/0x1810 [ib_core]
      [ 3562.145649]  ? mlx4_ib_destroy_cq+0x2a0/0x2a0 [mlx4_ib]
      [ 3562.147008]  ? _raw_spin_unlock_irqrestore+0x46/0x70
      [ 3562.148380]  ? debug_object_deactivate+0x2b9/0x4a0
      [ 3562.149814]  __ib_process_cq+0xe2/0x1d0 [ib_core]
      [ 3562.151195]  ib_cq_poll_work+0x45/0xf0 [ib_core]
      [ 3562.152577]  process_one_work+0x90c/0x1860
      [ 3562.153959]  ? pwq_dec_nr_in_flight+0x320/0x320
      [ 3562.155320]  worker_thread+0x87/0xbb0
      [ 3562.156687]  ? __kthread_parkme+0xb6/0x180
      [ 3562.158058]  ? process_one_work+0x1860/0x1860
      [ 3562.159429]  kthread+0x320/0x3e0
      [ 3562.161391]  ? kthread_park+0x120/0x120
      [ 3562.162744]  ret_from_fork+0x24/0x30
      ...
      [ 3562.187615] Freed by task 31682:
      [ 3562.188602]  save_stack+0x19/0x80
      [ 3562.189586]  __kasan_slab_free+0x11d/0x160
      [ 3562.190571]  kfree+0xf5/0x2f0
      [ 3562.191552]  ib_mad_port_close+0x200/0x380 [ib_core]
      [ 3562.192538]  ib_mad_remove_device+0xf0/0x230 [ib_core]
      [ 3562.193538]  remove_client_context+0xa6/0xe0 [ib_core]
      [ 3562.194514]  disable_device+0x14e/0x260 [ib_core]
      [ 3562.195488]  __ib_unregister_device+0x79/0x150 [ib_core]
      [ 3562.196462]  ib_unregister_device+0x21/0x30 [ib_core]
      [ 3562.197439]  mlx4_ib_remove+0x162/0x690 [mlx4_ib]
      [ 3562.198408]  mlx4_remove_device+0x204/0x2c0 [mlx4_core]
      [ 3562.199381]  mlx4_unregister_interface+0x49/0x1d0 [mlx4_core]
      [ 3562.200356]  mlx4_ib_cleanup+0xc/0x1d [mlx4_ib]
      [ 3562.201329]  __x64_sys_delete_module+0x2d2/0x400
      [ 3562.202288]  do_syscall_64+0x95/0x470
      [ 3562.203277]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The problem was that the MAD PD was deallocated before the MAD CQ.
      There was completion work pending for the CQ when the PD got deallocated.
      When the mad completion handling reached procedure
      ib_mad_post_receive_mads(), we got a use-after-free bug in the following
      line of code in that procedure:
         sg_list.lkey = qp_info->port_priv->pd->local_dma_lkey;
      (the pd pointer in the above line is no longer valid, because the
      pd has been deallocated).
      
      We fix this by allocating the PD before the CQ in procedure
      ib_mad_port_open(), and deallocating the PD after freeing the CQ
      in procedure ib_mad_port_close().
      
      Since the CQ completion work queue is flushed during ib_free_cq(),
      no completions will be pending for that CQ when the PD is later
      deallocated.
      
      Note that freeing the CQ before deallocating the PD is the practice
      in the ULPs.
      
      Fixes: 4be90bc6 ("IB/mad: Remove ib_get_dma_mr calls")
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Link: https://lore.kernel.org/r/20190801121449.24973-1-leon@kernel.orgSigned-off-by: NDoug Ledford <dledford@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b4f0fee7
    • G
      IB/mlx5: Fix MR registration flow to use UMR properly · a0258ff4
      Guy Levi 提交于
      [ Upstream commit e5366d309a772fef264ec85e858f9ea46f939848 ]
      
      Driver shouldn't allow to use UMR to register a MR when
      umr_modify_atomic_disabled is set. Otherwise it will always end up with a
      failure in the post send flow which sets the UMR WQE to modify atomic access
      right.
      
      Fixes: c8d75a98 ("IB/mlx5: Respect new UMR capabilities")
      Signed-off-by: NGuy Levi <guyle@mellanox.com>
      Reviewed-by: NMoni Shoua <monis@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Link: https://lore.kernel.org/r/20190731081929.32559-1-leon@kernel.orgSigned-off-by: NDoug Ledford <dledford@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a0258ff4
    • L
      IB/core: Add mitigation for Spectre V1 · efb742ce
      Luck, Tony 提交于
      [ Upstream commit 61f259821dd3306e49b7d42a3f90fb5a4ff3351b ]
      
      Some processors may mispredict an array bounds check and
      speculatively access memory that they should not. With
      a user supplied array index we like to play things safe
      by masking the value with the array size before it is
      used as an index.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Link: https://lore.kernel.org/r/20190731043957.GA1600@agluck-desk2.amr.corp.intel.comSigned-off-by: NDoug Ledford <dledford@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      efb742ce
  4. 09 8月, 2019 1 次提交
  5. 07 8月, 2019 7 次提交
  6. 31 7月, 2019 4 次提交
  7. 26 7月, 2019 2 次提交
  8. 03 7月, 2019 2 次提交
  9. 25 6月, 2019 6 次提交
  10. 31 5月, 2019 4 次提交
  11. 17 5月, 2019 1 次提交
    • L
      RDMA/hns: Bugfix for mapping user db · fb67c97c
      Lijun Ou 提交于
      [ Upstream commit 2557fabd6e29f349bfa0ac13f38ac98aa5eafc74 ]
      
      When the maximum send wr delivered by the user is zero, the qp does not
      have a sq.
      
      When allocating the sq db buffer to store the user sq pi pointer and map
      it to the kernel mode, max_send_wr is used as the trigger condition, while
      the kernel does not consider the max_send_wr trigger condition when
      mapmping db. It will cause sq record doorbell map fail and create qp fail.
      
      The failed print information as follows:
      
       hns3 0000:7d:00.1: Send cmd: tail - 418, opcode - 0x8504, flag - 0x0011, retval - 0x0000
       hns3 0000:7d:00.1: Send cmd: 0xe59dc000 0x00000000 0x00000000 0x00000000 0x00000116 0x0000ffff
       hns3 0000:7d:00.1: sq record doorbell map failed!
       hns3 0000:7d:00.1: Create RC QP failed
      
      Fixes: 0425e3e6 ("RDMA/hns: Support flush cqe for hip08 in kernel space")
      Signed-off-by: NLijun Ou <oulijun@huawei.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      fb67c97c
  12. 10 5月, 2019 3 次提交