1. 10 10月, 2017 1 次提交
  2. 24 7月, 2017 1 次提交
  3. 02 6月, 2017 2 次提交
    • M
      RDMA/SA: Fix kernel panic in CMA request handler flow · d3957b86
      Majd Dibbiny 提交于
      Commit 9fdca4da (IB/SA: Split struct sa_path_rec based on IB and
      ROCE specific fields) moved the service_id to be specific attribute
      for IB and OPA SA Path Record, and thus wasn't assigned for RoCE.
      
      This caused to the following kernel panic in the CMA request handler flow:
      
      [   27.074594] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      [   27.074731] IP: __radix_tree_lookup+0x1d/0xe0
      ...
      [   27.075356] Workqueue: ib_cm cm_work_handler [ib_cm]
      [   27.075401] task: ffff88022e3b8000 task.stack: ffffc90001298000
      [   27.075449] RIP: 0010:__radix_tree_lookup+0x1d/0xe0
      ...
      [   27.075979] Call Trace:
      [   27.076015]  radix_tree_lookup+0xd/0x10
      [   27.076055]  cma_ps_find+0x59/0x70 [rdma_cm]
      [   27.076097]  cma_id_from_event+0xd2/0x470 [rdma_cm]
      [   27.076144]  ? ib_init_ah_from_path+0x39a/0x590 [ib_core]
      [   27.076193]  cma_req_handler+0x25/0x480 [rdma_cm]
      [   27.076237]  cm_process_work+0x25/0x120 [ib_cm]
      [   27.076280]  ? cm_get_bth_pkey.isra.62+0x3c/0xa0 [ib_cm]
      [   27.076350]  cm_req_handler+0xb03/0xd40 [ib_cm]
      [   27.076430]  ? sched_clock_cpu+0x11/0xb0
      [   27.076478]  cm_work_handler+0x194/0x1588 [ib_cm]
      [   27.076525]  process_one_work+0x160/0x410
      [   27.076565]  worker_thread+0x137/0x4a0
      [   27.076614]  kthread+0x112/0x150
      [   27.076684]  ? max_active_store+0x60/0x60
      [   27.077642]  ? kthread_park+0x90/0x90
      [   27.078530]  ret_from_fork+0x2c/0x40
      
      This patch moves it back to the common SA Path Record structure
      and removes the redundant setter and getter.
      
      Tested on Connect-IB and Connect-X4 in Infiniband and RoCE respectively.
      
      Fixes: 9fdca4da (IB/SA: Split struct sa_path_rec based on IB ands
      	ROCE specific fields)
      Signed-off-by: NMajd Dibbiny <majd@mellanox.com>
      Reviewed-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      d3957b86
    • I
      RDMA/srp: Fix NULL deref at srp_destroy_qp() · 95c2ef50
      Israel Rukshin 提交于
      If srp_init_qp() fails at srp_create_ch_ib() then ch->send_cq
      may be NULL.
      Calling directly to ib_destroy_qp() is sufficient because
      no work requests were posted on the created qp.
      
      Fixes: 9294000d ("IB/srp: Drain the send queue before destroying a QP")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
      Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: Bart van Assche <bart.vanassche@sandisk.com>--
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      95c2ef50
  4. 02 5月, 2017 4 次提交
  5. 19 2月, 2017 7 次提交
    • B
      IB/srp: Drain the send queue before destroying a QP · 9294000d
      Bart Van Assche 提交于
      A quote from the IB spec:
      
      However, if the Consumer does not wait for the Affiliated Asynchronous
      Last WQE Reached Event, then WQE and Data Segment leakage may occur.
      Therefore, it is good programming practice to tear down a QP that is
      associated with an SRQ by using the following process:
      * Put the QP in the Error State;
      * wait for the Affiliated Asynchronous Last WQE Reached Event;
      * either:
        * drain the CQ by invoking the Poll CQ verb and either wait for CQ
          to be empty or the number of Poll CQ operations has exceeded CQ
          capacity size; or
        * post another WR that completes on the same CQ and wait for this WR to return as a WC;
      * and then invoke a Destroy QP or Reset QP.
      Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Israel Rukshin <israelr@mellanox.com>
      Cc: Max Gurtovoy <maxg@mellanox.com>
      Cc: Laurence Oberman <loberman@redhat.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      9294000d
    • B
      IB/srp: Improve an error path · b02c1536
      Bart Van Assche 提交于
      Avoid that the following message is printed if login fails:
      
      scsi host0: ib_srp: Sending CM DREQ failed
      Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Israel Rukshin <israelr@mellanox.com>
      Cc: Max Gurtovoy <maxg@mellanox.com>
      Cc: Laurence Oberman <loberman@redhat.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      b02c1536
    • B
      IB/srp: Make a diagnostic message more informative · a7139ca8
      Bart Van Assche 提交于
      Report the destination port GID if connecting fails.
      Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Israel Rukshin <israelr@mellanox.com>
      Cc: Max Gurtovoy <maxg@mellanox.com>
      Cc: Laurence Oberman <loberman@redhat.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      a7139ca8
    • B
      IB/srp: Document locking conventions · 93c76dbb
      Bart Van Assche 提交于
      Use lockdep_assert_held() statements to verify at run-time
      whether the proper locks are held.
      Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Israel Rukshin <israelr@mellanox.com>
      Cc: Max Gurtovoy <maxg@mellanox.com>
      Cc: Laurence Oberman <loberman@redhat.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      93c76dbb
    • B
      IB/srp: Fix race conditions related to task management · 0a6fdbde
      Bart Van Assche 提交于
      Avoid that srp_process_rsp() overwrites the status information
      in ch if the SRP target response timed out and processing of
      another task management function has already started. Avoid that
      issuing multiple task management functions concurrently triggers
      list corruption. This patch prevents that the following stack
      trace appears in the system log:
      
      WARNING: CPU: 8 PID: 9269 at lib/list_debug.c:52 __list_del_entry_valid+0xbc/0xc0
      list_del corruption. prev->next should be ffffc90004bb7b00, but was ffff8804052ecc68
      CPU: 8 PID: 9269 Comm: sg_reset Tainted: G        W       4.10.0-rc7-dbg+ #3
      Call Trace:
       dump_stack+0x68/0x93
       __warn+0xc6/0xe0
       warn_slowpath_fmt+0x4a/0x50
       __list_del_entry_valid+0xbc/0xc0
       wait_for_completion_timeout+0x12e/0x170
       srp_send_tsk_mgmt+0x1ef/0x2d0 [ib_srp]
       srp_reset_device+0x5b/0x110 [ib_srp]
       scsi_ioctl_reset+0x1c7/0x290
       scsi_ioctl+0x12a/0x420
       sd_ioctl+0x9d/0x100
       blkdev_ioctl+0x51e/0x9f0
       block_ioctl+0x38/0x40
       do_vfs_ioctl+0x8f/0x700
       SyS_ioctl+0x3c/0x70
       entry_SYSCALL_64_fastpath+0x18/0xad
      Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Israel Rukshin <israelr@mellanox.com>
      Cc: Max Gurtovoy <maxg@mellanox.com>
      Cc: Laurence Oberman <loberman@redhat.com>
      Cc: Steve Feeley <Steve.Feeley@sandisk.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      0a6fdbde
    • B
      IB/srp: Avoid that duplicate responses trigger a kernel bug · 6cb72bc1
      Bart Van Assche 提交于
      After srp_process_rsp() returns there is a short time during which
      the scsi_host_find_tag() call will return a pointer to the SCSI
      command that is being completed. If during that time a duplicate
      response is received, avoid that the following call stack appears:
      
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: srp_recv_done+0x450/0x6b0 [ib_srp]
      Oops: 0000 [#1] SMP
      CPU: 10 PID: 0 Comm: swapper/10 Not tainted 4.10.0-rc7-dbg+ #1
      Call Trace:
       <IRQ>
       __ib_process_cq+0x4b/0xd0 [ib_core]
       ib_poll_handler+0x1d/0x70 [ib_core]
       irq_poll_softirq+0xba/0x120
       __do_softirq+0xba/0x4c0
       irq_exit+0xbe/0xd0
       smp_apic_timer_interrupt+0x38/0x50
       apic_timer_interrupt+0x90/0xa0
       </IRQ>
      RIP: srp_recv_done+0x450/0x6b0 [ib_srp] RSP: ffff88046f483e20
      Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Israel Rukshin <israelr@mellanox.com>
      Cc: Max Gurtovoy <maxg@mellanox.com>
      Cc: Laurence Oberman <loberman@redhat.com>
      Cc: Steve Feeley <Steve.Feeley@sandisk.com>
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      6cb72bc1
    • B
      IB/SRP: Avoid using IB_MR_TYPE_SG_GAPS · d6c58dc4
      Bart Van Assche 提交于
      Tests have shown that the following error message is reported when
      using SG-GAPS registration with an mlx5 adapter:
      
      scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff880bd4270eb0
      00000000 00000000 00000000 00000000
      00000000 00000000 00000000 00000000
      00000000 00000000 00000000 00000000
      00000000 0f007806 2500002a ad9fafd1
      scsi host1: ib_srp: reconnect succeeded
      mlx5_0:dump_cqe:262:(pid 7369): dump error cqe
      00000000 00000000 00000000 00000000
      00000000 00000000 00000000 00000000
      00000000 00000000 00000000 00000000
      00000000 0f007806 25000032 00105dd0
      scsi host1: ib_srp: failed FAST REG status memory management operation error (6) for CQE ffff880b92860138
      
      Hence avoid using SG-GAPS memory registrations. Additionally,
      always configure the blk_queue_virt_boundary() to avoid to trigger
      a mapping failure when using adapters that support SG-GAPS (e.g.
      mlx5).
      
      Fixes: commit ad8e66b4 ("IB/srp: fix mr allocation when the device supports sg gaps")
      Fixes: commit 509c5f33 ("IB/srp: Prevent mapping failures")
      Reported-by: NLaurence Oberman <loberman@redhat.com>
      Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Israel Rukshin <israelr@mellanox.com>
      Cc: Max Gurtovoy <maxg@mellanox.com>
      Cc: Leon Romanovsky <leonro@mellanox.com>
      Cc: Mark Bloch <markb@mellanox.com>
      Cc: Yuval Shaia <yuval.shaia@oracle.com>
      Cc: <stable@vger.kernel.org> # 4.7+
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      d6c58dc4
  6. 07 2月, 2017 1 次提交
  7. 25 1月, 2017 3 次提交
  8. 15 12月, 2016 5 次提交
  9. 08 10月, 2016 2 次提交
  10. 24 9月, 2016 2 次提交
  11. 07 6月, 2016 2 次提交
  12. 14 5月, 2016 9 次提交
  13. 13 5月, 2016 1 次提交