1. 29 8月, 2017 1 次提交
  2. 23 8月, 2017 1 次提交
  3. 01 8月, 2017 1 次提交
    • K
      IB/hfi1: Serve the most starved iowait entry first · bcad2913
      Kaike Wan 提交于
      When an egress resource(SDMA descriptors, pio credits) is not available,
      a sending thread will be put on the resource's wait queue. When the
      resource becomes available again, up to a fixed number of sending threads
      can be awakened sequentially and removed from the wait queue, depending
      on the number of waiting threads and the number of free resources. Since
      each awakened sending thread will send as many packets as possible, it
      is highly likely that the first sending thread will consume all the
      egress resources. Subsequently, it will be put back to the end of the wait
      queue. Depending on the timing when the later sending threads wake up,
      they may not be able to send any packet and be again put back to the end
      of the wait queue sequentially, right behind the first sending thread.
      This starvation cycle continues until some sending threads exceed their
      retry limit and consequently fail.
      
      This patch fixes the issue by two simple approaches:
      (1) Any starved sending thread will be put to the head of the wait queue
      while a served sending thread will be put to the tail;
      (2) The most starved sending thread will be served first.
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NKaike Wan <kaike.wan@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      bcad2913
  4. 24 7月, 2017 1 次提交
  5. 18 7月, 2017 1 次提交
  6. 28 6月, 2017 1 次提交
  7. 05 5月, 2017 1 次提交
  8. 02 5月, 2017 1 次提交
  9. 19 2月, 2017 5 次提交
  10. 16 11月, 2016 1 次提交
  11. 02 10月, 2016 2 次提交
  12. 17 9月, 2016 1 次提交
  13. 23 8月, 2016 1 次提交
    • M
      IB/hfi1,IB/qib: Fix qp_stats sleep with rcu read lock held · c62fb260
      Mike Marciniszyn 提交于
      The qp init function does a kzalloc() while holding the RCU
      lock that encounters the following warning with a debug kernel
      when a cat of the qp_stats is done:
      
      [  231.723948] rcu_scheduler_active = 1, debug_locks = 0
      [  231.731939] 3 locks held by cat/11355:
      [  231.736492]  #0:  (debugfs_srcu){......}, at: [<ffffffff813001a5>] debugfs_use_file_start+0x5/0x90
      [  231.746955]  #1:  (&p->lock){+.+.+.}, at: [<ffffffff81289a6c>] seq_read+0x4c/0x3c0
      [  231.755873]  #2:  (rcu_read_lock){......}, at: [<ffffffffa0a0c535>] _qp_stats_seq_start+0x5/0xd0 [hfi1]
      [  231.766862]
      
      The init functions do an implicit next which requires the rcu read lock
      before the kzalloc().
      
      Fix for both drivers is to change the scope of the init function to only
      do the allocation and the initialization of the just allocated iter.
      
      The implict next is moved back into the respective start functions to fix
      the issue.
      Signed-off-by: NIra Weiny <ira.weiny@intel.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      CC: <stable@vger.kernel.org> # 4.6.x-
      Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      c62fb260
  14. 03 8月, 2016 3 次提交
  15. 27 5月, 2016 1 次提交
  16. 26 5月, 2016 2 次提交
  17. 14 5月, 2016 1 次提交
  18. 29 4月, 2016 2 次提交
  19. 18 3月, 2016 3 次提交
    • M
      IB/hfi1: Fix panic in adaptive pio · cef504c5
      Mike Marciniszyn 提交于
      The following panic occurs while running ib_send_bw -a with
      adaptive pio turned on:
      
      [ 8551.143596] BUG: unable to handle kernel NULL pointer dereference at (null)
      [ 8551.152986] IP: [<ffffffffa0902a94>] pio_wait.isra.21+0x34/0x190 [hfi1]
      [ 8551.160926] PGD 80db21067 PUD 80bb45067 PMD 0
      [ 8551.166431] Oops: 0000 [#1] SMP
      [ 8551.276725] task: ffff880816bf15c0 ti: ffff880812ac0000 task.ti: ffff880812ac0000
      [ 8551.285705] RIP: 0010:[<ffffffffa0902a94>] pio_wait.isra.21+0x34/0x190 [hfi1]
      [ 8551.296462] RSP: 0018:ffff880812ac3b58  EFLAGS: 00010282
      [ 8551.303029] RAX: 000000000000002d RBX: 0000000000000000 RCX: 0000000000000800
      [ 8551.311633] RDX: ffff880812ac3c08 RSI: 0000000000000000 RDI: ffff8800b6665e40
      [ 8551.320228] RBP: ffff880812ac3ba0 R08: 0000000000001000 R09: ffffffffa09039a0
      [ 8551.328820] R10: ffff880817a0c000 R11: 0000000000000000 R12: ffff8800b6665e40
      [ 8551.337406] R13: ffff880817a0c000 R14: ffff8800b6665800 R15: ffff8800b6665e40
      [ 8551.355640] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 8551.362674] CR2: 0000000000000000 CR3: 000000080abe8000 CR4: 00000000001406e0
      [ 8551.371262] Stack:
      [ 8551.374119]  ffff880812ac3bf0 ffff88080cf54010 ffff880800000800 ffff880812ac3c08
      [ 8551.383036]  ffff8800b6665800 ffff8800b6665e40 0000000000000202 ffffffffa08e7b80
      [ 8551.391941]  00000001007de431 ffff880812ac3bc8 ffffffffa0904645 ffff8800b6665800
      [ 8551.400859] Call Trace:
      [ 8551.404214]  [<ffffffffa08e7b80>] ? hfi1_del_timers_sync+0x30/0x30 [hfi1]
      [ 8551.412417]  [<ffffffffa0904645>] hfi1_verbs_send+0x215/0x330 [hfi1]
      [ 8551.420154]  [<ffffffffa08ec126>] hfi1_do_send+0x166/0x350 [hfi1]
      [ 8551.427618]  [<ffffffffa055a533>] rvt_post_send+0x533/0x6a0 [rdmavt]
      [ 8551.435367]  [<ffffffffa050760f>] ib_uverbs_post_send+0x30f/0x530 [ib_uverbs]
      [ 8551.443999]  [<ffffffffa0501367>] ib_uverbs_write+0x117/0x380 [ib_uverbs]
      [ 8551.452269]  [<ffffffff815810ab>] ? sock_recvmsg+0x3b/0x50
      [ 8551.459071]  [<ffffffff81581152>] ? sock_read_iter+0x92/0xe0
      [ 8551.466068]  [<ffffffff81212857>] __vfs_write+0x37/0x100
      [ 8551.472692]  [<ffffffff81213532>] ? rw_verify_area+0x52/0xd0
      [ 8551.479682]  [<ffffffff81213782>] vfs_write+0xa2/0x1a0
      [ 8551.486089]  [<ffffffff81003176>] ? do_audit_syscall_entry+0x66/0x70
      [ 8551.493891]  [<ffffffff812146c5>] SyS_write+0x55/0xc0
      [ 8551.500220]  [<ffffffff816ae0ee>] entry_SYSCALL_64_fastpath+0x12/0x71
      [ 8551.531284] RIP  [<ffffffffa0902a94>] pio_wait.isra.21+0x34/0x190 [hfi1]
      [ 8551.539508]  RSP <ffff880812ac3b58>
      [ 8551.544110] CR2: 0000000000000000
      
      The priv s_sendcontext pointer was not setup properly.  Fix with this
      patch by using the s_sendcontext and eliminating its send engine use.
      Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      cef504c5
    • M
      IB/hfi1: Fix issues with qp_stats print · ef6d8c4e
      Mike Marciniszyn 提交于
      The changes are to aid in coorelating trace information
      with QPs between the trace and qp_stats information
      
      Such changes include adds a space after QP and clarifying that the second
      QP is actually the remote QP.
      Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      ef6d8c4e
    • M
      IB/hfi1: Report pid in qp_stats to aid debug · ef086c0d
      Mike Marciniszyn 提交于
      Tracking user/QP ownership is needed to debug issues with
      user ULPs like OpenMPI.
      Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      ef086c0d
  20. 11 3月, 2016 10 次提交