1. 20 12月, 2011 2 次提交
  2. 07 12月, 2011 1 次提交
  3. 30 11月, 2011 2 次提交
    • E
      IB: Fix RCU lockdep splats · 580da35a
      Eric Dumazet 提交于
      Commit f2c31e32 ("net: fix NULL dereferences in check_peer_redir()")
      forgot to take care of infiniband uses of dst neighbours.
      
      Many thanks to Marc Aurele who provided a nice bug report and feedback.
      Reported-by: NMarc Aurele La France <tsi@ualberta.ca>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: <stable@kernel.org>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      580da35a
    • M
      IB/ipoib: Prevent hung task or softlockup processing multicast response · 3874397c
      Mike Marciniszyn 提交于
      This following can occur with ipoib when processing a multicast reponse:
      
          BUG: soft lockup - CPU#0 stuck for 67s! [ib_mad1:982]
          Modules linked in: ...
          CPU 0:
          Modules linked in: ...
          Pid: 982, comm: ib_mad1 Not tainted 2.6.32-131.0.15.el6.x86_64 #1 ProLiant DL160 G5
          RIP: 0010:[<ffffffff814ddb27>]  [<ffffffff814ddb27>] _spin_unlock_irqrestore+0x17/0x20
          RSP: 0018:ffff8802119ed860  EFLAGS: 00000246
          0000000000000004 RBX: ffff8802119ed860 RCX: 000000000000a299
          RDX: ffff88021086c700 RSI: 0000000000000246 RDI: 0000000000000246
          RBP: ffffffff8100bc8e R08: ffff880210ac229c R09: 0000000000000000
          R10: ffff88021278aab8 R11: 0000000000000000 R12: ffff8802119ed860
          R13: ffffffff8100be6e R14: 0000000000000001 R15: 0000000000000003
          FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
          CR2: 00000000006d4840 CR3: 0000000209aa5000 CR4: 00000000000406f0
          DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
          DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
          Call Trace:
          [<ffffffffa032c247>] ? ipoib_mcast_send+0x157/0x480 [ib_ipoib]
          [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
          [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
          [<ffffffffa03283d4>] ? ipoib_path_lookup+0x124/0x2d0 [ib_ipoib]
          [<ffffffffa03286fc>] ? ipoib_start_xmit+0x17c/0x430 [ib_ipoib]
          [<ffffffff8141e758>] ? dev_hard_start_xmit+0x2c8/0x3f0
          [<ffffffff81439d0a>] ? sch_direct_xmit+0x15a/0x1c0
          [<ffffffff81423098>] ? dev_queue_xmit+0x388/0x4d0
          [<ffffffffa032d6b7>] ? ipoib_mcast_join_finish+0x2c7/0x510 [ib_ipoib]
          [<ffffffffa032dab8>] ? ipoib_mcast_sendonly_join_complete+0x1b8/0x1f0 [ib_ipoib]
          [<ffffffffa02a0946>] ? mcast_work_handler+0x1a6/0x710 [ib_sa]
          [<ffffffffa015f01e>] ? ib_send_mad+0xfe/0x3c0 [ib_mad]
          [<ffffffffa00f6c93>] ? ib_get_cached_lmc+0xa3/0xb0 [ib_core]
          [<ffffffffa02a0f9b>] ? join_handler+0xeb/0x200 [ib_sa]
          [<ffffffffa029e4fc>] ? ib_sa_mcmember_rec_callback+0x5c/0xa0 [ib_sa]
          [<ffffffffa029e79c>] ? recv_handler+0x3c/0x70 [ib_sa]
          [<ffffffffa01603a4>] ? ib_mad_completion_handler+0x844/0x9d0 [ib_mad]
          [<ffffffffa015fb60>] ? ib_mad_completion_handler+0x0/0x9d0 [ib_mad]
          [<ffffffff81088830>] ? worker_thread+0x170/0x2a0
          [<ffffffff8108e160>] ? autoremove_wake_function+0x0/0x40
          [<ffffffff810886c0>] ? worker_thread+0x0/0x2a0
          [<ffffffff8108ddf6>] ? kthread+0x96/0xa0
          [<ffffffff8100c1ca>] ? child_rip+0xa/0x20
      
      Coinciding with stack trace is the following message:
      
          ib0: ib_address_create failed
      
      The code below in ipoib_mcast_join_finish() will note the above
      failure in the address handle but otherwise continue:
      
                      ah = ipoib_create_ah(dev, priv->pd, &av);
                      if (!ah) {
                              ipoib_warn(priv, "ib_address_create failed\n");
                      } else {
      
      The while loop at the bottom of ipoib_mcast_join_finish() will attempt
      to send queued multicast packets in mcast->pkt_queue and eventually
      end up in ipoib_mcast_send():
      
              if (!mcast->ah) {
                      if (skb_queue_len(&mcast->pkt_queue) < IPOIB_MAX_MCAST_QUEUE)
                              skb_queue_tail(&mcast->pkt_queue, skb);
                      else {
                              ++dev->stats.tx_dropped;
                              dev_kfree_skb_any(skb);
                      }
      
      My read is that the code will requeue the packet and return to the
      ipoib_mcast_join_finish() while loop and the stage is set for the
      "hung" task diagnostic as the while loop never sees a non-NULL ah, and
      will do nothing to resolve.
      
      There are GFP_ATOMIC allocates in the provider routines, so this is
      possible and should be dealt with.
      
      The test that induced the failure is associated with a host SM on the
      same server during a shutdown.
      
      This patch causes ipoib_mcast_join_finish() to exit with an error
      which will flush the queued mcast packets.  Nothing is done to unwind
      the QP attached state so that subsequent sends from above will retry
      the join.
      Reviewed-by: NRam Vepa <ram.vepa@qlogic.com>
      Reviewed-by: NGary Leshner <gary.leshner@qlogic.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@qlogic.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      3874397c
  4. 29 11月, 2011 3 次提交
  5. 09 11月, 2011 1 次提交
  6. 05 11月, 2011 3 次提交
    • M
      IB/qib: Fix panic in RC error flushing logic · 30ab7e23
      Mike Marciniszyn 提交于
      The following panic can occur when flushing a QP:
      
          RIP: 0010:[<ffffffffa0168e8b>]  [<ffffffffa0168e8b>] qib_send_complete+0x3b/0x190 [ib_qib]
          RSP: 0018:ffff8803cdc6fc90  EFLAGS: 00010046
          RAX: 0000000000000000 RBX: ffff8803d84ba000 RCX: 0000000000000000
          RDX: 0000000000000005 RSI: ffffc90015a53430 RDI: ffff8803d84ba000
          RBP: ffff8803cdc6fce0 R08: ffff8803cdc6fc90 R09: 0000000000000001
          R10: 00000000ffffffff R11: 0000000000000000 R12: ffff8803d84ba0c0
          R13: ffff8803d84ba5cc R14: 0000000000000800 R15: 0000000000000246
          FS:  0000000000000000(0000) GS:ffff880036600000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
          CR2: 0000000000000034 CR3: 00000003e44f9000 CR4: 00000000000406f0
          DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
          DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
          Process qib/0 (pid: 1350, threadinfo ffff8803cdc6e000, task ffff88042728a100)
          Stack:
           53544c5553455201 0000000100000005 0000000000000000 ffff8803d84ba000
           0000000000000000 0000000000000000 0000000000000000 0000000000000000
           0000000000000000 0000000000000001 ffff8803cdc6fd30 ffffffffa0165d7a
          Call Trace:
           [<ffffffffa0165d7a>] qib_make_rc_req+0x36a/0xe80 [ib_qib]
           [<ffffffffa0165a10>] ?  qib_make_rc_req+0x0/0xe80 [ib_qib]
           [<ffffffffa01698b3>] qib_do_send+0xf3/0xb60 [ib_qib]
           [<ffffffff814db757>] ? thread_return+0x4e/0x777
           [<ffffffffa01697c0>] ? qib_do_send+0x0/0xb60 [ib_qib]
           [<ffffffff81088bf0>] worker_thread+0x170/0x2a0
           [<ffffffff8108e530>] ?  autoremove_wake_function+0x0/0x40
           [<ffffffff81088a80>] ? worker_thread+0x0/0x2a0
           [<ffffffff8108e1c6>] kthread+0x96/0xa0
           [<ffffffff8100c1ca>] child_rip+0xa/0x20
           [<ffffffff8108e130>] ? kthread+0x0/0xa0
           [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
          RIP  [<ffffffffa0168e8b>] qib_send_complete+0x3b/0x190 [ib_qib]
      
      The RC error state flush logic in qib_make_rc_req() could return all
      of the acked wqes and potentially have emptied the queue.  It would
      then unconditionally try return a flush completion via
      qib_send_complete() for an invalid wqe, or worse a valid one that is
      not queued. The panic results when the completion code tries to
      maintain an MR reference count for a NULL MR.
      
      This fix modifies logic to only send one completion per
      qib_make_rc_req() call and changing the completion status from
      IB_WC_SUCCESS to IB_WC_WR_FLUSH_ERR as the completions progress.
      
      The outer loop will call as many times as necessary to flush the queue.
      Reviewed-by: NRam Vepa <ram.vepa@qlogic.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@qlogic.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      30ab7e23
    • O
      IB/iser: DMA unmap TX bufs used for iSCSI/iSER headers · 52439540
      Or Gerlitz 提交于
      The current driver never does DMA unmapping on these buffers.  Fix that
      by adding DMA unmapping to the task cleanup callback, and DMA mapping to
      the task init function (drop the headers_initialized micro-optimization).
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      52439540
    • O
      IB/iser: Use separate buffers for the login request/response · 2c4ce609
      Or Gerlitz 提交于
      The driver counted on the transactional nature of iSCSI login/text
      flows and used the same buffer for both the request and the response.
      We also went further and did DMA mapping only once, with
      DMA_FROM_DEVICE, which violates the DMA mapping API.  Fix that by
      using different buffers, one for requests and one for responses, and
      use the correct DMA mapping direction for each.
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      2c4ce609
  7. 04 11月, 2011 1 次提交
    • R
      IB/mthca: Fix buddy->num_free allocation size · e4221314
      Roland Dreier 提交于
      The num_free field of mthca_buddy has a type of array of unsigned int
      while it was allocated as an array of pointers.  On 64-bit platforms
      this allocates twice more than required.  Fix this by allocating the
      correct size for the type.
      
      This is the same bug just fixed in mlx4 by Eli Cohen <eli@mellanox.co.il>.
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      e4221314
  8. 01 11月, 2011 11 次提交
  9. 29 10月, 2011 1 次提交
  10. 22 10月, 2011 8 次提交
  11. 19 10月, 2011 2 次提交
  12. 15 10月, 2011 2 次提交
  13. 14 10月, 2011 3 次提交