1. 13 9月, 2012 2 次提交
  2. 15 8月, 2012 1 次提交
  3. 30 7月, 2012 1 次提交
    • S
      IPoIB: Use a private hash table for path lookup in xmit path · b63b70d8
      Shlomo Pongratz 提交于
      Dave Miller <davem@davemloft.net> provided a detailed description of
      why the way IPoIB is using neighbours for its own ipoib_neigh struct
      is buggy:
      
          Any time an ipoib_neigh is changed, a sequence like the following is made:
      
          			spin_lock_irqsave(&priv->lock, flags);
          			/*
          			 * It's safe to call ipoib_put_ah() inside
          			 * priv->lock here, because we know that
          			 * path->ah will always hold one more reference,
          			 * so ipoib_put_ah() will never do more than
          			 * decrement the ref count.
          			 */
          			if (neigh->ah)
          				ipoib_put_ah(neigh->ah);
          			list_del(&neigh->list);
          			ipoib_neigh_free(dev, neigh);
          			spin_unlock_irqrestore(&priv->lock, flags);
          			ipoib_path_lookup(skb, n, dev);
      
          This doesn't work, because you're leaving a stale pointer to the freed up
          ipoib_neigh in the special neigh->ha pointer cookie.  Yes, it even fails
          with all the locking done to protect _changes_ to *ipoib_neigh(n), and
          with the code in ipoib_neigh_free() that NULLs out the pointer.
      
          The core issue is that read side calls to *to_ipoib_neigh(n) are not
          being synchronized at all, they are performed without any locking.  So
          whether we hold the lock or not when making changes to *ipoib_neigh(n)
          you still can have threads see references to freed up ipoib_neigh
          objects.
      
          	cpu 1			cpu 2
          	n = *ipoib_neigh()
          				*ipoib_neigh() = NULL
          				kfree(n)
          	n->foo == OOPS
      
          [..]
      
          Perhaps the ipoib code can have a private path database it manages
          entirely itself, which holds all the necessary information and is
          looked up by some generic key which is available easily at transmit
          time and does not involve generic neighbour entries.
      
      See <http://marc.info/?l=linux-rdma&m=132812793105624&w=2> and
      <http://marc.info/?l=linux-rdma&w=2&r=1&s=allows+references+to+freed+memory&q=b>
      for the full discussion.
      
      This patch aims to solve the race conditions found in the IPoIB driver.
      
      The patch removes the connection between the core networking neighbour
      structure and the ipoib_neigh structure.  In addition to avoiding the
      race described above, it allows us to handle SKBs carrying IP packets
      that don't have any associated neighbour.
      
      We add an ipoib_neigh hash table with N buckets where the key is the
      destination hardware address.  The ipoib_neigh is fetched from the
      hash table and instead of the stashed location in the neighbour
      structure. The hash table uses both RCU and reference counting to
      guarantee that no ipoib_neigh instance is ever deleted while in use.
      
      Fetching the ipoib_neigh structure instance from the hash also makes
      the special code in ipoib_start_xmit that handles remote and local
      bonding failover redundant.
      
      Aged ipoib_neigh instances are deleted by a garbage collection task
      that runs every M seconds and deletes every ipoib_neigh instance that
      was idle for at least 2*M seconds. The deletion is safe since the
      ipoib_neigh instances are protected using RCU and reference count
      mechanisms.
      
      The number of buckets (N) and frequency of running the GC thread (M),
      are taken from the exported arb_tbl.
      Signed-off-by: NShlomo Pongratz <shlomop@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      b63b70d8
  4. 05 7月, 2012 1 次提交
  5. 09 2月, 2012 2 次提交
  6. 06 12月, 2011 2 次提交
  7. 01 12月, 2011 1 次提交
  8. 30 11月, 2011 2 次提交
    • E
      IB: Fix RCU lockdep splats · 580da35a
      Eric Dumazet 提交于
      Commit f2c31e32 ("net: fix NULL dereferences in check_peer_redir()")
      forgot to take care of infiniband uses of dst neighbours.
      
      Many thanks to Marc Aurele who provided a nice bug report and feedback.
      Reported-by: NMarc Aurele La France <tsi@ualberta.ca>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: <stable@kernel.org>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      580da35a
    • M
      IB/ipoib: Prevent hung task or softlockup processing multicast response · 3874397c
      Mike Marciniszyn 提交于
      This following can occur with ipoib when processing a multicast reponse:
      
          BUG: soft lockup - CPU#0 stuck for 67s! [ib_mad1:982]
          Modules linked in: ...
          CPU 0:
          Modules linked in: ...
          Pid: 982, comm: ib_mad1 Not tainted 2.6.32-131.0.15.el6.x86_64 #1 ProLiant DL160 G5
          RIP: 0010:[<ffffffff814ddb27>]  [<ffffffff814ddb27>] _spin_unlock_irqrestore+0x17/0x20
          RSP: 0018:ffff8802119ed860  EFLAGS: 00000246
          0000000000000004 RBX: ffff8802119ed860 RCX: 000000000000a299
          RDX: ffff88021086c700 RSI: 0000000000000246 RDI: 0000000000000246
          RBP: ffffffff8100bc8e R08: ffff880210ac229c R09: 0000000000000000
          R10: ffff88021278aab8 R11: 0000000000000000 R12: ffff8802119ed860
          R13: ffffffff8100be6e R14: 0000000000000001 R15: 0000000000000003
          FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
          CR2: 00000000006d4840 CR3: 0000000209aa5000 CR4: 00000000000406f0
          DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
          DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
          Call Trace:
          [<ffffffffa032c247>] ? ipoib_mcast_send+0x157/0x480 [ib_ipoib]
          [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
          [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
          [<ffffffffa03283d4>] ? ipoib_path_lookup+0x124/0x2d0 [ib_ipoib]
          [<ffffffffa03286fc>] ? ipoib_start_xmit+0x17c/0x430 [ib_ipoib]
          [<ffffffff8141e758>] ? dev_hard_start_xmit+0x2c8/0x3f0
          [<ffffffff81439d0a>] ? sch_direct_xmit+0x15a/0x1c0
          [<ffffffff81423098>] ? dev_queue_xmit+0x388/0x4d0
          [<ffffffffa032d6b7>] ? ipoib_mcast_join_finish+0x2c7/0x510 [ib_ipoib]
          [<ffffffffa032dab8>] ? ipoib_mcast_sendonly_join_complete+0x1b8/0x1f0 [ib_ipoib]
          [<ffffffffa02a0946>] ? mcast_work_handler+0x1a6/0x710 [ib_sa]
          [<ffffffffa015f01e>] ? ib_send_mad+0xfe/0x3c0 [ib_mad]
          [<ffffffffa00f6c93>] ? ib_get_cached_lmc+0xa3/0xb0 [ib_core]
          [<ffffffffa02a0f9b>] ? join_handler+0xeb/0x200 [ib_sa]
          [<ffffffffa029e4fc>] ? ib_sa_mcmember_rec_callback+0x5c/0xa0 [ib_sa]
          [<ffffffffa029e79c>] ? recv_handler+0x3c/0x70 [ib_sa]
          [<ffffffffa01603a4>] ? ib_mad_completion_handler+0x844/0x9d0 [ib_mad]
          [<ffffffffa015fb60>] ? ib_mad_completion_handler+0x0/0x9d0 [ib_mad]
          [<ffffffff81088830>] ? worker_thread+0x170/0x2a0
          [<ffffffff8108e160>] ? autoremove_wake_function+0x0/0x40
          [<ffffffff810886c0>] ? worker_thread+0x0/0x2a0
          [<ffffffff8108ddf6>] ? kthread+0x96/0xa0
          [<ffffffff8100c1ca>] ? child_rip+0xa/0x20
      
      Coinciding with stack trace is the following message:
      
          ib0: ib_address_create failed
      
      The code below in ipoib_mcast_join_finish() will note the above
      failure in the address handle but otherwise continue:
      
                      ah = ipoib_create_ah(dev, priv->pd, &av);
                      if (!ah) {
                              ipoib_warn(priv, "ib_address_create failed\n");
                      } else {
      
      The while loop at the bottom of ipoib_mcast_join_finish() will attempt
      to send queued multicast packets in mcast->pkt_queue and eventually
      end up in ipoib_mcast_send():
      
              if (!mcast->ah) {
                      if (skb_queue_len(&mcast->pkt_queue) < IPOIB_MAX_MCAST_QUEUE)
                              skb_queue_tail(&mcast->pkt_queue, skb);
                      else {
                              ++dev->stats.tx_dropped;
                              dev_kfree_skb_any(skb);
                      }
      
      My read is that the code will requeue the packet and return to the
      ipoib_mcast_join_finish() while loop and the stage is set for the
      "hung" task diagnostic as the while loop never sees a non-NULL ah, and
      will do nothing to resolve.
      
      There are GFP_ATOMIC allocates in the provider routines, so this is
      possible and should be dealt with.
      
      The test that induced the failure is associated with a host SM on the
      same server during a shutdown.
      
      This patch causes ipoib_mcast_join_finish() to exit with an error
      which will flush the queued mcast packets.  Nothing is done to unwind
      the QP attached state so that subsequent sends from above will retry
      the join.
      Reviewed-by: NRam Vepa <ram.vepa@qlogic.com>
      Reviewed-by: NGary Leshner <gary.leshner@qlogic.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@qlogic.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      3874397c
  9. 17 11月, 2011 1 次提交
  10. 18 8月, 2011 1 次提交
  11. 17 8月, 2011 1 次提交
  12. 18 7月, 2011 1 次提交
  13. 20 4月, 2011 1 次提交
  14. 13 1月, 2011 1 次提交
  15. 11 1月, 2011 2 次提交
  16. 27 10月, 2010 1 次提交
  17. 24 10月, 2010 1 次提交
  18. 14 10月, 2010 1 次提交
  19. 07 7月, 2010 1 次提交
  20. 10 12月, 2009 1 次提交
    • D
      IPoIB: Clear ipoib_neigh.dgid in ipoib_neigh_alloc() · 0cd4d0fd
      David J. Wilder 提交于
      IPoIB can miss a change in destination GID under some conditions.  The
      problem is caused when ipoib_neigh->dgid contains a stale address.
      The fix is to set ipoib_neigh->dgid to zero in ipoib_neigh_alloc().
      
      This can happen when a system using bonding on its IPoIB interfaces
      has switched its active interface from interface A to B and back to A.
      The system that fails over will not correctly processes the 2nd
      address change, as described below.
      
      When an address has changed neighbor->ha is updated with the new
      address.  Each neighbor has an associated ipoib_neigh.
      ipoib_neigh->dgid also holds a copy of the remote node's hardware
      address.  When an address changes neighbor->ha is updated by the
      network layer (arp code) with the new address.  IPoIB detects this
      change in ipoib_start_xmit() by comparing neighbor->ha with
      ipoib_neigh->dgid.  The bug is that ipoib_neigh->dgid may already
      contain the new address (A) thus the change from B to A is missed by
      ipoib.  Here is the sequence of events:
      
          ipoib_neigh->dgid = A  and  neighbor->ha = A
      
      The address is switched to B (the first switch)
      
          neighbor->ha = B
      
      The change is seen in ipoib_start_xmit() -- neighbor->ha !=
      ipoib_neigh->dgid so ipoib_neigh is released, and a new one is
      allocated.
      
      The allocator may return the same chunk of memory that was just
      released, therefore ipoib_neigh->dgid still contains A at this point.
      
      ipoib_neigh->dgid should be updated in neigh_add_path(), but if the
      following conditions are true dgid is not updated:
      
              1) __path_find() returns a path
              2) path->ah is NULL
      
      The remote system now switches from address B to A, neighbor->ha is
      updated to A.
      
      Now we have again : ipoib_neigh->dgid = A  and  neighbor->ha = A
      
      Since the addresses are the same ipoib won't process the change in
      address.  Fix this by zeroing out the dgid field when allocating a new
      struct ipoib_neigh.
      Signed-off-by: NDavid Wilder <dwilder@us.ibm.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      0cd4d0fd
  21. 06 9月, 2009 1 次提交
  22. 03 6月, 2009 1 次提交
  23. 31 5月, 2009 1 次提交
  24. 21 4月, 2009 1 次提交
  25. 22 3月, 2009 1 次提交
  26. 18 2月, 2009 1 次提交
  27. 15 1月, 2009 1 次提交
  28. 10 1月, 2009 1 次提交
    • Y
      IPoIB: Fix loss of connectivity after bonding failover on both sides · a50df398
      Yossi Etigin 提交于
      Fix bonding failover in the case both peers failover and the
      gratuitous ARP is lost.  In that case, the sender side will create an
      ipoib_neigh and issue a path request with the old GID first.  When
      skb->dst->neighbour->ha changes due to ARP refresh, this ipoib_neigh
      will not be added to the path->list of the path of the new GID,
      because the ipoib_neigh already exists.  It will not have an AH
      either, because of sender-side failover.  Therefore, it will not get
      an AH when the path is resolved.
      
      The solution here is to compare GIDs in ipoib_start_xmit() even if
      neigh->ah is invalid.  Comparing with an uninitialized value of
      neigh->dgid should be fine, since a spurious match is harmless (and
      astronomically unlikely too).
      Signed-off-by: NMoni Shoua <monis@voltaire.com>
      Signed-off-by: NYossi Etigin <yosefe@voltaire.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      a50df398
  29. 13 11月, 2008 3 次提交
  30. 30 10月, 2008 1 次提交
  31. 29 10月, 2008 1 次提交
  32. 23 10月, 2008 1 次提交
  33. 01 10月, 2008 1 次提交
    • R
      IPoIB: Use netif_tx_lock() and get rid of private tx_lock, LLTX · 943c246e
      Roland Dreier 提交于
      Currently, IPoIB is an LLTX driver that uses its own IRQ-disabling
      tx_lock.  Not only do we want to get rid of LLTX, this actually causes
      problems because of the skb_orphan() done with this tx_lock held: some
      skb destructors expect to be run with interrupts enabled.
      
      The simplest fix for this is to get rid of the driver-private tx_lock
      and stop using LLTX.  We kill off priv->tx_lock and use
      netif_tx_lock[_bh]() instead; the patch to do this is a tiny bit
      tricky because we need to update places that take priv->lock inside
      the tx_lock to disable IRQs, rather than relying on tx_lock having
      already disabled IRQs.
      
      Also, there are a couple of places where we need to disable BHs to
      make sure we have a consistent context to call netif_tx_lock() (since
      we no longer can use _irqsave() variants), and we also have to change
      ipoib_send_comp_handler() to call drain_tx_cq() through a timer rather
      than directly, because ipoib_send_comp_handler() runs in interrupt
      context and drain_tx_cq() must run in BH context so it can call
      netif_tx_lock().
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      943c246e