1. 28 2月, 2010 1 次提交
  2. 26 2月, 2010 1 次提交
  3. 25 9月, 2009 1 次提交
    • M
      IPoIB: Don't turn on carrier for a non-active port · 5ee95120
      Moni Shoua 提交于
      Multicast joins can succeed even if the IB port is down.  This happens
      when the SM runs on the same port with the requesting port.  However,
      IPoIB calls netif_carrier_on() when the join of the broadcast group
      succeeds, without caring about the state of the IB port.  The result
      is an IPoIB interface in RUNNING state but without an active IB port
      to support it.
      
      If a bonding interface uses this IPoIB interface as a slave it might
      not detect that this slave is almost useless and failover
      functionality will be damaged.  The fix checks the state of the IB
      port in the carrier_task before calling netif_carrier_on().
      
      Adresses: https://bugs.openfabrics.org/show_bug.cgi?id=1726Signed-off-by: NMoni Shoua <monis@voltaire.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      5ee95120
  4. 06 9月, 2009 2 次提交
  5. 03 6月, 2009 1 次提交
  6. 17 1月, 2009 1 次提交
  7. 13 1月, 2009 1 次提交
  8. 30 10月, 2008 1 次提交
  9. 29 10月, 2008 1 次提交
  10. 01 10月, 2008 1 次提交
    • R
      IPoIB: Use netif_tx_lock() and get rid of private tx_lock, LLTX · 943c246e
      Roland Dreier 提交于
      Currently, IPoIB is an LLTX driver that uses its own IRQ-disabling
      tx_lock.  Not only do we want to get rid of LLTX, this actually causes
      problems because of the skb_orphan() done with this tx_lock held: some
      skb destructors expect to be run with interrupts enabled.
      
      The simplest fix for this is to get rid of the driver-private tx_lock
      and stop using LLTX.  We kill off priv->tx_lock and use
      netif_tx_lock[_bh]() instead; the patch to do this is a tiny bit
      tricky because we need to update places that take priv->lock inside
      the tx_lock to disable IRQs, rather than relying on tx_lock having
      already disabled IRQs.
      
      Also, there are a couple of places where we need to disable BHs to
      make sure we have a consistent context to call netif_tx_lock() (since
      we no longer can use _irqsave() variants), and we also have to change
      ipoib_send_comp_handler() to call drain_tx_cq() through a timer rather
      than directly, because ipoib_send_comp_handler() runs in interrupt
      context and drain_tx_cq() must run in BH context so it can call
      netif_tx_lock().
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      943c246e
  11. 17 9月, 2008 1 次提交
    • Y
      IPoIB: Fix deadlock on RTNL between bcast join comp and ipoib_stop() · e8224e4b
      Yossi Etigin 提交于
      Taking rtnl_lock in ipoib_mcast_join_complete() causes a deadlock with
      ipoib_stop().  We avoid it by scheduling the piece of code that takes
      the lock on ipoib_workqueue instead of executing it directly.  This
      works because we only flush the ipoib_workqueue with the RTNL not held.
      
      The deadlock happens because ipoib_stop() calls ipoib_ib_dev_down()
      which calls ipoib_mcast_dev_flush(), which calls ipoib_mcast_free(),
      which calls ipoib_mcast_leave(). The latter calls
      ib_sa_free_multicast(), and this waits until the multicast completion
      handler finishes.  This handler is ipoib_mcast_join_complete(), which
      waits for the rtnl_lock(), which was already taken by ipoib_stop().
      
      This bug was introduced in commit a77a57a1 ("IPoIB: Fix deadlock on
      RTNL in ipoib_stop()").
      Signed-off-by: NYossi Etigin <yosefe@voltaire.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      e8224e4b
  12. 20 8月, 2008 1 次提交
    • R
      IPoIB: Fix deadlock on RTNL in ipoib_stop() · a77a57a1
      Roland Dreier 提交于
      Commit c8c2afe3 ("IPoIB: Use rtnl lock/unlock when changing device
      flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which
      is run from the ipoib_workqueue.  However, ipoib_stop() (which is run
      inside rtnl_lock()) flushes this workqueue, which leads to a deadlock
      if the join task is pending.
      
      Fix this by simply not flushing the workqueue from ipoib_stop().  It
      turns out that we really don't care about workqueue tasks running
      during or after ipoib_stop(), as long as we make sure to flush the
      workqueue before unregistering a netdev.
      
      This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1114>.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      a77a57a1
  13. 15 7月, 2008 8 次提交
  14. 21 5月, 2008 1 次提交
  15. 24 4月, 2008 1 次提交
  16. 12 3月, 2008 1 次提交
    • O
      IPoIB: Don't drop multicast sends when they can be queued · b3e2749b
      Or Gerlitz 提交于
      When set_multicast_list() is called the multicast task is restarted
      and the IPOIB_MCAST_STARTED bit is cleared.  As a result for some
      window of time, multicast packets are not transmitted nor queued but
      rather dropped by ipoib_mcast_send().  These dropped packets are
      painful in two cases:
      
       - bonding fail-over which both calls set_multicast_list() on the new
         active slave and sends Gratuitous ARP through that slave.
      
       - IP_DROP_MEMBERSHIP code which both calls set_multicast_list() on the
         device and issues IGMP leave.
      
      In both these cases, depending on the scheduling of the IPoIB
      multicast task, the packets would be dropped.  As a result, in the
      bonding case, the failover would not be detected by the peers until
      their neighbour is renewed the neighbour (which takes a few tens of
      seconds).  In the IGMP case, the IP router doesn't get an IGMP leave
      and would only learn on that from further probes on the group (also a
      delay of at least a few tens of seconds).
      
      Fix this by allowing transmission (or queuing) depending on the
      IPOIB_FLAG_OPER_UP flag instead of the IPOIB_MCAST_STARTED flag.
      Signed-off-by: NOlga Shern <olgas@voltaire.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@voltaire.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      b3e2749b
  17. 26 1月, 2008 2 次提交
  18. 16 10月, 2007 1 次提交
    • M
      IB/ipoib: Bound the net device to the ipoib_neigh structue · 732a2170
      Moni Shoua 提交于
      IPoIB uses a two layer neighboring scheme, such that for each struct neighbour
      whose device is an ipoib one, there is a struct ipoib_neigh buddy which is
      created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour)
      call.
      
      When using the bonding driver, neighbours are created by the net stack on behalf
      of the bonding (master) device. On the tx flow the bonding code gets an skb such
      that skb->dev points to the master device, it changes this skb to point on the
      slave device and calls the slave hard_start_xmit function.
      
      Under this scheme, ipoib_neigh_destructor assumption that for each struct
      neighbour it gets, n->dev is an ipoib device and hence netdev_priv(n->dev)
      can be casted to struct ipoib_dev_priv is buggy.
      
      To fix it, this patch adds a dev field to struct ipoib_neigh which is used
      instead of the struct neighbour dev one, when n->dev->flags has the
      IFF_MASTER bit set.
      
      Signed-off-by: Moni Shoua <monis at voltaire.com>
      Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
      Acked-by: NRoland Dreier <rdreier@cisco.com>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      732a2170
  19. 11 10月, 2007 2 次提交
    • R
      [IPoIB]: Convert to netdevice internal stats · de903512
      Roland Dreier 提交于
      Use the stats member of struct netdevice in IPoIB, so we can save
      memory by deleting the stats member of struct ipoib_dev_priv, and save
      code by deleting ipoib_get_stats().
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de903512
    • O
      IPoIB: Allow setting policy to ignore multicast groups · 335a64a5
      Or Gerlitz 提交于
      The kernel IB stack allows (through the RDMA CM) userspace
      applications to join and use multicast groups from the IPoIB MGID
      range.  This allows multicast traffic to be handled directly from
      userspace QPs, without going through the kernel stack, which gives
      better performance for some applications.
      
      However, to fully interoperate with IP multicast, such userspace
      applications need to participate in IGMP reports and queries, or else
      routers may not forward the multicast traffic to the system where the
      application is running.  The simplest way to do this is to share the
      kernel IGMP implementation by using the IP_ADD_MEMBERSHIP option to
      join multicast groups that are being handled directly in userspace.
      
      However, in such cases, the actual multicast traffic should not also
      be handled by the IPoIB interface, because that would burn resources
      handling multicast packets that will just be discarded in the kernel.
      
      To handle this, this patch adds lookup on the database used for IB
      multicast group reference counting when IPoIB is joining multicast
      groups, and if a multicast group is already handled by user space,
      then the IPoIB kernel driver ignores the group.  This is controlled by
      a per-interface policy flag.  When the flag is set, IPoIB will not
      join and attach its QP to a multicast group which already has an entry
      in the database; when the flag is cleared, IPoIB will behave as before
      this change.
      
      For each IPoIB interface, the /sys/class/net/$intf/umcast attribute
      controls the policy flag.  The default value is off/0.
      Signed-off-by: NOr Gerlitz <ogerlitz@voltaire.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      335a64a5
  20. 10 10月, 2007 1 次提交
  21. 22 5月, 2007 1 次提交
  22. 23 3月, 2007 1 次提交
  23. 09 3月, 2007 1 次提交
  24. 22 2月, 2007 1 次提交
  25. 17 2月, 2007 1 次提交
    • S
      IB/sa: Track multicast join/leave requests · faec2f7b
      Sean Hefty 提交于
      The IB SA tracks multicast join/leave requests on a per port basis and
      does not do any reference counting: if two users of the same port join
      the same group, and one leaves that group, then the SA will remove the
      port from the group even though there is one user who wants to stay a
      member left.  Therefore, in order to support multiple users of the
      same multicast group from the same port, we need to perform reference
      counting locally.
      
      To do this, add an multicast submodule to ib_sa to perform reference
      counting of multicast join/leave operations.  Modify ib_ipoib (the
      only in-kernel user of multicast) to use the new interface.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      faec2f7b
  26. 11 2月, 2007 1 次提交
    • M
      IPoIB: Connected mode experimental support · 839fcaba
      Michael S. Tsirkin 提交于
      The following patch adds experimental support for IPoIB connected
      mode, as defined by the draft from the IETF ipoib working group.  The
      idea is to increase performance by increasing the MTU from the maximum
      of 2K (theoretically 4K) supported by IPoIB on top of UD.  With this
      code, I'm able to get 800MByte/sec or more with netperf without
      options on a Mellanox 4x back-to-back DDR system.
      
      Some notes on code:
      1. SRQ is used for scalability to large cluster sizes
      2. Only RC connections are used (UC does not support SRQ now)
      3. Retry count is set to 0 since spec draft warns against retries
      4. Each connection is used for data transfers in only 1 direction, so
         each connection is either active(TX) or passive (RX).  2 sides that
         want to communicate create 2 connections.
      5. Each active (TX) connection has a separate CQ for send completions -
         this keeps the code simple without CQ resize and other tricks
      6. To detect stale passive side connections (where the remote side is
         down), we keep an LRU list of passive connections (updated once per
         second per connection) and destroy a connection after it has been
         unused for several seconds. The LRU rule makes it possible to avoid
         scanning connections that have recently been active.
      Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      839fcaba
  27. 30 11月, 2006 1 次提交
  28. 22 11月, 2006 1 次提交
  29. 23 9月, 2006 2 次提交
    • R
      IPoIB: Create MCGs with all attributes required by RFC · d0df6d6d
      Roland Dreier 提交于
      RFC 4391 ("Transmission of IP over InfiniBand (IPoIB)") says:
      
        If the IB multicast group does not already exist, one must be
        created first with the IPoIB link MTU.  The MGID MUST use the same
        P_Key, Q_Key, SL, MTU, and HopLimit as those used in the
        broadcast-GID.  The rest of attributes SHOULD follow the values used
        in the broadcast-GID as well.
      
      However, the current IPoIB driver is only setting the attributes
      required by the InfiniBand spec to create a multicast group, so in
      particular the MTU and HopLimit are not being set.  Add these
      attributes when creating MCGs, and also set the Rate attribute, since
      IPoIB pays attention to that attribute as well.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      d0df6d6d
    • M
      IB/sa: Require SA registration · c1a0b23b
      Michael S. Tsirkin 提交于
      Require users to register with SA module, to prevent the sa_query
      module text from going away while an SA query callback is still
      running.  Update all in-tree users for the new interface.
      Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il>
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      c1a0b23b