1. 15 7月, 2008 5 次提交
  2. 21 5月, 2008 1 次提交
  3. 24 4月, 2008 1 次提交
  4. 12 3月, 2008 1 次提交
    • O
      IPoIB: Don't drop multicast sends when they can be queued · b3e2749b
      Or Gerlitz 提交于
      When set_multicast_list() is called the multicast task is restarted
      and the IPOIB_MCAST_STARTED bit is cleared.  As a result for some
      window of time, multicast packets are not transmitted nor queued but
      rather dropped by ipoib_mcast_send().  These dropped packets are
      painful in two cases:
      
       - bonding fail-over which both calls set_multicast_list() on the new
         active slave and sends Gratuitous ARP through that slave.
      
       - IP_DROP_MEMBERSHIP code which both calls set_multicast_list() on the
         device and issues IGMP leave.
      
      In both these cases, depending on the scheduling of the IPoIB
      multicast task, the packets would be dropped.  As a result, in the
      bonding case, the failover would not be detected by the peers until
      their neighbour is renewed the neighbour (which takes a few tens of
      seconds).  In the IGMP case, the IP router doesn't get an IGMP leave
      and would only learn on that from further probes on the group (also a
      delay of at least a few tens of seconds).
      
      Fix this by allowing transmission (or queuing) depending on the
      IPOIB_FLAG_OPER_UP flag instead of the IPOIB_MCAST_STARTED flag.
      Signed-off-by: NOlga Shern <olgas@voltaire.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@voltaire.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      b3e2749b
  5. 26 1月, 2008 2 次提交
  6. 16 10月, 2007 1 次提交
    • M
      IB/ipoib: Bound the net device to the ipoib_neigh structue · 732a2170
      Moni Shoua 提交于
      IPoIB uses a two layer neighboring scheme, such that for each struct neighbour
      whose device is an ipoib one, there is a struct ipoib_neigh buddy which is
      created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour)
      call.
      
      When using the bonding driver, neighbours are created by the net stack on behalf
      of the bonding (master) device. On the tx flow the bonding code gets an skb such
      that skb->dev points to the master device, it changes this skb to point on the
      slave device and calls the slave hard_start_xmit function.
      
      Under this scheme, ipoib_neigh_destructor assumption that for each struct
      neighbour it gets, n->dev is an ipoib device and hence netdev_priv(n->dev)
      can be casted to struct ipoib_dev_priv is buggy.
      
      To fix it, this patch adds a dev field to struct ipoib_neigh which is used
      instead of the struct neighbour dev one, when n->dev->flags has the
      IFF_MASTER bit set.
      
      Signed-off-by: Moni Shoua <monis at voltaire.com>
      Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
      Acked-by: NRoland Dreier <rdreier@cisco.com>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      732a2170
  7. 11 10月, 2007 2 次提交
    • R
      [IPoIB]: Convert to netdevice internal stats · de903512
      Roland Dreier 提交于
      Use the stats member of struct netdevice in IPoIB, so we can save
      memory by deleting the stats member of struct ipoib_dev_priv, and save
      code by deleting ipoib_get_stats().
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de903512
    • O
      IPoIB: Allow setting policy to ignore multicast groups · 335a64a5
      Or Gerlitz 提交于
      The kernel IB stack allows (through the RDMA CM) userspace
      applications to join and use multicast groups from the IPoIB MGID
      range.  This allows multicast traffic to be handled directly from
      userspace QPs, without going through the kernel stack, which gives
      better performance for some applications.
      
      However, to fully interoperate with IP multicast, such userspace
      applications need to participate in IGMP reports and queries, or else
      routers may not forward the multicast traffic to the system where the
      application is running.  The simplest way to do this is to share the
      kernel IGMP implementation by using the IP_ADD_MEMBERSHIP option to
      join multicast groups that are being handled directly in userspace.
      
      However, in such cases, the actual multicast traffic should not also
      be handled by the IPoIB interface, because that would burn resources
      handling multicast packets that will just be discarded in the kernel.
      
      To handle this, this patch adds lookup on the database used for IB
      multicast group reference counting when IPoIB is joining multicast
      groups, and if a multicast group is already handled by user space,
      then the IPoIB kernel driver ignores the group.  This is controlled by
      a per-interface policy flag.  When the flag is set, IPoIB will not
      join and attach its QP to a multicast group which already has an entry
      in the database; when the flag is cleared, IPoIB will behave as before
      this change.
      
      For each IPoIB interface, the /sys/class/net/$intf/umcast attribute
      controls the policy flag.  The default value is off/0.
      Signed-off-by: NOr Gerlitz <ogerlitz@voltaire.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      335a64a5
  8. 10 10月, 2007 1 次提交
  9. 22 5月, 2007 1 次提交
  10. 23 3月, 2007 1 次提交
  11. 09 3月, 2007 1 次提交
  12. 22 2月, 2007 1 次提交
  13. 17 2月, 2007 1 次提交
    • S
      IB/sa: Track multicast join/leave requests · faec2f7b
      Sean Hefty 提交于
      The IB SA tracks multicast join/leave requests on a per port basis and
      does not do any reference counting: if two users of the same port join
      the same group, and one leaves that group, then the SA will remove the
      port from the group even though there is one user who wants to stay a
      member left.  Therefore, in order to support multiple users of the
      same multicast group from the same port, we need to perform reference
      counting locally.
      
      To do this, add an multicast submodule to ib_sa to perform reference
      counting of multicast join/leave operations.  Modify ib_ipoib (the
      only in-kernel user of multicast) to use the new interface.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      faec2f7b
  14. 11 2月, 2007 1 次提交
    • M
      IPoIB: Connected mode experimental support · 839fcaba
      Michael S. Tsirkin 提交于
      The following patch adds experimental support for IPoIB connected
      mode, as defined by the draft from the IETF ipoib working group.  The
      idea is to increase performance by increasing the MTU from the maximum
      of 2K (theoretically 4K) supported by IPoIB on top of UD.  With this
      code, I'm able to get 800MByte/sec or more with netperf without
      options on a Mellanox 4x back-to-back DDR system.
      
      Some notes on code:
      1. SRQ is used for scalability to large cluster sizes
      2. Only RC connections are used (UC does not support SRQ now)
      3. Retry count is set to 0 since spec draft warns against retries
      4. Each connection is used for data transfers in only 1 direction, so
         each connection is either active(TX) or passive (RX).  2 sides that
         want to communicate create 2 connections.
      5. Each active (TX) connection has a separate CQ for send completions -
         this keeps the code simple without CQ resize and other tricks
      6. To detect stale passive side connections (where the remote side is
         down), we keep an LRU list of passive connections (updated once per
         second per connection) and destroy a connection after it has been
         unused for several seconds. The LRU rule makes it possible to avoid
         scanning connections that have recently been active.
      Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      839fcaba
  15. 30 11月, 2006 1 次提交
  16. 22 11月, 2006 1 次提交
  17. 23 9月, 2006 3 次提交
  18. 15 9月, 2006 1 次提交
  19. 25 7月, 2006 1 次提交
  20. 27 6月, 2006 1 次提交
  21. 18 6月, 2006 2 次提交
    • H
      [NET]: Add netif_tx_lock · 932ff279
      Herbert Xu 提交于
      Various drivers use xmit_lock internally to synchronise with their
      transmission routines.  They do so without setting xmit_lock_owner.
      This is fine as long as netpoll is not in use.
      
      With netpoll it is possible for deadlocks to occur if xmit_lock_owner
      isn't set.  This is because if a printk occurs while xmit_lock is held
      and xmit_lock_owner is not set can cause netpoll to attempt to take
      xmit_lock recursively.
      
      While it is possible to resolve this by getting netpoll to use
      trylock, it is suboptimal because netpoll's sole objective is to
      maximise the chance of getting the printk out on the wire.  So
      delaying or dropping the message is to be avoided as much as possible.
      
      So the only alternative is to always set xmit_lock_owner.  The
      following patch does this by introducing the netif_tx_lock family of
      functions that take care of setting/unsetting xmit_lock_owner.
      
      I renamed xmit_lock to _xmit_lock to indicate that it should not be
      used directly.  I didn't provide irq versions of the netif_tx_lock
      functions since xmit_lock is meant to be a BH-disabling lock.
      
      This is pretty much a straight text substitution except for a small
      bug fix in winbond.  It currently uses
      netif_stop_queue/spin_unlock_wait to stop transmission.  This is
      unsafe as an IRQ can potentially wake up the queue.  So it is safer to
      use netif_tx_disable.
      
      The hamradio bits used spin_lock_irq but it is unnecessary as
      xmit_lock must never be taken in an IRQ handler.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      932ff279
    • J
      IPoIB: Fix kernel unaligned access on ia64 · 37c22a77
      Jack Morgenstein 提交于
      Fix misaligned access faults on ia64: never cast a misaligned
      neighbour->ha + 4 pointer to union ib_gid type; pass a void * pointer
      instead.  The memcpy was being optimized to use full word accesses
      because the compiler thought that union ib_gid is always aligned.
      
      The cast in IPOIB_GID_ARG is safe, since it is fixed to access each
      byte separately.
      Signed-off-by: NJack Morgenstein <jackm@mellanox.co.il>
      Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      37c22a77
  22. 11 4月, 2006 2 次提交
    • E
      IPoIB: Wait for join to finish before freeing mcast struct · f2de3b06
      Eli Cohen 提交于
      ipoib_mcast_restart_task() might free an mcast object while a join
      request is still outstanding, leading to an oops when the query
      completes.  Fix this by waiting for query to complete, similar to what
      ipoib_stop_thread() is doing.  The wait for mcast completion code is
      consolidated in wait_for_mcast_join().
      Signed-off-by: NEli Cohen <eli@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      f2de3b06
    • J
      IB: simplify static rate encoding · bf6a9e31
      Jack Morgenstein 提交于
      Push translation of static rate to HCA format into low-level drivers,
      where it belongs.  For static rate encoding, use encoding of rate
      field from IB standard PathRecord, with addition of value 0, for
      backwards compatibility with current usage.  The changes are:
      
       - Add enum ib_rate to midlayer includes.
       - Get rid of static rate translation in IPoIB; just use static rate
         directly from Path and MulticastGroup records.
       - Update mthca driver to translate absolute static rate into the
         format used by hardware.  This also fixes mthca's static rate
         handling for HCAs that are capable of 4X DDR.
      Signed-off-by: NJack Morgenstein <jackm@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      bf6a9e31
  23. 05 4月, 2006 1 次提交
    • M
      IPoIB: Consolidate private neighbour data handling · d2e0655e
      Michael S. Tsirkin 提交于
      Consolidate IPoIB's private neighbour data handling into
      ipoib_neigh_alloc() and ipoib_neigh_free().  This will make it easier
      to keep track of the neighbour structures that IPoIB is handling, and
      is a nice cleanup of the code:
      
      add/remove: 2/1 grow/shrink: 1/8 up/down: 100/-178 (-78)
      function                                     old     new   delta
      ipoib_neigh_alloc                              -      61     +61
      ipoib_neigh_free                               -      36     +36
      ipoib_mcast_join_finish                     1288    1291      +3
      path_rec_completion                          575     573      -2
      ipoib_mcast_join_task                        664     660      -4
      ipoib_neigh_destructor                       101      92      -9
      ipoib_neigh_setup_dev                         14       3     -11
      ipoib_neigh_setup                             17       -     -17
      path_free                                    238     215     -23
      ipoib_mcast_free                             329     306     -23
      ipoib_mcast_send                             718     684     -34
      neigh_add_path                               705     650     -55
      Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      d2e0655e
  24. 21 3月, 2006 4 次提交
  25. 12 2月, 2006 1 次提交
    • R
      IPoIB: Yet another fix for send-only joins · 20b83382
      Roland Dreier 提交于
      Even after the last fix, it's still possible for a send-only join to
      start before the join for the broadcast group has finished.  This
      could cause us to create a multicast group using attributes from the
      broadcast group that haven't been initialized yet, so we would use
      garbage for the Q_Key, etc.  Fix this by waiting until the broadcast
      group's attached flag is set before starting send-only joins.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      20b83382
  26. 08 2月, 2006 2 次提交