1. 17 4月, 2008 4 次提交
  2. 15 2月, 2008 1 次提交
  3. 09 2月, 2008 1 次提交
  4. 26 1月, 2008 3 次提交
    • P
      IPoIB/CM: Enable SRQ support on HCAs that support fewer than 16 SG entries · 586a6934
      Pradeep Satyanarayana 提交于
      Some HCAs (such as ehca2) support SRQ, but only support fewer than 16 SG
      entries for SRQs.  Currently IPoIB/CM implicitly assumes all HCAs will
      support 16 SG entries for SRQs (to handle a 64K MTU with 4K pages). This
      patch removes that restriction by limiting the maximum MTU in connected
      mode to what the maximum number of SRQ SG entries allows.
      
      This patch addresses <https://bugs.openfabrics.org/show_bug.cgi?id=728>
      Signed-off-by: NPradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      586a6934
    • P
      IPoIB/cm: Add connected mode support for devices without SRQs · 68e995a2
      Pradeep Satyanarayana 提交于
      Some IB adapters (notably IBM's eHCA) do not implement SRQs (shared
      receive queues).  The current IPoIB connected mode support only works
      on devices that support SRQs.
      
      Fix this by adding support for using the receive queue of each
      connected mode receive QP.  The disadvantage of this compared to using
      an SRQ is that it means a full queue of receives must be posted for
      each remote connected mode peer, which means that total memory usage
      is potentially much higher than when using SRQs.  To manage this, add
      a new module parameter "max_nonsrq_conn_qp" that limits the number of
      connections allowed per interface.
      
      The rest of the changes are fairly straightforward: we use a table of
      struct ipoib_cm_rx to hold all the active connections, and put the
      table index of the connection in the high bits of receive WR IDs.
      This is needed because we cannot rely on the struct ib_wc.qp field for
      non-SRQ receive completions.  Most of the rest of the changes just
      test whether or not an SRQ is available, and post receives or find
      received packets in the right place depending on the answer.
      
      Cleaning up dead connections actually becomes simpler, because we do
      not have to do the "last WQE reached" dance that is required to
      destroy QPs attached to an SRQ.  We just move the QP to the error
      state and wait for all pending receives to be flushed.
      Signed-off-by: NPradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
      
      [ Completely rewritten and split up, based on Pradeep's work.  Several
        bugs fixed and no doubt several bugs introduced.  - Roland ]
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      68e995a2
    • R
      IPoIB: Trivial formatting cleanups · 2337f809
      Roland Dreier 提交于
      Fix whitespace blunders, convert "foo* bar" to "foo *bar", etc.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      2337f809
  5. 20 10月, 2007 1 次提交
  6. 16 10月, 2007 1 次提交
    • M
      IB/ipoib: Bound the net device to the ipoib_neigh structue · 732a2170
      Moni Shoua 提交于
      IPoIB uses a two layer neighboring scheme, such that for each struct neighbour
      whose device is an ipoib one, there is a struct ipoib_neigh buddy which is
      created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour)
      call.
      
      When using the bonding driver, neighbours are created by the net stack on behalf
      of the bonding (master) device. On the tx flow the bonding code gets an skb such
      that skb->dev points to the master device, it changes this skb to point on the
      slave device and calls the slave hard_start_xmit function.
      
      Under this scheme, ipoib_neigh_destructor assumption that for each struct
      neighbour it gets, n->dev is an ipoib device and hence netdev_priv(n->dev)
      can be casted to struct ipoib_dev_priv is buggy.
      
      To fix it, this patch adds a dev field to struct ipoib_neigh which is used
      instead of the struct neighbour dev one, when n->dev->flags has the
      IFF_MASTER bit set.
      
      Signed-off-by: Moni Shoua <monis at voltaire.com>
      Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
      Acked-by: NRoland Dreier <rdreier@cisco.com>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      732a2170
  7. 11 10月, 2007 3 次提交
    • R
      [IPoIB]: Convert to netdevice internal stats · de903512
      Roland Dreier 提交于
      Use the stats member of struct netdevice in IPoIB, so we can save
      memory by deleting the stats member of struct ipoib_dev_priv, and save
      code by deleting ipoib_get_stats().
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de903512
    • S
      [NET]: Make NAPI polling independent of struct net_device objects. · bea3348e
      Stephen Hemminger 提交于
      Several devices have multiple independant RX queues per net
      device, and some have a single interrupt doorbell for several
      queues.
      
      In either case, it's easier to support layouts like that if the
      structure representing the poll is independant from the net
      device itself.
      
      The signature of the ->poll() call back goes from:
      
      	int foo_poll(struct net_device *dev, int *budget)
      
      to
      
      	int foo_poll(struct napi_struct *napi, int budget)
      
      The caller is returned the number of RX packets processed (or
      the number of "NAPI credits" consumed if you want to get
      abstract).  The callee no longer messes around bumping
      dev->quota, *budget, etc. because that is all handled in the
      caller upon return.
      
      The napi_struct is to be embedded in the device driver private data
      structures.
      
      Furthermore, it is the driver's responsibility to disable all NAPI
      instances in it's ->stop() device close handler.  Since the
      napi_struct is privatized into the driver's private data structures,
      only the driver knows how to get at all of the napi_struct instances
      it may have per-device.
      
      With lots of help and suggestions from Rusty Russell, Roland Dreier,
      Michael Chan, Jeff Garzik, and Jamal Hadi Salim.
      
      Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra,
      Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan.
      
      [ Ported to current tree and all drivers converted.  Integrated
        Stephen's follow-on kerneldoc additions, and restored poll_list
        handling to the old style to fix mutual exclusion issues.  -DaveM ]
      Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bea3348e
    • O
      IPoIB: Allow setting policy to ignore multicast groups · 335a64a5
      Or Gerlitz 提交于
      The kernel IB stack allows (through the RDMA CM) userspace
      applications to join and use multicast groups from the IPoIB MGID
      range.  This allows multicast traffic to be handled directly from
      userspace QPs, without going through the kernel stack, which gives
      better performance for some applications.
      
      However, to fully interoperate with IP multicast, such userspace
      applications need to participate in IGMP reports and queries, or else
      routers may not forward the multicast traffic to the system where the
      application is running.  The simplest way to do this is to share the
      kernel IGMP implementation by using the IP_ADD_MEMBERSHIP option to
      join multicast groups that are being handled directly in userspace.
      
      However, in such cases, the actual multicast traffic should not also
      be handled by the IPoIB interface, because that would burn resources
      handling multicast packets that will just be discarded in the kernel.
      
      To handle this, this patch adds lookup on the database used for IB
      multicast group reference counting when IPoIB is joining multicast
      groups, and if a multicast group is already handled by user space,
      then the IPoIB kernel driver ignores the group.  This is controlled by
      a per-interface policy flag.  When the flag is set, IPoIB will not
      join and attach its QP to a multicast group which already has an entry
      in the database; when the flag is cleared, IPoIB will behave as before
      this change.
      
      For each IPoIB interface, the /sys/class/net/$intf/umcast attribute
      controls the policy flag.  The default value is off/0.
      Signed-off-by: NOr Gerlitz <ogerlitz@voltaire.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      335a64a5
  8. 10 10月, 2007 1 次提交
  9. 30 5月, 2007 1 次提交
  10. 25 5月, 2007 1 次提交
  11. 22 5月, 2007 1 次提交
    • M
      IPoIB/cm: Fix SRQ WR leak · 518b1646
      Michael S. Tsirkin 提交于
      SRQ WR leakage has been observed with IPoIB/CM: e.g. flipping ports on
      and off will, with time, leak out all WRs and then all connections
      will start getting RNR NAKs.  Fix this in the way suggested by spec:
      move the QP being destroyed to the error state, wait for "Last WQE
      Reached" event and then post WR on a "drain QP" connected to the same
      CQ.  Once we observe a completion on the drain QP, it's safe to call
      ib_destroy_qp.
      Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      518b1646
  12. 19 5月, 2007 1 次提交
  13. 07 5月, 2007 1 次提交
    • R
      IPoIB: Convert to NAPI · 8d1cc86a
      Roland Dreier 提交于
      Convert the IP-over-InfiniBand network device driver over to using
      NAPI to handle completions for the main CQ.  This covers all receives
      as well as datagram mode sends; send completions for connected mode
      connections are still handled from interrupt context.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      8d1cc86a
  14. 03 5月, 2007 1 次提交
    • J
      PCI: Cleanup the includes of <linux/pci.h> · 6473d160
      Jean Delvare 提交于
      I noticed that many source files include <linux/pci.h> while they do
      not appear to need it. Here is an attempt to clean it all up.
      
      In order to find all possibly affected files, I searched for all
      files including <linux/pci.h> but without any other occurence of "pci"
      or "PCI". I removed the include statement from all of these, then I
      compiled an allmodconfig kernel on both i386 and x86_64 and fixed the
      false positives manually.
      
      My tests covered 66% of the affected files, so there could be false
      positives remaining. Untested files are:
      
      arch/alpha/kernel/err_common.c
      arch/alpha/kernel/err_ev6.c
      arch/alpha/kernel/err_ev7.c
      arch/ia64/sn/kernel/huberror.c
      arch/ia64/sn/kernel/xpnet.c
      arch/m68knommu/kernel/dma.c
      arch/mips/lib/iomap.c
      arch/powerpc/platforms/pseries/ras.c
      arch/ppc/8260_io/enet.c
      arch/ppc/8260_io/fcc_enet.c
      arch/ppc/8xx_io/enet.c
      arch/ppc/syslib/ppc4xx_sgdma.c
      arch/sh64/mach-cayman/iomap.c
      arch/xtensa/kernel/xtensa_ksyms.c
      arch/xtensa/platform-iss/setup.c
      drivers/i2c/busses/i2c-at91.c
      drivers/i2c/busses/i2c-mpc.c
      drivers/media/video/saa711x.c
      drivers/misc/hdpuftrs/hdpu_cpustate.c
      drivers/misc/hdpuftrs/hdpu_nexus.c
      drivers/net/au1000_eth.c
      drivers/net/fec_8xx/fec_main.c
      drivers/net/fec_8xx/fec_mii.c
      drivers/net/fs_enet/fs_enet-main.c
      drivers/net/fs_enet/mac-fcc.c
      drivers/net/fs_enet/mac-fec.c
      drivers/net/fs_enet/mac-scc.c
      drivers/net/fs_enet/mii-bitbang.c
      drivers/net/fs_enet/mii-fec.c
      drivers/net/ibm_emac/ibm_emac_core.c
      drivers/net/lasi_82596.c
      drivers/parisc/hppb.c
      drivers/sbus/sbus.c
      drivers/video/g364fb.c
      drivers/video/platinumfb.c
      drivers/video/stifb.c
      drivers/video/valkyriefb.c
      include/asm-arm/arch-ixp4xx/dma.h
      sound/oss/au1550_ac97.c
      
      I would welcome test reports for these files. I am fine with removing
      the untested files from the patch if the general opinion is that these
      changes aren't safe. The tested part would still be nice to have.
      
      Note that this patch depends on another header fixup patch I submitted
      to LKML yesterday:
        [PATCH] scatterlist.h needs types.h
        http://lkml.org/lkml/2007/3/01/141Signed-off-by: NJean Delvare <khali@linux-fr.org>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      6473d160
  15. 22 2月, 2007 1 次提交
  16. 11 2月, 2007 1 次提交
    • M
      IPoIB: Connected mode experimental support · 839fcaba
      Michael S. Tsirkin 提交于
      The following patch adds experimental support for IPoIB connected
      mode, as defined by the draft from the IETF ipoib working group.  The
      idea is to increase performance by increasing the MTU from the maximum
      of 2K (theoretically 4K) supported by IPoIB on top of UD.  With this
      code, I'm able to get 800MByte/sec or more with netperf without
      options on a Mellanox 4x back-to-back DDR system.
      
      Some notes on code:
      1. SRQ is used for scalability to large cluster sizes
      2. Only RC connections are used (UC does not support SRQ now)
      3. Retry count is set to 0 since spec draft warns against retries
      4. Each connection is used for data transfers in only 1 direction, so
         each connection is either active(TX) or passive (RX).  2 sides that
         want to communicate create 2 connections.
      5. Each active (TX) connection has a separate CQ for send completions -
         this keeps the code simple without CQ resize and other tricks
      6. To detect stale passive side connections (where the remote side is
         down), we keep an LRU list of passive connections (updated once per
         second per connection) and destroy a connection after it has been
         unused for several seconds. The LRU rule makes it possible to avoid
         scanning connections that have recently been active.
      Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      839fcaba
  17. 13 12月, 2006 1 次提交
  18. 30 11月, 2006 1 次提交
  19. 22 11月, 2006 1 次提交
  20. 23 9月, 2006 1 次提交
  21. 25 7月, 2006 1 次提交
  22. 01 7月, 2006 1 次提交
  23. 18 6月, 2006 1 次提交
  24. 11 4月, 2006 1 次提交
  25. 05 4月, 2006 1 次提交
    • M
      IPoIB: Consolidate private neighbour data handling · d2e0655e
      Michael S. Tsirkin 提交于
      Consolidate IPoIB's private neighbour data handling into
      ipoib_neigh_alloc() and ipoib_neigh_free().  This will make it easier
      to keep track of the neighbour structures that IPoIB is handling, and
      is a nice cleanup of the code:
      
      add/remove: 2/1 grow/shrink: 1/8 up/down: 100/-178 (-78)
      function                                     old     new   delta
      ipoib_neigh_alloc                              -      61     +61
      ipoib_neigh_free                               -      36     +36
      ipoib_mcast_join_finish                     1288    1291      +3
      path_rec_completion                          575     573      -2
      ipoib_mcast_join_task                        664     660      -4
      ipoib_neigh_destructor                       101      92      -9
      ipoib_neigh_setup_dev                         14       3     -11
      ipoib_neigh_setup                             17       -     -17
      path_free                                    238     215     -23
      ipoib_mcast_free                             329     306     -23
      ipoib_mcast_send                             718     684     -34
      neigh_add_path                               705     650     -55
      Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      d2e0655e
  26. 25 3月, 2006 1 次提交
    • L
      IPoIB: P_Key change event handling · 7a343d4c
      Leonid Arsh 提交于
      This patch causes the network interface to respond to P_Key change
      events correctly.  As a result, you'll see a child interface in the
      "RUNNING" state (netif_carrier_on()) only when the corresponding P_Key
      is configured by the SM.  When SM removes a P_Key, the "RUNNING" state
      will be disabled for the corresponding network interface.  To
      implement this, I added IB_EVENT_PKEY_CHANGE event handling.  To
      prevent flushing the device before the device is open by the "delay
      open" mechanism, I added an additional device flag called
      IPOIB_FLAG_INITIALIZED.
      
      This also prevents the child network interface from trying to join to
      multicast groups until the PKEY is configured.  We used to get error
      messages like:
      
          ib0.f2f2: couldn't attach QP to multicast group ff12:401b:f2f2:0:0:0:ffff:ffff
      
      in this case.  To fix this, I just check IPOIB_FLAG_OPER_UP flag in
      ipoib_set_mcast_list().
      Signed-off-by: NLeonid Arsh <leonida@voltaire.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      7a343d4c
  27. 21 3月, 2006 2 次提交
  28. 08 2月, 2006 1 次提交
  29. 14 1月, 2006 1 次提交
  30. 11 11月, 2005 1 次提交
    • R
      [IPoIB] add path record information in debugfs · 1732b0ef
      Roland Dreier 提交于
      Add ibX_path files to debugfs that contain information about the IPoIB
      path cache.  IPoIB ARP only gives GIDs, which the IPoIB driver must
      resolve to real IB paths through the ib_sa module.  For debugging,
      when the ARP table looks OK but traffic isn't flowing, it's useful to
      be able to see if the resolution from GID to path worked.
      
      Also clean up the formatting of the existing _mcg debugfs files.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      1732b0ef
  31. 03 11月, 2005 1 次提交
  32. 29 10月, 2005 1 次提交
    • R
      [IPoIB] Drop RX packets when out of memory · 1993d683
      Roland Dreier 提交于
      Change the way IPoIB handles RX packets when it can't allocate a new
      receive skbuff.  If the allocation of a new receive skb fails, we now
      drop the packet we just received and repost the original receive skb.
      This means that the receive ring always stays full and we don't have
      to monkey around with trying to schedule a refill task for later.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      1993d683