1. 20 4月, 2011 1 次提交
  2. 13 1月, 2011 1 次提交
  3. 11 1月, 2011 1 次提交
  4. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  5. 12 3月, 2010 2 次提交
    • O
      IPoIB: Include return code in trace message for ib_post_send() failures · a48f509b
      Or Gerlitz 提交于
      Print the return code of ib_post_send() if it fails to make these
      debugging messages more useful.
      Signed-off-by: NOr Gerlitz <ogerlitz@voltaire.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      a48f509b
    • E
      IPoIB: Fix TX queue lockup with mixed UD/CM traffic · f0dc117a
      Eli Cohen 提交于
      The IPoIB UD QP reports send completions to priv->send_cq, which is
      usually left unarmed; it only gets armed when the number of
      outstanding send requests reaches the size of the TX queue. This
      arming is done only in the send path for the UD QP.  However, when
      sending CM packets, the net queue may be stopped for the same reasons
      but no measures are taken to recover the UD path from a lockup.
      
      Consider this scenario: a host sends high rate of both CM and UD
      packets, with a TX queue length of N.  If at some time the number of
      outstanding UD packets is more than N/2 and the overall outstanding
      packets is N-1, and CM sends a packet (making the number of
      outstanding sends equal N), the TX queue will be stopped.  When all
      the CM packets complete, the number of outstanding packets will still
      be higher than N/2 so the TX queue will not be restarted.
      
      Fix this by calling ib_req_notify_cq() when the queue is stopped in
      the CM path.
      Signed-off-by: NEli Cohen <eli@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      f0dc117a
  6. 19 2月, 2010 1 次提交
  7. 06 9月, 2009 1 次提交
  8. 03 9月, 2009 1 次提交
  9. 03 6月, 2009 1 次提交
  10. 19 5月, 2009 1 次提交
  11. 30 10月, 2008 1 次提交
  12. 29 10月, 2008 1 次提交
  13. 01 10月, 2008 1 次提交
    • R
      IPoIB: Use netif_tx_lock() and get rid of private tx_lock, LLTX · 943c246e
      Roland Dreier 提交于
      Currently, IPoIB is an LLTX driver that uses its own IRQ-disabling
      tx_lock.  Not only do we want to get rid of LLTX, this actually causes
      problems because of the skb_orphan() done with this tx_lock held: some
      skb destructors expect to be run with interrupts enabled.
      
      The simplest fix for this is to get rid of the driver-private tx_lock
      and stop using LLTX.  We kill off priv->tx_lock and use
      netif_tx_lock[_bh]() instead; the patch to do this is a tiny bit
      tricky because we need to update places that take priv->lock inside
      the tx_lock to disable IRQs, rather than relying on tx_lock having
      already disabled IRQs.
      
      Also, there are a couple of places where we need to disable BHs to
      make sure we have a consistent context to call netif_tx_lock() (since
      we no longer can use _irqsave() variants), and we also have to change
      ipoib_send_comp_handler() to call drain_tx_cq() through a timer rather
      than directly, because ipoib_send_comp_handler() runs in interrupt
      context and drain_tx_cq() must run in BH context so it can call
      netif_tx_lock().
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      943c246e
  14. 09 8月, 2008 1 次提交
  15. 30 7月, 2008 1 次提交
  16. 15 7月, 2008 6 次提交
    • E
      IPoIB/cm: Reduce connected mode TX object size · e112373f
      Eli Cohen 提交于
      Since IPoIB connected mode does not NETIF_F_SG, we only have one DMA
      mapping per send, so we don't need a mapping[] array.  Define a new
      struct with a single u64 mapping member and use it for the CM tx_ring.
      Signed-off-by: NEli Cohen <eli@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      e112373f
    • E
      IPoIB: Use dev_set_mtu() to change mtu · bd360671
      Eli Cohen 提交于
      When the driver sets the MTU of the net device outside of its
      change_mtu method, it should make use of dev_set_mtu() instead of
      directly setting the mtu field of struct netdevice.  Otherwise
      functions registered to be called upon MTU change will not get called
      (this is done through call_netdevice_notifiers() in dev_set_mtu()).
      Signed-off-by: NEli Cohen <eli@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      bd360671
    • E
      IPoIB: Use rtnl lock/unlock when changing device flags · c8c2afe3
      Eli Cohen 提交于
      Use of this lock is required to synchronize changes to the netdvice's
      data structs.  Also move the call to ipoib_flush_paths() after the
      modification of the netdevice flags in set_mode().
      Signed-off-by: NEli Cohen <eli@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      c8c2afe3
    • R
      IPoIB/cm: Fix racy use of receive WR/SGL in ipoib_cm_post_receive_nonsrq() · a7d834c4
      Roland Dreier 提交于
      For devices that don't support SRQs, ipoib_cm_post_receive_nonsrq() is
      called from both ipoib_cm_handle_rx_wc() and ipoib_cm_nonsrq_init_rx(),
      and these two callers are not synchronized against each other.
      However, ipoib_cm_post_receive_nonsrq() always reuses the same receive
      work request and scatter list structures, so multiple callers can end
      up stepping on each other, which leads to posting garbled work
      requests.
      
      Fix this by having the caller pass in the ib_recv_wr and ib_sge
      structures to use, and allocating new local structures in
      ipoib_cm_nonsrq_init_rx().
      
      Based on a patch by Pradeep Satyanarayana <pradeep@us.ibm.com> and
      David Wilder <dwilder@us.ibm.com>, with debugging help from Hoang-Nam
      Nguyen <hnguyen@de.ibm.com>.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      a7d834c4
    • E
      IPoIB: Copy small received SKBs in connected mode · f89271da
      Eli Cohen 提交于
      The connected mode implementation in the IPoIB driver has a large
      overhead in the way SKBs are handled in the receive flow.  It usually
      allocates an SKB with as big as was used in the currently received SKB
      and moves unused fragments from the old SKB to the new one. This
      involves a loop on all the remaining fragments and incurs overhead on
      the CPU.  This patch, for small SKBs, allocates an SKB just large
      enough to contain the received data and copies to it the data from the
      received SKB.  The newly allocated SKB is passed to the stack and the
      old SKB is reposted.
      
      When running netperf, UDP small messages, without this pach I get:
      
          UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
          14.4.3.178 (14.4.3.178) port 0 AF_INET
          Socket  Message  Elapsed      Messages
          Size    Size     Time         Okay Errors   Throughput
          bytes   bytes    secs            #      #   10^6bits/sec
      
          114688     128   10.00     5142034      0     526.31
          114688           10.00     1130489            115.71
      
      With this patch I get both send and receive at ~315 mbps.
      
      The reason that send performance actually slows down is as follows:
      When using this patch, the overhead of the CPU for handling RX packets
      is dramatically reduced.  As a result, we do not experience RNR NAK
      messages from the receiver which cause the connection to be closed and
      reopened again; when the patch is not used, the receiver cannot handle
      the packets fast enough so there is less time to post new buffers and
      hence the mentioned RNR NACKs.  So what happens is that the
      application *thinks* it posted a certain number of packets for
      transmission but these packets are flushed and do not really get
      transmitted.  Since the connection gets opened and closed many times,
      each time netperf gets the CPU time that otherwise would have been
      given to IPoIB to actually transmit the packets.  This can be verified
      when looking at the port counters -- the output of ifconfig and the
      oputput of netperf (this is for the case without the patch):
      
          tx packets
          ==========
          port counter:   1,543,996
          ifconfig:       1,581,426
          netperf:        5,142,034
      
          rx packets
          ==========
          netperf         1,1304,089
      Signed-off-by: NEli Cohen <eli@mellanox.co.il>
      f89271da
    • R
      RDMA: Remove subversion $Id tags · f3781d2e
      Roland Dreier 提交于
      They don't get updated by git and so they're worse than useless.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      f3781d2e
  17. 30 4月, 2008 1 次提交
  18. 17 4月, 2008 3 次提交
    • R
      IPoIB: Handle case when P_Key is deleted and re-added at same index · 9fdd5e5b
      Roland Dreier 提交于
      If a P_Key is deleted and then re-added at the same index, then IPoIB
      gets confused because __ipoib_ib_dev_flush() only checks whether the
      index is the same without checking whether the P_Key was present, so
      the interface is stopped when the P_Key is deleted, but the event when
      the P_Key is re-added gets ignored and the interface never gets
      restarted.
      
      Also, switch to using ib_find_pkey() instead of ib_find_cached_pkey()
      everywhere in IPoIB, since none of the places that look for P_Keys are
      in a fast path or in non-sleeping context, and in general we want to
      kill off the whole caching infrastructure eventually.  This also fixes
      consistency problems caused because some IPoIB queries were cached and
      some were uncached during the window where the cache was not updated.
      
      Thanks to Venkata Subramonyam <vsubramo@cisco.com> for debugging this
      problem and testing this fix.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      9fdd5e5b
    • E
      IPoIB: Add LSO support · 40ca1988
      Eli Cohen 提交于
      For HCAs that support TCP segmentation offload (IB_DEVICE_UD_TSO), set
      NETIF_F_TSO and use HW LSO to offload TCP segmentation.
      Signed-off-by: NEli Cohen <eli@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      40ca1988
    • E
      IPoIB: Use checksum offload support if available · 6046136c
      Eli Cohen 提交于
      For HCAs that support checksum offload (ie that set IB_DEVICE_UD_IP_CSUM
      in the device capabilities flags), have IPoIB set NETIF_F_IP_CSUM and
      use the HCA to generate and verify IP checksums.
      Signed-off-by: NEli Cohen <eli@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      6046136c
  19. 12 3月, 2008 2 次提交
    • R
      IPoIB: Allocate priv->tx_ring with vmalloc() · 10313cbb
      Roland Dreier 提交于
      Commit 7143740d ("IPoIB: Add send gather support") made struct
      ipoib_tx_buf significantly larger, since the mapping member changed
      from a single u64 to an array with MAX_SKB_FRAGS + 1 entries.  This
      means that allocating tx_rings with kzalloc() may fail because there
      is not enough contiguous memory for the new, much bigger size.  Fix
      this regression by allocating the rings with vmalloc() instead.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      10313cbb
    • R
      IPoIB/cm: Set tx_wr.num_sge in connected mode post_send() · 4200406b
      Roland Dreier 提交于
      Commit 7143740d ("IPoIB: Add send gather support") made it possible
      for tx_wr.num_sge to be != 1 -- this happens if send gather support is
      enabled.  However, the code in the connected mode post_send() function
      assumes the old invariant, namely that tx_wr.num_sge is always 1.  Fix
      this by explicitly setting tx_wr.num_sge to 1 in the CM post_send().
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      4200406b
  20. 20 2月, 2008 1 次提交
  21. 09 2月, 2008 1 次提交
  22. 26 1月, 2008 6 次提交
    • P
      IPoIB/CM: Enable SRQ support on HCAs that support fewer than 16 SG entries · 586a6934
      Pradeep Satyanarayana 提交于
      Some HCAs (such as ehca2) support SRQ, but only support fewer than 16 SG
      entries for SRQs.  Currently IPoIB/CM implicitly assumes all HCAs will
      support 16 SG entries for SRQs (to handle a 64K MTU with 4K pages). This
      patch removes that restriction by limiting the maximum MTU in connected
      mode to what the maximum number of SRQ SG entries allows.
      
      This patch addresses <https://bugs.openfabrics.org/show_bug.cgi?id=728>
      Signed-off-by: NPradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      586a6934
    • P
      IPoIB/cm: Add connected mode support for devices without SRQs · 68e995a2
      Pradeep Satyanarayana 提交于
      Some IB adapters (notably IBM's eHCA) do not implement SRQs (shared
      receive queues).  The current IPoIB connected mode support only works
      on devices that support SRQs.
      
      Fix this by adding support for using the receive queue of each
      connected mode receive QP.  The disadvantage of this compared to using
      an SRQ is that it means a full queue of receives must be posted for
      each remote connected mode peer, which means that total memory usage
      is potentially much higher than when using SRQs.  To manage this, add
      a new module parameter "max_nonsrq_conn_qp" that limits the number of
      connections allowed per interface.
      
      The rest of the changes are fairly straightforward: we use a table of
      struct ipoib_cm_rx to hold all the active connections, and put the
      table index of the connection in the high bits of receive WR IDs.
      This is needed because we cannot rely on the struct ib_wc.qp field for
      non-SRQ receive completions.  Most of the rest of the changes just
      test whether or not an SRQ is available, and post receives or find
      received packets in the right place depending on the answer.
      
      Cleaning up dead connections actually becomes simpler, because we do
      not have to do the "last WQE reached" dance that is required to
      destroy QPs attached to an SRQ.  We just move the QP to the error
      state and wait for all pending receives to be flushed.
      Signed-off-by: NPradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
      
      [ Completely rewritten and split up, based on Pradeep's work.  Several
        bugs fixed and no doubt several bugs introduced.  - Roland ]
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      68e995a2
    • R
      IPoIB/cm: Factor out ipoib_cm_free_rx_reap_list() · efcd9971
      Roland Dreier 提交于
      Factor out the code for going through the rx_reap list of struct
      ipoib_cm_rx and freeing each one.  This consolidates the code
      duplicated between ipoib_cm_dev_stop() and ipoib_cm_rx_reap() and
      reduces the risk of error when adding additional accounting.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      efcd9971
    • R
      IPoIB/cm: Factor out ipoib_cm_create_srq() · 7b3687df
      Roland Dreier 提交于
      Factor out the code to create an SRQ and allocate the receive ring in
      ipoib_cm_dev_init() into a new function ipoib_cm_create_srq().  This
      will make the code neater when support for devices that don't implement
      SRQs is added.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      7b3687df
    • R
      IPoIB/cm: Factor out ipoib_cm_free_rx_ring() · 1efb6144
      Roland Dreier 提交于
      Factor out the code to unmap/free skbs and free the receive ring in
      ipoib_cm_dev_cleanup() into a new function ipoib_cm_free_rx_ring().
      This function will be called from a couple of other places when
      support for devices that don't implement SRQs is added.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      1efb6144
    • R
      IPoIB: Trivial formatting cleanups · 2337f809
      Roland Dreier 提交于
      Fix whitespace blunders, convert "foo* bar" to "foo *bar", etc.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      2337f809
  23. 27 10月, 2007 1 次提交
    • R
      IPoIB/cm: Fix receive QP cleanup · 09f60f8f
      Roland Dreier 提交于
      Commit 1b524963 ("IPoIB/cm: Use common CQ for CM send completions")
      changed how the high-order bits of work request IDs were used, which
      had the effect that IPOIB_CM_RX_DRAIN_WRID was no longer handled as a
      connected mode receive completion.  This leads to the messages
      
          ib1: cm send completion event with wrid 1073741823 (> 64)
          ib1: RX drain timing out
      
      when an interface with connected mode QPs is brought down.  Fix this
      by making sure that both IPOIB_OP_CM and IPOIB_OP_RECV are set in
      IPOIB_CM_RX_DRAIN_WRID.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      09f60f8f
  24. 20 10月, 2007 1 次提交
  25. 18 10月, 2007 1 次提交
  26. 11 10月, 2007 1 次提交