- 30 5月, 2007 1 次提交
-
-
由 Michael S. Tsirkin 提交于
commit 518b1646 ("IPoIB/cm: Fix SRQ WR leak") introduced a severe performance regression on Mellanox cards, because keeping a QP in the error state for extended periods of time moves hardware to the slow path (until the QP is destroyed). For example, MPI latency goes from ~3 usecs to ~7 usecs. Fix this by posting a send WR on one of the QPs that are being flushed, instead of using a separate drain QP that is kept in the error state. This fixes bug <https://bugs.openfabrics.org/show_bug.cgi?id=636>, reported and bisected by Scott Weitzenkamp at Cisco and debugged by Sasha Mikheev at Voltaire. Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 25 5月, 2007 2 次提交
-
-
由 Michael S. Tsirkin 提交于
Since NAPI polling is disabled while ipoib_cm_dev_stop() is running, ipoib_cm_dev_stop() must poll the CQ itself in order to see the packets draining. Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Michael S. Tsirkin 提交于
time_after() was used backwards, so the timeout occurred immediately. Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 22 5月, 2007 1 次提交
-
-
由 Michael S. Tsirkin 提交于
SRQ WR leakage has been observed with IPoIB/CM: e.g. flipping ports on and off will, with time, leak out all WRs and then all connections will start getting RNR NAKs. Fix this in the way suggested by spec: move the QP being destroyed to the error state, wait for "Last WQE Reached" event and then post WR on a "drain QP" connected to the same CQ. Once we observe a completion on the drain QP, it's safe to call ib_destroy_qp. Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 15 5月, 2007 1 次提交
-
-
由 Michael S. Tsirkin 提交于
In the presence of some running RX connections, we repeat queue_delayed_work calls each 4 RX WRs, which is a waste. It's enough to start stale task when a first passive connection is added, and rerun it every IPOIB_CM_RX_DELAY as long as there are outstanding passive connections. This removes some code from RX data path. Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 07 5月, 2007 3 次提交
-
-
由 Roland Dreier 提交于
Convert the IP-over-InfiniBand network device driver over to using NAPI to handle completions for the main CQ. This covers all receives as well as datagram mode sends; send completions for connected mode connections are still handled from interrupt context. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Michael S. Tsirkin 提交于
Add a num_comp_vectors member to struct ib_device and extend ib_create_cq() to pass in a comp_vector parameter -- this parallels the userspace libibverbs API. Update all hardware drivers to set num_comp_vectors to 1 and have all ULPs pass 0 for the comp_vector value. Pass the value of num_comp_vectors to userspace rather than hard-coding a value of 1. We want multiple CQ event vector support (via MSI-X or similar for adapters that can generate multiple interrupts), but it's not clear how many vectors we want, or how we want to deal with policy issues such as how to decide which vector to use or how to set up interrupt affinity. This patch is useful for experimenting, since no core changes will be necessary when updating a driver to support multiple vectors, and we know that we want to make at least these changes anyway. Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Michael S. Tsirkin 提交于
The IPoIB CM spec allows the use of a single connection in both active->passive and passive->active directions. The current Linux code uses one connection for both directions, but if another node only uses one connection for both directions, we oops when we try to look up the passive connection. Fix by checking that qp_context is non-NULL before dereferencing it. Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
-
- 01 5月, 2007 1 次提交
-
-
由 Michael S. Tsirkin 提交于
If skb allocation fails when we start the device, we call ipoib_cm_dev_stop() even though ipoib_cm_dev_open() did not run to completion, so we pass an invalid pointer to ib_destroy_cm_id and get an oops. Fix by clearing cm.id on error, and testing it in cm_dev_stop(). This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=561> Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 26 4月, 2007 1 次提交
-
-
由 Arnaldo Carvalho de Melo 提交于
For the common, open coded 'skb->mac.raw = skb->data' operation, so that we can later turn skb->mac.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple case, next will handle the slightly more "complex" cases. Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 4月, 2007 1 次提交
-
-
由 Roland Dreier 提交于
There are quite a few places in ipoib_cm.c where we know IRQs are enabled because we do something that sleeps in the same function, so we can convert several occurrences of spin_lock_irqsave() to a plain spin_lock_irq(). This cleans up the source a little and makes the code smaller too: add/remove: 0/0 grow/shrink: 1/5 up/down: 3/-51 (-48) function old new delta ipoib_cm_tx_reap 403 406 +3 ipoib_cm_stale_task 146 145 -1 ipoib_cm_dev_stop 173 172 -1 ipoib_cm_tx_handler 964 956 -8 ipoib_cm_rx_handler 956 937 -19 ipoib_cm_skb_reap 212 190 -22 Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 19 4月, 2007 1 次提交
-
-
由 Roland Dreier 提交于
There's no point in printing the opcode field in the completion handling debugging output, since the type of completion is already printed at the beginning of the line. In fact the opcode field is not even defined for completions with a status other than success. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 10 4月, 2007 1 次提交
-
-
由 Michael S. Tsirkin 提交于
Receive buffers need to be mapped with DMA_FROM_DEVICE. Incorrectly mapping with DMA_TO_DEVICE causes a hard lock on ppc64 machines with an IOMMU. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=431> Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 23 3月, 2007 2 次提交
-
-
由 Michael S. Tsirkin 提交于
The packet length checks in ipoib are broken: we add 4 bytes (IPoIB encapsulation header) when sending a packet, not 20 bytes (hardware address length) to each packet. Therefore, if connected mode is enabled so that the interface MTU is larger than the multicast MTU, IPoIB may end up trying to send too-long multicast packets. For example, multicast is broken if a message of size 2048 bytes is sent on an interface with UD MTU 2048, because 2048 is bigger than the real limit of 2044 but the code tests against the wrong limit of 2060. This patch fixes <https://bugs.openfabrics.org/show_bug.cgi?id=418>, submitted by Scott Weitzenkamp <sweitzen@cisco.com>. Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Michael S. Tsirkin 提交于
The sense of the time_after_eq() test in ipoib_cm_stale_task() is reversed so that only non-stale connections are reaped. Fix this by changing to time_before_eq(). Noticed by Pradeep Satyanarayana <pradeep@us.ibm.com>. Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 21 2月, 2007 1 次提交
-
-
由 Michael S. Tsirkin 提交于
Avoid the overhead of freeing/reallocating and mapping/unmapping for DMA pages that have not been written to by hardware. Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 17 2月, 2007 2 次提交
-
-
由 Michael S. Tsirkin 提交于
ipoib_cm_alloc_rx_skb() might be called from IRQ context, so it must use dev_kfree_skb_any(), not kfree_skb(). Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
Change the permissions of the "mode" sysfs attribute to be S_IWUSR instead of S_IWUGO. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 11 2月, 2007 1 次提交
-
-
由 Michael S. Tsirkin 提交于
The following patch adds experimental support for IPoIB connected mode, as defined by the draft from the IETF ipoib working group. The idea is to increase performance by increasing the MTU from the maximum of 2K (theoretically 4K) supported by IPoIB on top of UD. With this code, I'm able to get 800MByte/sec or more with netperf without options on a Mellanox 4x back-to-back DDR system. Some notes on code: 1. SRQ is used for scalability to large cluster sizes 2. Only RC connections are used (UC does not support SRQ now) 3. Retry count is set to 0 since spec draft warns against retries 4. Each connection is used for data transfers in only 1 direction, so each connection is either active(TX) or passive (RX). 2 sides that want to communicate create 2 connections. 5. Each active (TX) connection has a separate CQ for send completions - this keeps the code simple without CQ resize and other tricks 6. To detect stale passive side connections (where the remote side is down), we keep an LRU list of passive connections (updated once per second per connection) and destroy a connection after it has been unused for several seconds. The LRU rule makes it possible to avoid scanning connections that have recently been active. Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-