提交 · ec56dc0b7f6c3fec20bbc2e98ff1a06edf2fc9b9 · openeuler / Kernel

30 5月, 2007 1 次提交

IPoIB/cm: Fix performance regression on Mellanox · ec56dc0b

由 Michael S. Tsirkin 提交于 5月 28, 2007

commit 518b1646 ("IPoIB/cm: Fix SRQ WR leak") introduced a severe
performance regression on Mellanox cards, because keeping a QP in the
error state for extended periods of time moves hardware to the slow
path (until the QP is destroyed).  For example, MPI latency goes from
~3 usecs to ~7 usecs.

Fix this by posting a send WR on one of the QPs that are being
flushed, instead of using a separate drain QP that is kept in the
error state.

This fixes bug <https://bugs.openfabrics.org/show_bug.cgi?id=636>,
reported and bisected by Scott Weitzenkamp at Cisco and debugged by
Sasha Mikheev at Voltaire.
Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

ec56dc0b

25 5月, 2007 2 次提交

IPoIB/cm: Drain cq in ipoib_cm_dev_stop() · 2dfbfc37

由 Michael S. Tsirkin 提交于 5月 24, 2007

Since NAPI polling is disabled while ipoib_cm_dev_stop() is running,
ipoib_cm_dev_stop() must poll the CQ itself in order to see the
packets draining.
Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

2dfbfc37

IPoIB/cm: Fix timeout check in ipoib_cm_dev_stop() · 8fd357a6

由 Michael S. Tsirkin 提交于 5月 24, 2007

time_after() was used backwards, so the timeout occurred immediately.
Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

8fd357a6

22 5月, 2007 1 次提交

IPoIB/cm: Fix SRQ WR leak · 518b1646

由 Michael S. Tsirkin 提交于 5月 21, 2007

SRQ WR leakage has been observed with IPoIB/CM: e.g. flipping ports on
and off will, with time, leak out all WRs and then all connections
will start getting RNR NAKs.  Fix this in the way suggested by spec:
move the QP being destroyed to the error state, wait for "Last WQE
Reached" event and then post WR on a "drain QP" connected to the same
CQ.  Once we observe a completion on the drain QP, it's safe to call
ib_destroy_qp.
Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

518b1646

15 5月, 2007 1 次提交

IPoIB/cm: Optimize stale connection detection · 7c5b9ef8

由 Michael S. Tsirkin 提交于 5月 14, 2007

In the presence of some running RX connections, we repeat
queue_delayed_work calls each 4 RX WRs, which is a waste.  It's enough
to start stale task when a first passive connection is added, and
rerun it every IPOIB_CM_RX_DELAY as long as there are outstanding
passive connections.

This removes some code from RX data path.
Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

7c5b9ef8

07 5月, 2007 3 次提交

IPoIB: Convert to NAPI · 8d1cc86a

由 Roland Dreier 提交于 5月 06, 2007

Convert the IP-over-InfiniBand network device driver over to using
NAPI to handle completions for the main CQ.  This covers all receives
as well as datagram mode sends; send completions for connected mode
connections are still handled from interrupt context.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

8d1cc86a

IB: Add CQ comp_vector support · f4fd0b22

由 Michael S. Tsirkin 提交于 5月 03, 2007

Add a num_comp_vectors member to struct ib_device and extend
ib_create_cq() to pass in a comp_vector parameter -- this parallels
the userspace libibverbs API.  Update all hardware drivers to set
num_comp_vectors to 1 and have all ULPs pass 0 for the comp_vector
value.  Pass the value of num_comp_vectors to userspace rather than
hard-coding a value of 1.

We want multiple CQ event vector support (via MSI-X or similar for
adapters that can generate multiple interrupts), but it's not clear
how many vectors we want, or how we want to deal with policy issues
such as how to decide which vector to use or how to set up interrupt
affinity.  This patch is useful for experimenting, since no core
changes will be necessary when updating a driver to support multiple
vectors, and we know that we want to make at least these changes
anyway.
Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

f4fd0b22

IPoIB/cm: Don't crash if remote side uses one QP for both directions · d6ef7d68

由 Michael S. Tsirkin 提交于 5月 02, 2007

The IPoIB CM spec allows the use of a single connection in both
active->passive and passive->active directions.  The current Linux
code uses one connection for both directions, but if another node only
uses one connection for both directions, we oops when we try to look
up the passive connection.  Fix by checking that qp_context is
non-NULL before dereferencing it.
Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>

d6ef7d68

01 5月, 2007 1 次提交

IPoIB/cm: Fix error handling in ipoib_cm_dev_open() · 347fcfbe

由 Michael S. Tsirkin 提交于 4月 30, 2007

If skb allocation fails when we start the device, we call
ipoib_cm_dev_stop() even though ipoib_cm_dev_open() did not run to
completion, so we pass an invalid pointer to ib_destroy_cm_id and get
an oops.

Fix by clearing cm.id on error, and testing it in cm_dev_stop().
This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=561>
Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

347fcfbe

26 4月, 2007 1 次提交

[SK_BUFF]: Introduce skb_reset_mac_header(skb) · 459a98ed

由 Arnaldo Carvalho de Melo 提交于 3月 19, 2007

For the common, open coded 'skb->mac.raw = skb->data' operation, so that we can
later turn skb->mac.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.

This one touches just the most simple case, next will handle the slightly more
"complex" cases.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

459a98ed

25 4月, 2007 1 次提交

IPoIB/cm: spin_lock_irqsave() -> spin_lock_irq() replacements · 37aebbde

由 Roland Dreier 提交于 4月 24, 2007

There are quite a few places in ipoib_cm.c where we know IRQs are
enabled because we do something that sleeps in the same function, so
we can convert several occurrences of spin_lock_irqsave() to a plain
spin_lock_irq().  This cleans up the source a little and makes the
code smaller too:

add/remove: 0/0 grow/shrink: 1/5 up/down: 3/-51 (-48)
function                                     old     new   delta
ipoib_cm_tx_reap                             403     406      +3
ipoib_cm_stale_task                          146     145      -1
ipoib_cm_dev_stop                            173     172      -1
ipoib_cm_tx_handler                          964     956      -8
ipoib_cm_rx_handler                          956     937     -19
ipoib_cm_skb_reap                            212     190     -22
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

37aebbde

19 4月, 2007 1 次提交

IPoIB: Remove pointless opcode field from debugging output · a89875fc

由 Roland Dreier 提交于 4月 18, 2007

There's no point in printing the opcode field in the completion
handling debugging output, since the type of completion is already
printed at the beginning of the line. In fact the opcode field is not
even defined for completions with a status other than success.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

a89875fc

10 4月, 2007 1 次提交

IPoIB/cm: Fix DMA direction typo · 6371ea3d

由 Michael S. Tsirkin 提交于 4月 10, 2007

Receive buffers need to be mapped with DMA_FROM_DEVICE.  Incorrectly
mapping with DMA_TO_DEVICE causes a hard lock on ppc64 machines with
an IOMMU.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=431>
Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

6371ea3d

23 3月, 2007 2 次提交

IB/ipoib: Fix thinko in packet length checks · 77d8e1ef

由 Michael S. Tsirkin 提交于 3月 21, 2007

The packet length checks in ipoib are broken: we add 4 bytes (IPoIB
encapsulation header) when sending a packet, not 20 bytes (hardware
address length) to each packet. Therefore, if connected mode is
enabled so that the interface MTU is larger than the multicast MTU,
IPoIB may end up trying to send too-long multicast packets. For
example, multicast is broken if a message of size 2048 bytes is sent
on an interface with UD MTU 2048, because 2048 is bigger than the real
limit of 2044 but the code tests against the wrong limit of 2060.

This patch fixes <https://bugs.openfabrics.org/show_bug.cgi?id=418>,
submitted by Scott Weitzenkamp <sweitzen@cisco.com>.
Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

77d8e1ef

IPoIB/cm: Fix reaping of stale connections · 60a596da

由 Michael S. Tsirkin 提交于 3月 22, 2007

The sense of the time_after_eq() test in ipoib_cm_stale_task() is
reversed so that only non-stale connections are reaped.  Fix this by
changing to time_before_eq().

Noticed by Pradeep Satyanarayana <pradeep@us.ibm.com>.
Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

60a596da

21 2月, 2007 1 次提交

IPoIB/cm: Improve small message bandwidth · 1812063b

由 Michael S. Tsirkin 提交于 2月 20, 2007

Avoid the overhead of freeing/reallocating and mapping/unmapping for
DMA pages that have not been written to by hardware.
Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

1812063b

17 2月, 2007 2 次提交

IPoIB: CM error handling thinko fix · 8a2e65f8

由 Michael S. Tsirkin 提交于 2月 16, 2007

ipoib_cm_alloc_rx_skb() might be called from IRQ context, so it must
use dev_kfree_skb_any(), not kfree_skb().
Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

8a2e65f8

IPoIB: Only allow root to change between datagram and connected mode · 551fd612

由 Roland Dreier 提交于 2月 16, 2007

Change the permissions of the "mode" sysfs attribute to be S_IWUSR
instead of S_IWUGO.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

551fd612

11 2月, 2007 1 次提交

IPoIB: Connected mode experimental support · 839fcaba

由 Michael S. Tsirkin 提交于 2月 05, 2007

The following patch adds experimental support for IPoIB connected
mode, as defined by the draft from the IETF ipoib working group.  The
idea is to increase performance by increasing the MTU from the maximum
of 2K (theoretically 4K) supported by IPoIB on top of UD.  With this
code, I'm able to get 800MByte/sec or more with netperf without
options on a Mellanox 4x back-to-back DDR system.

Some notes on code:
1. SRQ is used for scalability to large cluster sizes
2. Only RC connections are used (UC does not support SRQ now)
3. Retry count is set to 0 since spec draft warns against retries
4. Each connection is used for data transfers in only 1 direction, so
   each connection is either active(TX) or passive (RX).  2 sides that
   want to communicate create 2 connections.
5. Each active (TX) connection has a separate CQ for send completions -
   this keeps the code simple without CQ resize and other tricks
6. To detect stale passive side connections (where the remote side is
   down), we keep an LRU list of passive connections (updated once per
   second per connection) and destroy a connection after it has been
   unused for several seconds. The LRU rule makes it possible to avoid
   scanning connections that have recently been active.
Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

839fcaba

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功