提交 · f6bccf695431da0e9bd773550ae91b8cb9ffb227 · openeuler / raspberrypi-kernel

01 10月, 2008 1 次提交

IPoIB: Use netif_tx_lock() and get rid of private tx_lock, LLTX · 943c246e

由 Roland Dreier 提交于 9月 30, 2008

Currently, IPoIB is an LLTX driver that uses its own IRQ-disabling
tx_lock.  Not only do we want to get rid of LLTX, this actually causes
problems because of the skb_orphan() done with this tx_lock held: some
skb destructors expect to be run with interrupts enabled.

The simplest fix for this is to get rid of the driver-private tx_lock
and stop using LLTX.  We kill off priv->tx_lock and use
netif_tx_lock[_bh]() instead; the patch to do this is a tiny bit
tricky because we need to update places that take priv->lock inside
the tx_lock to disable IRQs, rather than relying on tx_lock having
already disabled IRQs.

Also, there are a couple of places where we need to disable BHs to
make sure we have a consistent context to call netif_tx_lock() (since
we no longer can use _irqsave() variants), and we also have to change
ipoib_send_comp_handler() to call drain_tx_cq() through a timer rather
than directly, because ipoib_send_comp_handler() runs in interrupt
context and drain_tx_cq() must run in BH context so it can call
netif_tx_lock().
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

943c246e

26 9月, 2008 2 次提交

IPoIB: Fix crash when path record fails after path flush · 6ef190cc

由 Roland Dreier 提交于 9月 25, 2008

Commit ee1e2c82 ("IPoIB: Refresh paths instead of flushing them on SM
change events") changed how paths are flushed on an SM event.  This
change introduces a problem if the path record query triggered by
fails, causing path->ah to become NULL.  A later successful path query
will then trigger WARN_ON() in path_rec_completion(), and crash
because path->ah has already been freed, so the ipoib_put_ah() inside
the lock in path_rec_completion() may actually drop the last reference
(contrary to the comment that claims this is safe).

Fix this by updating path->ah and freeing old_ah only when the path
record query is successful.  This prevents the neighbour AH and that
path AH from getting out of sync.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1194>
Reported-by: NRabah Salem <ravah@mellanox.com>
Debugged-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6ef190cc

IPoIB: Fix crash when path record fails after path flush · c9da4bad

由 Roland Dreier 提交于 9月 25, 2008

Commit ee1e2c82 ("IPoIB: Refresh paths instead of flushing them on SM
change events") changed how paths are flushed on an SM event.  This
change introduces a problem if the path record query triggered by
fails, causing path->ah to become NULL.  A later successful path query
will then trigger WARN_ON() in path_rec_completion(), and crash
because path->ah has already been freed, so the ipoib_put_ah() inside
the lock in path_rec_completion() may actually drop the last reference
(contrary to the comment that claims this is safe).

Fix this by updating path->ah and freeing old_ah only when the path
record query is successful.  This prevents the neighbour AH and that
path AH from getting out of sync.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1194>
Reported-by: NRabah Salem <ravah@mellanox.com>
Debugged-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

c9da4bad

17 9月, 2008 1 次提交

IPoIB: Fix deadlock on RTNL between bcast join comp and ipoib_stop() · e8224e4b

由 Yossi Etigin 提交于 9月 16, 2008

Taking rtnl_lock in ipoib_mcast_join_complete() causes a deadlock with
ipoib_stop(). We avoid it by scheduling the piece of code that takes
the lock on ipoib_workqueue instead of executing it directly. This
works because we only flush the ipoib_workqueue with the RTNL not held.

The deadlock happens because ipoib_stop() calls ipoib_ib_dev_down()
which calls ipoib_mcast_dev_flush(), which calls ipoib_mcast_free(),
which calls ipoib_mcast_leave(). The latter calls
ib_sa_free_multicast(), and this waits until the multicast completion
handler finishes. This handler is ipoib_mcast_join_complete(), which
waits for the rtnl_lock(), which was already taken by ipoib_stop().

This bug was introduced in commit a77a57a1 ("IPoIB: Fix deadlock on
RTNL in ipoib_stop()").
Signed-off-by: NYossi Etigin <yosefe@voltaire.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

e8224e4b

20 8月, 2008 1 次提交

IPoIB: Fix deadlock on RTNL in ipoib_stop() · a77a57a1

由 Roland Dreier 提交于 8月 19, 2008

Commit c8c2afe3 ("IPoIB: Use rtnl lock/unlock when changing device
flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which
is run from the ipoib_workqueue.  However, ipoib_stop() (which is run
inside rtnl_lock()) flushes this workqueue, which leads to a deadlock
if the join task is pending.

Fix this by simply not flushing the workqueue from ipoib_stop().  It
turns out that we really don't care about workqueue tasks running
during or after ipoib_stop(), as long as we make sure to flush the
workqueue before unregistering a netdev.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1114>.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

a77a57a1

09 8月, 2008 1 次提交

IPoIB/cm: Use vmalloc() to allocate rx_rings · b1404069

由 David J. Wilder 提交于 8月 08, 2008

There are users that are running UDP applications that require a large
receive queue size in order to get good performance.  To prevent
allocation failures for rx_rings when using non-SRQ mode and large
recv_queue_size (1K or larger), use vmalloc() instead of kcalloc() to
alocate rx_rings.
Signed-off-by: NDavid Wilder <dwilder@us.ibm.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

b1404069

30 7月, 2008 1 次提交

IPoIB/cm: Set correct SG list in ipoib_cm_init_rx_wr() · e0819816

由 Roland Dreier 提交于 7月 30, 2008

wr->sg_list should be set to the sge pointer passed in, not
priv->cm.rx_sge.
Reported-by: NHoang-Nam Nguyen <HNGUYEN@de.ibm.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

e0819816

25 7月, 2008 2 次提交

IPoIB: Correct help text for INFINIBAND_IPOIB_DEBUG · 99059224

由 Roland Dreier 提交于 7月 24, 2008

The help text for INFINIBAND_IPOIB_DEBUG refers to "ipoib_debugfs,"
which no longer exists.  Correct this to talk about the files under
debugfs that are really created.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

99059224

IPoIB/cm: Connected mode is no longer EXPERIMENTAL · 99c3a5a9

由 Roland Dreier 提交于 7月 24, 2008

Connected mode is now tested and used by lots of people.  No need to
hide it under CONFIG_EXPERIMENTAL.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

99c3a5a9

23 7月, 2008 1 次提交

IPoIB: Include err code in trace message for ib_sa_path_rec_get() failures · 01b3fc8b

由 Or Gerlitz 提交于 7月 22, 2008

Print the return code of ib_sa_path_rec_get() if it fails to help
debug errors.
Signed-off-by: NOr Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

01b3fc8b

15 7月, 2008 16 次提交

netdev: Do not use TX lock to protect address lists. · b9e40857

由 David S. Miller 提交于 7月 15, 2008

Now that we have a specific lock to protect the network
device unicast and multicast lists, remove extraneous
grabs of the TX lock in cases where the code only needs
address list protection.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b9e40857

netdev: Add netdev->addr_list_lock protection. · e308a5d8

由 David S. Miller 提交于 7月 15, 2008

Add netif_addr_{lock,unlock}{,_bh}() helpers.

Use them to protect operations that operate on or read
the network device unicast and multicast address lists.

Also use them in cases where the code simply wants to
block calls into the driver's ->set_rx_mode() and
->set_multicast_list() methods.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e308a5d8

IPoIB: Double default RX/TX ring sizes · bc3a290b

由 Eli Cohen 提交于 7月 14, 2008

Increase IPoIB ring sizes to twice their original sizes (RX: 128->256,
TX: 64->128) to act as a shock absorber for high traffic peaks. With
the current settings, we have seen cases that there are many calls to
netif_stop_queue(), which causes degradation in throughput. Also,
larger receive buffer sizes help IPoIB in CM mode to avoid experiencing
RNR NAK conditions due to insufficient receive buffers at the SRQ.
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

bc3a290b

IPoIB/cm: Reduce connected mode TX object size · e112373f

由 Eli Cohen 提交于 7月 14, 2008

Since IPoIB connected mode does not NETIF_F_SG, we only have one DMA
mapping per send, so we don't need a mapping[] array.  Define a new
struct with a single u64 mapping member and use it for the CM tx_ring.
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

e112373f

IPoIB: Use dev_set_mtu() to change mtu · bd360671

由 Eli Cohen 提交于 7月 14, 2008

When the driver sets the MTU of the net device outside of its
change_mtu method, it should make use of dev_set_mtu() instead of
directly setting the mtu field of struct netdevice.  Otherwise
functions registered to be called upon MTU change will not get called
(this is done through call_netdevice_notifiers() in dev_set_mtu()).
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

bd360671

IPoIB: Use rtnl lock/unlock when changing device flags · c8c2afe3

由 Eli Cohen 提交于 7月 14, 2008

Use of this lock is required to synchronize changes to the netdvice's
data structs.  Also move the call to ipoib_flush_paths() after the
modification of the netdevice flags in set_mode().
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

c8c2afe3

IPoIB: Get rid of ipoib_mcast_detach() wrapper · 9eae554c

由 Roland Dreier 提交于 7月 14, 2008

ipoib_mcast_detach() does nothing except call ib_detach_mcast(), so just
use the core API in the one place that does a multicast group detach.

add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-105 (-105)
function old new delta
ipoib_mcast_leave 357 319 -38
ipoib_mcast_detach 67 - -67
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

9eae554c

IPoIB: Only set Q_Key once: after joining broadcast group · d0de1362

由 Eli Cohen 提交于 7月 14, 2008

The current code will set the Q_Key for any join of a non-sendonly
multicast group.  The operation involves a modify QP operation, which
is fairly heavyweight, and is only really required after the join of
the broadcast group.  Fix this by adding a parameter to ipoib_mcast_attach()
to control when the Q_Key is set.
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

d0de1362

IPoIB: Remove priv->mcast_mutex · 5892eff9

由 Eli Cohen 提交于 7月 14, 2008

No need for a mutex around calls to ib_attach_mcast/ib_detach_mcast
since these operations are synchronized at the HW driver layer.
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

5892eff9

IPoIB: Remove unused IPOIB_MCAST_STARTED code · c03d4731

由 Eli Cohen 提交于 7月 14, 2008

The IPOIB_MCAST_STARTED flag is not used at all since commit b3e2749b
("IPoIB: Don't drop multicast sends when they can be queued"), so
remove it.
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

c03d4731

IPoIB: Refresh paths instead of flushing them on SM change events · ee1e2c82

由 Moni Shoua 提交于 7月 14, 2008

The patch tries to solve the problem of device going down and paths being
flushed on an SM change event. The method is to mark the paths as candidates for
refresh (by setting the new valid flag to 0), and wait for an ARP
probe a new path record query.

The solution requires a different and less intrusive handling of SM
change event. For that, the second argument of the flush function
changes its meaning from a boolean flag to a level. In most cases, SM
failover doesn't cause LID change so traffic won't stop. In the rare
cases of LID change, the remote host (the one that hadn't changed its
LID) will lose connectivity until paths are refreshed. This is no
worse than the current state. In fact, preventing the device from
going down saves packets that otherwise would be lost.
Signed-off-by: NMoni Levy <monil@voltaire.com>
Signed-off-by: NMoni Shoua <monis@voltaire.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

ee1e2c82

IPoIB: add LRO support · af40da89

由 Vladimir Sokolovsky 提交于 7月 14, 2008

Add "ipoib_use_lro" module parameter to enable LRO and an
"ipoib_lro_max_aggr" module parameter to set the max number of packets
to be aggregated.  Make LRO controllable and LRO statistics accessible
through ethtool.
Signed-off-by: NVladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

af40da89

IPoIB: Use multicast loopback blocking if available · 12406734

由 Ron Livne 提交于 7月 14, 2008

Set IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK for IPoIB's UD QPs if
supported by the underlying device.  This creates an improvement of up
to 39% in bandwidth when sending multicast packets with IPoIB, and an
improvment of 12% in cpu usage.
Signed-off-by: NRon Livne <ronli@voltaire.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

12406734

IPoIB/cm: Fix racy use of receive WR/SGL in ipoib_cm_post_receive_nonsrq() · a7d834c4

由 Roland Dreier 提交于 7月 14, 2008

For devices that don't support SRQs, ipoib_cm_post_receive_nonsrq() is
called from both ipoib_cm_handle_rx_wc() and ipoib_cm_nonsrq_init_rx(),
and these two callers are not synchronized against each other.
However, ipoib_cm_post_receive_nonsrq() always reuses the same receive
work request and scatter list structures, so multiple callers can end
up stepping on each other, which leads to posting garbled work
requests.

Fix this by having the caller pass in the ib_recv_wr and ib_sge
structures to use, and allocating new local structures in
ipoib_cm_nonsrq_init_rx().

Based on a patch by Pradeep Satyanarayana <pradeep@us.ibm.com> and
David Wilder <dwilder@us.ibm.com>, with debugging help from Hoang-Nam
Nguyen <hnguyen@de.ibm.com>.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

a7d834c4

IPoIB: Copy small received SKBs in connected mode · f89271da

由 Eli Cohen 提交于 7月 14, 2008

The connected mode implementation in the IPoIB driver has a large
overhead in the way SKBs are handled in the receive flow.  It usually
allocates an SKB with as big as was used in the currently received SKB
and moves unused fragments from the old SKB to the new one. This
involves a loop on all the remaining fragments and incurs overhead on
the CPU.  This patch, for small SKBs, allocates an SKB just large
enough to contain the received data and copies to it the data from the
received SKB.  The newly allocated SKB is passed to the stack and the
old SKB is reposted.

When running netperf, UDP small messages, without this pach I get:

    UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
    14.4.3.178 (14.4.3.178) port 0 AF_INET
    Socket  Message  Elapsed      Messages
    Size    Size     Time         Okay Errors   Throughput
    bytes   bytes    secs            #      #   10^6bits/sec

    114688     128   10.00     5142034      0     526.31
    114688           10.00     1130489            115.71

With this patch I get both send and receive at ~315 mbps.

The reason that send performance actually slows down is as follows:
When using this patch, the overhead of the CPU for handling RX packets
is dramatically reduced.  As a result, we do not experience RNR NAK
messages from the receiver which cause the connection to be closed and
reopened again; when the patch is not used, the receiver cannot handle
the packets fast enough so there is less time to post new buffers and
hence the mentioned RNR NACKs.  So what happens is that the
application *thinks* it posted a certain number of packets for
transmission but these packets are flushed and do not really get
transmitted.  Since the connection gets opened and closed many times,
each time netperf gets the CPU time that otherwise would have been
given to IPoIB to actually transmit the packets.  This can be verified
when looking at the port counters -- the output of ifconfig and the
oputput of netperf (this is for the case without the patch):

    tx packets
    ==========
    port counter:   1,543,996
    ifconfig:       1,581,426
    netperf:        5,142,034

    rx packets
    ==========
    netperf         1,1304,089
Signed-off-by: NEli Cohen <eli@mellanox.co.il>

f89271da

RDMA: Remove subversion $Id tags · f3781d2e

由 Roland Dreier 提交于 7月 14, 2008

They don't get updated by git and so they're worse than useless.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

f3781d2e

21 5月, 2008 1 次提交

IPoIB: Test for NULL broadcast object in ipiob_mcast_join_finish() · e1d50dce

由 Jack Morgenstein 提交于 5月 20, 2008

We saw a kernel oops in our regression testing when a multicast "join
finish" occurred just after the interface was -- this is
<https://bugs.openfabrics.org/show_bug.cgi?id=1040>.  The test
randomly causes the HCA physical port to go down then up.

The cause of this is that ipoib_mcast_join_finish() processing happen
just after ipoib_mcast_dev_flush() was invoked (in which case the
broadcast pointer is NULL).  This patch tests for and handles the case
where priv->broadcast is NULL.

Cc: <stable@kernel.org>
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

e1d50dce

01 5月, 2008 1 次提交

IB/ipoib: Fix transmit queue stalling forever · 57ce41d1

由 Eli Cohen 提交于 4月 30, 2008

Commit f56bcd80 ("IPoIB: Use separate CQ for UD send completions")
introduced a bug where the transmit queue could get stopped and never
woken up. The problem is that send completions are only polled at the
end of the xmit function, so if the send queue fills up and the xmit
path stops the queue, then there is no way for send completions to
ever get polled, and so the transmit queue stays stopped forever.

Fix this by arming the send CQ just before posting the last send
request that fills the send queue. Then, when the completion event
handler is called, drain the send CQ. Since it is possible that not
enough send completions are in the CQ, verify that the the net queue
has been woken up after draining the send CQ, and if not arm a timer
and drain again at the timer function.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

57ce41d1

30 4月, 2008 2 次提交

IPoIB: Copy child MTU from parent · b4132efa

由 Eli Cohen 提交于 4月 29, 2008

When creating a child interface, copy the MTU information from the
parent.  Otherwise when the child's multicast join completes, the MTU
will not be updated since the code does

	dev->mtu = min(priv->mcast_mtu, priv->admin_mtu);

and priv->admin_mtu will be set to 0.
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

b4132efa

IPoIB: Use separate CQ for UD send completions · f56bcd80

由 Eli Cohen 提交于 4月 29, 2008

Use a dedicated CQ for UD send completions. Also, do not arm the UD
send CQ, which reduces the number of interrupts generated. This patch
farther reduces overhead by not calling poll CQ for every posted send
WR -- it does polls only when there 16 or more outstanding work requests.
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

f56bcd80

24 4月, 2008 1 次提交

IPoIB: Handle 4K IB MTU for UD (datagram) mode · bc7b3a36

由 Shirley Ma 提交于 4月 23, 2008

This patch enables IPoIB to use 4K UD messages (when the underlying
device and fabrics support a 4K MTU) by using two scatter buffers when
PAGE_SIZE is less than or equal to thhe HCA IB MTU size.  The first
buffer is for IPoIB header + GRH header, and the second buffer is the
IPoIB payload, which is 4K-4.
Signed-off-by: NShirley Ma <xma@us.ibm.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

bc7b3a36

17 4月, 2008 6 次提交

IPoIB: Handle case when P_Key is deleted and re-added at same index · 9fdd5e5b

由 Roland Dreier 提交于 4月 16, 2008

If a P_Key is deleted and then re-added at the same index, then IPoIB
gets confused because __ipoib_ib_dev_flush() only checks whether the
index is the same without checking whether the P_Key was present, so
the interface is stopped when the P_Key is deleted, but the event when
the P_Key is re-added gets ignored and the interface never gets
restarted.

Also, switch to using ib_find_pkey() instead of ib_find_cached_pkey()
everywhere in IPoIB, since none of the places that look for P_Keys are
in a fast path or in non-sleeping context, and in general we want to
kill off the whole caching infrastructure eventually.  This also fixes
consistency problems caused because some IPoIB queries were cached and
some were uncached during the window where the cache was not updated.

Thanks to Venkata Subramonyam <vsubramo@cisco.com> for debugging this
problem and testing this fix.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

9fdd5e5b

IPoIB: Support modifying IPoIB CQ event moderation · 28d52b3c

由 Eli Cohen 提交于 4月 16, 2008

This can be used to tune at run time the parameters controlling the
event (interrupt) generation rate and thus reduce the overhead
incurred by handling interrupts resulting in better throughput.  Since
IPoIB uses a single CQ for both RX and TX, RX is chosen to dictate
configuration for both RX and TX.
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

28d52b3c

IPoIB: Add basic ethtool support · 82c24c18

由 Eli Cohen 提交于 4月 16, 2008

Just add the infrastructure so we can add functionality later.
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

82c24c18

IPoIB: Add LSO support · 40ca1988

由 Eli Cohen 提交于 4月 16, 2008

For HCAs that support TCP segmentation offload (IB_DEVICE_UD_TSO), set
NETIF_F_TSO and use HW LSO to offload TCP segmentation.
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

40ca1988

IB: Use shorter list_splice_init() for brevity · 157de229

由 Robert P. J. Day 提交于 4月 16, 2008

Convert list_splice() + INIT_LIST_HEAD() to the equivalent list_splice_init()
Signed-off-by: NRobert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

157de229

IPoIB: Use checksum offload support if available · 6046136c

由 Eli Cohen 提交于 4月 16, 2008

For HCAs that support checksum offload (ie that set IB_DEVICE_UD_IP_CSUM
in the device capabilities flags), have IPoIB set NETIF_F_IP_CSUM and
use the HCA to generate and verify IP checksums.
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

6046136c

12 3月, 2008 3 次提交

IPoIB: Allocate priv->tx_ring with vmalloc() · 10313cbb

由 Roland Dreier 提交于 3月 12, 2008

Commit 7143740d ("IPoIB: Add send gather support") made struct
ipoib_tx_buf significantly larger, since the mapping member changed
from a single u64 to an array with MAX_SKB_FRAGS + 1 entries. This
means that allocating tx_rings with kzalloc() may fail because there
is not enough contiguous memory for the new, much bigger size. Fix
this regression by allocating the rings with vmalloc() instead.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

10313cbb

IPoIB/cm: Set tx_wr.num_sge in connected mode post_send() · 4200406b

由 Roland Dreier 提交于 3月 11, 2008

Commit 7143740d ("IPoIB: Add send gather support") made it possible
for tx_wr.num_sge to be != 1 -- this happens if send gather support is
enabled. However, the code in the connected mode post_send() function
assumes the old invariant, namely that tx_wr.num_sge is always 1. Fix
this by explicitly setting tx_wr.num_sge to 1 in the CM post_send().
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

4200406b

IPoIB: Don't drop multicast sends when they can be queued · b3e2749b

由 Or Gerlitz 提交于 3月 11, 2008

When set_multicast_list() is called the multicast task is restarted
and the IPOIB_MCAST_STARTED bit is cleared.  As a result for some
window of time, multicast packets are not transmitted nor queued but
rather dropped by ipoib_mcast_send().  These dropped packets are
painful in two cases:

 - bonding fail-over which both calls set_multicast_list() on the new
   active slave and sends Gratuitous ARP through that slave.

 - IP_DROP_MEMBERSHIP code which both calls set_multicast_list() on the
   device and issues IGMP leave.

In both these cases, depending on the scheduling of the IPoIB
multicast task, the packets would be dropped.  As a result, in the
bonding case, the failover would not be detected by the peers until
their neighbour is renewed the neighbour (which takes a few tens of
seconds).  In the IGMP case, the IP router doesn't get an IGMP leave
and would only learn on that from further probes on the group (also a
delay of at least a few tens of seconds).

Fix this by allowing transmission (or queuing) depending on the
IPOIB_FLAG_OPER_UP flag instead of the IPOIB_MCAST_STARTED flag.
Signed-off-by: NOlga Shern <olgas@voltaire.com>
Signed-off-by: NOr Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

b3e2749b