提交 · 7e5a90c25f89128c096dbdb0e5451962438b1e05 · openeuler / Kernel

06 2月, 2013 1 次提交

IPoIB: Fix crash due to skb double destruct · 7e5a90c2

由 Shlomo Pongratz 提交于 2月 04, 2013

After commit b13912bb ("IPoIB: Call skb_dst_drop() once skb is
enqueued for sending"), using connected mode and running multithreaded
iperf for long time, ie

    iperf -c <IP> -P 16 -t 3600

results in a crash.

After the above-mentioned patch, the driver is calling skb_orphan() and
skb_dst_drop() after calling post_send() in ipoib_cm.c::ipoib_cm_send()
(also in ipoib_ib.c::ipoib_send())

The problem with this is, as is written in a comment in both routines,
"it's entirely possible that the completion handler will run before we
execute anything after the post_send()."  This leads to running the
skb cleanup routines simultaneously in two different contexts.

The solution is to always perform the skb_orphan() and skb_dst_drop()
before queueing the send work request.  If an error occurs, then it
will be no different than the regular case where dev_free_skb_any() in
the completion path, which is assumed to be after these two routines.
Signed-off-by: NShlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

7e5a90c2

20 12月, 2012 1 次提交

IPoIB: Call skb_dst_drop() once skb is enqueued for sending · b13912bb

由 Roland Dreier 提交于 12月 19, 2012

Currently, IPoIB delays collecting send completions for TX packets in
order to batch work more efficiently.  It does skb_orphan() right after
queuing the packets so that destructors run early, to avoid problems
like holding socket send buffers for too long (since we might not
collect a send completion until a long time after the packet is
actually sent).

However, IPoIB clears IFF_XMIT_DST_RELEASE because it actually looks
at skb_dst() to update the PMTU when it gets a too-long packet.  This
means that the packets sitting in the TX ring with uncollected send
completions are holding a reference on the dst.  We've seen this lead
to pathological behavior with respect to route and neighbour GC.  The
easy fix for this is to call skb_dst_drop() when we call skb_orphan().

Also, give packets sent via connected mode (CM) the same skb_orphan()
/ skb_dst_drop() treatment that packets sent via datagram mode get.
Signed-off-by: NRoland Dreier <roland@purestorage.com>

b13912bb

03 10月, 2012 1 次提交

IPoIB: Fix build with CONFIG_INFINIBAND_IPOIB_CM=n · 71d9c5f9

由 Roland Dreier 提交于 10月 02, 2012

With the new netlink support in commit 862096a8 ("IB/ipoib: Add more
rtnl_link_ops callbacks") we need ipoib_set_mode() to be available even
if connected mode isn't built.  Move the function from ipoib_cm.c to
ipoib_main.c (and make a few CM-related macros available unconditonally).

This fixes the build error

    drivers/built-in.o: In function 'ipoib_changelink':
    ipoib_netlink.c:(.text+0x6a5fc9): undefined reference to 'ipoib_set_mode'
    ipoib_netlink.c:(.text+0x6a5fe3): undefined reference to 'ipoib_set_mode'

when CONFIG_INFINIBAND_IPOIB_CM isn't set.
Reported-by: NRandy Dunlap <rdunlap@xenotime.net>
Reported-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

71d9c5f9

02 10月, 2012 1 次提交

IB/ipoib: Add more rtnl_link_ops callbacks · 862096a8

由 Or Gerlitz 提交于 9月 27, 2012

Add the rtnl_link_ops changelink and fill_info callbacks, through
which the admin can now set/get the driver mode, etc policies.
Maintain the proprietary sysfs entries only for legacy childs.

For child devices, set dev->iflink to point to the parent
device ifindex, such that user space tools can now correctly
show the uplink relation as done for vlan, macvlan, etc
devices. Pointed out by Patrick McHardy <kaber@trash.net>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

862096a8

15 8月, 2012 1 次提交

IB/ipoib: Add missing locking when CM object is deleted · fa16ebed

由 Shlomo Pongratz 提交于 8月 13, 2012

Commit b63b70d8 ("IPoIB: Use a private hash table for path lookup
in xmit path") introduced a bug where in ipoib_cm_destroy_tx() a CM
object is moved between lists without any supported locking.  Under a
stress test, this eventually leads to list corruption and a crash.

Previously when this routine was called, callers were taking the
device priv lock.  Currently this function is called from the RCU
callback associated with neighbour deletion.  Fix the race by taking
the same lock we used to before.
Signed-off-by: NShlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

fa16ebed

30 7月, 2012 1 次提交

IPoIB: Use a private hash table for path lookup in xmit path · b63b70d8

由 Shlomo Pongratz 提交于 7月 24, 2012

Dave Miller <davem@davemloft.net> provided a detailed description of
why the way IPoIB is using neighbours for its own ipoib_neigh struct
is buggy:

    Any time an ipoib_neigh is changed, a sequence like the following is made:

    			spin_lock_irqsave(&priv->lock, flags);
    			/*
    			 * It's safe to call ipoib_put_ah() inside
    			 * priv->lock here, because we know that
    			 * path->ah will always hold one more reference,
    			 * so ipoib_put_ah() will never do more than
    			 * decrement the ref count.
    			 */
    			if (neigh->ah)
    				ipoib_put_ah(neigh->ah);
    			list_del(&neigh->list);
    			ipoib_neigh_free(dev, neigh);
    			spin_unlock_irqrestore(&priv->lock, flags);
    			ipoib_path_lookup(skb, n, dev);

    This doesn't work, because you're leaving a stale pointer to the freed up
    ipoib_neigh in the special neigh->ha pointer cookie.  Yes, it even fails
    with all the locking done to protect _changes_ to *ipoib_neigh(n), and
    with the code in ipoib_neigh_free() that NULLs out the pointer.

    The core issue is that read side calls to *to_ipoib_neigh(n) are not
    being synchronized at all, they are performed without any locking.  So
    whether we hold the lock or not when making changes to *ipoib_neigh(n)
    you still can have threads see references to freed up ipoib_neigh
    objects.

    	cpu 1			cpu 2
    	n = *ipoib_neigh()
    				*ipoib_neigh() = NULL
    				kfree(n)
    	n->foo == OOPS

    [..]

    Perhaps the ipoib code can have a private path database it manages
    entirely itself, which holds all the necessary information and is
    looked up by some generic key which is available easily at transmit
    time and does not involve generic neighbour entries.

See <http://marc.info/?l=linux-rdma&m=132812793105624&w=2> and
<http://marc.info/?l=linux-rdma&w=2&r=1&s=allows+references+to+freed+memory&q=b>
for the full discussion.

This patch aims to solve the race conditions found in the IPoIB driver.

The patch removes the connection between the core networking neighbour
structure and the ipoib_neigh structure.  In addition to avoiding the
race described above, it allows us to handle SKBs carrying IP packets
that don't have any associated neighbour.

We add an ipoib_neigh hash table with N buckets where the key is the
destination hardware address.  The ipoib_neigh is fetched from the
hash table and instead of the stashed location in the neighbour
structure. The hash table uses both RCU and reference counting to
guarantee that no ipoib_neigh instance is ever deleted while in use.

Fetching the ipoib_neigh structure instance from the hash also makes
the special code in ipoib_start_xmit that handles remote and local
bonding failover redundant.

Aged ipoib_neigh instances are deleted by a garbage collection task
that runs every M seconds and deletes every ipoib_neigh instance that
was idle for at least 2*M seconds. The deletion is safe since the
ipoib_neigh instances are protected using RCU and reference count
mechanisms.

The number of buckets (N) and frequency of running the GC thread (M),
are taken from the exported arb_tbl.
Signed-off-by: NShlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

b63b70d8

17 7月, 2012 1 次提交

net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}() · 6700c270

由 David S. Miller 提交于 7月 17, 2012

This will be used so that we can compose a full flow key.

Even though we have a route in this context, we need more. In the
future the routes will be without destination address, source address,
etc. keying. One ipv4 route will cover entire subnets, etc.

In this environment we have to have a way to possess persistent storage
for redirects and PMTU information. This persistent storage will exist
in the FIB tables, and that's why we'll need to be able to rebuild a
full lookup flow key here. Using that flow key will do a fib_lookup()
and create/update the persistent entry.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6700c270

09 7月, 2012 1 次提交

IB: Use IS_ENABLED(CONFIG_IPV6) · d90f9b35

由 Roland Dreier 提交于 7月 05, 2012

Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
Signed-off-by: NRoland Dreier <roland@purestorage.com>

d90f9b35

01 11月, 2011 1 次提交

infiniband: add moduleparam.h to drivers/infiniband as required · fec14d2f

由 Paul Gortmaker 提交于 8月 30, 2011

These files were getting the moduleparam infrastructure from the
implicit presence of module.h being everywhere, but that is going
away soon.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

fec14d2f

19 10月, 2011 2 次提交

net: add skb frag size accessors · 9e903e08

由 Eric Dumazet 提交于 10月 18, 2011

To ease skb->truesize sanitization, its better to be able to localize
all references to skb frags size.

Define accessors : skb_frag_size() to fetch frag size, and
skb_frag_size_{set|add|sub}() to manipulate it.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e903e08

IPoIB: Use the right function to do DMA unmap pages · 787adb9d

由 Dotan Barak 提交于 10月 18, 2011

Pages that were mapped using ib_dma_map_page() should be unmapped
using ib_dma_unmap_page().
Signed-off-by: NDotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

787adb9d

14 10月, 2011 1 次提交

RDMA/core: Add SRQ type field · 96104eda

由 Sean Hefty 提交于 5月 23, 2011

Currently, there is only a single ("basic") type of SRQ, but with XRC
support we will add a second.  Prepare for this by defining an SRQ type
and setting all current users to IB_SRQT_BASIC.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

96104eda

27 8月, 2011 1 次提交

IPoIB: convert to SKB paged frag API. · 5581be3b

由 Ian Campbell 提交于 8月 24, 2011

Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: linux-rdma@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5581be3b

20 4月, 2011 1 次提交

net: infiniband/ulp/ipoib: convert to hw_features · 3d96c74d

由 Michał Mirosław 提交于 4月 19, 2011

Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d96c74d

13 1月, 2011 1 次提交
- J
  RDMA: Use vzalloc() to replace vmalloc()+memset(0) · 948579cd
  由 Joe Perches 提交于 11月 05, 2010
```
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>
```
  948579cd
11 1月, 2011 1 次提交

IPoIB: Add GRO support · 8ae31e5b

由 Or Gerlitz 提交于 1月 10, 2011

Signed-off-by: NOr Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

8ae31e5b

30 3月, 2010 1 次提交

include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6

由 Tejun Heo 提交于 3月 24, 2010

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: NTejun Heo <tj@kernel.org>
Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

5a0e3ad6

12 3月, 2010 2 次提交

IPoIB: Include return code in trace message for ib_post_send() failures · a48f509b

由 Or Gerlitz 提交于 3月 04, 2010

Print the return code of ib_post_send() if it fails to make these
debugging messages more useful.
Signed-off-by: NOr Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

a48f509b

IPoIB: Fix TX queue lockup with mixed UD/CM traffic · f0dc117a

由 Eli Cohen 提交于 3月 03, 2010

The IPoIB UD QP reports send completions to priv->send_cq, which is
usually left unarmed; it only gets armed when the number of
outstanding send requests reaches the size of the TX queue. This
arming is done only in the send path for the UD QP.  However, when
sending CM packets, the net queue may be stopped for the same reasons
but no measures are taken to recover the UD path from a lockup.

Consider this scenario: a host sends high rate of both CM and UD
packets, with a TX queue length of N.  If at some time the number of
outstanding UD packets is more than N/2 and the overall outstanding
packets is N-1, and CM sends a packet (making the number of
outstanding sends equal N), the TX queue will be stopped.  When all
the CM packets complete, the number of outstanding packets will still
be higher than N/2 so the TX queue will not be restarted.

Fix this by calling ib_req_notify_cq() when the queue is stopped in
the CM path.
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

f0dc117a

19 2月, 2010 1 次提交

ipv6: drop unused "dev" arg of icmpv6_send() · 3ffe533c

由 Alexey Dobriyan 提交于 2月 18, 2010

Dunno, what was the idea, it wasn't used for a long time.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ffe533c

06 9月, 2009 1 次提交
- R
  IPoIB: Remove unused <rdma/ib_cache.h> includes · cd0bcf4c
  由 Roland Dreier 提交于 9月 05, 2009
```
Signed-off-by: NRoland Dreier <rolandd@cisco.com>
```
  cd0bcf4c
03 9月, 2009 1 次提交

drivers: Kill now superfluous ->last_rx stores · 451f1443

由 Eric Dumazet 提交于 8月 31, 2009

The generic packet receive code takes care of setting
netdev->last_rx when necessary, for the sake of the
bonding ARP monitor.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NNeil Horman <nhorman@txudriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

451f1443

03 6月, 2009 1 次提交

net: skb->dst accessors · adf30907

由 Eric Dumazet 提交于 6月 02, 2009

Define three accessors to get/set dst attached to a skb

struct dst_entry *skb_dst(const struct sk_buff *skb)

void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;

Delete skb->dst field
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

adf30907

19 5月, 2009 1 次提交

net: Fix ipoib rtnl_lock sysfs deadlock. · 26574401

由 Eric W. Biederman 提交于 5月 13, 2009

Network device sysfs files that grab the rtnl_lock unconditionally
will deadlock if accessed when the network device is being
unregistered.  So use trylock and syscall_restart to avoid this
deadlock.
Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

26574401

30 10月, 2008 1 次提交

net: replace %p6 with %pI6 · 5b095d98

由 Harvey Harrison 提交于 10月 29, 2008

Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5b095d98

29 10月, 2008 1 次提交

infiniband: ipoib replace IPOIB_GID_FMT with %p6 · fcace2fe

由 Harvey Harrison 提交于 10月 28, 2008

Replace all uses of IPOIB_GID_FMT, IPOIB_GID_RAW_ARG() and IPOIB_GID_ARG()
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fcace2fe

01 10月, 2008 1 次提交

IPoIB: Use netif_tx_lock() and get rid of private tx_lock, LLTX · 943c246e

由 Roland Dreier 提交于 9月 30, 2008

Currently, IPoIB is an LLTX driver that uses its own IRQ-disabling
tx_lock.  Not only do we want to get rid of LLTX, this actually causes
problems because of the skb_orphan() done with this tx_lock held: some
skb destructors expect to be run with interrupts enabled.

The simplest fix for this is to get rid of the driver-private tx_lock
and stop using LLTX.  We kill off priv->tx_lock and use
netif_tx_lock[_bh]() instead; the patch to do this is a tiny bit
tricky because we need to update places that take priv->lock inside
the tx_lock to disable IRQs, rather than relying on tx_lock having
already disabled IRQs.

Also, there are a couple of places where we need to disable BHs to
make sure we have a consistent context to call netif_tx_lock() (since
we no longer can use _irqsave() variants), and we also have to change
ipoib_send_comp_handler() to call drain_tx_cq() through a timer rather
than directly, because ipoib_send_comp_handler() runs in interrupt
context and drain_tx_cq() must run in BH context so it can call
netif_tx_lock().
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

943c246e

09 8月, 2008 1 次提交

IPoIB/cm: Use vmalloc() to allocate rx_rings · b1404069

由 David J. Wilder 提交于 8月 08, 2008

There are users that are running UDP applications that require a large
receive queue size in order to get good performance.  To prevent
allocation failures for rx_rings when using non-SRQ mode and large
recv_queue_size (1K or larger), use vmalloc() instead of kcalloc() to
alocate rx_rings.
Signed-off-by: NDavid Wilder <dwilder@us.ibm.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

b1404069

30 7月, 2008 1 次提交

IPoIB/cm: Set correct SG list in ipoib_cm_init_rx_wr() · e0819816

由 Roland Dreier 提交于 7月 30, 2008

wr->sg_list should be set to the sge pointer passed in, not
priv->cm.rx_sge.
Reported-by: NHoang-Nam Nguyen <HNGUYEN@de.ibm.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

e0819816

15 7月, 2008 6 次提交

IPoIB/cm: Reduce connected mode TX object size · e112373f

由 Eli Cohen 提交于 7月 14, 2008

Since IPoIB connected mode does not NETIF_F_SG, we only have one DMA
mapping per send, so we don't need a mapping[] array.  Define a new
struct with a single u64 mapping member and use it for the CM tx_ring.
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

e112373f

IPoIB: Use dev_set_mtu() to change mtu · bd360671

由 Eli Cohen 提交于 7月 14, 2008

When the driver sets the MTU of the net device outside of its
change_mtu method, it should make use of dev_set_mtu() instead of
directly setting the mtu field of struct netdevice.  Otherwise
functions registered to be called upon MTU change will not get called
(this is done through call_netdevice_notifiers() in dev_set_mtu()).
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

bd360671

IPoIB: Use rtnl lock/unlock when changing device flags · c8c2afe3

由 Eli Cohen 提交于 7月 14, 2008

Use of this lock is required to synchronize changes to the netdvice's
data structs.  Also move the call to ipoib_flush_paths() after the
modification of the netdevice flags in set_mode().
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

c8c2afe3

IPoIB/cm: Fix racy use of receive WR/SGL in ipoib_cm_post_receive_nonsrq() · a7d834c4

由 Roland Dreier 提交于 7月 14, 2008

For devices that don't support SRQs, ipoib_cm_post_receive_nonsrq() is
called from both ipoib_cm_handle_rx_wc() and ipoib_cm_nonsrq_init_rx(),
and these two callers are not synchronized against each other.
However, ipoib_cm_post_receive_nonsrq() always reuses the same receive
work request and scatter list structures, so multiple callers can end
up stepping on each other, which leads to posting garbled work
requests.

Fix this by having the caller pass in the ib_recv_wr and ib_sge
structures to use, and allocating new local structures in
ipoib_cm_nonsrq_init_rx().

Based on a patch by Pradeep Satyanarayana <pradeep@us.ibm.com> and
David Wilder <dwilder@us.ibm.com>, with debugging help from Hoang-Nam
Nguyen <hnguyen@de.ibm.com>.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

a7d834c4

IPoIB: Copy small received SKBs in connected mode · f89271da

由 Eli Cohen 提交于 7月 14, 2008

The connected mode implementation in the IPoIB driver has a large
overhead in the way SKBs are handled in the receive flow.  It usually
allocates an SKB with as big as was used in the currently received SKB
and moves unused fragments from the old SKB to the new one. This
involves a loop on all the remaining fragments and incurs overhead on
the CPU.  This patch, for small SKBs, allocates an SKB just large
enough to contain the received data and copies to it the data from the
received SKB.  The newly allocated SKB is passed to the stack and the
old SKB is reposted.

When running netperf, UDP small messages, without this pach I get:

    UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
    14.4.3.178 (14.4.3.178) port 0 AF_INET
    Socket  Message  Elapsed      Messages
    Size    Size     Time         Okay Errors   Throughput
    bytes   bytes    secs            #      #   10^6bits/sec

    114688     128   10.00     5142034      0     526.31
    114688           10.00     1130489            115.71

With this patch I get both send and receive at ~315 mbps.

The reason that send performance actually slows down is as follows:
When using this patch, the overhead of the CPU for handling RX packets
is dramatically reduced.  As a result, we do not experience RNR NAK
messages from the receiver which cause the connection to be closed and
reopened again; when the patch is not used, the receiver cannot handle
the packets fast enough so there is less time to post new buffers and
hence the mentioned RNR NACKs.  So what happens is that the
application *thinks* it posted a certain number of packets for
transmission but these packets are flushed and do not really get
transmitted.  Since the connection gets opened and closed many times,
each time netperf gets the CPU time that otherwise would have been
given to IPoIB to actually transmit the packets.  This can be verified
when looking at the port counters -- the output of ifconfig and the
oputput of netperf (this is for the case without the patch):

    tx packets
    ==========
    port counter:   1,543,996
    ifconfig:       1,581,426
    netperf:        5,142,034

    rx packets
    ==========
    netperf         1,1304,089
Signed-off-by: NEli Cohen <eli@mellanox.co.il>

f89271da

RDMA: Remove subversion $Id tags · f3781d2e

由 Roland Dreier 提交于 7月 14, 2008

They don't get updated by git and so they're worse than useless.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

f3781d2e

30 4月, 2008 1 次提交

IPoIB: Use separate CQ for UD send completions · f56bcd80

由 Eli Cohen 提交于 4月 29, 2008

Use a dedicated CQ for UD send completions. Also, do not arm the UD
send CQ, which reduces the number of interrupts generated. This patch
farther reduces overhead by not calling poll CQ for every posted send
WR -- it does polls only when there 16 or more outstanding work requests.
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

f56bcd80

17 4月, 2008 3 次提交

IPoIB: Handle case when P_Key is deleted and re-added at same index · 9fdd5e5b

由 Roland Dreier 提交于 4月 16, 2008

If a P_Key is deleted and then re-added at the same index, then IPoIB
gets confused because __ipoib_ib_dev_flush() only checks whether the
index is the same without checking whether the P_Key was present, so
the interface is stopped when the P_Key is deleted, but the event when
the P_Key is re-added gets ignored and the interface never gets
restarted.

Also, switch to using ib_find_pkey() instead of ib_find_cached_pkey()
everywhere in IPoIB, since none of the places that look for P_Keys are
in a fast path or in non-sleeping context, and in general we want to
kill off the whole caching infrastructure eventually.  This also fixes
consistency problems caused because some IPoIB queries were cached and
some were uncached during the window where the cache was not updated.

Thanks to Venkata Subramonyam <vsubramo@cisco.com> for debugging this
problem and testing this fix.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

9fdd5e5b

IPoIB: Add LSO support · 40ca1988

由 Eli Cohen 提交于 4月 16, 2008

For HCAs that support TCP segmentation offload (IB_DEVICE_UD_TSO), set
NETIF_F_TSO and use HW LSO to offload TCP segmentation.
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

40ca1988

IPoIB: Use checksum offload support if available · 6046136c

由 Eli Cohen 提交于 4月 16, 2008

For HCAs that support checksum offload (ie that set IB_DEVICE_UD_IP_CSUM
in the device capabilities flags), have IPoIB set NETIF_F_IP_CSUM and
use the HCA to generate and verify IP checksums.
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

6046136c

12 3月, 2008 1 次提交

IPoIB: Allocate priv->tx_ring with vmalloc() · 10313cbb

由 Roland Dreier 提交于 3月 12, 2008

Commit 7143740d ("IPoIB: Add send gather support") made struct
ipoib_tx_buf significantly larger, since the mapping member changed
from a single u64 to an array with MAX_SKB_FRAGS + 1 entries. This
means that allocating tx_rings with kzalloc() may fail because there
is not enough contiguous memory for the new, much bigger size. Fix
this regression by allocating the rings with vmalloc() instead.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

10313cbb

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功