提交 · 1df9916e46451533463f227e6be57cc2cfca4c5f · openeuler / Kernel

05 10月, 2010 2 次提交

fib: fib_rules_cleanup can be static · 1df9916e

由 stephen hemminger 提交于 10月 04, 2010

fib_rules_cleanup_ups is only defined and used in one place.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1df9916e

net: dynamic ingress_queue allocation · 24824a09

由 Eric Dumazet 提交于 10月 02, 2010

ingress being not used very much, and net_device->ingress_queue being
quite a big object (128 or 256 bytes), use a dynamic allocation if
needed (tc qdisc add dev eth0 ingress ...)

dev_ingress_queue(dev) helper should be used only with RTNL taken.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

24824a09

04 10月, 2010 2 次提交

net: introduce DST_NOCACHE flag · c7d4426a

由 Eric Dumazet 提交于 10月 03, 2010

While doing stress tests with IP route cache disabled, and multi queue
devices, I noticed a very high contention on one rwlock used in
neighbour code.

When many cpus are trying to send frames (possibly using a high
performance multiqueue device) to the same neighbour, they fight for the
neigh->lock rwlock in order to call neigh_hh_init(), and fight on
hh->hh_refcnt (a pair of atomic_inc/atomic_dec_and_test())

But we dont need to call neigh_hh_init() for dst that are used only
once. It costs four atomic operations at least, on two contended cache
lines, plus the high contention on neigh->lock rwlock.

Introduce a new dst flag, DST_NOCACHE, that is set when dst was not
inserted in route cache.

With the stress test bench, sending 160000000 frames on one neighbour,
results are :

Before patch:

real	2m28.406s
user	0m11.781s
sys	36m17.964s


After patch:

real	1m26.532s
user	0m12.185s
sys	20m3.903s
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c7d4426a

net: Fix the condition passed to sk_wait_event() · 482964e5

由 Nagendra Tomar 提交于 10月 02, 2010

This patch fixes the condition (3rd arg) passed to sk_wait_event() in
sk_stream_wait_memory(). The incorrect check in sk_stream_wait_memory()
causes the following soft lockup in tcp_sendmsg() when the global tcp
memory pool has exhausted.

>>> snip <<<

localhost kernel: BUG: soft lockup - CPU#3 stuck for 11s! [sshd:6429]
localhost kernel: CPU 3:
localhost kernel: RIP: 0010:[sk_stream_wait_memory+0xcd/0x200]  [sk_stream_wait_memory+0xcd/0x200] sk_stream_wait_memory+0xcd/0x200
localhost kernel:
localhost kernel: Call Trace:
localhost kernel:  [sk_stream_wait_memory+0x1b1/0x200] sk_stream_wait_memory+0x1b1/0x200
localhost kernel:  [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40
localhost kernel:  [ipv6:tcp_sendmsg+0x6e6/0xe90] tcp_sendmsg+0x6e6/0xce0
localhost kernel:  [sock_aio_write+0x126/0x140] sock_aio_write+0x126/0x140
localhost kernel:  [xfs:do_sync_write+0xf1/0x130] do_sync_write+0xf1/0x130
localhost kernel:  [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40
localhost kernel:  [hrtimer_start+0xe3/0x170] hrtimer_start+0xe3/0x170
localhost kernel:  [vfs_write+0x185/0x190] vfs_write+0x185/0x190
localhost kernel:  [sys_write+0x50/0x90] sys_write+0x50/0x90
localhost kernel:  [system_call+0x7e/0x83] system_call+0x7e/0x83

>>> snip <<<

What is happening is, that the sk_wait_event() condition passed from
sk_stream_wait_memory() evaluates to true for the case of tcp global memory
exhaustion. This is because both sk_stream_memory_free() and vm_wait are true
which causes sk_wait_event() to *not* call schedule_timeout().
Hence sk_stream_wait_memory() returns immediately to the caller w/o sleeping.
This causes the caller to again try allocation, which again fails and again
calls sk_stream_wait_memory(), and so on.

[ Bug introduced by commit c1cbe4b7
  ("[NET]: Avoid atomic xchg() for non-error case") -DaveM ]
Signed-off-by: NNagendra Singh Tomar <tomer_iisc@yahoo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

482964e5

30 9月, 2010 2 次提交

net: rename netdev rx_queue to ingress_queue · bfa5ae63

由 Eric Dumazet 提交于 9月 28, 2010

There is some confusion with rx_queue name after RPS, and net drivers
private rx_queue fields.

I suggest to rename "struct net_device"->rx_queue to ingress_queue.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bfa5ae63

net: add a recursion limit in xmit path · 745e20f1

由 Eric Dumazet 提交于 9月 29, 2010

As tunnel devices are going to be lockless, we need to make sure a
misconfigured machine wont enter an infinite loop.

Add a percpu variable, and limit to three the number of stacked xmits.
Reported-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

745e20f1

29 9月, 2010 1 次提交

ipv4: Allow configuring subnets as local addresses · 4465b469

由 Tom Herbert 提交于 5月 23, 2010

This patch allows a host to be configured to respond to any address in
a specified range as if it were local, without actually needing to
configure the address on an interface.  This is done through routing
table configuration.  For instance, to configure a host to respond
to any address in 10.1/16 received on eth0 as a local address we can do:

ip rule add from all iif eth0 lookup 200
ip route add local 10.1/16 dev lo proto kernel scope host src 127.0.0.1 table 200

This host is now reachable by any 10.1/16 address (route lookup on
input for packets received on eth0 can find the route).  On output, the
rule will not be matched so that this host can still send packets to
10.1/16 (not sent on loopback).  Presumably, external routing can be
configured to make sense out of this.

To make this work, we needed to modify the logic in finding the
interface which is assigned a given source address for output
(dev_ip_find).  We perform a normal fib_lookup instead of just a
lookup on the local table, and in the lookup we ignore the input
interface for matching.

This patch is useful to implement IP-anycast for subnets of virtual
addresses.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4465b469

28 9月, 2010 4 次提交

net: Allow changing number of RX queues after device allocation · 62fe0b40

由 Ben Hutchings 提交于 9月 27, 2010

For RPS, we create a kobject for each RX queue based on the number of
queues passed to alloc_netdev_mq().  However, drivers generally do not
determine the numbers of hardware queues to use until much later, so
this usually represents the maximum number the driver may use and not
the actual number in use.

For TX queues, drivers can update the actual number using
netif_set_real_num_tx_queues().  Add a corresponding function for RX
queues, netif_set_real_num_rx_queues().
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

62fe0b40

net: sk_{detach|attach}_filter() rcu fixes · f91ff5b9

由 Eric Dumazet 提交于 9月 27, 2010

sk_attach_filter() and sk_detach_filter() are run with socket locked.

Use the appropriate rcu_dereference_protected() instead of blocking BH,
and rcu_dereference_bh().
There is no point adding BH prevention and memory barrier.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f91ff5b9

fib: use atomic_inc_not_zero() in fib_rules_lookup · 7fa7cb71

由 Eric Dumazet 提交于 9月 27, 2010

It seems we dont use appropriate refcount increment in an
rcu_read_lock() protected section.

fib_rule_get() might increment a null refcount and bad things could
happen.

While fib_nl_delrule() respects an rcu grace period before calling
fib_rule_put(), fib_rules_cleanup_ops() calls fib_rule_put() without a
grace period.

Note : after this patch, we might avoid the synchronize_rcu() call done
in fib_nl_delrule()
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7fa7cb71

tcp: Fix >4GB writes on 64-bit. · 01db403c

由 David S. Miller 提交于 9月 27, 2010

Fixes kernel bugzilla #16603

tcp_sendmsg() truncates iov_len to an 'int' which a 4GB write to write
zero bytes, for example.

There is also the problem higher up of how verify_iovec() works.  It
wants to prevent the total length from looking like an error return
value.

However it does this using 'int', but syscalls return 'long' (and
thus signed 64-bit on 64-bit machines).  So it could trigger
false-positives on 64-bit as written.  So fix it to use 'long'.
Reported-by: NOlaf Bonorden <bono@onlinehome.de>
Reported-by: NDaniel Büse <dbuese@gmx.de>
Reported-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

01db403c

27 9月, 2010 2 次提交

rps: allocate rx queues in register_netdevice only · 1b4bf461

由 Eric Dumazet 提交于 9月 23, 2010

Instead of having two places were we allocate dev->_rx, introduce
netif_alloc_rx_queues() helper and call it only from
register_netdevice(), not from alloc_netdev_mq()

Goal is to let drivers change dev->num_rx_queues after allocating netdev
and before registering it.

This also removes a lot of ifdefs in net/core/dev.c
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b4bf461

net: propagate NETIF_F_HIGHDMA to vlans · c5256c51

由 Eric Dumazet 提交于 9月 23, 2010

Automatically allows vlans to get NETIF_F_HIGHDMA if underlying device
supports it.

On 32bit arches (and more precisely if CONFIG_HIGHMEM is enabled), it
can help to reduce cost of illegal_highdma() and __skb_linearize()
calls.

Tested on tg3 , bnx2, bonding, this worked very well.

This is a generalization of a patch provided by Yi Zou & Jeff Kirsher.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c5256c51

25 9月, 2010 1 次提交

net: fix a lockdep splat · f064af1e

由 Eric Dumazet 提交于 9月 22, 2010

We have for each socket :

One spinlock (sk_slock.slock)
One rwlock (sk_callback_lock)

Possible scenarios are :

(A) (this is used in net/sunrpc/xprtsock.c)
read_lock(&sk->sk_callback_lock) (without blocking BH)
<BH>
spin_lock(&sk->sk_slock.slock);
...
read_lock(&sk->sk_callback_lock);
...

(B)
write_lock_bh(&sk->sk_callback_lock)
stuff
write_unlock_bh(&sk->sk_callback_lock)

(C)
spin_lock_bh(&sk->sk_slock)
...
write_lock_bh(&sk->sk_callback_lock)
stuff
write_unlock_bh(&sk->sk_callback_lock)
spin_unlock_bh(&sk->sk_slock)

This (C) case conflicts with (A) :

CPU1 [A]                         CPU2 [C]
read_lock(callback_lock)
<BH>                             spin_lock_bh(slock)
<wait to spin_lock(slock)>
                                 <wait to write_lock_bh(callback_lock)>

We have one problematic (C) use case in inet_csk_listen_stop() :

local_bh_disable();
bh_lock_sock(child); // spin_lock_bh(&sk->sk_slock)
WARN_ON(sock_owned_by_user(child));
...
sock_orphan(child); // write_lock_bh(&sk->sk_callback_lock)

lockdep is not happy with this, as reported by Tetsuo Handa

It seems only way to deal with this is to use read_lock_bh(callbacklock)
everywhere.

Thanks to Jarek for pointing a bug in my first attempt and suggesting
this solution.
Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Jarek Poplawski <jarkao2@gmail.com>
Tested-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f064af1e

24 9月, 2010 1 次提交

net: return operator cleanup · a02cec21

由 Eric Dumazet 提交于 9月 22, 2010

Change "return (EXPR);" to "return EXPR;"

return is not a function, parentheses are not required.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a02cec21

22 9月, 2010 3 次提交

net: core: use kernel's converter from hex to bin · 82fd5b5d

由 Andy Shevchenko 提交于 9月 20, 2010

Signed-off-by: NAndy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

82fd5b5d

ethtool: Fix build due to lack of ethtool.h include. · 73da16c2

由 David S. Miller 提交于 9月 21, 2010

net/core/ethtool.c: In function 'ethtool_get_regs':
net/core/ethtool.c:818:2: error: implicit declaration of function 'vmalloc'
net/core/ethtool.c:818:9: warning: assignment makes pointer from integer without a cast
net/core/ethtool.c:833:2: error: implicit declaration of function 'vfree'
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73da16c2

ethtool: Allocate register dump buffer with vmalloc() · a77f5db3

由 Ben Hutchings 提交于 9月 20, 2010

Some NICs have huge register files which exceed the maximum heap
allocation size.
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a77f5db3

18 9月, 2010 3 次提交

ethtool, ixgbe: Move RX n-tuple mask fixup to ethtool · be2902da

由 Ben Hutchings 提交于 9月 16, 2010

The ethtool utility does not set masks for flow parameters that are
not specified, so if both value and mask are 0 then this must be
treated as equivalent to a mask with all bits set.  Currently that is
done in the only driver that implements RX n-tuple filtering, ixgbe.
Move it to the ethtool core.
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be2902da

netns: keep vlan slaves on master netns move · 3b27e105

由 David Lamparter 提交于 9月 17, 2010

previously, if a vlan master device was moved from one network namespace
to another, all 802.1q and macvlan slaves were deleted.

we can use dev->reg_state to figure out whether dev_change_net_namespace
is happening, since that won't set dev->reg_state NETREG_UNREGISTERING.
so, this changes 8021q and macvlan to ignore NETDEV_UNREGISTER when
reg_state is not NETREG_UNREGISTERING.
Signed-off-by: NDavid Lamparter <equinox@diac24.net>
Reviewed-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: NDaniel Lezcano <daniel.lezcano@free.fr>
Acked-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b27e105

ethtool: change ethtool_set_gro() to use ethtool_op_get_rx_csum · 67c96608

由 Eric Dumazet 提交于 9月 17, 2010

To be able to switch on GRO on a device, ethtool_set_gro() checks this
device provides a get_rx_csum() method.

Some devices dont provide this method, while they do support RX
checksumming.

This patch allows bonding to support GRO :

ethtool -K bond0 gro on
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67c96608

17 9月, 2010 1 次提交

net: include inetdevice.h for rcu_dereference_raw api change · caeda9b9

由 Stephen Rothwell 提交于 9月 16, 2010

rcu_dereference_raw() now needs to know the type of its argument.
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

caeda9b9

16 9月, 2010 3 次提交

net: enable GRO by default for vlan devices · 16c3ea78

由 Brandon Philips 提交于 9月 15, 2010

Currently vlan devices don't have GRO by default as none of the Ethernet
drivers add NETIF_F_GRO to their vlan_features.

As GRO is a software feature add GRO to dev->vlan_features in
register_netdevice() and let vlan_dev_init() take care that it gets
enabled only when dev->features has NETIF_F_GRO too.
Signed-off-by: NBrandon Philips <bphilips@suse.de>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16c3ea78

ipv4: ip_ptr cleanups · 95ae6b22

由 Eric Dumazet 提交于 9月 15, 2010

dev->ip_ptr is protected by rtnl and rcu.

Yet some places dont use appropriate primitives and/or locking rules.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95ae6b22

ethtool: Remove unimplemented flow specification types · e0de7c93

由 Ben Hutchings 提交于 9月 14, 2010

struct ethtool_rawip4_spec and struct ethtool_ether_spec are neither
commented nor used by any driver, so remove them.  Adjust padding in
the user-visible unions that included these structures.

Fix references to struct ethtool_rawip4_spec in
ethtool_get_rx_ntuple(), which should use struct ethtool_usrip4_spec.

struct ethtool_usrip4_spec cannot hold IPv6 host addresses and there
is no separate structure that can, so remove ETH_RX_NFC_IP6 and the
reference to it in niu.
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0de7c93

15 9月, 2010 1 次提交

net: use rcu_barrier() in rollback_registered_many · ef885afb

由 Eric Dumazet 提交于 9月 13, 2010

netdev_wait_allrefs() waits that all references to a device vanishes.

It currently uses a _very_ pessimistic 250 ms delay between each probe.
Some users reported that no more than 4 devices can be dismantled per
second, this is a pretty serious problem for some setups.

Most of the time, a refcount is about to be released by an RCU callback,
that is still in flight because rollback_registered_many() uses a
synchronize_rcu() call instead of rcu_barrier(). Problem is visible if
number of online cpus is one, because synchronize_rcu() is then a no op.

time to remove 50 ipip tunnels on a UP machine :

before patch : real 11.910s
after patch : real 1.250s
Reported-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Reported-by: NOctavian Purdila <opurdila@ixiacom.com>
Reported-by: NBenjamin LaHaise <bcrl@kvack.org>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef885afb

14 9月, 2010 1 次提交

flow: better memory management · 83b6b1f5

由 Eric Dumazet 提交于 9月 10, 2010

Allocate hash tables for every online cpus, not every possible ones.

NUMA aware allocations.

Dont use a full page on arches where PAGE_SIZE > 1024*sizeof(void *)

misc:
  __percpu , __read_mostly, __cpuinit annotations
  flow_compare_t is just an "unsigned long"
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

83b6b1f5

11 9月, 2010 1 次提交

pkt_sched: remov unnecessary bh_disable · 9ca7f876

由 stephen hemminger 提交于 9月 08, 2010

Now that est_tree_lock is acquired with BH protection, the other
call is unnecessary.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ca7f876

10 9月, 2010 2 次提交

net/core: add lock context change annotations in net/core/sock.c · f39234d6

由 Namhyung Kim 提交于 9月 08, 2010

__lock_sock() and __release_sock() releases and regrabs lock but
were missing proper annotations. Add it. This removes following
warning from sparse. (Currently __lock_sock() does not emit any
warning about it but I think it is better to add also.)

 net/core/sock.c:1580:17: warning: context imbalance in '__release_sock' - unexpected unlock
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f39234d6

net/core: remove address space warnings on verify_iovec() · a700d8be

由 Namhyung Kim 提交于 9月 08, 2010

move_addr_to_kernel() and copy_from_user() requires their argument
as __user pointer but were missing proper markups. Add it.
This removes following warnings from sparse.

net/core/iovec.c:44:52: warning: incorrect type in argument 1 (different address spaces)
net/core/iovec.c:44:52: expected void [noderef] <asn:1>*uaddr
net/core/iovec.c:44:52: got void *msg_name
net/core/iovec.c:55:34: warning: incorrect type in argument 2 (different address spaces)
net/core/iovec.c:55:34: expected void const [noderef] <asn:1>*from
net/core/iovec.c:55:34: got struct iovec *msg_iov
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a700d8be

09 9月, 2010 2 次提交

net: rps: add the shortcut for one rps_cpus · 6febfca9

由 Changli Gao 提交于 9月 03, 2010

When there is only one rps_cpus, skb_get_rxhash() can be eliminated.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6febfca9

gro: Re-fix different skb headrooms · 64289c8e

由 Jarek Poplawski 提交于 9月 04, 2010

The patch: "gro: fix different skb headrooms" in its part:
"2) allocate a minimal skb for head of frag_list" is buggy. The copied
skb has p->data set at the ip header at the moment, and skb_gro_offset
is the length of ip + tcp headers. So, after the change the length of
mac header is skipped. Later skb_set_mac_header() sets it into the
NET_SKB_PAD area (if it's long enough) and ip header is misaligned at
NET_SKB_PAD + NET_IP_ALIGN offset. There is no reason to assume the
original skb was wrongly allocated, so let's copy it as it was.

bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=16626
fixes commit: 3d3be433Reported-by: NPlamen Petrov <pvp-lsts@fs.uni-ruse.bg>
Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Tested-by: NPlamen Petrov <pvp-lsts@fs.uni-ruse.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64289c8e

08 9月, 2010 1 次提交

net: fix tx queue selection for bridged devices implementing select_queue · deabc772

由 Helmut Schaa 提交于 9月 03, 2010

When a net device is implementing the select_queue callback and is part of
a bridge, frames coming from the bridge already have a tx queue associated
to the socket (introduced in commit a4ee3ce3,
"net: Use sk_tx_queue_mapping for connected sockets"). The call to
sk_tx_queue_get will then return the tx queue used by the bridge instead
of calling the select_queue callback.

In case of mac80211 this broke QoS which is implemented by using the
select_queue callback. Furthermore it introduced problems with rt2x00
because frames with the same TID and RA sometimes appeared on different
tx queues which the hw cannot handle correctly.

Fix this by always calling select_queue first if it is available and only
afterwards use the socket tx queue mapping.
Signed-off-by: NHelmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

deabc772

07 9月, 2010 2 次提交

net: poll() optimizations · db40980f

由 Eric Dumazet 提交于 9月 06, 2010

No need to test twice sk->sk_shutdown & RCV_SHUTDOWN
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

db40980f

net: pskb_expand_head() optimization · 1fd63041

由 Eric Dumazet 提交于 9月 02, 2010

pskb_expand_head() blindly takes references on fragments before calling
skb_release_data(), potentially releasing these references.

We can add a fast path, avoiding these atomic operations, if we own the
last reference on skb->head.

Based on a previous patch from David
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1fd63041

04 9月, 2010 1 次提交

net: remove two kmemcheck annotations · 52ee7a04

由 Eric Dumazet 提交于 9月 03, 2010

__alloc_skb() uses a memset() to clear all the beginning of skb,
including bitfields contained in 'flags1' & 'flags2'.

We dont need any more to use kmemcheck_annotate_bitfield() on these
fields. However, we still need it for the clone part, which is not
cleared.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52ee7a04

03 9月, 2010 2 次提交

pkt_sched: Fix lockdep warning on est_tree_lock in gen_estimator · 0b5d404e

由 Jarek Poplawski 提交于 9月 02, 2010

This patch fixes a lockdep warning:

[  516.287584] =========================================================
[  516.288386] [ INFO: possible irq lock inversion dependency detected ]
[  516.288386] 2.6.35b #7
[  516.288386] ---------------------------------------------------------
[  516.288386] swapper/0 just changed the state of lock:
[  516.288386]  (&qdisc_tx_lock){+.-...}, at: [<c12eacda>] est_timer+0x62/0x1b4
[  516.288386] but this lock took another, SOFTIRQ-unsafe lock in the past:
[  516.288386]  (est_tree_lock){+.+...}
[  516.288386] 
[  516.288386] and interrupts could create inverse lock ordering between them.
...

So, est_tree_lock needs BH protection because it's taken by
qdisc_tx_lock, which is used both in BH and process contexts.
(Full warning with this patch at netdev, 02 Sep 2010.)

Fixes commit: ae638c47
("pkt_sched: gen_estimator: add a new lock")
Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0b5d404e

net: dev_add_pack() & __dev_remove_pack() changes · c07b68e8

由 Eric Dumazet 提交于 9月 02, 2010

Add a small helper ptype_head() to get the head to manipulate

dev_add_pack() & __dev_remove_pack() can use a spinlock without
blocking BH, since softirq use RCU, and these functions are run from
process context only.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c07b68e8

02 9月, 2010 2 次提交

gro: fix different skb headrooms · 3d3be433

由 Eric Dumazet 提交于 9月 01, 2010

Packets entering GRO might have different headrooms, even for a given
flow (because of implementation details in drivers, like copybreak).
We cant force drivers to deliver packets with a fixed headroom.

1) fix skb_segment()

skb_segment() makes the false assumption headrooms of fragments are same
than the head. When CHECKSUM_PARTIAL is used, this can give csum_start
errors, and crash later in skb_copy_and_csum_dev()

2) allocate a minimal skb for head of frag_list

skb_gro_receive() uses netdev_alloc_skb(headroom + skb_gro_offset(p)) to
allocate a fresh skb. This adds NET_SKB_PAD to a padding already
provided by netdevice, depending on various things, like copybreak.

Use alloc_skb() to allocate an exact padding, to reduce cache line
needs:
NET_SKB_PAD + NET_IP_ALIGN

bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=16626

Many thanks to Plamen Petrov, testing many debugging patches !
With help of Jarek Poplawski.
Reported-by: NPlamen Petrov <pvp-lsts@fs.uni-ruse.bg>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d3be433

net: make rx_queue sysfs_ops const · fa50d645

由 stephen hemminger 提交于 8月 31, 2010

Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa50d645

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功