提交 · 234b27c3fd58fc0e15c04dd0fbf4337fac9c2a06 · OpenHarmony / kernel_linux

14 11月, 2009 7 次提交

ipv6: speedup inet6_dump_addr() · 234b27c3

由 Eric Dumazet 提交于 11月 12, 2009

When handling large number of netdevices, inet6_dump_addr()
is very slow because it has O(N^2) complexity.

Instead of scanning one single list, we can use the NETDEV_HASHENTRIES
sub lists of the dev_index hash table, and RCU lookups.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

234b27c3

ipv4: speedup inet_dump_ifaddr() · eec4df98

由 Eric Dumazet 提交于 11月 12, 2009

Stephen Hemminger a écrit :
> On Thu, 12 Nov 2009 15:11:36 +0100
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>> When handling large number of netdevices, inet_dump_ifaddr()
>> is very slow because it has O(N^2) complexity.
>>
>> Instead of scanning one single list, we can use the NETDEV_HASHENTRIES
>> sub lists of the dev_index hash table, and RCU lookups.
>>
>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>
> You might be able to make RCU critical section smaller by moving
> it into loop.
>

Indeed. But we dump at most one skb (<= 8192 bytes ?), so rcu_read_lock
holding time is small, unless we meet many netdevices without
addresses. I wonder if its really common...

Thanks

[PATCH net-next-2.6] ipv4: speedup inet_dump_ifaddr()

When handling large number of netdevices, inet_dump_ifaddr()
is very slow because it has O(N2) complexity.

Instead of scanning one single list, we can use the NETDEV_HASHENTRIES
sub lists of the dev_index hash table, and RCU lookups.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eec4df98

igmp: Use next_net_device_rcu() · 6baff150

由 Eric Dumazet 提交于 11月 11, 2009

We need to use next_det_device_rcu() in RCU protected section.

We also can avoid in_dev_get()/in_dev_put() overhead (code size mainly)
in rcu_read_lock() sections.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6baff150

ipv6: use RCU to walk list of network devices · ce81b76a

由 Eric Dumazet 提交于 11月 11, 2009

No longer need read_lock(&dev_base_lock), use RCU instead.
We also can avoid taking references on inet6_dev structs.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce81b76a

net: TCP_MSS_DEFAULT, TCP_MSS_DESIRED · bee7ca9e

由 William Allen Simpson 提交于 11月 10, 2009

Define two symbols needed in both kernel and user space.

Remove old (somewhat incorrect) kernel variant that wasn't used in
most cases.  Default should apply to both RMSS and SMSS (RFC2581).

Replace numeric constants with defined symbols.

Stand-alone patch, originally developed for TCPCT.

Signed-off-by: William.Allen.Simpson@gmail.com
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bee7ca9e

vlan/macvlan: propagate transmission state to upper layers · cbbef5e1

由 Patrick McHardy 提交于 11月 10, 2009

Both vlan and macvlan devices usually don't use a qdisc and immediately
queue packets to the underlying device. Propagate transmission state of
the underlying device to the upper layers so they can react on congestion
and/or inform the sending process.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cbbef5e1

net: allow to propagate errors through ->ndo_hard_start_xmit() · 572a9d7b

由 Patrick McHardy 提交于 11月 10, 2009

Currently the ->ndo_hard_start_xmit() callbacks are only permitted to return
one of the NETDEV_TX codes. This prevents any kind of error propagation for
virtual devices, like queue congestion of the underlying device in case of
layered devices, or unreachability in case of tunnels.

This patches changes the NET_XMIT codes to avoid clashes with the NETDEV_TX
codes and changes the two callers of dev_hard_start_xmit() to expect either
errno codes, NET_XMIT codes or NETDEV_TX codes as return value.

In case of qdisc_restart(), all non NETDEV_TX codes are mapped to NETDEV_TX_OK
since no error propagation is possible when using qdiscs. In case of
dev_queue_xmit(), the error is propagated upwards.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

572a9d7b

12 11月, 2009 7 次提交

net/atm: move all compat_ioctl handling to atm/ioctl.c · 805003a4

由 Arnd Bergmann 提交于 11月 11, 2009

We have two implementations of the compat_ioctl handling for ATM, the
one that we have had for ages in fs/compat_ioctl.c and the one added to
net/atm/ioctl.c by David Woodhouse. Unfortunately, both versions are
incomplete, and in practice we use a very confusing combination of the
two.

For ioctl numbers that have the same identifier on 32 and 64 bit systems,
we go directly through the compat_ioctl socket operation, for those that

differ, we do a conversion in fs/compat_ioctl.c.

This patch moves both variants into the vcc_compat_ioctl() function,
while preserving the current behaviour. It also kills off the COMPATIBLE_IOCTL
definitions that we never use here.
Doing it this way is clearly not a good solution, but I hope it is a
step into the right direction, so that someone is able to clean up this
mess for real.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

805003a4

net/compat: fix dev_ifsioc emulation corner cases · a2116ed2

由 Arnd Bergmann 提交于 11月 11, 2009

Handling for SIOCSHWTSTAMP is broken on architectures
with a split user/kernel address space like s390,
because it passes a real user pointer while using
set_fs(KERNEL_DS).
A similar problem might arise the next time somebody
adds code to dev_ifsioc.

Split up dev_ifsioc into three separate functions for
SIOCSHWTSTAMP, SIOC*IFMAP and all other numbers so
we can get rid of set_fs in all potentially affected
cases.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Cc: Patrick Ohly <patrick.ohly@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a2116ed2

decnet: convert dndev_lock to spinlock · e5c140a3

由 stephen hemminger 提交于 11月 11, 2009

There is no reason for this lock to be reader/writer since
the reader only has lock held for a very brief period.
The overhead of read_lock is more expensive than spinlock.

Compile tested only, I am not a decnet user.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e5c140a3

decnet: add RTNL lock when reading address list · 41bdecf1

由 stephen hemminger 提交于 11月 11, 2009

Add missing locking in the case of auto binding to the
default device. The address list might change while this code is looking
at the list.

Compile tested only, I am not a decnet user.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

41bdecf1

netdev: fold name hash properly (v3) · 08e9897d

由 stephen hemminger 提交于 11月 10, 2009

The full_name_hash function does not produce well distributed values in
the lower bits, so most code uses hash_32() to fold it.  This is really
a bug introduced when name hashing was added, back in 2.5 when I added
name hashing.

hash_32 is all that is needed since full_name_hash returns unsigned int
which is only 32 bits on 64 bit platforms.

Also, there is no point in using hash_32 on ifindex, because the is naturally
sequential and usually well distributed.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

08e9897d

skbuff: Do not allow skb recycling with disabled IRQs · e84af6dd

由 Anton Vorontsov 提交于 11月 10, 2009

NAPI drivers try to recycle SKBs in their polling routine, but we
generally don't know the context in which the polling will be called,
and the skb recycling itself may require IRQs to be enabled.

This patch adds irqs_disabled() test to the skb_recycle_check()
routine, so that we'll not let the drivers hit the skb recycling
path with IRQs disabled.

As a side effect, this patch actually disables skb recycling for some
[broken] drivers. E.g. gianfar driver grabs an irqsave spinlock during
TX ring processing, and then tries to recycle an skb, and that caused
the following badness:

nf_conntrack version 0.5.0 (1008 buckets, 4032 max)
------------[ cut here ]------------
Badness at kernel/softirq.c:143
NIP: c003e3c4 LR: c423a528 CTR: c003e344
...
NIP [c003e3c4] local_bh_enable+0x80/0xc4
LR [c423a528] destroy_conntrack+0xd4/0x13c [nf_conntrack]
Call Trace:
[c15d1b60] [c003e32c] local_bh_disable+0x1c/0x34 (unreliable)
[c15d1b70] [c423a528] destroy_conntrack+0xd4/0x13c [nf_conntrack]
[c15d1b80] [c02c6370] nf_conntrack_destroy+0x3c/0x70
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e84af6dd

ipv6: Remove unused var in inet6_dump_ifinfo() · 434a8a58

由 David S. Miller 提交于 11月 11, 2009

Reported by Stephen Rothwell:

--------------------
Today's linux-next build (x86_64 allmodconfig) produced this warning:

net/ipv6/addrconf.c: In function 'inet6_dump_ifinfo':
net/ipv6/addrconf.c:3833: warning: unused variable 'err'

Introduced by commit 84d2697d ("ipv6:
speedup inet6_dump_ifinfo()").
--------------------
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

434a8a58

11 11月, 2009 13 次提交

CAN: use dev_get_by_index_rcu · ff879eb6

由 stephen hemminger 提交于 11月 10, 2009

Use new function to avoid doing read_lock().
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NOliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff879eb6

IPV4: use rcu to walk list of devices in IGMP · 61fbab77

由 stephen hemminger 提交于 11月 10, 2009

This also needs to be optimized for large number of devices.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

61fbab77

decnet: use RCU to find network devices · fa918602

由 stephen hemminger 提交于 11月 10, 2009

When showing device statistics use RCU rather than read_lock(&dev_base_lock)
Compile tested only.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa918602

net: use rcu for network scheduler API · f1e9016d

由 stephen hemminger 提交于 11月 10, 2009

Use RCU to walk list of network devices in qdisc dump.
This could be optimized for large number of devices.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f1e9016d

vlan: eliminate use of dev_base_lock · 9e067597

由 stephen hemminger 提交于 11月 10, 2009

Do not need to use read_lock(&dev_base_lock), use RCU instead.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e067597

IPv6: use ipv6_addr_v4mapped() · 856540ee

由 Brian Haley 提交于 11月 09, 2009

Change udp6_portaddr_hash() to use ipv6_addr_v4mapped()
inline instead of ipv6_addr_type().
Signed-off-by: NBrian Haley <brian.haley@hp.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

856540ee

sit: Clean up DF code by copying from IPIP · 292f4f3c

由 Herbert Xu 提交于 11月 09, 2009

This patch rearranges the SIT DF bit handling using the new IPIP DF
code.  The only externally visible effect should be the case where
PMTU is enabled and the MTU is exactly 1280 bytes.  In this case the
previous code would send packets out with DF off while the new code
would set the DF bit.  This is inline with RFC 4213.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

Thanks,
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

292f4f3c

ipv6: Allow inet6_dump_addr() to handle more than 64 addresses · bcd32326

由 Eric Dumazet 提交于 11月 09, 2009

Apparently, inet6_dump_addr() is not able to handle more than
64 ipv6 addresses per device. We must break from inner loops
in case skb is full, or else cursor is put at the end of list.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bcd32326

ipv6: speedup inet6_dump_ifinfo() · 84d2697d

由 Eric Dumazet 提交于 11月 09, 2009

When handling large number of netdevice, inet6_dump_ifinfo()
is very slow because it has O(N^2) complexity.

Instead of scanning one single list, we can use the 256 sub lists
of the dev_index hash table, and RCU lookups.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

84d2697d

net: netlink_getname, packet_getname -- use DECLARE_SOCKADDR guard · 13cfa97b

由 Cyrill Gorcunov 提交于 11月 08, 2009

Use guard DECLARE_SOCKADDR in a few more places which allow
us to catch if the structure copied back is too big.
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13cfa97b

udp: bind() optimisation · 30fff923

由 Eric Dumazet 提交于 11月 09, 2009

UDP bind() can be O(N^2) in some pathological cases.

Thanks to secondary hash tables, we can make it O(N)
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

30fff923

Phonet: allocate and copy for pipe TX without sock lock · b1704374

由 Rémi Denis-Courmont 提交于 11月 09, 2009

Signed-off-by: NRémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1704374

Phonet: put sockets in a hash table · 6b0d07ba

由 Rémi Denis-Courmont 提交于 11月 09, 2009

Signed-off-by: NRémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b0d07ba

09 11月, 2009 12 次提交

xfrm: SAD entries do not expire correctly after suspend-resume · 9e0d57fd

由 Yury Polyanskiy 提交于 11月 08, 2009

  This fixes the following bug in the current implementation of
net/xfrm: SAD entries timeouts do not count the time spent by the machine 
in the suspended state. This leads to the connectivity problems because 
after resuming local machine thinks that the SAD entry is still valid, while 
it has already been expired on the remote server.

  The cause of this is very simple: the timeouts in the net/xfrm are bound to 
the old mod_timer() timers. This patch reassigns them to the
CLOCK_REALTIME hrtimer.

  I have been using this version of the patch for a few months on my
machines without any problems. Also run a few stress tests w/o any
issues.

  This version of the patch uses tasklet_hrtimer by Peter Zijlstra
(commit 9ba5f0).

  This patch is against 2.6.31.4. Please CC me.
Signed-off-by: NYury Polyanskiy <polyanskiy@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e0d57fd

net/compat_ioctl: support SIOCWANDEV · 7a50a240

由 Arnd Bergmann 提交于 11月 08, 2009

This adds compat_ioctl support for SIOCWANDEV, which has
always been missing.

The definition of struct compat_ifreq was missing an
ifru_settings fields that is needed to support SIOCWANDEV,
so add that and clean up the whitespace damage in the
struct definition.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a50a240

net, compat_ioctl: fix SIOCGMII ioctls · fab2532b

由 Arnd Bergmann 提交于 11月 08, 2009

SIOCGMIIPHY and SIOCGMIIREG return data through ifreq,
so it needs to be converted on the way out as well.

SIOCGIFPFLAGS is unused, but has the same problem in theory.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fab2532b

udp: multicast RX should increment SNMP/sk_drops counter in allocation failures · f6b8f32c