提交 · 4d75313ce9b832efc4efb487f080b5ed72beae2c · openeuler / Kernel

07 2月, 2012 3 次提交

tipc: Prevent broadcast link stalling in dual LAN environments · 4d75313c

由 Allan Stephens 提交于 10月 25, 2011

Ensure that sequence number information about incoming broadcast link
messages is initialized only by the activation of the first link to a
given cluster node. Previously, a race condition allowed reset and/or
activation messages for a second link to re-initialize this sequence
number information with obsolete values. This could trigger TIPC to
request the retransmission of previously acknowledged broadcast link
messages from that node, resulting in broadcast link processing becoming
stalled if the node had already released one or more of those messages
and was unable to perform the required retransmission.

Thanks to Laser <gotolaser@gmail.com> for identifying this problem
and assisting in the development of this fix.
Signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

4d75313c

tipc: Prevent transmission of outdated link protocol messages · 92d2c905

由 Allan Stephens 提交于 10月 25, 2011

Ensures that a link endpoint discards any previously deferred link
protocol message whenever it attempts to send a new one.

Previously, it was possible for a link protocol message that was unsent
due to congestion to be transmitted after newer protocol messages had
been sent. The stale link protocol message might then cause the receiving
link endpoint to malfunction because of its outdated conent.

Thanks to Osamu Kaminuma [okaminum@avaya.com] for diagnosing the problem
and contributing a prototype patch.
Signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

92d2c905

tipc: improve the link deferred queue insertion algorithm · 8809b255

由 Allan Stephens 提交于 10月 25, 2011

Re-code the algorithm for inserting an out-of-sequence message into
a unicast or broadcast link's deferred message queue. It remains
functionally equivalent but should be easier to understand/maintain.
Signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

8809b255

06 2月, 2012 2 次提交

J
caif: caifdev is never used in net/caif/caif_dev.c::transmit() - remove it. · 1f0b6702
由 Jesper Juhl 提交于 2月 05, 2012
```
Signed-off-by: NJesper Juhl <jj@chaosbits.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
1f0b6702

decnet: remove unused variable from dn_output() · 22b6a2eb

由 Jesper Juhl 提交于 2月 05, 2012

The variable 'neigh' is assigned to, but otherwise completely
unused. So let's remove it.
Signed-off-by: NJesper Juhl <jj@chaosbits.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22b6a2eb

05 2月, 2012 3 次提交

netprio_cgroup: Fix obo in get_prioidx · 5962b35c

由 Neil Horman 提交于 2月 03, 2012

It was recently pointed out to me that the get_prioidx function sets a bit in
the prioidx map prior to checking to see if the index being set is out of
bounds.  This patch corrects that, avoiding the possiblity of us writing beyond
the end of the array
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
Reported-by: NStanislaw Gruszka <sgruszka@redhat.com>
CC: Stanislaw Gruszka <sgruszka@redhat.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5962b35c

caif: Add drop count for caif_net device. · 576f3cc7

由 sjur.brandeland@stericsson.com 提交于 2月 03, 2012

Count dropped packets in CAIF Netdevice.
Signed-off-by: NSjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

576f3cc7

caif: Kill debugfs vars for caif socket · 4a695823

由 sjur.brandeland@stericsson.com 提交于 2月 03, 2012

Kill off the debug-fs exposed varaibles from caif_socket.
Signed-off-by: NSjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4a695823

03 2月, 2012 3 次提交

caif: Bugfix double kfree_skb upon xmit failure · ba760574

由 Dmitry Tarnyagin 提交于 2月 02, 2012

SKB is freed twice upon send error. The Network stack consumes SKB even
when it returns error code.
Signed-off-by: NSjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba760574

caif: Bugfix list_del_rcu race in cfmuxl_ctrlcmd. · b01377a4

由 sjur.brandeland@stericsson.com 提交于 2月 02, 2012

Always use cfmuxl_remove_uplayer when removing a up-layer.
cfmuxl_ctrlcmd() can be called independently and in parallel with
cfmuxl_remove_uplayer(). The race between them could cause list_del_rcu
to be called on a node which has been already taken out from the list.
That lead to a (rare) crash on accessing poisoned node->prev inside
list_del_rcu.

This fix ensures that deletion are done holding the same lock.
Reported-by: NDmitry Tarnyagin <dmitry.tarnyagin@stericsson.com>
Signed-off-by: NSjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b01377a4

tcp: properly initialize tcp memory limits · c43b874d

由 Jason Wang 提交于 2月 02, 2012

Commit 4acb4190 tries to fix the using uninitialized value
introduced by commit 3dc43e3e,  but it would make the
per-socket memory limits too small.

This patch fixes this and also remove the redundant codes
introduced in 4acb4190.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c43b874d

02 2月, 2012 10 次提交

atm: clip: Convert over to dst_neigh_lookup(). · 7161c76f

由 David S. Miller 提交于 2月 01, 2012

CLIP only support ipv4, and this is evidenced by the fact that it
is a device specific extension of arp_tbl, so this conversion is
pretty straightforward.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7161c76f

D
decnet: Add missing neigh->ha locking to dn_neigh_output_packet() · 3329bdfc
由 David S. Miller 提交于 2月 01, 2012
```
Basically, mirror the logic in neigh_connected_output().
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
3329bdfc

ipv6: Remove never used function inet6_ac_check(). · f79d52c2

由 David S. Miller 提交于 2月 01, 2012

It went from unused, to commented out, and never changing after
that.

Just get rid of it, if someone wants it they can unearth it from
the history.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f79d52c2

PATCH V2 net-next] net: dev: Convert printks to pr_<level> · 7b6cd1ce

由 Joe Perches 提交于 2月 01, 2012

Use the current logging style.
Coalesce formats where appropriate.
Update grammar where appropriate.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7b6cd1ce

mac80211: timeout a single frame in the rx reorder buffer · 07ae2dfc

由 Eliad Peller 提交于 2月 01, 2012

The current code checks for stored_mpdu_num > 1, causing
the reorder_timer to be triggered indefinitely, but the
frame is never timed-out (until the next packet is received)
Signed-off-by: NEliad Peller <eliad@wizery.com>
Cc: <stable@vger.kernel.org>
Acked-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

07ae2dfc

ethtool: Null-terminate filename passed to ethtool_ops::flash_device · 786f5281

由 Ben Hutchings 提交于 2月 01, 2012

The parameters for ETHTOOL_FLASHDEV include a filename, which ought to
be null-terminated. Currently the only driver that implements
ethtool_ops::flash_device attempts to add a null terminator if
necessary, but does it wrongly. Do it in the ethtool core instead.
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

786f5281

net: Disambiguate kernel message · efcdbf24

由 Arun Sharma 提交于 1月 30, 2012

Some of our machines were reporting:

TCP: too many of orphaned sockets

even when the number of orphaned sockets was well below the
limit.

We print a different message depending on whether we're out
of TCP memory or there are too many orphaned sockets.

Also move the check out of line and cleanup the messages
that were printed.
Signed-off-by: NArun Sharma <asharma@fb.com>
Suggested-by: NMohan Srinivasan <mohan@fb.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: David Miller <davem@davemloft.net>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

efcdbf24

netpoll: Neaten MAX_SKB_SIZE macro · 6f706245

由 Joe Perches 提交于 1月 29, 2012

Add the types in the packet layout order.
Signed-off-by: NJoe Perches <joe@perches.com>
Reviewed-by: NWANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f706245

netpoll: Convert printks to np_<level> and add pr_fmt · e6ec2693

由 Joe Perches 提交于 1月 29, 2012

Use a more current message logging style.
Add pr_fmt to prefix dmesg output with "netpoll: "
Add macros to print np->name.
Signed-off-by: NJoe Perches <joe@perches.com>
Reviewed-by: NWANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e6ec2693

tcp: md5: RST: getting md5 key from listener · 658ddaaf

由 Shawn Lu 提交于 1月 31, 2012

TCP RST mechanism is broken in TCP md5(RFC2385). When
connection is gone, md5 key is lost, sending RST
without md5 hash is deem to ignored by peer. This can
be a problem since RST help protocal like bgp to fast
recove from peer crash.

In most case, users of tcp md5, such as bgp and ldp,
have listener on both sides to accept connection from peer.
md5 keys for peers are saved in listening socket.

There are two cases in finding md5 key when connection is
lost:
1.Passive receive RST: The message is send to well known port,
tcp will associate it with listner. md5 key is gotten from
listener.

2.Active receive RST (no sock): The message is send to ative
side, there is no socket associated with the message. In this
case, finding listener from source port, then find md5 key from
listener.

we are not loosing sercuriy here:
packet is checked with md5 hash. No RST is generated
if md5 hash doesn't match or no md5 key can be found.
Signed-off-by: NShawn Lu <shawn.lu@ericsson.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

658ddaaf

01 2月, 2012 5 次提交

xfrm6: remove unneeded NULL check in __xfrm6_output() · 5b11b2e4

由 Dan Carpenter 提交于 1月 31, 2012

We don't check for NULL consistently in __xfrm6_output().  If "x" were
NULL here it would lead to an OOPs later.  I asked Steffen Klassert
about this and he suggested that we remove the NULL check.

On 10/29/11, Steffen Klassert <steffen.klassert@secunet.com> wrote:
>> net/ipv6/xfrm6_output.c
>>    148
>>    149		if ((x && x->props.mode == XFRM_MODE_TUNNEL) &&
>>                           ^
>
> x can't be null here. It would be a bug if __xfrm6_output() is called
> without a xfrm_state attached to the skb. I think we can just remove
> this null check.

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5b11b2e4

tcp: md5: protects md5sig_info with RCU · a8afca03

由 Eric Dumazet 提交于 1月 31, 2012

This patch makes sure we use appropriate memory barriers before
publishing tp->md5sig_info, allowing tcp_md5_do_lookup() being used from
tcp_v4_send_reset() without holding socket lock (upcoming patch from
Shawn Lu)

Note we also need to respect rcu grace period before its freeing, since
we can free socket without this grace period thanks to
SLAB_DESTROY_BY_RCU
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Shawn Lu <shawn.lu@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8afca03

tcp: md5: use sock_kmalloc() to limit md5 keys · 5f3d9cb2

由 Eric Dumazet 提交于 1月 31, 2012

There is no limit on number of MD5 keys an application can attach to a
tcp socket.

This patch adds a per tcp socket limit based
on /proc/sys/net/core/optmem_max

With current default optmem_max values, this allows about 150 keys on
64bit arches, and 88 keys on 32bit arches.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f3d9cb2

tcp: md5: rcu conversion · a915da9b

由 Eric Dumazet 提交于 1月 31, 2012

In order to be able to support proper RST messages for TCP MD5 flows, we
need to allow access to MD5 keys without locking listener socket.

This conversion is a nice cleanup, and shrinks size of timewait sockets
by 80 bytes.

IPv6 code reuses generic code found in IPv4 instead of duplicating it.

Control path uses GFP_KERNEL allocations instead of GFP_ATOMIC.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Shawn Lu <shawn.lu@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a915da9b

tcp: md5: remove obsolete md5_add() method · a2d91241

由 Eric Dumazet 提交于 1月 31, 2012

We no longer use md5_add() method from struct tcp_sock_af_ops
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a2d91241

31 1月, 2012 6 次提交

net: Deinline __nlmsg_put and genlmsg_put. -7k code on i386 defconfig. · a46621a3

由 Denys Vlasenko 提交于 1月 30, 2012

   text	   data	    bss	    dec	    hex	filename
8455963	 532732	1810804	10799499 a4c98b	vmlinux.o.before
8448899	 532732	1810804	10792435 a4adf3	vmlinux.o

This change also removes commented-out copy of __nlmsg_put
which was last touched in 2005 with "Enable once all users
have been converted" comment on top.

Changes in v2: rediffed against net-next.
Signed-off-by: NDenys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a46621a3

ipv6: fix RFC5722 comment · 5de658f8

由 Eric Dumazet 提交于 1月 30, 2012

RFC5722 Section 4 was amended by Errata 3089

Our implementation did the right thing anyway...
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5de658f8

net: Allow ipv6 proxies and arp proxies be shown with iproute2 · 84920c14

由 Tony Zelenoff 提交于 1月 26, 2012

Add ability to return neighbour proxies list to caller if
it sent full ndmsg structure and has NTF_PROXY flag set.

Before this patch (and before iproute2 patches):
$ ip neigh add proxy 2001::1 dev eth0
$ ip -6 neigh show
$

After it and with applied iproute2 patches:
$ ip neigh add proxy 2001::1 dev eth0
$ ip -6 neigh show
2001::1 dev eth0  proxy
$

Compatibility with old versions of iproute2 is not broken,
kernel checks for incoming structure size and properly
works if old structure is came.

[v2]
* changed comments style.
* removed useless line with continue and curly bracket.
* changed incoming message size check from equal to more or
  equal.

CC: davem@davemloft.net
CC: kuznet@ms2.inr.ac.ru
CC: netdev@vger.kernel.org
CC: xemul@parallels.com
Signed-off-by: NTony Zelenoff <antonz@parallels.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

84920c14

af_unix: fix EPOLLET regression for stream sockets · 6f01fd6e

由 Eric Dumazet 提交于 1月 28, 2012

Commit 0884d7aa (AF_UNIX: Fix poll blocking problem when reading from
a stream socket) added a regression for epoll() in Edge Triggered mode
(EPOLLET)

Appropriate fix is to use skb_peek()/skb_unlink() instead of
skb_dequeue(), and only call skb_unlink() when skb is fully consumed.

This remove the need to requeue a partial skb into sk_receive_queue head
and the extra sk->sk_data_ready() calls that added the regression.

This is safe because once skb is given to sk_receive_queue, it is not
modified by a writer, and readers are serialized by u->readlock mutex.

This also reduce number of spinlock acquisition for small reads or
MSG_PEEK users so should improve overall performance.
Reported-by: NNick Mathewson <nickm@freehaven.net>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Alexey Moiseytsev <himeraster@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f01fd6e

tcp: fix tcp_trim_head() to adjust segment count with skb MSS · 5b35e1e6

由 Neal Cardwell 提交于 1月 28, 2012

This commit fixes tcp_trim_head() to recalculate the number of
segments in the skb with the skb's existing MSS, so trimming the head
causes the skb segment count to be monotonically non-increasing - it
should stay the same or go down, but not increase.

Previously tcp_trim_head() used the current MSS of the connection. But
if there was a decrease in MSS between original transmission and ACK
(e.g. due to PMTUD), this could cause tcp_trim_head() to
counter-intuitively increase the segment count when trimming bytes off
the head of an skb. This violated assumptions in tcp_tso_acked() that
tcp_trim_head() only decreases the packet count, so that packets_acked
in tcp_tso_acked() could underflow, leading tcp_clean_rtx_queue() to
pass u32 pkts_acked values as large as 0xffffffff to
ca_ops->pkts_acked().

As an aside, if tcp_trim_head() had really wanted the skb to reflect
the current MSS, it should have called tcp_set_skb_tso_segs()
unconditionally, since a decrease in MSS would mean that a
single-packet skb should now be sliced into multiple segments.
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Acked-by: NNandita Dukkipati <nanditad@google.com>
Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5b35e1e6

net/tcp: Fix tcp memory limits initialization when !CONFIG_SYSCTL · 4acb4190

由 Glauber Costa 提交于 1月 30, 2012

sysctl_tcp_mem() initialization was moved to sysctl_tcp_ipv4.c
in commit 3dc43e3e, since it
became a per-ns value.

That code, however, will never run when CONFIG_SYSCTL is
disabled, leading to bogus values on those fields - causing hung
TCP sockets.

This patch fixes it by keeping an initialization code in
tcp_init(). It will be overwritten by the first net namespace
init if CONFIG_SYSCTL is compiled in, and do the right thing if
it is compiled out.

It is also named properly as tcp_init_mem(), to properly signal
its non-sysctl side effect on TCP limits.
Reported-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Cc: David S. Miller <davem@davemloft.net>
Link: http://lkml.kernel.org/r/4F22D05A.8030604@parallels.com
[ renamed the function, tidied up the changelog a bit ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4acb4190

28 1月, 2012 6 次提交

net caif: Register properly as a pernet subsystem. · 8a8ee9af

由 Eric W. Biederman 提交于 1月 26, 2012

caif is a subsystem and as such it needs to register with
register_pernet_subsys instead of register_pernet_device.

Among other problems using register_pernet_device was resulting in
net_generic being called before the caif_net structure was allocated.
Which has been causing net_generic to fail with either BUG_ON's or by
return NULL pointers.

A more ugly problem that could be caused is packets in flight why the
subsystem is shutting down.

To remove confusion also remove the cruft cause by inappropriately
trying to fix this bug.

With the aid of the previous patch I have tested this patch and
confirmed that using register_pernet_subsys makes the failure go away as
it should.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Acked-by: NSjur Brændeland <sjur.brandeland@stericsson.com>
Tested-by: NSasha Levin <levinsasha928@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8a8ee9af

ipv6: Eliminate dst_get_neighbour_noref() usage in ip6_forward(). · c45a3dfb

由 David S. Miller 提交于 1月 27, 2012

It's only used to get at neigh->primary_key, which in this context is
always going to be the same as rt->rt6i_gateway.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c45a3dfb

ipv6: Remove neigh argument from ndisc_send_redirect() · 4991969a

由 David S. Miller 提交于 1月 27, 2012

Instead, compute it as-needed inside of that function using
dst_neigh_lookup().
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4991969a

ipv6: fib: Convert fib6_age() to dst_neigh_lookup(). · 5339ab8b

由 David S. Miller 提交于 1月 27, 2012

In this specific situation we know we are dealing with a gatewayed route
and therefore rt6i_gateway is not going to be in6addr_any even in future
interpretations.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5339ab8b

ipv6: ndisc: Convert to dst_neigh_lookup() · eb857186

由 David S. Miller 提交于 1月 27, 2012

Now all code paths grab a local reference to the neigh, so if neigh
is not NULL we unconditionally release it at the end.  The old logic
would only release if we didn't have a non-NULL 'rt'.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eb857186

ipv4: ip_gre: Convert to dst_neigh_lookup() · 0ec88662

由 David S. Miller 提交于 1月 27, 2012

The conversion is very similar to that made to ipv6's SIT code.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ec88662

27 1月, 2012 2 次提交

net: RTNETLINK adjusting values of min_ifinfo_dump_size · f18da145

由 Stefan Gula 提交于 1月 26, 2012

Setting link parameters on a netdevice changes the value
of if_nlmsg_size(), therefore it is necessary to recalculate
min_ifinfo_dump_size.
Signed-off-by: NStefan Gula <steweg@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f18da145

ipv6: Fix ip_gre lockless xmits. · f2b3ee9e

由 Willem de Bruijn 提交于 1月 26, 2012

Tunnel devices set NETIF_F_LLTX to bypass HARD_TX_LOCK. Sit and
ipip set this unconditionally in ops->setup, but gre enables it
conditionally after parameter passing in ops->newlink. This is
not called during tunnel setup as below, however, so GRE tunnels are
still taking the lock.

modprobe ip_gre
ip tunnel add test0 mode gre remote 10.5.1.1 dev lo
ip link set test0 up
ip addr add 10.6.0.1 dev test0
# cat /sys/class/net/test0/features
# $DIR/test_tunnel_xmit 10 10.5.2.1
ip route add 10.5.2.0/24 dev test0
ip tunnel del test0

The newlink callback is only called in rtnl_netlink, and only if
the device is new, as it calls register_netdevice internally. Gre
tunnels are created at 'ip tunnel add' with ioctl SIOCADDTUNNEL,
which calls ipgre_tunnel_locate, which calls register_netdev.
rtnl_newlink is called at 'ip link set', but skips ops->newlink
and the device is up with locking still enabled. The equivalent
ipip tunnel works fine, btw (just substitute 'method gre' for
'method ipip').

On kernels before /sys/class/net/*/features was removed [1],
the first commented out line returns 0x6000 with method gre,
which indicates that NETIF_F_LLTX (0x1000) is not set. With ipip,
it reports 0x7000. This test cannot be used on recent kernels where
the sysfs file is removed (and ETHTOOL_GFEATURES does not currently
work for tunnel devices, because they lack dev->ethtool_ops).

The second commented out line calls a simple transmission test [2]
that sends on 24 cores at maximum rate. Results of a single run:

ipip: 19,372,306
gre before patch: 4,839,753
gre after patch: 19,133,873

This patch replicates the condition check in ipgre_newlink to
ipgre_tunnel_locate. It works for me, both with oseq on and off.
This is the first time I looked at rtnetlink and iproute2 code,
though, so someone more knowledgeable should probably check the
patch. Thanks.

The tail of both functions is now identical, by the way. To avoid
code duplication, I'll be happy to rework this and merge the two.

[1] http://patchwork.ozlabs.org/patch/104610/
[2] http://kernel.googlecode.com/files/xmit_udp_parallel.cSigned-off-by: NWillem de Bruijn <willemb@google.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f2b3ee9e

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功