提交 · d4d576f5ab7edcb757bb33e6a5600666a0b1232d · openanolis / cloud-kernel

19 10月, 2018 1 次提交

ip6_tunnel: Fix encapsulation layout · d4d576f5

由 Stefano Brivio 提交于 10月 18, 2018

Commit 058214a4 ("ip6_tun: Add infrastructure for doing
encapsulation") added the ip6_tnl_encap() call in ip6_tnl_xmit(), before
the call to ipv6_push_frag_opts() to append the IPv6 Tunnel Encapsulation
Limit option (option 4, RFC 2473, par. 5.1) to the outer IPv6 header.

As long as the option didn't actually end up in generated packets, this
wasn't an issue. Then commit 89a23c8b ("ip6_tunnel: Fix missing tunnel
encapsulation limit option") fixed sending of this option, and the
resulting layout, e.g. for FoU, is:

.-------------------.------------.----------.-------------------.----- - -
| Outer IPv6 Header | UDP header | Option 4 | Inner IPv6 Header | Payload
'-------------------'------------'----------'-------------------'----- - -

Needless to say, FoU and GUE (at least) won't work over IPv6. The option
is appended by default, and I couldn't find a way to disable it with the
current iproute2.

Turn this into a more reasonable:

.-------------------.----------.------------.-------------------.----- - -
| Outer IPv6 Header | Option 4 | UDP header | Inner IPv6 Header | Payload
'-------------------'----------'------------'-------------------'----- - -

With this, and with 84dad559 ("udp6: fix encap return code for
resubmitting"), FoU and GUE work again over IPv6.

Fixes: 058214a4 ("ip6_tun: Add infrastructure for doing encapsulation")
Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4d576f5

20 9月, 2018 1 次提交

ip6_tunnel: be careful when accessing the inner header · 76c0ddd8

由 Paolo Abeni 提交于 9月 19, 2018

the ip6 tunnel xmit ndo assumes that the processed skb always
contains an ip[v6] header, but syzbot has found a way to send
frames that fall short of this assumption, leading to the following splat:

BUG: KMSAN: uninit-value in ip6ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1307
[inline]
BUG: KMSAN: uninit-value in ip6_tnl_start_xmit+0x7d2/0x1ef0
net/ipv6/ip6_tunnel.c:1390
CPU: 0 PID: 4504 Comm: syz-executor558 Not tainted 4.16.0+ #87
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x185/0x1d0 lib/dump_stack.c:53
  kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
  __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
  ip6ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1307 [inline]
  ip6_tnl_start_xmit+0x7d2/0x1ef0 net/ipv6/ip6_tunnel.c:1390
  __netdev_start_xmit include/linux/netdevice.h:4066 [inline]
  netdev_start_xmit include/linux/netdevice.h:4075 [inline]
  xmit_one net/core/dev.c:3026 [inline]
  dev_hard_start_xmit+0x5f1/0xc70 net/core/dev.c:3042
  __dev_queue_xmit+0x27ee/0x3520 net/core/dev.c:3557
  dev_queue_xmit+0x4b/0x60 net/core/dev.c:3590
  packet_snd net/packet/af_packet.c:2944 [inline]
  packet_sendmsg+0x7c70/0x8a30 net/packet/af_packet.c:2969
  sock_sendmsg_nosec net/socket.c:630 [inline]
  sock_sendmsg net/socket.c:640 [inline]
  ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
  __sys_sendmmsg+0x42d/0x800 net/socket.c:2136
  SYSC_sendmmsg+0xc4/0x110 net/socket.c:2167
  SyS_sendmmsg+0x63/0x90 net/socket.c:2162
  do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x441819
RSP: 002b:00007ffe58ee8268 EFLAGS: 00000213 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000441819
RDX: 0000000000000002 RSI: 0000000020000100 RDI: 0000000000000003
RBP: 00000000006cd018 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000213 R12: 0000000000402510
R13: 00000000004025a0 R14: 0000000000000000 R15: 0000000000000000

Uninit was created at:
  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
  kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
  kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
  kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321
  slab_post_alloc_hook mm/slab.h:445 [inline]
  slab_alloc_node mm/slub.c:2737 [inline]
  __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369
  __kmalloc_reserve net/core/skbuff.c:138 [inline]
  __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206
  alloc_skb include/linux/skbuff.h:984 [inline]
  alloc_skb_with_frags+0x1d4/0xb20 net/core/skbuff.c:5234
  sock_alloc_send_pskb+0xb56/0x1190 net/core/sock.c:2085
  packet_alloc_skb net/packet/af_packet.c:2803 [inline]
  packet_snd net/packet/af_packet.c:2894 [inline]
  packet_sendmsg+0x6454/0x8a30 net/packet/af_packet.c:2969
  sock_sendmsg_nosec net/socket.c:630 [inline]
  sock_sendmsg net/socket.c:640 [inline]
  ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
  __sys_sendmmsg+0x42d/0x800 net/socket.c:2136
  SYSC_sendmmsg+0xc4/0x110 net/socket.c:2167
  SyS_sendmmsg+0x63/0x90 net/socket.c:2162
  do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

This change addresses the issue adding the needed check before
accessing the inner header.

The ipv4 side of the issue is apparently there since the ipv4 over ipv6
initial support, and the ipv6 side predates git history.

Fixes: c4d3efaf ("[IPV6] IP6TUNNEL: Add support to IPv4 over IPv6 tunnel.")
Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+3fde91d4d394747d6db4@syzkaller.appspotmail.com
Tested-by: NAlexander Potapenko <glider@google.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

76c0ddd8

04 9月, 2018 1 次提交

ip6_tunnel: respect ttl inherit for ip6tnl · 36feaac3

由 Hangbin Liu 提交于 8月 31, 2018

man ip-tunnel ttl section says:
0 is a special value meaning that packets inherit the TTL value.

IPv4 tunnel respect this in ip_tunnel_xmit(), but IPv6 tunnel has not
implement it yet. To make IPv6 behave consistently with IP tunnel,
add ipv6 tunnel inherit support.
Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

36feaac3

08 8月, 2018 1 次提交

ip6_tunnel: collect_md xmit: Use ip_tunnel_key's provided src address · 3789caba

由 Shmulik Ladkani 提交于 8月 06, 2018

When using an ip6tnl device in collect_md mode, the xmit methods ignore
the ipv6.src field present in skb_tunnel_info's key, both for route
calculation purposes (flowi6 construction) and for assigning the
packet's final ipv6h->saddr.

This makes it impossible specifying a desired ipv6 local address in the
encapsulating header (for example, when using tc action tunnel_key).

This is also not aligned with behavior of ipip (ipv4) in collect_md
mode, where the key->u.ipv4.src gets used.

Fix, by assigning fl6.saddr with given key->u.ipv6.src.
In case ipv6.src is not specified, ip6_tnl_xmit uses existing saddr
selection code.

Fixes: 8d79266b ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
Reviewed-by: NEyal Birger <eyal.birger@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3789caba

06 8月, 2018 1 次提交

ip6_tunnel: use the right value for ipv4 min mtu check in ip6_tnl_xmit · 82a40777

由 Xin Long 提交于 8月 05, 2018

According to RFC791, 68 bytes is the minimum size of IPv4 datagram every
device must be able to forward without further fragmentation while 576
bytes is the minimum size of IPv4 datagram every device has to be able
to receive, so in ip6_tnl_xmit(), 68(IPV4_MIN_MTU) should be the right
value for the ipv4 min mtu check in ip6_tnl_xmit.

While at it, change to use max() instead of if statement.

Fixes: c9fefa08 ("ip6_tunnel: get the min mtu properly in ip6_tnl_xmit")
Reported-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

82a40777

02 6月, 2018 1 次提交

ip6_tunnel: remove magic mtu value 0xFFF8 · f7ff1fde

由 Nicolas Dichtel 提交于 5月 31, 2018

I don't know where this value comes from (probably a copy and paste and
paste and paste ...).
Let's use standard values which are a bit greater.

Link: https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=e5afd356a411aSigned-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f7ff1fde

06 4月, 2018 1 次提交

ip6_tunnel: better validate user provided tunnel names · db7a65e3

由 Eric Dumazet 提交于 4月 05, 2018

Use valid_name() to make sure user does not provide illegal
device name.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

db7a65e3

28 3月, 2018 1 次提交

net: Drop pernet_operations::async · 2f635cee

由 Kirill Tkhai 提交于 3月 27, 2018

Synchronous pernet_operations are not allowed anymore.
All are asynchronous. So, drop the structure member.
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f635cee

16 3月, 2018 1 次提交

net/ipv6: Change address check to always take a device argument · 232378e8

由 David Ahern 提交于 3月 13, 2018

ipv6_chk_addr_and_flags determines if an address is a local address and
optionally if it is an address on a specific device. For example, it is
called by ip6_route_info_create to determine if a given gateway address
is a local address. The address check currently does not consider L3
domains and as a result does not allow a route to be added in one VRF
if the nexthop points to an address in a second VRF. e.g.,

$ ip route add 2001:db8:1::/64 vrf r2 via 2001:db8:102::23
Error: Invalid gateway address.

where 2001:db8:102::23 is an address on an interface in vrf r1.

ipv6_chk_addr_and_flags needs to allow callers to always pass in a device
with a separate argument to not limit the address to the specific device.
The device is used used to determine the L3 domain of interest.

To that end add an argument to skip the device check and update callers
to always pass a device where possible and use the new argument to mean
any address in the domain.

Update a handful of users of ipv6_chk_addr with a NULL dev argument. This
patch handles the change to these callers without adding the domain check.

ip6_validate_gw needs to handle 2 cases - one where the device is given
as part of the nexthop spec and the other where the device is resolved.
There is at least 1 VRF case where deferring the check to only after
the route lookup has resolved the device fails with an unintuitive error
"RTNETLINK answers: No route to host" as opposed to the preferred
"Error: Gateway can not be a local address." The 'no route to host'
error is because of the fallback to a full lookup. The check is done
twice to avoid this error.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

232378e8

10 3月, 2018 1 次提交

net: do not create fallback tunnels for non-default namespaces · 79134e6c

由 Eric Dumazet 提交于 3月 08, 2018

fallback tunnels (like tunl0, gre0, gretap0, erspan0, sit0,
ip6tnl0, ip6gre0) are automatically created when the corresponding
module is loaded.

These tunnels are also automatically created when a new network
namespace is created, at a great cost.

In many cases, netns are used for isolation purposes, and these
extra network devices are a waste of resources. We are using
thousands of netns per host, and hit the netns creation/delete
bottleneck a lot. (Many thanks to Kirill for recent work on this)

Add a new sysctl so that we can opt-out from this automatic creation.

Note that these tunnels are still created for the initial namespace,
to be the least intrusive for typical setups.

Tested:
lpk43:~# cat add_del_unshare.sh
for i in `seq 1 40`
do
 (for j in `seq 1 100` ; do  unshare -n /bin/true >/dev/null ; done) &
done
wait

lpk43:~# echo 0 >/proc/sys/net/core/fb_tunnels_only_for_init_net
lpk43:~# time ./add_del_unshare.sh

real	0m37.521s
user	0m0.886s
sys	7m7.084s
lpk43:~# echo 1 >/proc/sys/net/core/fb_tunnels_only_for_init_net
lpk43:~# time ./add_del_unshare.sh

real	0m4.761s
user	0m0.851s
sys	1m8.343s
lpk43:~#
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79134e6c

05 3月, 2018 1 次提交

net/ipv6: Pass skb to route lookup · b75cc8f9

由 David Ahern 提交于 3月 02, 2018

IPv6 does path selection for multipath routes deep in the lookup
functions. The next patch adds L4 hash option and needs the skb
for the forward path. To get the skb to the relevant FIB lookup
functions it needs to go through the fib rules layer, so add a
lookup_data argument to the fib_lookup_arg struct.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b75cc8f9

28 2月, 2018 2 次提交

ip6_tunnel: fix IFLA_MTU ignored on NEWLINK · a6aa8044

由 Xin Long 提交于 2月 27, 2018

Commit 128bb975 ("ip6_gre: init dev->mtu and dev->hard_header_len
correctly") fixed IFLA_MTU ignored on NEWLINK for ip6_gre. The same
mtu fix is also needed for ip6_tunnel.

Note that dev->hard_header_len setting for ip6_tunnel works fine,
no need to fix it.
Reported-by: NJianlin Shi <jishi@redhat.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6aa8044

net: Convert ip6_tnl_net_ops · 66997ba0

由 Kirill Tkhai 提交于 2月 26, 2018

These pernet_operations are similar to ip6gre_net_ops. Exit method
unregisters all net ip6_tnl tunnels, and it looks like another
pernet_operations are not interested in foreign net tunnels list.
So, it's possible to mark them async.
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66997ba0

26 1月, 2018 1 次提交

net: don't call update_pmtu unconditionally · f15ca723

由 Nicolas Dichtel 提交于 1月 25, 2018

Some dst_ops (e.g. md_dst_ops)) doesn't set this handler. It may result to:
"BUG: unable to handle kernel NULL pointer dereference at           (null)"

Let's add a helper to check if update_pmtu is available before calling it.

Fixes: 52a589d5 ("geneve: update skb dst pmtu on tx path")
Fixes: a93bf0ff ("vxlan: update skb dst pmtu on tx path")
CC: Roman Kapl <code@rkapl.cz>
CC: Xin Long <lucien.xin@gmail.com>
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f15ca723

03 1月, 2018 2 次提交

ip6_tunnel: allow ip6gre dev mtu to be set below 1280 · 2fa771be

由 Xin Long 提交于 12月 25, 2017

Commit 582442d6 ("ipv6: Allow the MTU of ipip6 tunnel to be set
below 1280") fixed a mtu setting issue. It works for ipip6 tunnel.

But ip6gre dev updates the mtu also with ip6_tnl_change_mtu. Since
the inner packet over ip6gre can be ipv4 and it's mtu should also
be allowed to set below 1280, the same issue also exists on ip6gre.

This patch is to fix it by simply changing to check if parms.proto
is IPPROTO_IPV6 in ip6_tnl_change_mtu instead, to make ip6gre to
go to 'else' branch.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2fa771be

ip6_tunnel: disable dst caching if tunnel is dual-stack · 23263ec8

由 Eli Cooper 提交于 12月 25, 2017

When an ip6_tunnel is in mode 'any', where the transport layer
protocol can be either 4 or 41, dst_cache must be disabled.

This is because xfrm policies might apply to only one of the two
protocols. Caching dst would cause xfrm policies for one protocol
incorrectly used for the other.
Signed-off-by: NEli Cooper <elicooper@gmx.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

23263ec8

20 12月, 2017 1 次提交

ip6_tunnel: get the min mtu properly in ip6_tnl_xmit · c9fefa08

由 Xin Long 提交于 12月 18, 2017

Now it's using IPV6_MIN_MTU as the min mtu in ip6_tnl_xmit, but
IPV6_MIN_MTU actually only works when the inner packet is ipv6.

With IPV6_MIN_MTU for ipv4 packets, the new pmtu for inner dst
couldn't be set less than 1280. It would cause tx_err and the
packet to be dropped when the outer dst pmtu is close to 1280.

Jianlin found it by running ipv4 traffic with the topo:

  (client) gre6 <---> eth1 (route) eth2 <---> gre6 (server)

After changing eth2 mtu to 1300, the performance became very
low, or the connection was even broken. The issue also affects
ip4ip6 and ip6ip6 tunnels.

So if the inner packet is ipv4, 576 should be considered as the
min mtu.

Note that for ip4ip6 and ip6ip6 tunnels, the inner packet can
only be ipv4 or ipv6, but for gre6 tunnel, it may also be ARP.
This patch using 576 as the min mtu for non-ipv6 packet works
for all those cases.
Reported-by: NJianlin Shi <jishi@redhat.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c9fefa08

08 12月, 2017 1 次提交

adding missing rcu_read_unlock in ipxip6_rcv · 74c4b656

由 Nikita V. Shirokov 提交于 12月 06, 2017

commit 8d79266b ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
introduced new exit point in  ipxip6_rcv. however rcu_read_unlock is
missing there. this diff is fixing this

v1->v2:
 instead of doing rcu_read_unlock in place, we are going to "drop"
 section (to prevent skb leakage)

Fixes: 8d79266b ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
Signed-off-by: NNikita V. Shirokov <tehnerd@fb.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

74c4b656

05 12月, 2017 1 次提交

ip6_gre: add ip6 gre and gretap collect_md mode · 6712abc1

由 William Tu 提交于 12月 01, 2017

Similar to gre, vxlan, geneve, ipip tunnels, allow ip6 gre and gretap
tunnels to operate in collect metadata mode.  bpf_skb_[gs]et_tunnel_key()
helpers can make use of it right away.  OVS can use it as well in the
future.
Signed-off-by: NWilliam Tu <u9012063@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6712abc1

13 11月, 2017 3 次提交

ip6_tunnel: clean up ip4ip6 and ip6ip6's err_handlers · 77552cfa

由 Xin Long 提交于 11月 11, 2017

This patch is to remove some useless codes of redirect and fix some
indents on ip4ip6 and ip6ip6's err_handlers.

Note that redirect icmp packet is already processed in ip6_tnl_err,
the old redirect codes in ip4ip6_err actually never worked even
before this patch. Besides, there's no need to send redirect to
user's sk, it's for lower dst, so just remove it in this patch.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77552cfa

ip6_tunnel: process toobig in a better way · b00f5432

由 Xin Long 提交于 11月 11, 2017

The same improvement in "ip6_gre: process toobig in a better way"
is needed by ip4ip6 and ip6ip6 as well.

Note that ip4ip6 and ip6ip6 will also update sk dst pmtu in their
err_handlers. Like I said before, gre6 could not do this as it's
inner proto is not certain. But for all of them, sk dst pmtu will
be updated in tx path if in need.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b00f5432

ip6_tunnel: add the process for redirect in ip6_tnl_err · 383c1f88

由 Xin Long 提交于 11月 11, 2017

The same process for redirect in "ip6_gre: add the process for redirect
in ip6gre_err" is needed by ip4ip6 and ip6ip6 as well.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

383c1f88

25 10月, 2017 2 次提交

locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns... · 6aa7de05

由 Mark Rutland 提交于 10月 23, 2017

locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()

Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.

For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.

However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:

----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()

// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch

virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)
----
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

6aa7de05

ip6_tunnel: Allow rcv/xmit even if remote address is a local address · 908d140a

由 Shmulik Ladkani 提交于 10月 21, 2017

Currently, ip6_tnl_xmit_ctl drops tunneled packets if the remote
address (outer v6 destination) is one of host's locally configured
addresses.
Same applies to ip6_tnl_rcv_ctl: it drops packets if the remote address
(outer v6 source) is a local address.

This prevents using ipxip6 (and ip6_gre) tunnels whose local/remote
endpoints are on same host; OTOH v4 tunnels (ipip or gre) allow such
configurations.

An example where this proves useful is a system where entities are
identified by their unique v6 addresses, and use tunnels to encapsulate
traffic between them. The limitation prevents placing several entities
on same host.

Introduce IP6_TNL_F_ALLOW_LOCAL_REMOTE which allows to bypass this
restriction.
Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

908d140a

18 10月, 2017 1 次提交

ipv6: mark expected switch fall-throughs · 275757e6

由 Gustavo A. R. Silva 提交于 10月 16, 2017

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Notice that in some cases I placed the "fall through" comment
on its own line, which is what GCC is expecting to find.
Signed-off-by: NGustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

275757e6

01 10月, 2017 1 次提交

ip6_tunnel: update mtu properly for ARPHRD_ETHER tunnel device in tx path · d41bb33b

由 Xin Long 提交于 9月 28, 2017

Now when updating mtu in tx path, it doesn't consider ARPHRD_ETHER tunnel
device, like ip6gre_tap tunnel, for which it should also subtract ether
header to get the correct mtu.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d41bb33b

20 9月, 2017 1 次提交

ipv6: speedup ipv6 tunnels dismantle · bb401cae

由 Eric Dumazet 提交于 9月 19, 2017

Implement exit_batch() method to dismantle more devices
per round.

(rtnl_lock() ...
 unregister_netdevice_many() ...
 rtnl_unlock())

Tested:
$ cat add_del_unshare.sh
for i in `seq 1 40`
do
 (for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) &
done
wait ; grep net_namespace /proc/slabinfo

Before patch :
$ time ./add_del_unshare.sh
net_namespace        110    267   5504    1    2 : tunables    8    4    0 : slabdata    110    267      0

real    3m25.292s
user    0m0.644s
sys     0m40.153s

After patch:

$ time ./add_del_unshare.sh
net_namespace        126    282   5504    1    2 : tunables    8    4    0 : slabdata    126    282      0

real	1m38.965s
user	0m0.688s
sys	0m37.017s
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bb401cae

19 9月, 2017 1 次提交

ip6_tunnel: do not allow loading ip6_tunnel if ipv6 is disabled in cmdline · 8c22dab0

由 Xin Long 提交于 9月 15, 2017

If ipv6 has been disabled from cmdline since kernel started, it makes
no sense to allow users to create any ip6 tunnel. Otherwise, it could
some potential problem.

Jianlin found a kernel crash caused by this in ip6_gre when he set
ipv6.disable=1 in grub:

[  209.588865] Unable to handle kernel paging request for data at address 0x00000080
[  209.588872] Faulting instruction address: 0xc000000000a3aa6c
[  209.588879] Oops: Kernel access of bad area, sig: 11 [#1]
[  209.589062] NIP [c000000000a3aa6c] fib_rules_lookup+0x4c/0x260
[  209.589071] LR [c000000000b9ad90] fib6_rule_lookup+0x50/0xb0
[  209.589076] Call Trace:
[  209.589097] fib6_rule_lookup+0x50/0xb0
[  209.589106] rt6_lookup+0xc4/0x110
[  209.589116] ip6gre_tnl_link_config+0x214/0x2f0 [ip6_gre]
[  209.589125] ip6gre_newlink+0x138/0x3a0 [ip6_gre]
[  209.589134] rtnl_newlink+0x798/0xb80
[  209.589142] rtnetlink_rcv_msg+0xec/0x390
[  209.589151] netlink_rcv_skb+0x138/0x150
[  209.589159] rtnetlink_rcv+0x48/0x70
[  209.589169] netlink_unicast+0x538/0x640
[  209.589175] netlink_sendmsg+0x40c/0x480
[  209.589184] ___sys_sendmsg+0x384/0x4e0
[  209.589194] SyS_sendmsg+0xd4/0x140
[  209.589201] SyS_socketcall+0x3e0/0x4f0
[  209.589209] system_call+0x38/0xe0

This patch is to return -EOPNOTSUPP in ip6_tunnel_init if ipv6 has been
disabled from cmdline.
Reported-by: NJianlin Shi <jishi@redhat.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8c22dab0

13 9月, 2017 1 次提交

ip6_tunnel: fix ip6 tunnel lookup in collect_md mode · 6c1cb439

由 Haishuang Yan 提交于 9月 12, 2017

In collect_md mode, if the tun dev is down, it still can call
__ip6_tnl_rcv to receive on packets, and the rx statistics increase
improperly.

When the md tunnel is down, it's not neccessary to increase RX drops
for the tunnel device, packets would be recieved on fallback tunnel,
and the RX drops on fallback device will be increased as expected.

Fixes: 8d79266b ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
Cc: Alexei Starovoitov <ast@fb.com>
Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c1cb439

09 9月, 2017 1 次提交

ip6_tunnel: fix setting hop_limit value for ipv6 tunnel · 18e1173d

由 Haishuang Yan 提交于 9月 07, 2017

Similar to vxlan/geneve tunnel, if hop_limit is zero, it should fall
back to ip6_dst_hoplimt().
Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

18e1173d

27 6月, 2017 3 次提交

net: add netlink_ext_ack argument to rtnl_link_ops.validate · a8b8a889

由 Matthias Schiffer 提交于 6月 25, 2017

Add support for extended error reporting.
Signed-off-by: NMatthias Schiffer <mschiffer@universe-factory.net>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8b8a889

net: add netlink_ext_ack argument to rtnl_link_ops.changelink · ad744b22

由 Matthias Schiffer 提交于 6月 25, 2017

Add support for extended error reporting.
Signed-off-by: NMatthias Schiffer <mschiffer@universe-factory.net>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad744b22

net: add netlink_ext_ack argument to rtnl_link_ops.newlink · 7a3f4a18

由 Matthias Schiffer 提交于 6月 25, 2017

Add support for extended error reporting.
Signed-off-by: NMatthias Schiffer <mschiffer@universe-factory.net>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a3f4a18

19 6月, 2017 1 次提交

ip6_tunnel: Correct tos value in collect_md mode · 46f8cd9d

由 Haishuang Yan 提交于 6月 17, 2017

Same as ip_gre, geneve and vxlan, use key->tos as traffic class value.

CC: Peter Dawson <petedaws@gmail.com>
Fixes: 0e9a7095 ("ip6_tunnel, ip6_gre: fix setting of DSCP on
encapsulated packets”)
Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: NPeter Dawson <peter.a.dawson@boeing.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46f8cd9d

17 6月, 2017 1 次提交

ip6_tunnel: fix potential issue in __ip6_tnl_rcv · f1925ca5

由 Haishuang Yan 提交于 6月 15, 2017

When __ip6_tnl_rcv fails, the tun_dst won't be freed, so call
dst_release to free it in error code path.

Fixes: 8d79266b ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
CC: Alexei Starovoitov <ast@fb.com>
Tested-by: NZhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f1925ca5

08 6月, 2017 1 次提交

net: Fix inconsistent teardown and release of private netdev state. · cf124db5

由 David S. Miller 提交于 5月 08, 2017

Network devices can allocate reasources and private memory using
netdev_ops->ndo_init().  However, the release of these resources
can occur in one of two different places.

Either netdev_ops->ndo_uninit() or netdev->destructor().

The decision of which operation frees the resources depends upon
whether it is necessary for all netdev refs to be released before it
is safe to perform the freeing.

netdev_ops->ndo_uninit() presumably can occur right after the
NETDEV_UNREGISTER notifier completes and the unicast and multicast
address lists are flushed.

netdev->destructor(), on the other hand, does not run until the
netdev references all go away.

Further complicating the situation is that netdev->destructor()
almost universally does also a free_netdev().

This creates a problem for the logic in register_netdevice().
Because all callers of register_netdevice() manage the freeing
of the netdev, and invoke free_netdev(dev) if register_netdevice()
fails.

If netdev_ops->ndo_init() succeeds, but something else fails inside
of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
it is not able to invoke netdev->destructor().

This is because netdev->destructor() will do a free_netdev() and
then the caller of register_netdevice() will do the same.

However, this means that the resources that would normally be released
by netdev->destructor() will not be.

Over the years drivers have added local hacks to deal with this, by
invoking their destructor parts by hand when register_netdevice()
fails.

Many drivers do not try to deal with this, and instead we have leaks.

Let's close this hole by formalizing the distinction between what
private things need to be freed up by netdev->destructor() and whether
the driver needs unregister_netdevice() to perform the free_netdev().

netdev->priv_destructor() performs all actions to free up the private
resources that used to be freed by netdev->destructor(), except for
free_netdev().

netdev->needs_free_netdev is a boolean that indicates whether
free_netdev() should be done at the end of unregister_netdevice().

Now, register_netdevice() can sanely release all resources after
ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
and netdev->priv_destructor().

And at the end of unregister_netdevice(), we invoke
netdev->priv_destructor() and optionally call free_netdev().
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf124db5

05 6月, 2017 1 次提交

ip6_tunnel: fix traffic class routing for tunnels · 5f733ee6

由 Liam McBirnie 提交于 6月 01, 2017

ip6_route_output() requires that the flowlabel contains the traffic
class for policy routing.

Commit 0e9a7095 ("ip6_tunnel, ip6_gre: fix setting of DSCP on
encapsulated packets") removed the code which previously added the
traffic class to the flowlabel.

The traffic class is added here because only route lookup needs the
flowlabel to contain the traffic class.

Fixes: 0e9a7095 ("ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets")
Signed-off-by: NLiam McBirnie <liam.mcbirnie@boeing.com>
Acked-by: NPeter Dawson <peter.a.dawson@boeing.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f733ee6

27 5月, 2017 1 次提交

ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets · 0e9a7095

由 Peter Dawson 提交于 5月 26, 2017

This fix addresses two problems in the way the DSCP field is formulated
 on the encapsulating header of IPv6 tunnels.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195661

1) The IPv6 tunneling code was manipulating the DSCP field of the
 encapsulating packet using the 32b flowlabel. Since the flowlabel is
 only the lower 20b it was incorrect to assume that the upper 12b
 containing the DSCP and ECN fields would remain intact when formulating
 the encapsulating header. This fix handles the 'inherit' and
 'fixed-value' DSCP cases explicitly using the extant dsfield u8 variable.

2) The use of INET_ECN_encapsulate(0, dsfield) in ip6_tnl_xmit was
 incorrect and resulted in the DSCP value always being set to 0.

Commit 90427ef5 ("ipv6: fix flow labels when the traffic class
 is non-0") caused the regression by masking out the flowlabel
 which exposed the incorrect handling of the DSCP portion of the
 flowlabel in ip6_tunnel and ip6_gre.

Fixes: 90427ef5 ("ipv6: fix flow labels when the traffic class is non-0")
Signed-off-by: NPeter Dawson <peter.a.dawson@boeing.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e9a7095

02 5月, 2017 1 次提交

ip6_tunnel: Fix missing tunnel encapsulation limit option · 89a23c8b

由 Craig Gallek 提交于 4月 26, 2017

The IPv6 tunneling code tries to insert IPV6_TLV_TNL_ENCAP_LIMIT and
IPV6_TLV_PADN options when an encapsulation limit is defined (the
default is a limit of 4).  An MTU adjustment is done to account for
these options as well.  However, the options are never present in the
generated packets.

The issue appears to be a subtlety between IPV6_DSTOPTS and
IPV6_RTHDRDSTOPTS defined in RFC 3542.  When the IPIP tunnel driver was
written, the encap limit options were included as IPV6_RTHDRDSTOPTS in
dst0opt of struct ipv6_txoptions.  Later, ipv6_push_nfrags_opts was
(correctly) updated to require IPV6_RTHDR options when IPV6_RTHDRDSTOPTS
are to be used.  This caused the options to no longer be included in v6
encapsulated packets.

The fix is to use IPV6_DSTOPTS (in dst1opt of struct ipv6_txoptions)
instead.  IPV6_DSTOPTS do not have the additional IPV6_RTHDR requirement.

Fixes: 1df64a8569c7: ("[IPV6]: Add ip6ip6 tunnel driver.")
Fixes: 333fad53: ("[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542)")
Signed-off-by: NCraig Gallek <kraig@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

89a23c8b

27 4月, 2017 1 次提交

ipv6: check skb->protocol before lookup for nexthop · 199ab00f

由 WANG Cong 提交于 4月 25, 2017

Andrey reported a out-of-bound access in ip6_tnl_xmit(), this
is because we use an ipv4 dst in ip6_tnl_xmit() and cast an IPv4
neigh key as an IPv6 address:

        neigh = dst_neigh_lookup(skb_dst(skb),
                                 &ipv6_hdr(skb)->daddr);
        if (!neigh)
                goto tx_err_link_failure;

        addr6 = (struct in6_addr *)&neigh->primary_key; // <=== HERE
        addr_type = ipv6_addr_type(addr6);

        if (addr_type == IPV6_ADDR_ANY)
                addr6 = &ipv6_hdr(skb)->daddr;

        memcpy(&fl6->daddr, addr6, sizeof(fl6->daddr));

Also the network header of the skb at this point should be still IPv4
for 4in6 tunnels, we shold not just use it as IPv6 header.

This patch fixes it by checking if skb->protocol is ETH_P_IPV6: if it
is, we are safe to do the nexthop lookup using skb_dst() and
ipv6_hdr(skb)->daddr; if not (aka IPv4), we have no clue about which
dest address we can pick here, we have to rely on callers to fill it
from tunnel config, so just fall to ip6_route_output() to make the
decision.

Fixes: ea3dc960 ("ip6_tunnel: Add support for wildcard tunnel endpoints.")
Reported-by: NAndrey Konovalov <andreyknvl@google.com>
Tested-by: NAndrey Konovalov <andreyknvl@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

199ab00f

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功