- 13 9月, 2018 1 次提交
-
-
由 Xin Long 提交于
In inet6_rtm_getroute, since Commit 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes"), it has used rt->from to dump route info instead of rt. However for some route like cache, some of its information like flags or gateway is not the same as that of the 'from' one. It caused 'ip route get' to dump the wrong route information. In Jianlin's testing, the output information even lost the expiration time for a pmtu route cache due to the wrong fib6_flags. So change to use rt6_info members for dst addr, src addr, flags and gateway when it tries to dump a route entry without fibmatch set. v1->v2: - not use rt6i_prefsrc. - also fix the gw dump issue. Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes") Reported-by: NJianlin Shi <jishi@redhat.com> Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 9月, 2018 1 次提交
-
-
由 Taehee Yoo 提交于
A kernel crash occurrs when defragmented packet is fragmented in ip_do_fragment(). In defragment routine, skb_orphan() is called and skb->ip_defrag_offset is set. but skb->sk and skb->ip_defrag_offset are same union member. so that frag->sk is not NULL. Hence crash occurrs in skb->sk check routine in ip_do_fragment() when defragmented packet is fragmented. test commands: %iptables -t nat -I POSTROUTING -j MASQUERADE %hping3 192.168.4.2 -s 1000 -p 2000 -d 60000 splat looks like: [ 261.069429] kernel BUG at net/ipv4/ip_output.c:636! [ 261.075753] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 261.083854] CPU: 1 PID: 1349 Comm: hping3 Not tainted 4.19.0-rc2+ #3 [ 261.100977] RIP: 0010:ip_do_fragment+0x1613/0x2600 [ 261.106945] Code: e8 e2 38 e3 fe 4c 8b 44 24 18 48 8b 74 24 08 e9 92 f6 ff ff 80 3c 02 00 0f 85 da 07 00 00 48 8b b5 d0 00 00 00 e9 25 f6 ff ff <0f> 0b 0f 0b 44 8b 54 24 58 4c 8b 4c 24 18 4c 8b 5c 24 60 4c 8b 6c [ 261.127015] RSP: 0018:ffff8801031cf2c0 EFLAGS: 00010202 [ 261.134156] RAX: 1ffff1002297537b RBX: ffffed0020639e6e RCX: 0000000000000004 [ 261.142156] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880114ba9bd8 [ 261.150157] RBP: ffff880114ba8a40 R08: ffffed0022975395 R09: ffffed0022975395 [ 261.158157] R10: 0000000000000001 R11: ffffed0022975394 R12: ffff880114ba9ca4 [ 261.166159] R13: 0000000000000010 R14: ffff880114ba9bc0 R15: dffffc0000000000 [ 261.174169] FS: 00007fbae2199700(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000 [ 261.183012] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 261.189013] CR2: 00005579244fe000 CR3: 0000000119bf4000 CR4: 00000000001006e0 [ 261.198158] Call Trace: [ 261.199018] ? dst_output+0x180/0x180 [ 261.205011] ? save_trace+0x300/0x300 [ 261.209018] ? ip_copy_metadata+0xb00/0xb00 [ 261.213034] ? sched_clock_local+0xd4/0x140 [ 261.218158] ? kill_l4proto+0x120/0x120 [nf_conntrack] [ 261.223014] ? rt_cpu_seq_stop+0x10/0x10 [ 261.227014] ? find_held_lock+0x39/0x1c0 [ 261.233008] ip_finish_output+0x51d/0xb50 [ 261.237006] ? ip_fragment.constprop.56+0x220/0x220 [ 261.243011] ? nf_ct_l4proto_register_one+0x5b0/0x5b0 [nf_conntrack] [ 261.250152] ? rcu_is_watching+0x77/0x120 [ 261.255010] ? nf_nat_ipv4_out+0x1e/0x2b0 [nf_nat_ipv4] [ 261.261033] ? nf_hook_slow+0xb1/0x160 [ 261.265007] ip_output+0x1c7/0x710 [ 261.269005] ? ip_mc_output+0x13f0/0x13f0 [ 261.273002] ? __local_bh_enable_ip+0xe9/0x1b0 [ 261.278152] ? ip_fragment.constprop.56+0x220/0x220 [ 261.282996] ? nf_hook_slow+0xb1/0x160 [ 261.287007] raw_sendmsg+0x21f9/0x4420 [ 261.291008] ? dst_output+0x180/0x180 [ 261.297003] ? sched_clock_cpu+0x126/0x170 [ 261.301003] ? find_held_lock+0x39/0x1c0 [ 261.306155] ? stop_critical_timings+0x420/0x420 [ 261.311004] ? check_flags.part.36+0x450/0x450 [ 261.315005] ? _raw_spin_unlock_irq+0x29/0x40 [ 261.320995] ? _raw_spin_unlock_irq+0x29/0x40 [ 261.326142] ? cyc2ns_read_end+0x10/0x10 [ 261.330139] ? raw_bind+0x280/0x280 [ 261.334138] ? sched_clock_cpu+0x126/0x170 [ 261.338995] ? check_flags.part.36+0x450/0x450 [ 261.342991] ? __lock_acquire+0x4500/0x4500 [ 261.348994] ? inet_sendmsg+0x11c/0x500 [ 261.352989] ? dst_output+0x180/0x180 [ 261.357012] inet_sendmsg+0x11c/0x500 [ ... ] v2: - clear skb->sk at reassembly routine.(Eric Dumarzet) Fixes: fa0f5273 ("ip: use rb trees for IP frag queue.") Suggested-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NTaehee Yoo <ap420073@gmail.com> Reviewed-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 9月, 2018 1 次提交
-
-
由 Hangbin Liu 提交于
man ip-tunnel ttl section says: 0 is a special value meaning that packets inherit the TTL value. IPv4 tunnel respect this in ip_tunnel_xmit(), but IPv6 tunnel has not implement it yet. To make IPv6 behave consistently with IP tunnel, add ipv6 tunnel inherit support. Signed-off-by: NHangbin Liu <liuhangbin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 9月, 2018 1 次提交
-
-
由 David Ahern 提交于
Jan reported a regression after an update to 4.18.5. In this case ipv6 default route is setup by systemd-networkd based on data from an RA. The RA contains an MTU of 1492 which is used when the route is first inserted but then systemd-networkd pushes down updates to the default route without the mtu set. Prior to the change to fib6_info, metrics such as MTU were held in the dst_entry and rt6i_pmtu in rt6_info contained an update to the mtu if any. ip6_mtu would look at rt6i_pmtu first and use it if set. If not, the value from the metrics is used if it is set and finally falling back to the idev value. After the fib6_info change metrics are contained in the fib6_info struct and there is no equivalent to rt6i_pmtu. To maintain consistency with the old behavior the new code should only reset the MTU in the metrics if the route update has it set. Fixes: d4ead6b3 ("net/ipv6: move metrics from dst to rt6_info") Reported-by: NJan Janssen <medhefgo@web.de> Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 9月, 2018 1 次提交
-
-
由 Alexey Kodanev 提交于
Commit 80f1a0f4 ("net/ipv6: Put lwtstate when destroying fib6_info") partially fixed the kmemleak [1], lwtstate can be copied from fib6_info, with ip6_rt_copy_init(), and it should be done only once there. rt->dst.lwtstate is set by ip6_rt_init_dst(), at the start of the function ip6_rt_copy_init(), so there is no need to get it again at the end. With this patch, lwtstate also isn't copied from RTF_REJECT routes. [1]: unreferenced object 0xffff880b6aaa14e0 (size 64): comm "ip", pid 10577, jiffies 4295149341 (age 1273.903s) hex dump (first 32 bytes): 01 00 04 00 04 00 00 00 10 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000018664623>] lwtunnel_build_state+0x1bc/0x420 [<00000000b73aa29a>] ip6_route_info_create+0x9f7/0x1fd0 [<00000000ee2c5d1f>] ip6_route_add+0x14/0x70 [<000000008537b55c>] inet6_rtm_newroute+0xd9/0xe0 [<000000002acc50f5>] rtnetlink_rcv_msg+0x66f/0x8e0 [<000000008d9cd381>] netlink_rcv_skb+0x268/0x3b0 [<000000004c893c76>] netlink_unicast+0x417/0x5a0 [<00000000f2ab1afb>] netlink_sendmsg+0x70b/0xc30 [<00000000890ff0aa>] sock_sendmsg+0xb1/0xf0 [<00000000a2e7b66f>] ___sys_sendmsg+0x659/0x950 [<000000001e7426c8>] __sys_sendmsg+0xde/0x170 [<00000000fe411443>] do_syscall_64+0x9f/0x4a0 [<000000001be7b28b>] entry_SYSCALL_64_after_hwframe+0x49/0xbe [<000000006d21f353>] 0xffffffffffffffff Fixes: 6edb3c96 ("net/ipv6: Defer initialization of dst to data path") Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 8月, 2018 3 次提交
-
-
由 Sabrina Dubroca 提交于
Commit 6d0bfe22 ("net: ipv6: Add IPv6 support to the ping socket.") contains an error in the cleanup path of inet6_init(): when proto_register(&pingv6_prot, 1) fails, we try to unregister &pingv6_prot. When rawv6_init() fails, we skip unregistering &pingv6_prot. Example of panic (triggered by faking a failure of proto_register(&pingv6_prot, 1)): general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI [...] RIP: 0010:__list_del_entry_valid+0x79/0x160 [...] Call Trace: proto_unregister+0xbb/0x550 ? trace_preempt_on+0x6f0/0x6f0 ? sock_no_shutdown+0x10/0x10 inet6_init+0x153/0x1b8 Fixes: 6d0bfe22 ("net: ipv6: Add IPv6 support to the ping socket.") Signed-off-by: NSabrina Dubroca <sd@queasysnail.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sabrina Dubroca 提交于
Commit 15e66807 ("ipv6: reorder icmpv6_init() and ip6_mr_init()") moved the cleanup label for ipmr_fail, but should have changed the contents of the cleanup labels as well. Now we can end up cleaning up icmpv6 even though it hasn't been initialized (jump to icmp_fail or ipmr_fail). Simply undo things in the reverse order of their initialization. Example of panic (triggered by faking a failure of icmpv6_init): kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI [...] RIP: 0010:__list_del_entry_valid+0x79/0x160 [...] Call Trace: ? lock_release+0x8a0/0x8a0 unregister_pernet_operations+0xd4/0x560 ? ops_free_list+0x480/0x480 ? down_write+0x91/0x130 ? unregister_pernet_subsys+0x15/0x30 ? down_read+0x1b0/0x1b0 ? up_read+0x110/0x110 ? kmem_cache_create_usercopy+0x1b4/0x240 unregister_pernet_subsys+0x1d/0x30 icmpv6_cleanup+0x1d/0x30 inet6_init+0x1b5/0x23f Fixes: 15e66807 ("ipv6: reorder icmpv6_init() and ip6_mr_init()") Signed-off-by: NSabrina Dubroca <sd@queasysnail.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexey Kodanev 提交于
Before the commit d6990976 ("vti6: fix PMTU caching and reporting on xmit") '!skb->ignore_df' check was always true because the function skb_scrub_packet() was called before it, resetting ignore_df to zero. In the commit, skb_scrub_packet() was moved below, and now this check can be false for the packet, e.g. when sending it in the two fragments, this prevents successful PMTU updates in such case. The next attempts to send the packet lead to the same tx error. Moreover, vti6 initial MTU value relies on PMTU adjustments. This issue can be reproduced with the following LTP test script: udp_ipsec_vti.sh -6 -p ah -m tunnel -s 2000 Fixes: ccd740cb ("vti6: Add pmtu handling to vti6_xmit.") Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com> Acked-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 8月, 2018 1 次提交
-
-
由 Xin Long 提交于
After erspan_ver is introudced, if erspan_ver is not set in iproute, its value will be left 0 by default. Since Commit 02f99df1 ("erspan: fix invalid erspan version."), it has broken the traffic due to the version check in erspan_xmit if users are not aware of 'erspan_ver' param, like using an old version of iproute. To fix this compatibility problem, it sets erspan_ver to 1 by default when adding an erspan dev in erspan_setup. Note that we can't do it in ipgre_netlink_parms, as this function is also used by ipgre_changelink. Fixes: 02f99df1 ("erspan: fix invalid erspan version.") Reported-by: NJianlin Shi <jishi@redhat.com> Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 8月, 2018 2 次提交
-
-
由 Hangbin Liu 提交于
Commit 6edb3c96 ("net/ipv6: Defer initialization of dst to data path") forgot to handle anycast route and init anycast rt->dst.input to ip6_forward. Fix it by setting anycast rt->dst.input back to ip6_input. Fixes: 6edb3c96 ("net/ipv6: Defer initialization of dst to data path") Signed-off-by: NHangbin Liu <liuhangbin@gmail.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Cong Wang 提交于
All the 3 callers of addrconf_add_mroute() assert RTNL lock, they don't take any additional lock either, so it is safe to convert it to GFP_KERNEL. Same for sit_add_v4_addrs(). Cc: David Ahern <dsahern@gmail.com> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 8月, 2018 1 次提交
-
-
由 David Ahern 提交于
Prior to the introduction of fib6_info lwtstate was managed by the dst code. With fib6_info releasing lwtstate needs to be done when the struct is freed. Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes") Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 8月, 2018 2 次提交
-
-
由 Haishuang Yan 提交于
If load ip6_vti module and create a network namespace when set fb_tunnels_only_for_init_net to 1, then exit the namespace will cause following crash: [ 6601.677036] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 6601.679057] PGD 8000000425eca067 P4D 8000000425eca067 PUD 424292067 PMD 0 [ 6601.680483] Oops: 0000 [#1] SMP PTI [ 6601.681223] CPU: 7 PID: 93 Comm: kworker/u16:1 Kdump: loaded Tainted: G E 4.18.0+ #3 [ 6601.683153] Hardware name: Fedora Project OpenStack Nova, BIOS seabios-1.7.5-11.el7 04/01/2014 [ 6601.684919] Workqueue: netns cleanup_net [ 6601.685742] RIP: 0010:vti6_exit_batch_net+0x87/0xd0 [ip6_vti] [ 6601.686932] Code: 7b 08 48 89 e6 e8 b9 ea d3 dd 48 8b 1b 48 85 db 75 ec 48 83 c5 08 48 81 fd 00 01 00 00 75 d5 49 8b 84 24 08 01 00 00 48 89 e6 <48> 8b 78 08 e8 90 ea d3 dd 49 8b 45 28 49 39 c6 4c 8d 68 d8 75 a1 [ 6601.690735] RSP: 0018:ffffa897c2737de0 EFLAGS: 00010246 [ 6601.691846] RAX: 0000000000000000 RBX: 0000000000000000 RCX: dead000000000200 [ 6601.693324] RDX: 0000000000000015 RSI: ffffa897c2737de0 RDI: ffffffff9f2ea9e0 [ 6601.694824] RBP: 0000000000000100 R08: 0000000000000000 R09: 0000000000000000 [ 6601.696314] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8dc323c07e00 [ 6601.697812] R13: ffff8dc324a63100 R14: ffffa897c2737e30 R15: ffffa897c2737e30 [ 6601.699345] FS: 0000000000000000(0000) GS:ffff8dc33fdc0000(0000) knlGS:0000000000000000 [ 6601.701068] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6601.702282] CR2: 0000000000000008 CR3: 0000000424966002 CR4: 00000000001606e0 [ 6601.703791] Call Trace: [ 6601.704329] cleanup_net+0x1b4/0x2c0 [ 6601.705268] process_one_work+0x16c/0x370 [ 6601.706145] worker_thread+0x49/0x3e0 [ 6601.706942] kthread+0xf8/0x130 [ 6601.707626] ? rescuer_thread+0x340/0x340 [ 6601.708476] ? kthread_bind+0x10/0x10 [ 6601.709266] ret_from_fork+0x35/0x40 Reproduce: modprobe ip6_vti echo 1 > /proc/sys/net/core/fb_tunnels_only_for_init_net unshare -n exit This because ip6n->tnls_wc[0] point to fallback device in default, but in non-default namespace, ip6n->tnls_wc[0] will be NULL, so add the NULL check comparatively. Fixes: e2948e5a ("ip6_vti: fix creating fallback tunnel device for vti6") Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Haishuang Yan 提交于
When set fb_tunnels_only_for_init_net to 1, don't create fallback tunnel device for vti6 when a new namespace is created. Tested: [root@builder2 ~]# modprobe ip6_tunnel [root@builder2 ~]# modprobe ip6_vti [root@builder2 ~]# echo 1 > /proc/sys/net/core/fb_tunnels_only_for_init_net [root@builder2 ~]# unshare -n [root@builder2 ~]# ip link 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 8月, 2018 1 次提交
-
-
由 Haishuang Yan 提交于
Same as ip_vti, use iptunnel_xmit_stats to updates stats in tunnel xmit code path. Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 8月, 2018 1 次提交
-
-
由 Florian Westphal 提交于
Roman reports that DHCPv6 client no longer sees replies from server due to ip6tables -t raw -A PREROUTING -m rpfilter --invert -j DROP rule. We need to set the F_IFACE flag for linklocal addresses, they are scoped per-device. Fixes: 47b7e7f8 ("netfilter: don't set F_IFACE on ipv6 fib lookups") Reported-by: NRoman Mamedov <rm@romanrm.net> Tested-by: NRoman Mamedov <rm@romanrm.net> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 13 8月, 2018 1 次提交
-
-
由 Virgile Jarry 提交于
Preventing the kernel from responding to ICMP Echo Requests messages can be useful in several ways. The sysctl parameter 'icmp_echo_ignore_all' can be used to prevent the kernel from responding to IPv4 ICMP echo requests. For IPv6 pings, such a sysctl kernel parameter did not exist. Add the ability to prevent the kernel from responding to IPv6 ICMP echo requests through the use of the following sysctl parameter : /proc/sys/net/ipv6/icmp/echo_ignore_all. Update the documentation to reflect this change. Signed-off-by: NVirgile Jarry <virgile@acceis.fr> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 8月, 2018 1 次提交
-
-
由 Martin KaFai Lau 提交于
This patch allows a BPF_PROG_TYPE_SK_REUSEPORT bpf prog to select a SO_REUSEPORT sk from a BPF_MAP_TYPE_REUSEPORT_ARRAY introduced in the earlier patch. "bpf_run_sk_reuseport()" will return -ECONNREFUSED when the BPF_PROG_TYPE_SK_REUSEPORT prog returns SK_DROP. The callers, in inet[6]_hashtable.c and ipv[46]/udp.c, are modified to handle this case and return NULL immediately instead of continuing the sk search from its hashtable. It re-uses the existing SO_ATTACH_REUSEPORT_EBPF setsockopt to attach BPF_PROG_TYPE_SK_REUSEPORT. The "sk_reuseport_attach_bpf()" will check if the attaching bpf prog is in the new SK_REUSEPORT or the existing SOCKET_FILTER type and then check different things accordingly. One level of "__reuseport_attach_prog()" call is removed. The "sk_unhashed() && ..." and "sk->sk_reuseport_cb" tests are pushed back to "reuseport_attach_prog()" in sock_reuseport.c. sock_reuseport.c seems to have more knowledge on those test requirements than filter.c. In "reuseport_attach_prog()", after new_prog is attached to reuse->prog, the old_prog (if any) is also directly freed instead of returning the old_prog to the caller and asking the caller to free. The sysctl_optmem_max check is moved back to the "sk_reuseport_attach_filter()" and "sk_reuseport_attach_bpf()". As of other bpf prog types, the new BPF_PROG_TYPE_SK_REUSEPORT is only bounded by the usual "bpf_prog_charge_memlock()" during load time instead of bounded by both bpf_prog_charge_memlock and sysctl_optmem_max. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 10 8月, 2018 1 次提交
-
-
由 Maria Pasechnik 提交于
IPv6 GRO over GRE tap is not working while GRO is not set over the native interface. gro_list_prepare function updates the same_flow variable of existing sessions to 1 if their mac headers match the one of the incoming packet. same_flow is used to filter out non-matching sessions and keep potential ones for aggregation. The number of bytes to compare should be the number of bytes in the mac headers. In gro_list_prepare this number is set to be skb->dev->hard_header_len. For GRE interfaces this hard_header_len should be as it is set in the initialization process (when GRE is created), it should not be overridden. But currently it is being overridden by the value that is actually supposed to represent the needed_headroom. Therefore, the number of bytes compared in order to decide whether the the mac headers are the same is greater than the length of the headers. As it's documented in netdevice.h, hard_header_len is the maximum hardware header length, and needed_headroom is the extra headroom the hardware may need. hard_header_len is basically all the bytes received by the physical till layer 3 header of the packet received by the interface. For example, if the interface is a GRE tap then the needed_headroom should be the total length of the following headers: IP header of the physical, GRE header, mac header of GRE. It is often used to calculate the MTU of the created interface. This patch removes the override of the hard_header_len, and assigns the calculated value to needed_headroom. This way, the comparison in gro_list_prepare is really of the mac headers, and if the packets have the same mac headers the same_flow will be set to 1. Performance testing: 45% higher bandwidth. Measuring bandwidth of single-stream IPv4 TCP traffic over IPv6 GRE tap while GRO is not set on the native. NIC: ConnectX-4LX Before (GRO not working) : 7.2 Gbits/sec After (GRO working): 10.5 Gbits/sec Signed-off-by: NMaria Pasechnik <mariap@mellanox.com> Signed-off-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 8月, 2018 1 次提交
-
-
由 Shmulik Ladkani 提交于
When using an ip6tnl device in collect_md mode, the xmit methods ignore the ipv6.src field present in skb_tunnel_info's key, both for route calculation purposes (flowi6 construction) and for assigning the packet's final ipv6h->saddr. This makes it impossible specifying a desired ipv6 local address in the encapsulating header (for example, when using tc action tunnel_key). This is also not aligned with behavior of ipip (ipv4) in collect_md mode, where the key->u.ipv4.src gets used. Fix, by assigning fl6.saddr with given key->u.ipv6.src. In case ipv6.src is not specified, ip6_tnl_xmit uses existing saddr selection code. Fixes: 8d79266b ("ip6_tunnel: add collect_md mode to IPv6 tunnels") Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com> Reviewed-by: NEyal Birger <eyal.birger@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 8月, 2018 5 次提交
-
-
由 Xin Long 提交于
According to RFC791, 68 bytes is the minimum size of IPv4 datagram every device must be able to forward without further fragmentation while 576 bytes is the minimum size of IPv4 datagram every device has to be able to receive, so in ip6_tnl_xmit(), 68(IPV4_MIN_MTU) should be the right value for the ipv4 min mtu check in ip6_tnl_xmit. While at it, change to use max() instead of if statement. Fixes: c9fefa08 ("ip6_tunnel: get the min mtu properly in ip6_tnl_xmit") Reported-by: NSabrina Dubroca <sd@queasysnail.net> Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Cong Wang 提交于
All the callers of ip6_rt_copy_init()/rt6_set_from() hold refcnt of the "from" fib6_info, so there is no need to hold fib6_metrics refcnt again, because fib6_metrics refcnt is only released when fib6_info is gone, that is, they have the same life time, so the whole fib6_metrics refcnt can be removed actually. This fixes a kmemleak warning reported by Sabrina. Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes") Reported-by: NSabrina Dubroca <sd@queasysnail.net> Cc: Sabrina Dubroca <sd@queasysnail.net> Cc: David Ahern <dsahern@gmail.com> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
don't bother with pathological cases, they only waste cycles. IPv6 requires a minimum MTU of 1280 so we should never see fragments smaller than this (except last frag). v3: don't use awkward "-offset + len" v2: drop IPv4 part, which added same check w. IPV4_MIN_MTU (68). There were concerns that there could be even smaller frags generated by intermediate nodes, e.g. on radio networks. Cc: Peter Oskolkov <posk@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Peter Oskolkov 提交于
Similar to TCP OOO RX queue, it makes sense to use rb trees to store IP fragments, so that OOO fragments are inserted faster. Tested: - a follow-up patch contains a rather comprehensive ip defrag self-test (functional) - ran neper `udp_stream -c -H <host> -F 100 -l 300 -T 20`: netstat --statistics Ip: 282078937 total packets received 0 forwarded 0 incoming packets discarded 946760 incoming packets delivered 18743456 requests sent out 101 fragments dropped after timeout 282077129 reassemblies required 944952 packets reassembled ok 262734239 packet reassembles failed (The numbers/stats above are somewhat better re: reassemblies vs a kernel without this patchset. More comprehensive performance testing TBD). Reported-by: NJann Horn <jannh@google.com> Reported-by: NJuha-Matti Tilli <juha-matti.tilli@iki.fi> Suggested-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NPeter Oskolkov <posk@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Georg Kohmann 提交于
When a ICMPV6_PKT_TOOBIG is received from a link local address the pmtu will be updated on a route with an arbitrary interface index. Subsequent packets sent back to the same link local address may therefore end up not considering the updated pmtu. Current behavior breaks TAHI v6LC4.1.4 Reduce PMTU On-link. Referring to RFC 1981: Section 3: "Note that Path MTU Discovery must be performed even in cases where a node "thinks" a destination is attached to the same link as itself. In a situation such as when a neighboring router acts as proxy [ND] for some destination, the destination can to appear to be directly connected but is in fact more than one hop away." Using the interface index from the incoming ICMPV6_PKT_TOOBIG when updating the pmtu. Signed-off-by: NGeorg Kohmann <geokohma@cisco.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 8月, 2018 2 次提交
-
-
由 David S. Miller 提交于
This reverts commit df18b504. This change causes other problems and use-after-free situations as found by syzbot. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vincent Bernat 提交于
The construction "net->ipv4.sysctl_ip_nonlocal_bind || inet->freebind || inet->transparent" is present three times and its IPv6 counterpart is also present three times. We introduce two small helpers to characterize these tests uniformly. Signed-off-by: NVincent Bernat <vincent@bernat.im> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 7月, 2018 2 次提交
-
-
由 Mathieu Xhonneux 提交于
The seg6local LWT provides the End.DT6 action, which allows to decapsulate an outer IPv6 header containing a Segment Routing Header (SRH), full specification is available here: https://tools.ietf.org/html/draft-filsfils-spring-srv6-network-programming-05 This patch adds this action now to the seg6local BPF interface. Since it is not mandatory that the inner IPv6 header also contains a SRH, seg6_bpf_srh_state has been extended with a pointer to a possible SRH of the outermost IPv6 header. This helps assessing if the validation must be triggered or not, and avoids some calls to ipv6_find_hdr. v3: s/1/true, s/0/false for boolean values v2: - changed true/false -> 1/0 - preempt_enable no longer called in first conditional block Signed-off-by: NMathieu Xhonneux <m.xhonneux@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Sabrina Dubroca 提交于
Since commit d4ead6b3 ("net/ipv6: move metrics from dst to rt6_info"), ipv6 metrics are shared and refcounted. rt6_set_from() assigns the rt->from pointer and increases the refcount on from's metrics. This reference is never released. Introduce the fib6_metrics_release() helper and use it to release the metrics. Fixes: d4ead6b3 ("net/ipv6: move metrics from dst to rt6_info") Signed-off-by: NSabrina Dubroca <sd@queasysnail.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 7月, 2018 1 次提交
-
-
由 Vincent Bernat 提交于
When freebind feature is set of an IPv6 socket, any source address can be used when sending UDP datagrams using IPv6 PKTINFO ancillary message. Global non-local bind feature was added in commit 35a256fe ("ipv6: Nonlocal bind") for IPv6. This commit also allows IPv6 source address spoofing when non-local bind feature is enabled. Signed-off-by: NVincent Bernat <vincent@bernat.im> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 7月, 2018 3 次提交
-
-
由 Willem de Bruijn 提交于
Syzbot reported a read beyond the end of the skb head when returning IPV6_ORIGDSTADDR: BUG: KMSAN: kernel-infoleak in put_cmsg+0x5ef/0x860 net/core/scm.c:242 CPU: 0 PID: 4501 Comm: syz-executor128 Not tainted 4.17.0+ #9 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:113 kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1125 kmsan_internal_check_memory+0x138/0x1f0 mm/kmsan/kmsan.c:1219 kmsan_copy_to_user+0x7a/0x160 mm/kmsan/kmsan.c:1261 copy_to_user include/linux/uaccess.h:184 [inline] put_cmsg+0x5ef/0x860 net/core/scm.c:242 ip6_datagram_recv_specific_ctl+0x1cf3/0x1eb0 net/ipv6/datagram.c:719 ip6_datagram_recv_ctl+0x41c/0x450 net/ipv6/datagram.c:733 rawv6_recvmsg+0x10fb/0x1460 net/ipv6/raw.c:521 [..] This logic and its ipv4 counterpart read the destination port from the packet at skb_transport_offset(skb) + 4. With MSG_MORE and a local SOCK_RAW sender, syzbot was able to cook a packet that stores headers exactly up to skb_transport_offset(skb) in the head and the remainder in a frag. Call pskb_may_pull before accessing the pointer to ensure that it lies in skb head. Link: http://lkml.kernel.org/r/CAF=yD-LEJwZj5a1-bAAj2Oy_hKmGygV6rsJ_WOrAYnv-fnayiQ@mail.gmail.com Reported-by: syzbot+9adb4b567003cac781f0@syzkaller.appspotmail.com Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stephen Hemminger 提交于
Several files have extra line at end of file. Signed-off-by: NStephen Hemminger <stephen@networkplumber.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stephen Hemminger 提交于
Signed-off-by: NStephen Hemminger <stephen@networkplumber.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 7月, 2018 3 次提交
-
-
由 Paolo Abeni 提交于
The skb hash for locally generated ip[v6] fragments belonging to the same datagram can vary in several circumstances: * for connected UDP[v6] sockets, the first fragment get its hash via set_owner_w()/skb_set_hash_from_sk() * for unconnected IPv6 UDPv6 sockets, the first fragment can get its hash via ip6_make_flowlabel()/skb_get_hash_flowi6(), if auto_flowlabel is enabled For the following frags the hash is usually computed via skb_get_hash(). The above can cause OoO for unconnected IPv6 UDPv6 socket: in that scenario the egress tx queue can be selected on a per packet basis via the skb hash. It may also fool flow-oriented schedulers to place fragments belonging to the same datagram in different flows. Fix the issue by copying the skb hash from the head frag into the others at fragmentation time. Before this commit: perf probe -a "dev_queue_xmit skb skb->hash skb->l4_hash:b1@0/8 skb->sw_hash:b1@1/8" netperf -H $IPV4 -t UDP_STREAM -l 5 -- -m 2000 -n & perf record -e probe:dev_queue_xmit -e probe:skb_set_owner_w -a sleep 0.1 perf script probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=3713014309 l4_hash=1 sw_hash=0 probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=0 l4_hash=0 sw_hash=0 After this commit: probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=2171763177 l4_hash=1 sw_hash=0 probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=2171763177 l4_hash=1 sw_hash=0 Fixes: b73c3d0e ("net: Save TX flow hash in sock and set in skbuf on xmit") Fixes: 67800f9b ("ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Wei Wang 提交于
In the code path where only rcu read lock is held, e.g. in the route lookup code path, it is not safe to directly call fib6_info_hold() because the fib6_info may already have been deleted but still exists in the rcu grace period. Holding reference to it could cause double free and crash the kernel. This patch adds a new function fib6_info_hold_safe() and replace fib6_info_hold() in all necessary places. Syzbot reported 3 crash traces because of this. One of them is: 8021q: adding VLAN 0 to HW filter on device team0 IPv6: ADDRCONF(NETDEV_CHANGE): team0: link becomes ready dst_release: dst:(____ptrval____) refcnt:-1 dst_release: dst:(____ptrval____) refcnt:-2 WARNING: CPU: 1 PID: 4845 at include/net/dst.h:239 dst_hold include/net/dst.h:239 [inline] WARNING: CPU: 1 PID: 4845 at include/net/dst.h:239 ip6_setup_cork+0xd66/0x1830 net/ipv6/ip6_output.c:1204 dst_release: dst:(____ptrval____) refcnt:-1 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 4845 Comm: syz-executor493 Not tainted 4.18.0-rc3+ #10 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 panic+0x238/0x4e7 kernel/panic.c:184 dst_release: dst:(____ptrval____) refcnt:-2 dst_release: dst:(____ptrval____) refcnt:-3 __warn.cold.8+0x163/0x1ba kernel/panic.c:536 dst_release: dst:(____ptrval____) refcnt:-4 report_bug+0x252/0x2d0 lib/bug.c:186 fixup_bug arch/x86/kernel/traps.c:178 [inline] do_error_trap+0x1fc/0x4d0 arch/x86/kernel/traps.c:296 dst_release: dst:(____ptrval____) refcnt:-5 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:316 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992 RIP: 0010:dst_hold include/net/dst.h:239 [inline] RIP: 0010:ip6_setup_cork+0xd66/0x1830 net/ipv6/ip6_output.c:1204 Code: c1 ed 03 89 9d 18 ff ff ff 48 b8 00 00 00 00 00 fc ff df 41 c6 44 05 00 f8 e9 2d 01 00 00 4c 8b a5 c8 fe ff ff e8 1a f6 e6 fa <0f> 0b e9 6a fc ff ff e8 0e f6 e6 fa 48 8b 85 d0 fe ff ff 48 8d 78 RSP: 0018:ffff8801a8fcf178 EFLAGS: 00010293 RAX: ffff8801a8eba5c0 RBX: 0000000000000000 RCX: ffffffff869511e6 RDX: 0000000000000000 RSI: ffffffff869515b6 RDI: 0000000000000005 RBP: ffff8801a8fcf2c8 R08: ffff8801a8eba5c0 R09: ffffed0035ac8338 R10: ffffed0035ac8338 R11: ffff8801ad6419c3 R12: ffff8801a8fcf720 R13: ffff8801a8fcf6a0 R14: ffff8801ad6419c0 R15: ffff8801ad641980 ip6_make_skb+0x2c8/0x600 net/ipv6/ip6_output.c:1768 udpv6_sendmsg+0x2c90/0x35f0 net/ipv6/udp.c:1376 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798 sock_sendmsg_nosec net/socket.c:641 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:651 ___sys_sendmsg+0x51d/0x930 net/socket.c:2125 __sys_sendmmsg+0x240/0x6f0 net/socket.c:2220 __do_sys_sendmmsg net/socket.c:2249 [inline] __se_sys_sendmmsg net/socket.c:2246 [inline] __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2246 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x446ba9 Code: e8 cc bb 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fb39a469da8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 00000000006dcc54 RCX: 0000000000446ba9 RDX: 00000000000000b8 RSI: 0000000020001b00 RDI: 0000000000000003 RBP: 00000000006dcc50 R08: 00007fb39a46a700 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 45c828efc7a64843 R13: e6eeb815b9d8a477 R14: 5068caf6f713c6fc R15: 0000000000000001 Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled Rebooting in 86400 seconds.. Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes") Reported-by: syzbot+902e2a1bcd4f7808cef5@syzkaller.appspotmail.com Reported-by: syzbot+8ae62d67f647abeeceb9@syzkaller.appspotmail.com Reported-by: syzbot+3f08feb14086930677d0@syzkaller.appspotmail.com Signed-off-by: NWei Wang <weiwan@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 YueHaibing 提交于
Replace calls to kmalloc followed by a memcpy with a direct call to kmemdup. Signed-off-by: NYueHaibing <yuehaibing@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 7月, 2018 3 次提交
-
-
由 Hangbin Liu 提交于
There are two scenarios that we will restore deleted records. The first is when device down and up(or unmap/remap). In this scenario the new filter mode is same with previous one. Because we get it from in_dev->mc_list and we do not touch it during device down and up. The other scenario is when a new socket join a group which was just delete and not finish sending status reports. In this scenario, we should use the current filter mode instead of restore old one. Here are 4 cases in total. old_socket new_socket before_fix after_fix IN(A) IN(A) ALLOW(A) ALLOW(A) IN(A) EX( ) TO_IN( ) TO_EX( ) EX( ) IN(A) TO_EX( ) ALLOW(A) EX( ) EX( ) TO_EX( ) TO_EX( ) Fixes: 24803f38 (igmp: do not remove igmp souce list info when set link down) Fixes: 1666d49e (mld: do not remove mld souce list info when set link down) Signed-off-by: NHangbin Liu <liuhangbin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hangbin Liu 提交于
Remove the mode parameter for igmp/igmp6_group_added as we can get it from first parameter. Fixes: 6e2059b5 (ipv4/igmp: init group mode as INCLUDE when join source group) Fixes: c7ea20c9 (ipv6/mcast: init as INCLUDE when join SSM INCLUDE group) Signed-off-by: NHangbin Liu <liuhangbin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Example setup: host: ip -6 addr add dev eth1 2001:db8:104::4 where eth1 is enslaved to a VRF switch: ip -6 ro add 2001:db8:104::4/128 dev br1 where br1 only has an LLA ping6 2001:db8:104::4 ssh 2001:db8:104::4 (NOTE: UDP works fine if the PKTINFO has the address set to the global address and ifindex is set to the index of eth1 with a destination an LLA). For ICMP, icmp6_iif needs to be updated to check if skb->dev is an L3 master. If it is then return the ifindex from rt6i_idev similar to what is done for loopback. For TCP, restore the original tcp_v6_iif definition which is needed in most places and add a new tcp_v6_iif_l3_slave that considers the l3_slave variability. This latter check is only needed for socket lookups. Fixes: 9ff74384 ("net: vrf: Handle ipv6 multicast and link-local addresses") Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 7月, 2018 1 次提交
-
-
由 Shannon Nelson 提交于
The offload_handle should be an opaque data cookie for the driver to use, much like the data cookie for a timer or alarm callback. Thus, the XFRM stack should not be checking for non-zero, because the driver might use that to store an array reference, which could be zero, or some other zero but meaningful value. We can remove the checks for non-zero because there are plenty other attributes also being checked to see if there is an offload in place for the SA in question. Signed-off-by: NShannon Nelson <shannon.nelson@oracle.com> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-