- 23 5月, 2020 1 次提交
-
-
由 Roopa Prabhu 提交于
This patch introduces ecmp nexthops and nexthop groups for mac fdb entries. In subsequent patches this is used by the vxlan driver fdb entries. The use case is E-VPN multihoming [1,2,3] which requires bridged vxlan traffic to be load balanced to remote switches (vteps) belonging to the same multi-homed ethernet segment (This is analogous to a multi-homed LAG but over vxlan). Changes include new nexthop flag NHA_FDB for nexthops referenced by fdb entries. These nexthops only have ip. This patch includes appropriate checks to avoid routes referencing such nexthops. example: $ip nexthop add id 12 via 172.16.1.2 fdb $ip nexthop add id 13 via 172.16.1.3 fdb $ip nexthop add id 102 group 12/13 fdb $bridge fdb add 02:02:00:00:00:13 dev vxlan1000 nhid 101 self [1] E-VPN https://tools.ietf.org/html/rfc7432 [2] E-VPN VxLAN: https://tools.ietf.org/html/rfc8365 [3] LPC talk with mention of nexthop groups for L2 ecmp http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf v4 - fixed uninitialized variable reported by kernel test robot Reported-by: Nkernel test robot <rong.a.chen@intel.com> Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 5月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
Prepare for better compat ioctl handling by moving the user copy out of ipv6_route_ioctl. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 5月, 2020 3 次提交
-
-
由 Yonghong Song 提交于
Commit b121b341 ("bpf: Add PTR_TO_BTF_ID_OR_NULL support") adds a field btf_id_or_null_non0_off to bpf_prog->aux structure to indicate that the first ctx argument is PTR_TO_BTF_ID reg_type and all others are PTR_TO_BTF_ID_OR_NULL. This approach does not really scale if we have other different reg types in the future, e.g., a pointer to a buffer. This patch enables bpf_iter targets registering ctx argument reg types which may be different from the default one. For example, for pointers to structures, the default reg_type is PTR_TO_BTF_ID for tracing program. The target can register a particular pointer type as PTR_TO_BTF_ID_OR_NULL which can be used by the verifier to enforce accesses. Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200513180221.2949882-1-yhs@fb.com
-
由 Yonghong Song 提交于
Change func bpf_iter_unreg_target() parameter from target name to target reg_info, similar to bpf_iter_reg_target(). Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200513180220.2949737-1-yhs@fb.com
-
由 Yonghong Song 提交于
Currently bpf_iter_reg_target takes parameters from target and allocates memory to save them. This is really not necessary, esp. in the future we may grow information passed from targets to bpf_iter manager. The patch refactors the code so target reg_info becomes static and bpf_iter manager can just take a reference to it. Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200513180219.2949605-1-yhs@fb.com
-
- 10 5月, 2020 1 次提交
-
-
由 Yonghong Song 提交于
This patch added netlink and ipv6_route targets, using the same seq_ops (except show() and minor changes for stop()) for /proc/net/{netlink,ipv6_route}. The net namespace for these targets are the current net namespace at file open stage, similar to /proc/net/{netlink,ipv6_route} reference counting the net namespace at seq_file open stage. Since module is not supported for now, ipv6_route is supported only if the IPV6 is built-in, i.e., not compiled as a module. The restriction can be lifted once module is properly supported for bpf_iter. Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175910.2476329-1-yhs@fb.com
-
- 09 5月, 2020 2 次提交
-
-
由 Eric Dumazet 提交于
We currently have to adjust ipv6 route gc_thresh/max_size depending on number of cpus on a server, this makes very little sense. If the kernels sets /proc/sys/net/ipv6/route/gc_thresh to 1024 and /proc/sys/net/ipv6/route/max_size to 4096, then we better not track the percpu dst that our implementation uses. Only routes not added (directly or indirectly) by the admin should be tracked and limited. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: David Ahern <dsahern@kernel.org> Cc: Maciej Żenczykowski <maze@google.com> Acked-by: NWei Wang <weiwan@google.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Eric Dumazet 提交于
percpu_counter_add() uses a default batch size which is quite big on platforms with 256 cpus. (2*256 -> 512) This means dst_entries_get_fast() can be off by +/- 2*(nr_cpus^2) (131072 on servers with 256 cpus) Reduce the batch size to something more reasonable, and add logic to ip6_dst_gc() to call dst_entries_get_slow() before calling the _very_ expensive fib6_run_gc() function. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 08 5月, 2020 1 次提交
-
-
由 Maciej Żenczykowski 提交于
This reverts commit 19bda36c: | ipv6: add mtu lock check in __ip6_rt_update_pmtu | | Prior to this patch, ipv6 didn't do mtu lock check in ip6_update_pmtu. | It leaded to that mtu lock doesn't really work when receiving the pkt | of ICMPV6_PKT_TOOBIG. | | This patch is to add mtu lock check in __ip6_rt_update_pmtu just as ipv4 | did in __ip_rt_update_pmtu. The above reasoning is incorrect. IPv6 *requires* icmp based pmtu to work. There's already a comment to this effect elsewhere in the kernel: $ git grep -p -B1 -A3 'RTAX_MTU lock' net/ipv6/route.c=4813= static int rt6_mtu_change_route(struct fib6_info *f6i, void *p_arg) ... /* In IPv6 pmtu discovery is not optional, so that RTAX_MTU lock cannot disable it. We still use this lock to block changes caused by addrconf/ndisc. */ This reverts to the pre-4.9 behaviour. Cc: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Xin Long <lucien.xin@gmail.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NMaciej Żenczykowski <maze@google.com> Fixes: 19bda36c ("ipv6: add mtu lock check in __ip6_rt_update_pmtu") Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 5月, 2020 1 次提交
-
-
由 David Ahern 提交于
Nik reported a bug with pcpu dst cache when nexthop objects are used illustrated by the following: $ ip netns add foo $ ip -netns foo li set lo up $ ip -netns foo addr add 2001:db8:11::1/128 dev lo $ ip netns exec foo sysctl net.ipv6.conf.all.forwarding=1 $ ip li add veth1 type veth peer name veth2 $ ip li set veth1 up $ ip addr add 2001:db8:10::1/64 dev veth1 $ ip li set dev veth2 netns foo $ ip -netns foo li set veth2 up $ ip -netns foo addr add 2001:db8:10::2/64 dev veth2 $ ip -6 nexthop add id 100 via 2001:db8:10::2 dev veth1 $ ip -6 route add 2001:db8:11::1/128 nhid 100 Create a pcpu entry on cpu 0: $ taskset -a -c 0 ip -6 route get 2001:db8:11::1 Re-add the route entry: $ ip -6 ro del 2001:db8:11::1 $ ip -6 route add 2001:db8:11::1/128 nhid 100 Route get on cpu 0 returns the stale pcpu: $ taskset -a -c 0 ip -6 route get 2001:db8:11::1 RTNETLINK answers: Network is unreachable While cpu 1 works: $ taskset -a -c 1 ip -6 route get 2001:db8:11::1 2001:db8:11::1 from :: via 2001:db8:10::2 dev veth1 src 2001:db8:10::1 metric 1024 pref medium Conversion of FIB entries to work with external nexthop objects missed an important difference between IPv4 and IPv6 - how dst entries are invalidated when the FIB changes. IPv4 has a per-network namespace generation id (rt_genid) that is bumped on changes to the FIB. Checking if a dst_entry is still valid means comparing rt_genid in the rtable to the current value of rt_genid for the namespace. IPv6 also has a per network namespace counter, fib6_sernum, but the count is saved per fib6_node. With the per-node counter only dst_entries based on fib entries under the node are invalidated when changes are made to the routes - limiting the scope of invalidations. IPv6 uses a reference in the rt6_info, 'from', to track the corresponding fib entry used to create the dst_entry. When validating a dst_entry, the 'from' is used to backtrack to the fib6_node and check the sernum of it to the cookie passed to the dst_check operation. With the inline format (nexthop definition inline with the fib6_info), dst_entries cached in the fib6_nh have a 1:1 correlation between fib entries, nexthop data and dst_entries. With external nexthops, IPv6 looks more like IPv4 which means multiple fib entries across disparate fib6_nodes can all reference the same fib6_nh. That means validation of dst_entries based on external nexthops needs to use the IPv4 format - the per-network namespace counter. Add sernum to rt6_info and set it when creating a pcpu dst entry. Update rt6_get_cookie to return sernum if it is set and update dst_check for IPv6 to look for sernum set and based the check on it if so. Finally, rt6_get_pcpu_route needs to validate the cached entry before returning a pcpu entry (similar to the rt_cache_valid calls in __mkroute_input and __mkroute_output for IPv4). This problem only affects routes using the new, external nexthops. Thanks to the kbuild test robot for catching the IS_ENABLED needed around rt_genid_ipv6 before I sent this out. Fixes: 5b98324e ("ipv6: Allow routes to use nexthop objects") Reported-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid Ahern <dsahern@kernel.org> Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Tested-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 4月, 2020 2 次提交
-
-
由 Roopa Prabhu 提交于
Current route nexthop API maintains user space compatibility with old route API by default. Dumps and netlink notifications support both new and old API format. In systems which have moved to the new API, this compatibility mode cancels some of the performance benefits provided by the new nexthop API. This patch adds new sysctl nexthop_compat_mode which is on by default but provides the ability to turn off compatibility mode allowing systems to run entirely with the new routing API. Old route API behaviour and support is not modified by this sysctl. Uses a single sysctl to cover both ipv4 and ipv6 following other sysctls. Covers dumps and delete notifications as suggested by David Ahern. Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Roopa Prabhu 提交于
Used in subsequent work to skip route delete notifications on nexthop deletes. Suggested-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 4月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
Instead of having all the sysctl handlers deal with user pointers, which is rather hairy in terms of the BPF interaction, copy the input to and from userspace in common code. This also means that the strings are always NUL-terminated by the common code, making the API a little bit safer. As most handler just pass through the data to one of the common handlers a lot of the changes are mechnical. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NAndrey Ignatov <rdna@fb.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 30 3月, 2020 1 次提交
-
-
由 Alexander Aring 提交于
The build_state callback of lwtunnel doesn't contain the net namespace structure yet. This patch will add it so we can check on specific address configuration at creation time of rpl source routes. Signed-off-by: NAlexander Aring <alex.aring@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 3月, 2020 1 次提交
-
-
由 David Laight 提交于
Previous changes to the IP routing code have removed all the tests for the DS_HOST route flag. Remove the flags and all the code that sets it. Signed-off-by: NDavid Laight <david.laight@aculab.com> Acked-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 3月, 2020 1 次提交
-
-
由 Joe Perches 提交于
Convert the various uses of fallthrough comments to fallthrough; Done via script Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe@perches.com/ And by hand: net/ipv6/ip6_fib.c has a fallthrough comment outside of an #ifdef block that causes gcc to emit a warning if converted in-place. So move the new fallthrough; inside the containing #ifdef/#endif too. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 2月, 2020 1 次提交
-
-
由 Benjamin Poirier 提交于
When splitting an RTA_MULTIPATH request into multiple routes and adding the second and later components, we must not simply remove NLM_F_REPLACE but instead replace it by NLM_F_CREATE. Otherwise, it may look like the netlink message was malformed. For example, ip route add 2001:db8::1/128 dev dummy0 ip route change 2001:db8::1/128 nexthop via fe80::30:1 dev dummy0 \ nexthop via fe80::30:2 dev dummy0 results in the following warnings: [ 1035.057019] IPv6: RTM_NEWROUTE with no NLM_F_CREATE or NLM_F_REPLACE [ 1035.057517] IPv6: NLM_F_CREATE should be set when creating new route This patch makes the nlmsg sequence look equivalent for __ip6_ins_rt() to what it would get if the multipath route had been added in multiple netlink operations: ip route add 2001:db8::1/128 dev dummy0 ip route change 2001:db8::1/128 nexthop via fe80::30:1 dev dummy0 ip route append 2001:db8::1/128 nexthop via fe80::30:2 dev dummy0 Fixes: 27596472 ("ipv6: fix ECMP route replacement") Signed-off-by: NBenjamin Poirier <bpoirier@cumulusnetworks.com> Reviewed-by: NMichal Kubecek <mkubecek@suse.cz> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 1月, 2020 1 次提交
-
-
由 Ido Schimmel 提交于
In a similar fashion to previous patch, add "offload" and "trap" indication to IPv6 routes. This is done by using two unused bits in 'struct fib6_info' to hold these indications. Capable drivers are expected to set these when processing the various in-kernel route notifications. Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Reviewed-by: NJiri Pirko <jiri@mellanox.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 12月, 2019 4 次提交
-
-
由 Ido Schimmel 提交于
Now that mlxsw is converted to use the new FIB notifications it is possible to delete the old ones and use the new replace / append / delete notifications. Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Reviewed-by: NJiri Pirko <jiri@mellanox.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ido Schimmel 提交于
When an entire multipath route is deleted, only emit a notification if it is the first route in the node. Emit a replace notification in case the last sibling is followed by another route. Otherwise, emit a delete notification. Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Reviewed-by: NJiri Pirko <jiri@mellanox.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ido Schimmel 提交于
In a similar fashion to previous patches, only notify the new multipath route if it is the first route in the node or if it was appended to such route. The type of the notification (replace vs. append) is determined based on the number of routes added ('nhn') and the number of sibling routes. If the two do not match, then an append notification should be sent. Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Reviewed-by: NJiri Pirko <jiri@mellanox.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hangbin Liu 提交于
The MTU update code is supposed to be invoked in response to real networking events that update the PMTU. In IPv6 PMTU update function __ip6_rt_update_pmtu() we called dst_confirm_neigh() to update neighbor confirmed time. But for tunnel code, it will call pmtu before xmit, like: - tnl_update_pmtu() - skb_dst_update_pmtu() - ip6_rt_update_pmtu() - __ip6_rt_update_pmtu() - dst_confirm_neigh() If the tunnel remote dst mac address changed and we still do the neigh confirm, we will not be able to update neigh cache and ping6 remote will failed. So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we should not be invoking dst_confirm_neigh() as we have no evidence of successful two-way communication at this point. On the other hand it is also important to keep the neigh reachability fresh for TCP flows, so we cannot remove this dst_confirm_neigh() call. To fix the issue, we have to add a new bool parameter for dst_ops.update_pmtu to choose whether we should do neigh update or not. I will add the parameter in this patch and set all the callers to true to comply with the previous way, and fix the tunnel code one by one on later patches. v5: No change. v4: No change. v3: Do not remove dst_confirm_neigh, but add a new bool parameter in dst_ops.update_pmtu to control whether we should do neighbor confirm. Also split the big patch to small ones for each area. v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu. Suggested-by: NDavid Miller <davem@davemloft.net> Reviewed-by: NGuillaume Nault <gnault@redhat.com> Acked-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NHangbin Liu <liuhangbin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 11月, 2019 1 次提交
-
-
由 Paolo Abeni 提交于
Use a per namespace counter, increment it on successful creation of any route using the source address, decrement it on deletion of such routes. This allows us to check easily if the routing decision in the current namespace depends on the packet source. Will be used by the next patch. Suggested-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 11月, 2019 1 次提交
-
-
由 Hangbin Liu 提交于
Previously we will return directly if (!rt || !rt->fib6_nh.fib_nh_gw_family) in function rt6_probe(), but after commit cc3a86c8 ("ipv6: Change rt6_probe to take a fib6_nh"), the logic changed to return if there is fib_nh_gw_family. Fixes: cc3a86c8 ("ipv6: Change rt6_probe to take a fib6_nh") Signed-off-by: NHangbin Liu <liuhangbin@gmail.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 11月, 2019 1 次提交
-
-
由 Eric Dumazet 提交于
While looking at a syzbot KCSAN report [1], I found multiple issues in this code : 1) fib6_nh->last_probe has an initial value of 0. While probably okay on 64bit kernels, this causes an issue on 32bit kernels since the time_after(jiffies, 0 + interval) might be false ~24 days after boot (for HZ=1000) 2) The data-race found by KCSAN I could use READ_ONCE() and WRITE_ONCE(), but we also can take the opportunity of not piling-up too many rt6_probe_deferred() works by using instead cmpxchg() so that only one cpu wins the race. [1] BUG: KCSAN: data-race in find_match / find_match write to 0xffff8880bb7aabe8 of 8 bytes by interrupt on cpu 1: rt6_probe net/ipv6/route.c:663 [inline] find_match net/ipv6/route.c:757 [inline] find_match+0x5bd/0x790 net/ipv6/route.c:733 __find_rr_leaf+0xe3/0x780 net/ipv6/route.c:831 find_rr_leaf net/ipv6/route.c:852 [inline] rt6_select net/ipv6/route.c:896 [inline] fib6_table_lookup+0x383/0x650 net/ipv6/route.c:2164 ip6_pol_route+0xee/0x5c0 net/ipv6/route.c:2200 ip6_pol_route_output+0x48/0x60 net/ipv6/route.c:2452 fib6_rule_lookup+0x3d6/0x470 net/ipv6/fib6_rules.c:117 ip6_route_output_flags_noref+0x16b/0x230 net/ipv6/route.c:2484 ip6_route_output_flags+0x50/0x1a0 net/ipv6/route.c:2497 ip6_dst_lookup_tail+0x25d/0xc30 net/ipv6/ip6_output.c:1049 ip6_dst_lookup_flow+0x68/0x120 net/ipv6/ip6_output.c:1150 inet6_csk_route_socket+0x2f7/0x420 net/ipv6/inet6_connection_sock.c:106 inet6_csk_xmit+0x91/0x1f0 net/ipv6/inet6_connection_sock.c:121 __tcp_transmit_skb+0xe81/0x1d60 net/ipv4/tcp_output.c:1169 tcp_transmit_skb net/ipv4/tcp_output.c:1185 [inline] tcp_xmit_probe_skb+0x19b/0x1d0 net/ipv4/tcp_output.c:3735 read to 0xffff8880bb7aabe8 of 8 bytes by interrupt on cpu 0: rt6_probe net/ipv6/route.c:657 [inline] find_match net/ipv6/route.c:757 [inline] find_match+0x521/0x790 net/ipv6/route.c:733 __find_rr_leaf+0xe3/0x780 net/ipv6/route.c:831 find_rr_leaf net/ipv6/route.c:852 [inline] rt6_select net/ipv6/route.c:896 [inline] fib6_table_lookup+0x383/0x650 net/ipv6/route.c:2164 ip6_pol_route+0xee/0x5c0 net/ipv6/route.c:2200 ip6_pol_route_output+0x48/0x60 net/ipv6/route.c:2452 fib6_rule_lookup+0x3d6/0x470 net/ipv6/fib6_rules.c:117 ip6_route_output_flags_noref+0x16b/0x230 net/ipv6/route.c:2484 ip6_route_output_flags+0x50/0x1a0 net/ipv6/route.c:2497 ip6_dst_lookup_tail+0x25d/0xc30 net/ipv6/ip6_output.c:1049 ip6_dst_lookup_flow+0x68/0x120 net/ipv6/ip6_output.c:1150 inet6_csk_route_socket+0x2f7/0x420 net/ipv6/inet6_connection_sock.c:106 inet6_csk_xmit+0x91/0x1f0 net/ipv6/inet6_connection_sock.c:121 __tcp_transmit_skb+0xe81/0x1d60 net/ipv4/tcp_output.c:1169 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 18894 Comm: udevd Not tainted 5.4.0-rc3+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: cc3a86c8 ("ipv6: Change rt6_probe to take a fib6_nh") Fixes: f547fac6 ("ipv6: rate-limit probes for neighbourless routes") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: Nsyzbot <syzkaller@googlegroups.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 11月, 2019 1 次提交
-
-
由 Matteo Croce 提交于
The same code which recognizes ICMP error packets is duplicated several times. Use the icmp_is_err() and icmpv6_is_err() helpers instead, which do the same thing. ip_multipath_l3_keys() and tcf_nat_act() didn't check for all the error types, assume that they should instead. Signed-off-by: NMatteo Croce <mcroce@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 11月, 2019 1 次提交
-
-
由 Eric Dumazet 提交于
Faster jhash2() can be used instead of jhash(), since IPv6 addresses have the needed alignment requirement. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 10月, 2019 1 次提交
-
-
由 Mahesh Bandewar 提交于
This reverts commit b0818f80. Started seeing weird behavior after this patch especially in the IPv6 code path. Haven't root caused it, but since this was applied to net branch, taking a precautionary measure to revert it and look / analyze those failures Revert this now and I'll send a better fix after analysing / fixing the weirdness observed. CC: Eric Dumazet <edumazet@google.com> CC: Wei Wang <weiwan@google.com> CC: David S. Miller <davem@davemloft.net> Signed-off-by: NMahesh Bandewar <maheshb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 10月, 2019 1 次提交
-
-
由 Mahesh Bandewar 提交于
While invalidating the dst, we assign backhole_netdev instead of loopback device. However, this device does not have idev pointer and hence no ip6_ptr even if IPv6 is enabled. Possibly this has triggered the syzbot reported crash. The syzbot report does not have reproducer, however, this is the only device that doesn't have matching idev created. Crash instruction is : static inline bool ip6_ignore_linkdown(const struct net_device *dev) { const struct inet6_dev *idev = __in6_dev_get(dev); return !!idev->cnf.ignore_routes_with_linkdown; <= crash } Also ipv6 always assumes presence of idev and never checks for it being NULL (as does the above referenced code). So adding a idev for the blackhole_netdev to avoid this class of crashes in the future. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 9月, 2019 1 次提交
-
-
由 Stefano Brivio 提交于
This is the equivalent of commit 2c6b55f4 ("ipv6: fix neighbour resolution with raw socket") for ip6_confirm_neigh(): we can send a packet with MSG_CONFIRM on a raw socket for a connected route, so the gateway would be :: here, and we should pick the next hop using rt6_nexthop() instead. This was found by code review and, to the best of my knowledge, doesn't actually fix a practical issue: the destination address from the packet is not considered while confirming a neighbour, as ip6_confirm_neigh() calls choose_neigh_daddr() without passing the packet, so there are no similar issues as the one fixed by said commit. A possible source of issues with the existing implementation might come from the fact that, if we have a cached dst, we won't consider it, while rt6_nexthop() takes care of that. I might just not be creative enough to find a practical problem here: the only way to affect this with cached routes is to have one coming from an ICMPv6 redirect, but if the next hop is a directly connected host, there should be no topology for which a redirect applies here, and tests with redirected routes show no differences for MSG_CONFIRM (and MSG_PROBE) packets on raw sockets destined to a directly connected host. However, directly using the dst gateway here is not consistent anymore with neighbour resolution, and, in general, as we want the next hop, using rt6_nexthop() looks like the only sane way to fetch it. Reported-by: NGuillaume Nault <gnault@redhat.com> Signed-off-by: NStefano Brivio <sbrivio@redhat.com> Acked-by: NGuillaume Nault <gnault@redhat.com> Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 9月, 2019 1 次提交
-
-
由 Maciej Żenczykowski 提交于
Fixes a stupid bug I recently introduced... ip6_route_info_create() returns an ERR_PTR(err) and not a NULL on error. Fixes: d55a2e37 ("net-ipv6: fix excessive RTF_ADDRCONF flag on ::1/128 local route (and others)'") Cc: David Ahern <dsahern@gmail.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: NMaciej Żenczykowski <maze@google.com> Reported-by: Nsyzbot <syzkaller@googlegroups.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 9月, 2019 3 次提交
-
-
由 Donald Sharp 提交于
When creating a v4 route that uses a v6 nexthop from a nexthop group. Allow the kernel to properly send the nexthop as v6 via the RTA_VIA attribute. Broken behavior: $ ip nexthop add via fe80::9 dev eth0 $ ip nexthop show id 1 via fe80::9 dev eth0 scope link $ ip route add 4.5.6.7/32 nhid 1 $ ip route show default via 10.0.2.2 dev eth0 4.5.6.7 nhid 1 via 254.128.0.0 dev eth0 10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 $ Fixed behavior: $ ip nexthop add via fe80::9 dev eth0 $ ip nexthop show id 1 via fe80::9 dev eth0 scope link $ ip route add 4.5.6.7/32 nhid 1 $ ip route show default via 10.0.2.2 dev eth0 4.5.6.7 nhid 1 via inet6 fe80::9 dev eth0 10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 $ v2, v3: Addresses code review comments from David Ahern Fixes: dcb1ecb5 (“ipv4: Prepare for fib6_nh from a nexthop object”) Signed-off-by: NDonald Sharp <sharpd@cumulusnetworks.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
A change to the core nla helpers was missed during the push of the nexthop changes. rt6_fill_node_nexthop should be calling nla_nest_start_noflag not nla_nest_start. Currently, iproute2 does not print multipath data because of parsing issues with the attribute. Fixes: f88d8ea6 ("ipv6: Plumb support for nexthop object in a fib6_info") Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Maciej Żenczykowski 提交于
There is a subtle change in behaviour introduced by: commit c7a1ce39 'ipv6: Change addrconf_f6i_alloc to use ip6_route_info_create' Before that patch /proc/net/ipv6_route includes: 00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001 lo Afterwards /proc/net/ipv6_route includes: 00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80240001 lo ie. the above commit causes the ::1/128 local (automatic) route to be flagged with RTF_ADDRCONF (0x040000). AFAICT, this is incorrect since these routes are *not* coming from RA's. As such, this patch restores the old behaviour. Fixes: c7a1ce39 ("ipv6: Change addrconf_f6i_alloc to use ip6_route_info_create") Cc: David Ahern <dsahern@gmail.com> Cc: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: NMaciej Żenczykowski <maze@google.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 8月, 2019 2 次提交
-
-
由 David Ahern 提交于
Simplify the unlock path in __ip6_rt_update_pmtu by using a single point where rcu_read_unlock is called. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
The nexthop path in rt6_update_exception_stamp_rt needs to call rcu_read_unlock if it fails to find a fib6_nh match rather than just returning. Fixes: e659ba31 ("ipv6: Handle all fib6_nh in a nexthop in exception handling") Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 7月, 2019 1 次提交
-
-
由 Matteo Croce 提交于
In the sysctl code the proc_dointvec_minmax() function is often used to validate the user supplied value between an allowed range. This function uses the extra1 and extra2 members from struct ctl_table as minimum and maximum allowed value. On sysctl handler declaration, in every source file there are some readonly variables containing just an integer which address is assigned to the extra1 and extra2 members, so the sysctl range is enforced. The special values 0, 1 and INT_MAX are very often used as range boundary, leading duplication of variables like zero=0, one=1, int_max=INT_MAX in different source files: $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l 248 Add a const int array containing the most commonly used values, some macros to refer more easily to the correct array member, and use them instead of creating a local one for every object file. This is the bloat-o-meter output comparing the old and new binary compiled with the default Fedora config: # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164) Data old new delta sysctl_vals - 12 +12 __kstrtab_sysctl_vals - 12 +12 max 14 10 -4 int_max 16 - -16 one 68 - -68 zero 128 28 -100 Total: Before=20583249, After=20583085, chg -0.00% [mcroce@redhat.com: tipc: remove two unused variables] Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c] [arnd@arndb.de: proc/sysctl: make firmware loader table conditional] Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de [akpm@linux-foundation.org: fix fs/eventpoll.c] Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.comSigned-off-by: NMatteo Croce <mcroce@redhat.com> Signed-off-by: NArnd Bergmann <arnd@arndb.de> Acked-by: NKees Cook <keescook@chromium.org> Reviewed-by: NAaron Tomlin <atomlin@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 7月, 2019 1 次提交
-
-
由 David Ahern 提交于
Paul reported that l2tp sessions were broken after the commit referenced in the Fixes tag. Prior to this commit rt6_check returned NULL if the rt6_info 'from' was NULL - ie., the dst_entry was disconnected from a FIB entry. Restore that behavior. Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes") Reported-by: NPaul Donohue <linux-kernel@PaulSD.com> Tested-by: NPaul Donohue <linux-kernel@PaulSD.com> Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 7月, 2019 1 次提交
-
-
由 Stephen Suryaputra 提交于
Make the same support as commit 363887a2 ("ipv4: Support multipath hashing on inner IP pkts for GRE tunnel") for outer IPv6. The hashing considers both IPv4 and IPv6 pkts when they are tunneled by IPv6 GRE. Signed-off-by: NStephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 7月, 2019 1 次提交
-
-
由 Mahesh Bandewar 提交于
Use blackhole_netdev instead of 'lo' device with lower MTU when marking dst "dead". Signed-off-by: NMahesh Bandewar <maheshb@google.com> Tested-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-