- 17 8月, 2017 1 次提交
-
-
由 John Fastabend 提交于
To complete the sendmsg_locked and sendpage_locked implementation add the hooks for af_inet6 as well. Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 8月, 2017 4 次提交
-
-
由 Florian Westphal 提交于
Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
__dev_get_by_index assumes RTNL is held, use _rcu version instead. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Based on a syzkaller report [1], I found that a per cpu allocation failure in snmp6_alloc_dev() would then lead to NULL dereference in ip6_route_dev_notify(). It seems this is a very old bug, thus no Fixes tag in this submission. Let's add in6_dev_put_clear() helper, as we will probably use it elsewhere (once available/present in net-next) [1] kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 17294 Comm: syz-executor6 Not tainted 4.13.0-rc2+ #10 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 task: ffff88019f456680 task.stack: ffff8801c6e58000 RIP: 0010:__read_once_size include/linux/compiler.h:250 [inline] RIP: 0010:atomic_read arch/x86/include/asm/atomic.h:26 [inline] RIP: 0010:refcount_sub_and_test+0x7d/0x1b0 lib/refcount.c:178 RSP: 0018:ffff8801c6e5f1b0 EFLAGS: 00010202 RAX: 0000000000000037 RBX: dffffc0000000000 RCX: ffffc90005d25000 RDX: ffff8801c6e5f218 RSI: ffffffff82342bbf RDI: 0000000000000001 RBP: ffff8801c6e5f240 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff10038dcbe37 R13: 0000000000000006 R14: 0000000000000001 R15: 00000000000001b8 FS: 00007f21e0429700(0000) GS:ffff8801dc100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001ddbc22000 CR3: 00000001d632b000 CR4: 00000000001426e0 DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600 Call Trace: refcount_dec_and_test+0x1a/0x20 lib/refcount.c:211 in6_dev_put include/net/addrconf.h:335 [inline] ip6_route_dev_notify+0x1c9/0x4a0 net/ipv6/route.c:3732 notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93 __raw_notifier_call_chain kernel/notifier.c:394 [inline] raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401 call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1678 call_netdevice_notifiers net/core/dev.c:1694 [inline] rollback_registered_many+0x91c/0xe80 net/core/dev.c:7107 rollback_registered+0x1be/0x3c0 net/core/dev.c:7149 register_netdevice+0xbcd/0xee0 net/core/dev.c:7587 register_netdev+0x1a/0x30 net/core/dev.c:7669 loopback_net_init+0x76/0x160 drivers/net/loopback.c:214 ops_init+0x10a/0x570 net/core/net_namespace.c:118 setup_net+0x313/0x710 net/core/net_namespace.c:294 copy_net_ns+0x27c/0x580 net/core/net_namespace.c:418 create_new_namespaces+0x425/0x880 kernel/nsproxy.c:107 unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:206 SYSC_unshare kernel/fork.c:2347 [inline] SyS_unshare+0x653/0xfa0 kernel/fork.c:2297 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x4512c9 RSP: 002b:00007f21e0428c08 EFLAGS: 00000216 ORIG_RAX: 0000000000000110 RAX: ffffffffffffffda RBX: 0000000000718150 RCX: 00000000004512c9 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000062020200 RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004b973d R13: 00000000ffffffff R14: 000000002001d000 R15: 00000000000002dd Code: 50 2b 34 82 c7 00 f1 f1 f1 f1 c7 40 04 04 f2 f2 f2 c7 40 08 f3 f3 f3 f3 e8 a1 43 39 ff 4c 89 f8 48 8b 95 70 ff ff ff 48 c1 e8 03 <0f> b6 0c 18 4c 89 f8 83 e0 07 83 c0 03 38 c8 7c 08 84 c9 0f 85 RIP: __read_once_size include/linux/compiler.h:250 [inline] RSP: ffff8801c6e5f1b0 RIP: atomic_read arch/x86/include/asm/atomic.h:26 [inline] RSP: ffff8801c6e5f1b0 RIP: refcount_sub_and_test+0x7d/0x1b0 lib/refcount.c:178 RSP: ffff8801c6e5f1b0 ---[ end trace e441d046c6410d31 ]--- Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NDmitry Vyukov <dvyukov@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ido Schimmel 提交于
IPv6 routes currently lack nexthop flags as in IPv4. This has several implications. In the forwarding path, it requires us to check the carrier state of the nexthop device and potentially ignore a linkdown route, instead of checking for RTNH_F_LINKDOWN. It also requires capable drivers to use the user facing IPv6-specific route flags to provide offload indication, instead of using the nexthop flags as in IPv4. Add nexthop flags to IPv6 routes in the 40 bytes hole and use it to provide offload indication instead of the RTF_OFFLOAD flag, which is removed while it's still not part of any official kernel release. In the near future we would like to use the field for the RTNH_F_{LINKDOWN,DEAD} flags, but this change is more involved and might not be ready in time for the current cycle. Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Acked-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 8月, 2017 2 次提交
-
-
由 Eric Dumazet 提交于
Filtering the ACK packet was not put at the right place. At this place, we already allocated a child and put it into accept queue. We absolutely need to call tcp_child_process() to release its spinlock, or we will deadlock at accept() or close() time. Found by syzkaller team (Thanks a lot !) Fixes: 8fac365f ("tcp: Add a tcp_filter hook before handle ack packet") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NDmitry Vyukov <dvyukov@google.com> Cc: Chenbo Feng <fengc@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Wei Wang 提交于
When a dst is created by addrconf_dst_alloc() for a host route or an anycast route, dst->dev points to loopback dev while rt6->rt6i_idev points to a real device. When the real device goes down, the current cleanup code only checks for dst->dev and assumes rt6->rt6i_idev->dev is the same. This causes the refcount leak on the real device in the above situation. This patch makes sure to always release the refcount taken on rt6->rt6i_idev during dst_dev_put(). Fixes: 587fea74 ("ipv6: mark DST_NOGC and remove the operation of dst_free()") Reported-by: NJohn Stultz <john.stultz@linaro.org> Tested-by: NJohn Stultz <john.stultz@linaro.org> Tested-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NWei Wang <weiwan@google.com> Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 8月, 2017 1 次提交
-
-
由 Willem de Bruijn 提交于
When iteratively building a UDP datagram with MSG_MORE and that datagram exceeds MTU, consistently choose UFO or fragmentation. Once skb_is_gso, always apply ufo. Conversely, once a datagram is split across multiple skbs, do not consider ufo. Sendpage already maintains the first invariant, only add the second. IPv6 does not have a sendpage implementation to modify. A gso skb must have a partial checksum, do not follow sk_no_check_tx in udp_send_skb. Found by syzkaller. Fixes: e89e9cf5 ("[IPv4/IPv6]: UFO Scatter-gather approach") Reported-by: NAndrey Konovalov <andreyknvl@google.com> Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 8月, 2017 2 次提交
-
-
由 David Ahern 提交于
ndisc_notify is used to send unsolicited neighbor advertisements (e.g., on a link up). Currently, the ndisc notifier is run before the addrconf notifer which means NA's are not sent for link-local addresses which are added by the addrconf notifier. Fix by lowering the priority of the ndisc notifier. Setting the priority to ADDRCONF_NOTIFY_PRIORITY - 5 means it runs after addrconf and before the route notifier which is ADDRCONF_NOTIFY_PRIORITY - 10. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
This change allows us to later indicate to rtnetlink core that certain doit functions should be called without acquiring rtnl_mutex. This change should have no effect, we simply replace the last (now unused) calcit argument with the new flag. Signed-off-by: NFlorian Westphal <fw@strlen.de> Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 8月, 2017 2 次提交
-
-
由 Vincent Bernat 提交于
If the user hasn't installed any custom rules, don't go through the whole FIB rules layer. This is pretty similar to f4530fa5 (ipv4: Avoid overhead when no custom FIB rules are installed). Using a micro-benchmark module [1], timing ip6_route_output() with get_cycles(), with 40,000 routes in the main routing table, before this patch: min=606 max=12911 count=627 average=1959 95th=4903 90th=3747 50th=1602 mad=821 table=254 avgdepth=21.8 maxdepth=39 value │ ┊ count 600 │▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 199 880 │▒▒▒░░░░░░░░░░░░░░░░ 43 1160 │▒▒▒░░░░░░░░░░░░░░░░░░░░ 48 1440 │▒▒▒░░░░░░░░░░░░░░░░░░░░░░░ 43 1720 │▒▒▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░ 59 2000 │▒▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 50 2280 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 26 2560 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 31 2840 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 28 3120 │▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 17 3400 │▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 17 3680 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 8 3960 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 11 4240 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 6 4520 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 6 4800 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 9 After: min=544 max=11687 count=627 average=1776 95th=4546 90th=3585 50th=1227 mad=565 table=254 avgdepth=21.8 maxdepth=39 value │ ┊ count 540 │▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 201 800 │▒▒▒▒▒░░░░░░░░░░░░░░░░ 63 1060 │▒▒▒▒▒░░░░░░░░░░░░░░░░░░░░░ 68 1320 │▒▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░ 39 1580 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 32 1840 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 32 2100 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 34 2360 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 33 2620 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 26 2880 │▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 22 3140 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 9 3400 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 8 3660 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 9 3920 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 8 4180 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 8 4440 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 8 At the frequency of the host during the bench (~ 3.7 GHz), this is about a 100 ns difference on the median value. A next step would be to collapse local and main tables, as in 0ddcf43d (ipv4: FIB Local/MAIN table collapse). [1]: https://github.com/vincentbernat/network-lab/blob/master/lab-routes-ipv6/kbench_mod.cSigned-off-by: NVincent Bernat <vincent@bernat.im> Reviewed-by: NJiri Pirko <jiri@mellanox.com> Acked-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Willem de Bruijn 提交于
skb_warn_bad_offload triggers a warning when an skb enters the GSO stack at __skb_gso_segment that does not have CHECKSUM_PARTIAL checksum offload set. Commit b2504a5d ("net: reduce skb_warn_bad_offload() noise") observed that SKB_GSO_DODGY producers can trigger the check and that passing those packets through the GSO handlers will fix it up. But, the software UFO handler will set ip_summed to CHECKSUM_NONE. When __skb_gso_segment is called from the receive path, this triggers the warning again. Make UFO set CHECKSUM_UNNECESSARY instead of CHECKSUM_NONE. On Tx these two are equivalent. On Rx, this better matches the skb state (checksum computed), as CHECKSUM_NONE here means no checksum computed. See also this thread for context: http://patchwork.ozlabs.org/patch/799015/ Fixes: b2504a5d ("net: reduce skb_warn_bad_offload() noise") Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 8月, 2017 8 次提交
-
-
由 David Lebrun 提交于
This patch implements the following seg6local actions. - SEG6_LOCAL_ACTION_END: regular SRH processing. The DA of the packet is updated to the next segment and forwarded accordingly. - SEG6_LOCAL_ACTION_END_X: same as above, except that the packet is forwarded to the specified IPv6 next-hop. - SEG6_LOCAL_ACTION_END_DX6: decapsulate the packet and forward to inner IPv6 packet to the specified IPv6 next-hop. - SEG6_LOCAL_ACTION_END_B6: insert the specified SRH directly after the IPv6 header of the packet. - SEG6_LOCAL_ACTION_END_B6_ENCAP: encapsulate the packet within an outer IPv6 header, containing the specified SRH. Signed-off-by: NDavid Lebrun <david.lebrun@uclouvain.be> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Lebrun 提交于
This patch adds the necessary functions to parse, fill, and compare seg6local rtnetlink attributes, for all defined action parameters. - The SRH parameter defines an SRH to be inserted or encapsulated. - The TABLE parameter defines the table to use for the route lookup of the next segment or the inner decapsulated packet. - The NH4 parameter defines the IPv4 next-hop for an inner decapsulated IPv4 packet. - The NH6 parameter defines the IPv6 next-hop for the next segment or for an inner decapsulated IPv6 packet - The IIF parameter defines an ingress interface index. - The OIF parameter defines an egress interface index. Signed-off-by: NDavid Lebrun <david.lebrun@uclouvain.be> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Lebrun 提交于
This patch implements a new type of lightweight tunnel named seg6local. A seg6local lwt is defined by a type of action and a set of parameters. The action represents the operation to perform on the packets matching the lwt's route, and is not necessarily an encapsulation. The set of parameters are arguments for the processing function. Each action is defined in a struct seg6_action_desc within seg6_action_table[]. This structure contains the action, mandatory attributes, the processing function, and a static headroom size required by the action. The mandatory attributes are encoded as a bitmask field. The static headroom is set to a non-zero value when the processing function always add a constant number of bytes to the skb (e.g. the header size for encapsulations). To facilitate rtnetlink-related operations such as parsing, fill_encap, and cmp_encap, each type of action parameter is associated to three function pointers, in seg6_action_params[]. All actions defined in seg6_local.h are detailed in [1]. [1] https://tools.ietf.org/html/draft-filsfils-spring-srv6-network-programming-01Signed-off-by: NDavid Lebrun <david.lebrun@uclouvain.be> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Lebrun 提交于
This patch exports the seg6_do_srh_encap() and seg6_do_srh_inline() functions. It also removes the CONFIG_IPV6_SEG6_INLINE knob that enabled the compilation of seg6_do_srh_inline(). This function is now built-in. Signed-off-by: NDavid Lebrun <david.lebrun@uclouvain.be> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Lebrun 提交于
The seg6_validate_srh() function only allows SRHs whose active segment is the first segment of the path. However, an application may insert an SRH whose active segment is not the first one. Such an application might be for example an SR-aware Virtual Network Function. This patch enables to insert SRHs with an arbitrary active segment. Signed-off-by: NDavid Lebrun <david.lebrun@uclouvain.be> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Add a second device index, sdif, to raw socket lookups. sdif is the index for ingress devices enslaved to an l3mdev. It allows the lookups to consider the enslaved device as well as the L3 domain when searching for a socket. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Add a second device index, sdif, to inet6 socket lookups. sdif is the index for ingress devices enslaved to an l3mdev. It allows the lookups to consider the enslaved device as well as the L3 domain when searching for a socket. TCP moves the data in the cb. Prior to tcp_v4_rcv (e.g., early demux) the ingress index is obtained from IPCB using inet_sdif and after tcp_v4_rcv tcp_v4_sdif is used. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Add a second device index, sdif, to udp socket lookups. sdif is the index for ingress devices enslaved to an l3mdev. It allows the lookups to consider the enslaved device as well as the L3 domain when searching for a socket. Early demux lookups are handled in the next patch as part of INET_MATCH changes. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 8月, 2017 12 次提交
-
-
由 Ido Schimmel 提交于
Similar to commit 1c677b3d ("ipv4: fib: Add fib_info_hold() helper") and commit b423cb10 ("ipv4: fib: Export free_fib_info()") add an helper to hold a reference on rt6_info and export rt6_release() to drop it and potentially release the route. This is needed so that drivers capable of FIB offload could hold a reference on the route before queueing it for offload and drop it after the route has been programmed to the device's tables. Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ido Schimmel 提交于
When an interface is brought back up, the kernel tries to restore the host routes tied to its permanent addresses. However, if the host route was removed from the FIB, then we need to reinsert it. This is done by releasing the current dst and allocating a new, so as to not reuse a dst with obsolete values. Since this function is called under RTNL and using the same explanation from the previous patch, we can test if the route is in the FIB by checking its node pointer instead of its reference count. Tested using the following script and Andrey's reproducer mentioned in commit 8048ced9 ("net: ipv6: regenerate host route if moved to gc list") and linked below: $ ip link set dev lo up $ ip link add dummy1 type dummy $ ip -6 address add cafe::1/64 dev dummy1 $ ip link set dev lo down # cafe::1/128 is removed $ ip link set dev dummy1 up $ ip link set dev lo up The host route is correctly regenerated. Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Link: http://lkml.kernel.org/r/CAAeHK+zSe82vc5gCRgr_EoUwiALPnWVdWJBPwJZBpbxYz=kGJw@mail.gmail.comSigned-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ido Schimmel 提交于
When the loopback device is brought back up we need to check if the host route attached to the address is still in the FIB and regenerate one in case it's not. Host routes using the loopback device are always inserted into and removed from the FIB under RTNL (under which this function is called), so we can test their node pointer instead of the reference count in order to check if the route is in the FIB or not. Tested using the following script from Nicolas mentioned in commit a220445f ("ipv6: correctly add local routes when lo goes up"): $ ip link add dummy1 type dummy $ ip link set dummy1 up $ ip link set lo down ; ip link set lo up The host route is correctly regenerated. Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ido Schimmel 提交于
When a route is deleted its node pointer is set to NULL to indicate it's no longer linked to its node. Do the same for routes that are replaced. This will later allow us to test if a route is still in the FIB by checking its node pointer instead of its reference count. Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ido Schimmel 提交于
The code currently assumes that only FIB nodes can hold a reference on routes. Therefore, after fib6_purge_rt() has run and the route is no longer present in any intermediate nodes, it's assumed that its reference count would be 1 - taken by the node where it's currently stored. However, we're going to allow users other than the FIB to take a reference on a route, so this assumption is no longer valid and the BUG_ON() needs to be removed. Note that purging only takes place if the initial reference count is different than 1. I've left that check intact, as in the majority of systems (where routes are only referenced by the FIB), it does actually mean the route is present in intermediate nodes. Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ido Schimmel 提交于
Allow user space applications to see which routes are offloaded and which aren't by setting the RTNH_F_OFFLOAD flag when dumping them. To be consistent with IPv4, offload indication is provided on a per-nexthop basis. Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ido Schimmel 提交于
Dump all the FIB tables in each net namespace upon registration to the FIB notification chain so that the callee will have a complete view of the tables. The integrity of the dump is ensured by a per-table sequence counter that is incremented (under write lock) whenever a route is added or deleted from the table. All the sequence counters are read (under each table's read lock) and summed, prior and after the dump. In case the counters differ, then the dump is either restarted or the registration fails. While it's possible for a table to be modified after its counter has been read, this isn't really a problem. In case it happened before it was read the second time, then the comparison at the end will fail. If it happened afterwards, then we're guaranteed to be notified about the change, as the notification block is registered prior to the second read. Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ido Schimmel 提交于
Allow users of the FIB notification chain to receive a complete view of the IPv6 FIB rules upon registration to the chain. The integrity of the dump is ensured by a per-family sequence counter that is incremented (under RTNL) whenever a rule is added or deleted. All the sequence counters are read (under RTNL) and summed, prior and after the dump. In case the counters differ, then the dump is either restarted or the registration fails. Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ido Schimmel 提交于
As with IPv4, allow listeners of the FIB notification chain to receive notifications whenever a route is added, replaced or deleted. This is done by placing calls to the FIB notification chain in the two lowest level functions that end up performing these operations - namely, fib6_add_rt2node() and fib6_del_route(). Unlike IPv4, APPEND notifications aren't sent as the kernel doesn't distinguish between "append" (NLM_F_CREATE|NLM_F_APPEND) and "prepend" (NLM_F_CREATE). If NLM_F_EXCL isn't set, duplicate routes are always added after the existing duplicate routes. Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ido Schimmel 提交于
We're about to add IPv6 FIB offload support, so implement the necessary callbacks in IPv6 code, which will later allow us to add routes and rules notifications. Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ido Schimmel 提交于
As explained in commit 3c71006d ("ipv4: fib_rules: Check if rule is a default rule"), drivers supporting IPv6 FIB offload need to be able to sanitize the rules they don't support and potentially flush their tables. Add an IPv6 helper to check if a FIB rule is a default rule. Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Xin Long 提交于
After commit c2ed1880 ("net: ipv6: check route protocol when deleting routes"), ipv6 route checks rt protocol when trying to remove a rt entry. It introduced a side effect causing 'ip -6 route flush cache' not to work well. When flushing caches with iproute, all route caches get dumped from kernel then removed one by one by sending DELROUTE requests to kernel for each cache. The thing is iproute sends the request with the cache whose proto is set with RTPROT_REDIRECT by rt6_fill_node() when kernel dumps it. But in kernel the rt_cache protocol is still 0, which causes the cache not to be matched and removed. So the real reason is rt6i_protocol in the route is not set when it is allocated. As David Ahern's suggestion, this patch is to set rt6i_protocol properly in the route when it is installed and remove the codes setting rtm_protocol according to rt6i_flags in rt6_fill_node. This is also an improvement to keep rt6i_protocol consistent with rtm_protocol. Fixes: c2ed1880 ("net: ipv6: check route protocol when deleting routes") Reported-by: NJianlin Shi <jishi@redhat.com> Suggested-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 8月, 2017 1 次提交
-
-
由 Julia Lawall 提交于
This reverts commit 3a3a4e30. inet6_add_protocol and inet6_del_protocol include casts that remove the effect of the const annotation on their parameter, leading to possible runtime crashes. Reported-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 8月, 2017 3 次提交
-
-
由 Paolo Abeni 提交于
Since commit 67a51780 ("ipv6: udp: leverage scratch area helpers") udp6_recvmsg() read the skb len from the scratch area, to avoid a cache miss. But the UDP6 rx path support RFC 2675 UDPv6 jumbograms, and their length exceeds the 16 bits available in the scratch area. As a side effect the length returned by recvmsg() is: <ingress datagram len> % (1<<16) This commit addresses the issue allocating one more bit in the IP6CB flags field and setting it for incoming jumbograms. Such field is still in the first cacheline, so at recvmsg() time we can check it and fallback to access skb->len if required, without a measurable overhead. Fixes: 67a51780 ("ipv6: udp: leverage scratch area helpers") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Sitnicki 提交于
There is no need to go through sk->sk_net to access the net namespace and its sysctl variables because we allocate the sock and initialize sk_net just a few lines earlier in the same routine. Signed-off-by: NJakub Sitnicki <jkbs@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
prequeue is a tcp receive optimization that moves part of rx processing from bh to process context. This only works if the socket being processed belongs to a process that is blocked in recv on that socket. In practice, this doesn't happen anymore that often because nowadays servers tend to use an event driven (epoll) model. Even normal client applications (web browsers) commonly use many tcp connections in parallel. This has measureable impact only in netperf (which uses plain recv and thus allows prequeue use) from host to locally running vm (~4%), however, there were no changes when using netperf between two physical hosts with ixgbe interfaces. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 7月, 2017 1 次提交
-
-
由 Paolo Abeni 提交于
When an early demuxed packet reaches __udp6_lib_lookup_skb(), the sk reference is retrieved and used, but the relevant reference count is leaked and the socket destructor is never called. Beyond leaking the sk memory, if there are pending UDP packets in the receive queue, even the related accounted memory is leaked. In the long run, this will cause persistent forward allocation errors and no UDP skbs (both ipv4 and ipv6) will be able to reach the user-space. Fix this by explicitly accessing the early demux reference before the lookup, and properly decreasing the socket reference count after usage. Also drop the skb_steal_sock() in __udp6_lib_lookup_skb(), and the now obsoleted comment about "socket cache". The newly added code is derived from the current ipv4 code for the similar path. v1 -> v2: fixed the __udp6_lib_rcv() return code for resubmission, as suggested by Eric Reported-by: NSam Edwards <CFSworks@gmail.com> Reported-by: NMarc Haber <mh+netdev@zugschlus.de> Fixes: 5425077d ("net: ipv6: Add early demux handler for UDP unicast") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 7月, 2017 1 次提交
-
-
由 Julia Lawall 提交于
The inet6_protocol structure is only passed as the first argument to inet6_add_protocol or inet6_del_protocol, both of which are declared as const. Thus the inet6_protocol structure itself can be const. Also drop __read_mostly where present on the newly const structures. Done with the help of Coccinelle. Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 7月, 2017 1 次提交
-
-
由 Stefano Brivio 提交于
RFC 2465 defines ipv6IfStatsOutFragFails as: "The number of IPv6 datagrams that have been discarded because they needed to be fragmented at this output interface but could not be." The existing implementation, instead, would increase the counter twice in case we fail to allocate room for single fragments: once for the fragment, once for the datagram. This didn't look intentional though. In one of the two affected affected failure paths, the double increase was simply a result of a new 'goto fail' statement, introduced to avoid a skb leak. The other path appears to be affected since at least 2.6.12-rc2. Reported-by: NSabrina Dubroca <sdubroca@redhat.com> Fixes: 1d325d21 ("ipv6: ip6_fragment: fix headroom tests and skb leak") Signed-off-by: NStefano Brivio <sbrivio@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 7月, 2017 1 次提交
-
-
由 Matvejchikov Ilya 提交于
The last (4th) argument of tcp_rcv_established() is redundant as it always equals to skb->len and the skb itself is always passed as 2th agrument. There is no reason to have it. Signed-off-by: NIlya V. Matveychikov <matvejchikov@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-