- 05 6月, 2019 1 次提交
-
-
由 Florian Westphal 提交于
syzbot triggered following splat when strict netlink validation is enabled: net/ipv4/devinet.c:1766 suspicious rcu_dereference_check() usage! This occurs because we hold RTNL mutex, but no rcu read lock. The second call site holds both, so just switch to the _rtnl variant. Reported-by: syzbot+bad6e32808a3a97b1515@syzkaller.appspotmail.com Fixes: 2638eb8b ("net: ipv4: provide __rcu annotation for ifa_list") Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 6月, 2019 3 次提交
-
-
由 Eric Dumazet 提交于
syzbot reported nasty use-after-free [1] Lets remove frag_list field from structs ip_fraglist_iter and ip6_fraglist_iter. This seens not needed anyway. [1] : BUG: KASAN: use-after-free in kfree_skb_list+0x5d/0x60 net/core/skbuff.c:706 Read of size 8 at addr ffff888085a3cbc0 by task syz-executor303/8947 CPU: 0 PID: 8947 Comm: syz-executor303 Not tainted 5.2.0-rc2+ #12 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188 __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 kasan_report+0x12/0x20 mm/kasan/common.c:614 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132 kfree_skb_list+0x5d/0x60 net/core/skbuff.c:706 ip6_fragment+0x1ef4/0x2680 net/ipv6/ip6_output.c:882 __ip6_finish_output+0x577/0xaa0 net/ipv6/ip6_output.c:144 ip6_finish_output+0x38/0x1f0 net/ipv6/ip6_output.c:156 NF_HOOK_COND include/linux/netfilter.h:294 [inline] ip6_output+0x235/0x7f0 net/ipv6/ip6_output.c:179 dst_output include/net/dst.h:433 [inline] ip6_local_out+0xbb/0x1b0 net/ipv6/output_core.c:179 ip6_send_skb+0xbb/0x350 net/ipv6/ip6_output.c:1796 ip6_push_pending_frames+0xc8/0xf0 net/ipv6/ip6_output.c:1816 rawv6_push_pending_frames net/ipv6/raw.c:617 [inline] rawv6_sendmsg+0x2993/0x35e0 net/ipv6/raw.c:947 inet_sendmsg+0x141/0x5d0 net/ipv4/af_inet.c:802 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:671 ___sys_sendmsg+0x803/0x920 net/socket.c:2292 __sys_sendmsg+0x105/0x1d0 net/socket.c:2330 __do_sys_sendmsg net/socket.c:2339 [inline] __se_sys_sendmsg net/socket.c:2337 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2337 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x44add9 Code: e8 7c e6 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 1b 05 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f826f33bce8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00000000006e7a18 RCX: 000000000044add9 RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000005 RBP: 00000000006e7a10 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006e7a1c R13: 00007ffcec4f7ebf R14: 00007f826f33c9c0 R15: 20c49ba5e353f7cf Allocated by task 8947: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_kmalloc mm/kasan/common.c:489 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462 kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497 slab_post_alloc_hook mm/slab.h:437 [inline] slab_alloc_node mm/slab.c:3269 [inline] kmem_cache_alloc_node+0x131/0x710 mm/slab.c:3579 __alloc_skb+0xd5/0x5e0 net/core/skbuff.c:199 alloc_skb include/linux/skbuff.h:1058 [inline] __ip6_append_data.isra.0+0x2a24/0x3640 net/ipv6/ip6_output.c:1519 ip6_append_data+0x1e5/0x320 net/ipv6/ip6_output.c:1688 rawv6_sendmsg+0x1467/0x35e0 net/ipv6/raw.c:940 inet_sendmsg+0x141/0x5d0 net/ipv4/af_inet.c:802 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:671 ___sys_sendmsg+0x803/0x920 net/socket.c:2292 __sys_sendmsg+0x105/0x1d0 net/socket.c:2330 __do_sys_sendmsg net/socket.c:2339 [inline] __se_sys_sendmsg net/socket.c:2337 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2337 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 8947: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:451 kasan_slab_free+0xe/0x10 mm/kasan/common.c:459 __cache_free mm/slab.c:3432 [inline] kmem_cache_free+0x86/0x260 mm/slab.c:3698 kfree_skbmem net/core/skbuff.c:625 [inline] kfree_skbmem+0xc5/0x150 net/core/skbuff.c:619 __kfree_skb net/core/skbuff.c:682 [inline] kfree_skb net/core/skbuff.c:699 [inline] kfree_skb+0xf0/0x390 net/core/skbuff.c:693 kfree_skb_list+0x44/0x60 net/core/skbuff.c:708 __dev_xmit_skb net/core/dev.c:3551 [inline] __dev_queue_xmit+0x3034/0x36b0 net/core/dev.c:3850 dev_queue_xmit+0x18/0x20 net/core/dev.c:3914 neigh_direct_output+0x16/0x20 net/core/neighbour.c:1532 neigh_output include/net/neighbour.h:511 [inline] ip6_finish_output2+0x1034/0x2550 net/ipv6/ip6_output.c:120 ip6_fragment+0x1ebb/0x2680 net/ipv6/ip6_output.c:863 __ip6_finish_output+0x577/0xaa0 net/ipv6/ip6_output.c:144 ip6_finish_output+0x38/0x1f0 net/ipv6/ip6_output.c:156 NF_HOOK_COND include/linux/netfilter.h:294 [inline] ip6_output+0x235/0x7f0 net/ipv6/ip6_output.c:179 dst_output include/net/dst.h:433 [inline] ip6_local_out+0xbb/0x1b0 net/ipv6/output_core.c:179 ip6_send_skb+0xbb/0x350 net/ipv6/ip6_output.c:1796 ip6_push_pending_frames+0xc8/0xf0 net/ipv6/ip6_output.c:1816 rawv6_push_pending_frames net/ipv6/raw.c:617 [inline] rawv6_sendmsg+0x2993/0x35e0 net/ipv6/raw.c:947 inet_sendmsg+0x141/0x5d0 net/ipv4/af_inet.c:802 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:671 ___sys_sendmsg+0x803/0x920 net/socket.c:2292 __sys_sendmsg+0x105/0x1d0 net/socket.c:2330 __do_sys_sendmsg net/socket.c:2339 [inline] __se_sys_sendmsg net/socket.c:2337 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2337 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff888085a3cbc0 which belongs to the cache skbuff_head_cache of size 224 The buggy address is located 0 bytes inside of 224-byte region [ffff888085a3cbc0, ffff888085a3cca0) The buggy address belongs to the page: page:ffffea0002168f00 refcount:1 mapcount:0 mapping:ffff88821b6f63c0 index:0x0 flags: 0x1fffc0000000200(slab) raw: 01fffc0000000200 ffffea00027bbf88 ffffea0002105b88 ffff88821b6f63c0 raw: 0000000000000000 ffff888085a3c080 000000010000000c 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888085a3ca80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff888085a3cb00: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc >ffff888085a3cb80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb ^ ffff888085a3cc00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888085a3cc80: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc Fixes: 0feca619 ("net: ipv6: add skbuff fraglist splitter") Fixes: c8b17be0 ("net: ipv4: add skbuff fraglist splitter") Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: NPablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
this_cpu_read(*X) is slightly faster than *this_cpu_ptr(X) Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
this_cpu_read(*X) is faster than *this_cpu_ptr(X) Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 6月, 2019 5 次提交
-
-
由 Florian Westphal 提交于
ifa_list is protected by rcu, yet code doesn't reflect this. Add the __rcu annotations and fix up all places that are now reported by sparse. I've done this in the same commit to not add intermediate patches that result in new warnings. Reported-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
Use in_dev_for_each_ifa_rcu/rtnl instead. This prevents sparse warnings once proper __rcu annotations are added. Signed-off-by: NFlorian Westphal <fw@strlen.de> t di# Last commands done (6 commands done): Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
Netfilter hooks are always running under rcu read lock, use the new iterator macro so sparse won't complain once we add proper __rcu annotations. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
This also replaces spots that used for_primary_ifa(). for_primary_ifa() aborts the loop on the first secondary address seen. Replace it with either the rcu or rtnl variant of in_dev_for_each_ifa(), but two places will now also consider secondary addresses too: inet_addr_onlink() and inet_ifa_byprefix(). I do not understand why they should ignore secondary addresses. Why would a secondary address not be considered 'on link'? When matching a prefix, why ignore a matching secondary address? Other places get converted as well, but gain "->flags & SECONDARY" check. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
The ifa_list is protected either by rcu or rtnl lock, but the current iterators do not account for this. This adds two iterators as replacement, a later patch in the series will update them with the needed rcu/rtnl_dereference calls. Its not done in this patch yet to avoid sparse warnings -- the fields lack the proper __rcu annotation. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 6月, 2019 3 次提交
-
-
由 brakmo 提交于
Update BPF_CGROUP_RUN_PROG_INET_EGRESS() callers to support returning congestion notifications from the BPF programs. Signed-off-by: NLawrence Brakmo <brakmo@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Colin Ian King 提交于
The variable err is initialized with a value that is never read and err is reassigned a few statements later. This initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: NColin Ian King <colin.king@canonical.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
.. so skb_make_writable can be removed soon. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 31 5月, 2019 11 次提交
-
-
由 Willem de Bruijn 提交于
TCP zerocopy takes a uarg reference for every skb, plus one for the tcp_sendmsg_locked datapath temporarily, to avoid reaching refcnt zero as it builds, sends and frees skbs inside its inner loop. UDP and RAW zerocopy do not send inside the inner loop so do not need the extra sock_zerocopy_get + sock_zerocopy_put pair. Commit 52900d22288ed ("udp: elide zerocopy operation in hot path") introduced extra_uref to pass the initial reference taken in sock_zerocopy_alloc to the first generated skb. But, sock_zerocopy_realloc takes this extra reference at the start of every call. With MSG_MORE, no new skb may be generated to attach the extra_uref to, so refcnt is incorrectly 2 with only one skb. Do not take the extra ref if uarg && !tcp, which implies MSG_MORE. Update extra_uref accordingly. This conditional assignment triggers a false positive may be used uninitialized warning, so have to initialize extra_uref at define. Changes v1->v2: fix typo in Fixes SHA1 Fixes: 52900d22 ("udp: elide zerocopy operation in hot path") Reported-by: Nsyzbot <syzkaller@googlegroups.com> Diagnosed-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
af_inet sets sock->sk to NULL which trips strparser over: BUG: kernel NULL pointer dereference, address: 0000000000000012 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 7 PID: 0 Comm: swapper/7 Not tainted 5.2.0-rc1-00139-g14629453a6d3 #21 RIP: 0010:tcp_peek_len+0x10/0x60 RSP: 0018:ffffc02e41c54b98 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff9cf924c4e030 RCX: 0000000000000051 RDX: 0000000000000000 RSI: 000000000000000c RDI: ffff9cf97128f480 RBP: ffff9cf9365e0300 R08: ffff9cf94fe7d2c0 R09: 0000000000000000 R10: 000000000000036b R11: ffff9cf939735e00 R12: ffff9cf91ad9ae40 R13: ffff9cf924c4e000 R14: ffff9cf9a8fcbaae R15: 0000000000000020 FS: 0000000000000000(0000) GS:ffff9cf9af7c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000012 CR3: 000000013920a003 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> strp_data_ready+0x48/0x90 tls_data_ready+0x22/0xd0 [tls] tcp_rcv_established+0x569/0x620 tcp_v4_do_rcv+0x127/0x1e0 tcp_v4_rcv+0xad7/0xbf0 ip_protocol_deliver_rcu+0x2c/0x1c0 ip_local_deliver_finish+0x41/0x50 ip_local_deliver+0x6b/0xe0 ? ip_protocol_deliver_rcu+0x1c0/0x1c0 ip_rcv+0x52/0xd0 ? ip_rcv_finish_core.isra.20+0x380/0x380 __netif_receive_skb_one_core+0x7e/0x90 netif_receive_skb_internal+0x42/0xf0 napi_gro_receive+0xed/0x150 nfp_net_poll+0x7a2/0xd30 [nfp] ? kmem_cache_free_bulk+0x286/0x310 net_rx_action+0x149/0x3b0 __do_softirq+0xe3/0x30a ? handle_irq_event_percpu+0x6a/0x80 irq_exit+0xe8/0xf0 do_IRQ+0x85/0xd0 common_interrupt+0xf/0xf </IRQ> RIP: 0010:cpuidle_enter_state+0xbc/0x450 To avoid this issue set sock->sk after sk_prot->close. My grepping and testing did not discover any code which would depend on the current behaviour. Fixes: c46234eb ("tls: RX path for ktls") Reported-by: NDavid Beckett <david.beckett@netronome.com> Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pablo Neira Ayuso 提交于
Deal with the IPCB() area away from the iterators. The bridge codebase has its own control buffer layout, move specific IP control buffer handling into the IPv4 codepath. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pablo Neira Ayuso 提交于
This patch exposes a new API to refragment a skbuff. This allows you to split either a linear skbuff or to force the refragmentation of an existing fraglist using a different mtu. The API consists of: * ip_frag_init(), that initializes the internal state of the transformer. * ip_frag_next(), that allows you to fetch the next fragment. This function internally allocates the skbuff that represents the fragment, it pushes the IPv4 header, and it also copies the payload for each fragment. The ip_frag_state object stores the internal state of the splitter. This code has been extracted from ip_do_fragment(). Symbols are also exported to allow to reuse this iterator from the bridge codepath to build its own refragmentation routine by reusing the existing codebase. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pablo Neira Ayuso 提交于
This patch adds the skbuff fraglist splitter. This API provides an iterator to transform the fraglist into single skbuff objects, it consists of: * ip_fraglist_init(), that initializes the internal state of the fraglist splitter. * ip_fraglist_prepare(), that restores the IPv4 header on the fragments. * ip_fraglist_next(), that retrieves the fragment from the fraglist and it updates the internal state of the splitter to point to the next fragment skbuff in the fraglist. The ip_fraglist_iter object stores the internal state of the iterator. This code has been extracted from ip_do_fragment(). Symbols are also exported to allow to reuse this iterator from the bridge codepath to build its own refragmentation routine by reusing the existing codebase. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jason Baron 提交于
Add the ability to add a backup TFO key as: # echo "x-x-x-x,x-x-x-x" > /proc/sys/net/ipv4/tcp_fastopen_key The key before the comma acks as the primary TFO key and the key after the comma is the backup TFO key. This change is intended to be backwards compatible since if only one key is set, userspace will simply read back that single key as follows: # echo "x-x-x-x" > /proc/sys/net/ipv4/tcp_fastopen_key # cat /proc/sys/net/ipv4/tcp_fastopen_key x-x-x-x Signed-off-by: NJason Baron <jbaron@akamai.com> Signed-off-by: NChristoph Paasch <cpaasch@apple.com> Acked-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jason Baron 提交于
Add support for get/set of an optional backup key via TCP_FASTOPEN_KEY, in addition to the current 'primary' key. The primary key is used to encrypt and decrypt TFO cookies, while the backup is only used to decrypt TFO cookies. The backup key is used to maximize successful TFO connections when TFO keys are rotated. Currently, TCP_FASTOPEN_KEY allows a single 16-byte primary key to be set. This patch now allows a 32-byte value to be set, where the first 16 bytes are used as the primary key and the second 16 bytes are used for the backup key. Similarly, for getsockopt(), we can receive a 32-byte value as output if requested. If a 16-byte value is used to set the primary key via TCP_FASTOPEN_KEY, then any previously set backup key will be removed. Signed-off-by: NJason Baron <jbaron@akamai.com> Signed-off-by: NChristoph Paasch <cpaasch@apple.com> Acked-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jason Baron 提交于
We would like to be able to rotate TFO keys while minimizing the number of client cookies that are rejected. Currently, we have only one key which can be used to generate and validate cookies, thus if we simply replace this key clients can easily have cookies rejected upon rotation. We propose having the ability to have both a primary key and a backup key. The primary key is used to generate as well as to validate cookies. The backup is only used to validate cookies. Thus, keys can be rotated as: 1) generate new key 2) add new key as the backup key 3) swap the primary and backup key, thus setting the new key as the primary We don't simply set the new key as the primary key and move the old key to the backup slot because the ip may be behind a load balancer and we further allow for the fact that all machines behind the load balancer will not be updated simultaneously. We make use of this infrastructure in subsequent patches. Suggested-by: NIgor Lubashev <ilubashe@akamai.com> Signed-off-by: NJason Baron <jbaron@akamai.com> Signed-off-by: NChristoph Paasch <cpaasch@apple.com> Acked-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Christoph Paasch 提交于
Restructure __tcp_fastopen_cookie_gen() to take a 'struct crypto_cipher' argument and rename it as __tcp_fastopen_cookie_gen_cipher(). Subsequent patches will provide different ciphers based on which key is being used for the cookie generation. Signed-off-by: NChristoph Paasch <cpaasch@apple.com> Signed-off-by: NJason Baron <jbaron@akamai.com> Acked-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Young Xiao 提交于
The TCP option parsing routines in tcp_parse_options function could read one byte out of the buffer of the TCP options. 1 while (length > 0) { 2 int opcode = *ptr++; 3 int opsize; 4 5 switch (opcode) { 6 case TCPOPT_EOL: 7 return; 8 case TCPOPT_NOP: /* Ref: RFC 793 section 3.1 */ 9 length--; 10 continue; 11 default: 12 opsize = *ptr++; //out of bound access If length = 1, then there is an access in line2. And another access is occurred in line 12. This would lead to out-of-bound access. Therefore, in the patch we check that the available data length is larger enough to pase both TCP option code and size. Signed-off-by: NYoung Xiao <92siuyang@gmail.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
The smp_store_release call in fqdir_exit cannot protect the setting of fqdir->dead as claimed because its memory barrier is only guaranteed to be one-way and the barrier precedes the setting of fqdir->dead. IOW it doesn't provide any barriers between fq->dir and the following hash table destruction. In fact, the code is safe anyway because call_rcu does provide both the memory barrier as well as a guarantee that when the destruction work starts executing all RCU readers will see the updated value for fqdir->dead. Therefore this patch removes the unnecessary smp_store_release call as well as the corresponding READ_ONCE on the read-side in order to not confuse future readers of this code. Comments have been added in their places. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Reviewed-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 5月, 2019 7 次提交
-
-
由 David Ahern 提交于
Allow the creation of nexthop groups which reference other nexthop objects to create multipath routes: +--------------+ +------------+ +--------------+ | | nh nh_grp --->| nh_grp_entry |-+ +------------+ +---------|----+ ^ | | +------------+ +----------------+ +--->| nh, weight | nh_parent +------------+ A group entry points to a nexthop with a weight for that hop within the group. The nexthop has a list_head, grp_list, for tracking which groups it is a member of and the group entry has a reference back to the parent. The grp_list is used when a nexthop is deleted - to efficiently remove it from groups using it. If a nexthop group spec is given, no other attributes can be set. Each nexthop id in a group spec must already exist. Similar to single nexthops, the specification of a nexthop group can be updated so that data is managed with rcu locking. Add path selection function to account for multiple paths and add ipv{4,6}_good_nh helpers to know that if a neighbor entry exists it is in a good state. Update NETDEV event handling to rebalance multipath nexthop groups if a nexthop is deleted due to a link event (down or unregister). When a nexthop is removed any groups using it are updated. Groups using a nexthop a tracked via a grp_list. Nexthop dumps can be limited to groups only by adding NHA_GROUPS to the request. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Add support for NHA_ENCAP and NHA_ENCAP_TYPE. Leverages the existing code for lwtunnel within fib_nh_common, so the only change needed is handling the attributes in the nexthop code. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Handle IPv6 gateway in a nexthop spec. If nh_family is set to AF_INET6, NHA_GATEWAY is expected to be an IPv6 address. Add ipv6 option to gw in nh_config to hold the address, add fib6_nh to nh_info to leverage the ipv6 initialization and cleanup code. Update nh_fill_node to dump the v6 address. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Add support for IPv4 nexthops. If nh_family is set to AF_INET, then NHA_GATEWAY is expected to be an IPv4 address. Register for netdev events to be notified of admin up/down changes as well as deletes. A hash table is used to track nexthop per devices to quickly convert device events to the affected nexthops. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Barebones start point for nexthops. Implementation for RTM commands, notifications, management of rbtree for holding nexthops by id, and kernel side data structures for nexthops and nexthop config. Nexthops are maintained in an rbtree sorted by id. Similar to routes, nexthops are configured per namespace using netns_nexthop struct added to struct net. Nexthop notifications are sent when a nexthop is added or deleted, but NOT if the delete is due to a device event or network namespace teardown (which also involves device events). Applications are expected to use the device down event to flush nexthops and any routes used by the nexthops. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
As caught by syzbot [1], the rcu grace period that is respected before fqdir_rwork_fn() proceeds and frees fqdir is not enough to prevent inet_frag_destroy_rcu() being run after the freeing. We need a proper rcu_barrier() synchronization to replace the one we had in inet_frags_fini() We also have to fix a potential problem at module removal : inet_frags_fini() needs to make sure that all queued work queues (fqdir_rwork_fn) have completed, otherwise we might call kmem_cache_destroy() too soon and get another use-after-free. [1] BUG: KASAN: use-after-free in inet_frag_destroy_rcu+0xd9/0xe0 net/ipv4/inet_fragment.c:201 Read of size 8 at addr ffff88806ed47a18 by task swapper/1/0 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.2.0-rc1+ #2 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188 __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 kasan_report+0x12/0x20 mm/kasan/common.c:614 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132 inet_frag_destroy_rcu+0xd9/0xe0 net/ipv4/inet_fragment.c:201 __rcu_reclaim kernel/rcu/rcu.h:222 [inline] rcu_do_batch kernel/rcu/tree.c:2092 [inline] invoke_rcu_callbacks kernel/rcu/tree.c:2310 [inline] rcu_core+0xba5/0x1500 kernel/rcu/tree.c:2291 __do_softirq+0x25c/0x94c kernel/softirq.c:293 invoke_softirq kernel/softirq.c:374 [inline] irq_exit+0x180/0x1d0 kernel/softirq.c:414 exiting_irq arch/x86/include/asm/apic.h:536 [inline] smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806 </IRQ> RIP: 0010:native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:61 Code: ff ff 48 89 df e8 f2 95 8c fa eb 82 e9 07 00 00 00 0f 00 2d e4 45 4b 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d d4 45 4b 00 fb f4 <c3> 90 55 48 89 e5 41 57 41 56 41 55 41 54 53 e8 8e 18 42 fa e8 99 RSP: 0018:ffff8880a98e7d78 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13 RAX: 1ffffffff1164e11 RBX: ffff8880a98d4340 RCX: 0000000000000000 RDX: dffffc0000000000 RSI: 0000000000000006 RDI: ffff8880a98d4bbc RBP: ffff8880a98e7da8 R08: ffff8880a98d4340 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 R13: ffffffff88b27078 R14: 0000000000000001 R15: 0000000000000000 arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:571 default_idle_call+0x36/0x90 kernel/sched/idle.c:94 cpuidle_idle_call kernel/sched/idle.c:154 [inline] do_idle+0x377/0x560 kernel/sched/idle.c:263 cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:354 start_secondary+0x34e/0x4c0 arch/x86/kernel/smpboot.c:267 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243 Allocated by task 8877: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_kmalloc mm/kasan/common.c:489 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:503 kmem_cache_alloc_trace+0x151/0x750 mm/slab.c:3555 kmalloc include/linux/slab.h:547 [inline] kzalloc include/linux/slab.h:742 [inline] fqdir_init include/net/inet_frag.h:115 [inline] ipv6_frags_init_net+0x48/0x460 net/ipv6/reassembly.c:513 ops_init+0xb3/0x410 net/core/net_namespace.c:130 setup_net+0x2d3/0x740 net/core/net_namespace.c:316 copy_net_ns+0x1df/0x340 net/core/net_namespace.c:439 create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107 unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206 ksys_unshare+0x440/0x980 kernel/fork.c:2692 __do_sys_unshare kernel/fork.c:2760 [inline] __se_sys_unshare kernel/fork.c:2758 [inline] __x64_sys_unshare+0x31/0x40 kernel/fork.c:2758 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 17: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:451 kasan_slab_free+0xe/0x10 mm/kasan/common.c:459 __cache_free mm/slab.c:3432 [inline] kfree+0xcf/0x220 mm/slab.c:3755 fqdir_rwork_fn+0x33/0x40 net/ipv4/inet_fragment.c:154 process_one_work+0x989/0x1790 kernel/workqueue.c:2269 worker_thread+0x98/0xe40 kernel/workqueue.c:2415 kthread+0x354/0x420 kernel/kthread.c:255 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 The buggy address belongs to the object at ffff88806ed47a00 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 24 bytes inside of 512-byte region [ffff88806ed47a00, ffff88806ed47c00) The buggy address belongs to the page: page:ffffea0001bb51c0 refcount:1 mapcount:0 mapping:ffff8880aa400940 index:0x0 flags: 0x1fffc0000000200(slab) raw: 01fffc0000000200 ffffea000282a788 ffffea0001bb53c8 ffff8880aa400940 raw: 0000000000000000 ffff88806ed47000 0000000100000006 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88806ed47900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88806ed47980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88806ed47a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88806ed47a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88806ed47b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: 3c8fc878 ("inet: frags: rework rhashtable dismantle") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: Nsyzbot <syzkaller@googlegroups.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
fqdir_init() is not fast path and is getting bigger. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 5月, 2019 9 次提交
-
-
由 Colin Ian King 提交于
The pointer n is being assigned a value however this value is never read in the code block and the end of the code block continues to the next loop iteration. Clean up the code by removing the redundant assignment. Fixes: 1bff1a0c ("ipv4: Add function to send route updates") Addresses-Coverity: ("Unused value") Signed-off-by: NColin Ian King <colin.king@canonical.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
syszbot found an interesting use-after-free [1] happening while IPv4 fragment rhashtable was destroyed at netns dismantle. While no insertions can possibly happen at the time a dismantling netns is destroying this rhashtable, timers can still fire and attempt to remove elements from this rhashtable. This is forbidden, since rhashtable_free_and_destroy() has no synchronization against concurrent inserts and deletes. Add a new fqdir->dead flag so that timers do not attempt a rhashtable_remove_fast() operation. We also have to respect an RCU grace period before starting the rhashtable_free_and_destroy() from process context, thus we use rcu_work infrastructure. This is a refinement of a prior rough attempt to fix this bug : https://marc.info/?l=linux-netdev&m=153845936820900&w=2 Since the rhashtable cleanup is now deferred to a work queue, netns dismantles should be slightly faster. [1] BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:194 [inline] BUG: KASAN: use-after-free in rhashtable_last_table+0x162/0x180 lib/rhashtable.c:212 Read of size 8 at addr ffff8880a6497b70 by task kworker/0:0/5 CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.2.0-rc1+ #2 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: events rht_deferred_worker Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188 __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 kasan_report+0x12/0x20 mm/kasan/common.c:614 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132 __read_once_size include/linux/compiler.h:194 [inline] rhashtable_last_table+0x162/0x180 lib/rhashtable.c:212 rht_deferred_worker+0x111/0x2030 lib/rhashtable.c:411 process_one_work+0x989/0x1790 kernel/workqueue.c:2269 worker_thread+0x98/0xe40 kernel/workqueue.c:2415 kthread+0x354/0x420 kernel/kthread.c:255 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 Allocated by task 32687: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_kmalloc mm/kasan/common.c:489 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:503 __do_kmalloc_node mm/slab.c:3620 [inline] __kmalloc_node+0x4e/0x70 mm/slab.c:3627 kmalloc_node include/linux/slab.h:590 [inline] kvmalloc_node+0x68/0x100 mm/util.c:431 kvmalloc include/linux/mm.h:637 [inline] kvzalloc include/linux/mm.h:645 [inline] bucket_table_alloc+0x90/0x480 lib/rhashtable.c:178 rhashtable_init+0x3f4/0x7b0 lib/rhashtable.c:1057 inet_frags_init_net include/net/inet_frag.h:109 [inline] ipv4_frags_init_net+0x182/0x410 net/ipv4/ip_fragment.c:683 ops_init+0xb3/0x410 net/core/net_namespace.c:130 setup_net+0x2d3/0x740 net/core/net_namespace.c:316 copy_net_ns+0x1df/0x340 net/core/net_namespace.c:439 create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107 unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206 ksys_unshare+0x440/0x980 kernel/fork.c:2692 __do_sys_unshare kernel/fork.c:2760 [inline] __se_sys_unshare kernel/fork.c:2758 [inline] __x64_sys_unshare+0x31/0x40 kernel/fork.c:2758 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 7: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:451 kasan_slab_free+0xe/0x10 mm/kasan/common.c:459 __cache_free mm/slab.c:3432 [inline] kfree+0xcf/0x220 mm/slab.c:3755 kvfree+0x61/0x70 mm/util.c:460 bucket_table_free+0x69/0x150 lib/rhashtable.c:108 rhashtable_free_and_destroy+0x165/0x8b0 lib/rhashtable.c:1155 inet_frags_exit_net+0x3d/0x50 net/ipv4/inet_fragment.c:152 ipv4_frags_exit_net+0x73/0x90 net/ipv4/ip_fragment.c:695 ops_exit_list.isra.0+0xaa/0x150 net/core/net_namespace.c:154 cleanup_net+0x3fb/0x960 net/core/net_namespace.c:553 process_one_work+0x989/0x1790 kernel/workqueue.c:2269 worker_thread+0x98/0xe40 kernel/workqueue.c:2415 kthread+0x354/0x420 kernel/kthread.c:255 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 The buggy address belongs to the object at ffff8880a6497b40 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 48 bytes inside of 1024-byte region [ffff8880a6497b40, ffff8880a6497f40) The buggy address belongs to the page: page:ffffea0002992580 refcount:1 mapcount:0 mapping:ffff8880aa400ac0 index:0xffff8880a64964c0 compound_mapcount: 0 flags: 0x1fffc0000010200(slab|head) raw: 01fffc0000010200 ffffea0002916e88 ffffea000218fe08 ffff8880aa400ac0 raw: ffff8880a64964c0 ffff8880a6496040 0000000100000005 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8880a6497a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880a6497a80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc >ffff8880a6497b00: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb ^ ffff8880a6497b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880a6497c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: 648700f7 ("inet: frags: use rhashtables for reassembly units") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: Nsyzbot <syzkaller@googlegroups.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Following patch will add rcu grace period before fqdir rhashtable destruction, so we need to dynamically allocate fqdir structures to not force expensive synchronize_rcu() calls in netns dismantle path. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
fqdir will soon be dynamically allocated. We need to reach the struct net pointer from fqdir, so add it, and replace the various container_of() constructs by direct access to the new field. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
And pass an extra parameter, since we will soon dynamically allocate fqdir structures. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
(struct net *)->ipv4.fqdir will soon be a pointer, so make sure ip4_frags_ns_ctl_table[] does not reference init_net. ip4_frags_ns_ctl_register() can perform the needed initialization for all netns. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Rename the @frags fields from structs netns_ipv4, netns_ipv6, netns_nf_frag and netns_ieee802154_lowpan to @fqdir Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
1) struct netns_frags is renamed to struct fqdir This structure is really holding many frag queues in a hash table. 2) (struct inet_frag_queue)->net field is renamed to fqdir since net is generally associated to a 'struct net' pointer in networking stack. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 5月, 2019 1 次提交
-
-
由 Gen Zhang 提交于
In function ip_ra_control(), the pointer new_ra is allocated a memory space via kmalloc(). And it is used in the following codes. However, when there is a memory allocation error, kmalloc() fails. Thus null pointer dereference may happen. And it will cause the kernel to crash. Therefore, we should check the return value and handle the error. Signed-off-by: NGen Zhang <blackgod016574@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-