- 04 9月, 2017 1 次提交
-
-
由 Jesper Dangaard Brouer 提交于
This reverts commit 6d7b857d. There is a bug in fragmentation codes use of the percpu_counter API, that can cause issues on systems with many CPUs. The frag_mem_limit() just reads the global counter (fbc->count), without considering other CPUs can have upto batch size (130K) that haven't been subtracted yet. Due to the 3MBytes lower thresh limit, this become dangerous at >=24 CPUs (3*1024*1024/130000=24). The correct API usage would be to use __percpu_counter_compare() which does the right thing, and takes into account the number of (online) CPUs and batch size, to account for this and call __percpu_counter_sum() when needed. We choose to revert the use of the lib/percpu_counter API for frag memory accounting for several reasons: 1) On systems with CPUs > 24, the heavier fully locked __percpu_counter_sum() is always invoked, which will be more expensive than the atomic_t that is reverted to. Given systems with more than 24 CPUs are becoming common this doesn't seem like a good option. To mitigate this, the batch size could be decreased and thresh be increased. 2) The add_frag_mem_limit+sub_frag_mem_limit pairs happen on the RX CPU, before SKBs are pushed into sockets on remote CPUs. Given NICs can only hash on L2 part of the IP-header, the NIC-RXq's will likely be limited. Thus, a fair chance that atomic add+dec happen on the same CPU. Revert note that commit 1d6119ba ("net: fix percpu memory leaks") removed init_frag_mem_limit() and instead use inet_frags_init_net(). After this revert, inet_frags_uninit_net() becomes empty. Fixes: 6d7b857d ("net: use lib/percpu_counter API for fragmentation mem accounting") Fixes: 1d6119ba ("net: fix percpu memory leaks") Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Acked-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 9月, 2017 1 次提交
-
-
由 Yossi Kuperman 提交于
After commit dce4551c ("udp: preserve head state for IP_CMSG_PASSSEC") we preserve the secpath for the whole skb lifecycle, but we also end up leaking a reference to it. We must clear the head state on skb reception, if secpath is present. Fixes: dce4551c ("udp: preserve head state for IP_CMSG_PASSSEC") Signed-off-by: NYossi Kuperman <yossiku@mellanox.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 8月, 2017 2 次提交
-
-
由 Paolo Abeni 提交于
Currently, in the udp6 code, the dst cookie is not initialized/updated concurrently with the RX dst used by early demux. As a result, the dst_check() in the early_demux path always fails, the rx dst cache is always invalidated, and we can't really leverage significant gain from the demux lookup. Fix it adding udp6 specific variant of sk_rx_dst_set() and use it to set the dst cookie when the dst entry is really changed. The issue is there since the introduction of early demux for ipv6. Fixes: 5425077d ("net: ipv6: Add early demux handler for UDP unicast") Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sabrina Dubroca 提交于
There are a few bugs around refcnt handling in the new BPF congestion control setsockopt: - The new ca is assigned to icsk->icsk_ca_ops even in the case where we cannot get a reference on it. This would lead to a use after free, since that ca is going away soon. - Changing the congestion control case doesn't release the refcnt on the previous ca. - In the reinit case, we first leak a reference on the old ca, then we call tcp_reinit_congestion_control on the ca that we have just assigned, leading to deinitializing the wrong ca (->release of the new ca on the old ca's data) and releasing the refcount on the ca that we actually want to use. This is visible by building (for example) BIC as a module and setting net.ipv4.tcp_congestion_control=bic, and using tcp_cong_kern.c from samples/bpf. This patch fixes the refcount issues, and moves reinit back into tcp core to avoid passing a ca pointer back to BPF. Fixes: 91b5b21c ("bpf: Add support for changing congestion control") Signed-off-by: NSabrina Dubroca <sd@queasysnail.net> Acked-by: NLawrence Brakmo <brakmo@fb.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 8月, 2017 2 次提交
-
-
由 Steffen Klassert 提交于
We use skb_availroom to calculate the skb tailroom for the ESP trailer. skb_availroom calculates the tailroom and subtracts this value by reserved_tailroom. However reserved_tailroom is a union with the skb mark. This means that we subtract the tailroom by the skb mark if set. Fix this by using skb_tailroom instead. Fixes: cac2661c ("esp4: Avoid skb_cow_data whenever possible") Fixes: 03e2a30f ("esp6: Avoid skb_cow_data whenever possible") Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Steffen Klassert 提交于
We allocate the page fragment for the ESP trailer inside a spinlock, but consume it outside of the lock. This is racy as some other cou could get the same page fragment then. Fix this by consuming the page fragment inside the lock too. Fixes: cac2661c ("esp4: Avoid skb_cow_data whenever possible") Fixes: 03e2a30f ("esp6: Avoid skb_cow_data whenever possible") Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
- 19 8月, 2017 3 次提交
-
-
由 Neal Cardwell 提交于
In some situations tcp_send_loss_probe() can realize that it's unable to send a loss probe (TLP), and falls back to calling tcp_rearm_rto() to schedule an RTO timer. In such cases, sometimes tcp_rearm_rto() realizes that the RTO was eligible to fire immediately or at some point in the past (delta_us <= 0). Previously in such cases tcp_rearm_rto() was scheduling such "overdue" RTOs to happen at now + icsk_rto, which caused needless delays of hundreds of milliseconds (and non-linear behavior that made reproducible testing difficult). This commit changes the logic to schedule "overdue" RTOs ASAP, rather than at now + icsk_rto. Fixes: 6ba8a3b1 ("tcp: Tail loss probe (TLP)") Suggested-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Roopa Prabhu 提交于
Syzkaller hit 'general protection fault in fib_dump_info' bug on commit 4.13-rc5.. Guilty file: net/ipv4/fib_semantics.c kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Modules linked in: CPU: 0 PID: 2808 Comm: syz-executor0 Not tainted 4.13.0-rc5 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 task: ffff880078562700 task.stack: ffff880078110000 RIP: 0010:fib_dump_info+0x388/0x1170 net/ipv4/fib_semantics.c:1314 RSP: 0018:ffff880078117010 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 00000000000000fe RCX: 0000000000000002 RDX: 0000000000000006 RSI: ffff880078117084 RDI: 0000000000000030 RBP: ffff880078117268 R08: 000000000000000c R09: ffff8800780d80c8 R10: 0000000058d629b4 R11: 0000000067fce681 R12: 0000000000000000 R13: ffff8800784bd540 R14: ffff8800780d80b5 R15: ffff8800780d80a4 FS: 00000000022fa940(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004387d0 CR3: 0000000079135000 CR4: 00000000000006f0 Call Trace: inet_rtm_getroute+0xc89/0x1f50 net/ipv4/route.c:2766 rtnetlink_rcv_msg+0x288/0x680 net/core/rtnetlink.c:4217 netlink_rcv_skb+0x340/0x470 net/netlink/af_netlink.c:2397 rtnetlink_rcv+0x28/0x30 net/core/rtnetlink.c:4223 netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline] netlink_unicast+0x4c4/0x6e0 net/netlink/af_netlink.c:1291 netlink_sendmsg+0x8c4/0xca0 net/netlink/af_netlink.c:1854 sock_sendmsg_nosec net/socket.c:633 [inline] sock_sendmsg+0xca/0x110 net/socket.c:643 ___sys_sendmsg+0x779/0x8d0 net/socket.c:2035 __sys_sendmsg+0xd1/0x170 net/socket.c:2069 SYSC_sendmsg net/socket.c:2080 [inline] SyS_sendmsg+0x2d/0x50 net/socket.c:2076 entry_SYSCALL_64_fastpath+0x1a/0xa5 RIP: 0033:0x4512e9 RSP: 002b:00007ffc75584cc8 EFLAGS: 00000216 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00000000004512e9 RDX: 0000000000000000 RSI: 0000000020f2cfc8 RDI: 0000000000000003 RBP: 000000000000000e R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000216 R12: fffffffffffffffe R13: 0000000000718000 R14: 0000000020c44ff0 R15: 0000000000000000 Code: 00 0f b6 8d ec fd ff ff 48 8b 85 f0 fd ff ff 88 48 17 48 8b 45 28 48 8d 78 30 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e cb 0c 00 00 48 8b 45 28 44 RIP: fib_dump_info+0x388/0x1170 net/ipv4/fib_semantics.c:1314 RSP: ffff880078117010 ---[ end trace 254a7af28348f88b ]--- This patch adds a res->fi NULL check. example run: $ip route get 0.0.0.0 iif virt1-0 broadcast 0.0.0.0 dev lo cache <local,brd> iif virt1-0 $ip route get 0.0.0.0 iif virt1-0 fibmatch RTNETLINK answers: No route to host Reported-by: Nidaifish <idaifish@gmail.com> Reported-by: NDmitry Vyukov <dvyukov@google.com> Fixes: b6179813 ("net: ipv4: RTM_GETROUTE: return matched fib result when requested") Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Matthew Dawson 提交于
Due to commit e6afc8ac ("udp: remove headers from UDP packets before queueing"), when udp packets are being peeked the requested extra offset is always 0 as there is no need to skip the udp header. However, when the offset is 0 and the next skb is of length 0, it is only returned once. The behaviour can be seen with the following python script: from socket import *; f=socket(AF_INET6, SOCK_DGRAM | SOCK_NONBLOCK, 0); g=socket(AF_INET6, SOCK_DGRAM | SOCK_NONBLOCK, 0); f.bind(('::', 0)); addr=('::1', f.getsockname()[1]); g.sendto(b'', addr) g.sendto(b'b', addr) print(f.recvfrom(10, MSG_PEEK)); print(f.recvfrom(10, MSG_PEEK)); Where the expected output should be the empty string twice. Instead, make sk_peek_offset return negative values, and pass those values to __skb_try_recv_datagram/__skb_try_recv_from_queue. If the passed offset to __skb_try_recv_from_queue is negative, the checked skb is never skipped. __skb_try_recv_from_queue will then ensure the offset is reset back to 0 if a peek is requested without an offset, unless no packets are found. Also simplify the if condition in __skb_try_recv_from_queue. If _off is greater then 0, and off is greater then or equal to skb->len, then (_off || skb->len) must always be true assuming skb->len >= 0 is always true. Also remove a redundant check around a call to sk_peek_offset in af_unix.c, as it double checked if MSG_PEEK was set in the flags. V2: - Moved the negative fixup into __skb_try_recv_from_queue, and remove now redundant checks - Fix peeking in udp{,v6}_recvmsg to report the right value when the offset is 0 V3: - Marked new branch in __skb_try_recv_from_queue as unlikely. Signed-off-by: NMatthew Dawson <matthew@mjdsystems.ca> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 8月, 2017 2 次提交
-
-
由 Eric Dumazet 提交于
While working on yet another syzkaller report, I found that our IP_MAX_MTU enforcements were not properly done. gcc seems to reload dev->mtu for min(dev->mtu, IP_MAX_MTU), and final result can be bigger than IP_MAX_MTU :/ This is a problem because device mtu can be changed on other cpus or threads. While this patch does not fix the issue I am working on, it is probably worth addressing it. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Anuradha reported that statically added groups for interfaces enslaved to a VRF device were not persisting. The problem is that igmp queries and reports need to use the data in the in_dev for the real ingress device rather than the VRF device. Update igmp_rcv accordingly. Fixes: e58e4159 ("net: Enable support for VRF with ipv4 multicast") Reported-by: NAnuradha Karuppiah <anuradhak@cumulusnetworks.com> Signed-off-by: NDavid Ahern <dsahern@gmail.com> Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 8月, 2017 1 次提交
-
-
由 Eric Dumazet 提交于
If fi->fib_metrics could not be allocated in fib_create_info() we attempt to dereference a NULL pointer in free_fib_info_rcu() : m = fi->fib_metrics; if (m != &dst_default_metrics && atomic_dec_and_test(&m->refcnt)) kfree(m); Before my recent patch, we used to call kfree(NULL) and nothing wrong happened. Instead of using RCU to defer freeing while we are under memory stress, it seems better to take immediate action. This was reported by syzkaller team. Fixes: 3fb07daf ("ipv4: add reference counting to metrics") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NDmitry Vyukov <dvyukov@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 8月, 2017 3 次提交
-
-
由 Eric Dumazet 提交于
Filtering the ACK packet was not put at the right place. At this place, we already allocated a child and put it into accept queue. We absolutely need to call tcp_child_process() to release its spinlock, or we will deadlock at accept() or close() time. Found by syzkaller team (Thanks a lot !) Fixes: 8fac365f ("tcp: Add a tcp_filter hook before handle ack packet") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NDmitry Vyukov <dvyukov@google.com> Cc: Chenbo Feng <fengc@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sabrina Dubroca 提交于
__tcp_ulp_find_autoload returns tcp_ulp_ops after taking a reference on the module. Then, if ->init fails, tcp_set_ulp propagates the error but nothing releases that reference. Fixes: 734942cc ("tcp: ULP infrastructure") Signed-off-by: NSabrina Dubroca <sd@queasysnail.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
"ip route get $daddr iif eth0 from $saddr" causes: BUG: KASAN: use-after-free in ip_route_input_rcu+0x1535/0x1b50 Call Trace: ip_route_input_rcu+0x1535/0x1b50 ip_route_input_noref+0xf9/0x190 tcp_v4_early_demux+0x1a4/0x2b0 ip_rcv+0xbcb/0xc05 __netif_receive_skb+0x9c/0xd0 netif_receive_skb_internal+0x5a8/0x890 Problem is that inet_rtm_getroute calls either ip_route_input_rcu (if an iif was provided) or ip_route_output_key_hash_rcu. But ip_route_input_rcu, unlike ip_route_output_key_hash_rcu, already associates the dst_entry with the skb. This clears the SKB_DST_NOREF bit (i.e. skb_dst_drop will release/free the entry while it should not). Thus only set the dst if we called ip_route_output_key_hash_rcu(). I tested this patch by running: while true;do ip r get 10.0.1.2;done > /dev/null & while true;do ip r get 10.0.1.2 iif eth0 from 10.0.1.1;done > /dev/null & ... and saw no crash or memory leak. Cc: Roopa Prabhu <roopa@cumulusnetworks.com> Cc: David Ahern <dsahern@gmail.com> Fixes: ba52d61e ("ipv4: route: restore skb_dst_set in inet_rtm_getroute") Signed-off-by: NFlorian Westphal <fw@strlen.de> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 8月, 2017 1 次提交
-
-
由 Willem de Bruijn 提交于
When iteratively building a UDP datagram with MSG_MORE and that datagram exceeds MTU, consistently choose UFO or fragmentation. Once skb_is_gso, always apply ufo. Conversely, once a datagram is split across multiple skbs, do not consider ufo. Sendpage already maintains the first invariant, only add the second. IPv6 does not have a sendpage implementation to modify. A gso skb must have a partial checksum, do not follow sk_no_check_tx in udp_send_skb. Found by syzkaller. Fixes: e89e9cf5 ("[IPv4/IPv6]: UFO Scatter-gather approach") Reported-by: NAndrey Konovalov <andreyknvl@google.com> Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 8月, 2017 1 次提交
-
-
由 Nikolay Borisov 提交于
Commit dcd87999 ("igmp: net: Move igmp namespace init to correct file") moved the igmp sysctls initialization from tcp_sk_init to igmp_net_init. This function is only called as part of per-namespace initialization, only if CONFIG_IP_MULTICAST is defined, otherwise igmp_mc_init() call in ip_init is compiled out, casuing the igmp pernet ops to not be registerd and those sysctl being left initialized with 0. However, there are certain functions, such as ip_mc_join_group which are always compiled and make use of some of those sysctls. Let's do a partial revert of the aforementioned commit and move the sysctl initialization into inet_init_net, that way they will always have sane values. Fixes: dcd87999 ("igmp: net: Move igmp namespace init to correct file") Link: https://bugzilla.kernel.org/show_bug.cgi?id=196595Reported-by: NGerardo Exequiel Pozzi <vmlinuz386@gmail.com> Signed-off-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 8月, 2017 2 次提交
-
-
由 Willem de Bruijn 提交于
skb_warn_bad_offload triggers a warning when an skb enters the GSO stack at __skb_gso_segment that does not have CHECKSUM_PARTIAL checksum offload set. Commit b2504a5d ("net: reduce skb_warn_bad_offload() noise") observed that SKB_GSO_DODGY producers can trigger the check and that passing those packets through the GSO handlers will fix it up. But, the software UFO handler will set ip_summed to CHECKSUM_NONE. When __skb_gso_segment is called from the receive path, this triggers the warning again. Make UFO set CHECKSUM_UNNECESSARY instead of CHECKSUM_NONE. On Tx these two are equivalent. On Rx, this better matches the skb state (checksum computed), as CHECKSUM_NONE here means no checksum computed. See also this thread for context: http://patchwork.ozlabs.org/patch/799015/ Fixes: b2504a5d ("net: reduce skb_warn_bad_offload() noise") Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
With new TCP_FASTOPEN_CONNECT socket option, there is a possibility to call tcp_connect() while socket sk_dst_cache is either NULL or invalid. +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4 +0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0 +0 setsockopt(4, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0 +0 connect(4, ..., ...) = 0 << sk->sk_dst_cache becomes obsolete, or even set to NULL >> +1 sendto(4, ..., 1000, MSG_FASTOPEN, ..., ...) = 1000 We need to refresh the route otherwise bad things can happen, especially when syzkaller is running on the host :/ Fixes: 19f6d3f3 ("net/tcp-fastopen: Add new API support") Reported-by: NDmitry Vyukov <dvyukov@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: NWei Wang <weiwan@google.com> Acked-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 8月, 2017 1 次提交
-
-
由 Steffen Klassert 提交于
esp_output_tail() and esp6_output_tail() can return negative and positive error values. We currently treat only negative values as errors, fix this to treat both cases as error. Fixes: fca11ebd ("esp4: Reorganize esp_output") Fixes: 383d0350 ("esp6: Reorganize esp_output") Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
- 04 8月, 2017 4 次提交
-
-
由 Neal Cardwell 提交于
Fix a TCP loss recovery performance bug raised recently on the netdev list, in two threads: (i) July 26, 2017: netdev thread "TCP fast retransmit issues" (ii) July 26, 2017: netdev thread: "[PATCH V2 net-next] TLP: Don't reschedule PTO when there's one outstanding TLP retransmission" The basic problem is that incoming TCP packets that did not indicate forward progress could cause the xmit timer (TLP or RTO) to be rearmed and pushed back in time. In certain corner cases this could result in the following problems noted in these threads: - Repeated ACKs coming in with bogus SACKs corrupted by middleboxes could cause TCP to repeatedly schedule TLPs forever. We kept sending TLPs after every ~200ms, which elicited bogus SACKs, which caused more TLPs, ad infinitum; we never fired an RTO to fill in the holes. - Incoming data segments could, in some cases, cause us to reschedule our RTO or TLP timer further out in time, for no good reason. This could cause repeated inbound data to result in stalls in outbound data, in the presence of packet loss. This commit fixes these bugs by changing the TLP and RTO ACK processing to: (a) Only reschedule the xmit timer once per ACK. (b) Only reschedule the xmit timer if tcp_clean_rtx_queue() deems the ACK indicates sufficient forward progress (a packet was cumulatively ACKed, or we got a SACK for a packet that was sent before the most recent retransmit of the write queue head). This brings us back into closer compliance with the RFCs, since, as the comment for tcp_rearm_rto() notes, we should only restart the RTO timer after forward progress on the connection. Previously we were restarting the xmit timer even in these cases where there was no forward progress. As a side benefit, this commit simplifies and speeds up the TCP timer arming logic. We had been calling inet_csk_reset_xmit_timer() three times on normal ACKs that cumulatively acknowledged some data: 1) Once near the top of tcp_ack() to switch from TLP timer to RTO: if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE) tcp_rearm_rto(sk); 2) Once in tcp_clean_rtx_queue(), to update the RTO: if (flag & FLAG_ACKED) { tcp_rearm_rto(sk); 3) Once in tcp_ack() after tcp_fastretrans_alert() to switch from RTO to TLP: if (icsk->icsk_pending == ICSK_TIME_RETRANS) tcp_schedule_loss_probe(sk); This commit, by only rescheduling the xmit timer once per ACK, simplifies the code and reduces CPU overhead. This commit was tested in an A/B test with Google web server traffic. SNMP stats and request latency metrics were within noise levels, substantiating that for normal web traffic patterns this is a rare issue. This commit was also tested with packetdrill tests to verify that it fixes the timer behavior in the corner cases discussed in the netdev threads mentioned above. This patch is a bug fix patch intended to be queued for -stable relases. Fixes: 6ba8a3b1 ("tcp: Tail loss probe (TLP)") Reported-by: NKlavs Klavsen <kl@vsen.dk> Reported-by: NMao Wenan <maowenan@huawei.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NNandita Dukkipati <nanditad@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Neal Cardwell 提交于
Have tcp_schedule_loss_probe() base the TLP scheduling decision based on when the RTO *should* fire. This is to enable the upcoming xmit timer fix in this series, where tcp_schedule_loss_probe() cannot assume that the last timer installed was an RTO timer (because we are no longer doing the "rearm RTO, rearm RTO, rearm TLP" dance on every ACK). So tcp_schedule_loss_probe() must independently figure out when an RTO would want to fire. In the new TLP implementation following in this series, we cannot assume that icsk_timeout was set based on an RTO; after processing a cumulative ACK the icsk_timeout we see can be from a previous TLP or RTO. So we need to independently recalculate the RTO time (instead of reading it out of icsk_timeout). Removing this dependency on the nature of icsk_timeout makes things a little easier to reason about anyway. Note that the old and new code should be equivalent, since they are both saying: "if the RTO is in the future, but at an earlier time than the normal TLP time, then set the TLP timer to fire when the RTO would have fired". Fixes: 6ba8a3b1 ("tcp: Tail loss probe (TLP)") Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NNandita Dukkipati <nanditad@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Neal Cardwell 提交于
Pure refactor. This helper will be required in the xmit timer fix later in the patch series. (Because the TLP logic will want to make this calculation.) Fixes: 6ba8a3b1 ("tcp: Tail loss probe (TLP)") Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NNandita Dukkipati <nanditad@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
syzkaller was able to trigger a divide by 0 in TCP stack [1] Issue here is that keepalive timer needs to be updated to not attempt to send a probe if the connection setup was deferred using TCP_FASTOPEN_CONNECT socket option added in linux-4.11 [1] divide error: 0000 [#1] SMP CPU: 18 PID: 0 Comm: swapper/18 Not tainted task: ffff986f62f4b040 ti: ffff986f62fa2000 task.ti: ffff986f62fa2000 RIP: 0010:[<ffffffff8409cc0d>] [<ffffffff8409cc0d>] __tcp_select_window+0x8d/0x160 Call Trace: <IRQ> [<ffffffff8409d951>] tcp_transmit_skb+0x11/0x20 [<ffffffff8409da21>] tcp_xmit_probe_skb+0xc1/0xe0 [<ffffffff840a0ee8>] tcp_write_wakeup+0x68/0x160 [<ffffffff840a151b>] tcp_keepalive_timer+0x17b/0x230 [<ffffffff83b3f799>] call_timer_fn+0x39/0xf0 [<ffffffff83b40797>] run_timer_softirq+0x1d7/0x280 [<ffffffff83a04ddb>] __do_softirq+0xcb/0x257 [<ffffffff83ae03ac>] irq_exit+0x9c/0xb0 [<ffffffff83a04c1a>] smp_apic_timer_interrupt+0x6a/0x80 [<ffffffff83a03eaf>] apic_timer_interrupt+0x7f/0x90 <EOI> [<ffffffff83fed2ea>] ? cpuidle_enter_state+0x13a/0x3b0 [<ffffffff83fed2cd>] ? cpuidle_enter_state+0x11d/0x3b0 Tested: Following packetdrill no longer crashes the kernel `echo 0 >/proc/sys/net/ipv4/tcp_timestamps` // Cache warmup: send a Fast Open cookie request 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0 +0 setsockopt(3, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0 +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation is now in progress) +0 > S 0:0(0) <mss 1460,nop,nop,sackOK,nop,wscale 8,FO,nop,nop> +.01 < S. 123:123(0) ack 1 win 14600 <mss 1460,nop,nop,sackOK,nop,wscale 6,FO abcd1234,nop,nop> +0 > . 1:1(0) ack 1 +0 close(3) = 0 +0 > F. 1:1(0) ack 1 +0 < F. 1:1(0) ack 2 win 92 +0 > . 2:2(0) ack 2 +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4 +0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0 +0 setsockopt(4, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0 +0 setsockopt(4, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0 +.01 connect(4, ..., ...) = 0 +0 setsockopt(4, SOL_TCP, TCP_KEEPIDLE, [5], 4) = 0 +10 close(4) = 0 `echo 1 >/proc/sys/net/ipv4/tcp_timestamps` Fixes: 19f6d3f3 ("net/tcp-fastopen: Add new API support") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NDmitry Vyukov <dvyukov@google.com> Cc: Wei Wang <weiwan@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 8月, 2017 1 次提交
-
-
由 Yuchung Cheng 提交于
If the sender switches the congestion control during ECN-triggered cwnd-reduction state (CA_CWR), upon exiting recovery cwnd is set to the ssthresh value calculated by the previous congestion control. If the previous congestion control is BBR that always keep ssthresh to TCP_INIFINITE_SSTHRESH, cwnd ends up being infinite. The safe step is to avoid assigning invalid ssthresh value when recovery ends. Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 8月, 2017 2 次提交
-
-
由 K. Den 提交于
In the case that GRO is turned on and the original received packet is CHECKSUM_PARTIAL, if the outer UDP header is exactly at the last csum-unnecessary point, which for instance could occur if the packet comes from another Linux guest on the same Linux host, we have to do either remcsum_adjust or set up CHECKSUM_PARTIAL again with its csum_start properly reset considering RCO. However, since b7fe10e5 ("gro: Fix remcsum offload to deal with frags in GRO") that barrier in such case could be skipped if GRO turned on, hence we pass over it and the inner L4 validation mistakenly reckons it as a bad csum. This patch makes remcsum_offload being reset at the same time of GRO remcsum cleanup, so as to make it work in such case as before. Fixes: b7fe10e5 ("gro: Fix remcsum offload to deal with frags in GRO") Signed-off-by: NKoichiro Den <den@klaipeden.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 yujuan.qi 提交于
in for(),if((optlen > 0) && (optptr[1] == 0)), enter infinite loop. Test: receive a packet which the ip length > 20 and the first byte of ip option is 0, produce this issue Signed-off-by: Nyujuan.qi <yujuan.qi@mediatek.com> Acked-by: NPaul Moore <paul@paul-moore.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 8月, 2017 2 次提交
-
-
由 Ido Schimmel 提交于
Michał reported a NULL pointer deref during fib_sync_down_dev() when unregistering a netdevice. The problem is that we don't check for 'in_dev' being NULL, which can happen in very specific cases. Usually routes are flushed upon NETDEV_DOWN sent in either the netdev or the inetaddr notification chains. However, if an interface isn't configured with any IP address, then it's possible for host routes to be flushed following NETDEV_UNREGISTER, after NULLing dev->ip_ptr in inetdev_destroy(). To reproduce: $ ip link add type dummy $ ip route add local 1.1.1.0/24 dev dummy0 $ ip link del dev dummy0 Fix this by checking for the presence of 'in_dev' before referencing it. Fixes: 982acb97 ("ipv4: fib: Notify about nexthop status changes") Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Reported-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl> Tested-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taehee Yoo 提交于
If verdict is NF_STOLEN in the SYNPROXY target, the skb is consumed. However, ipt_do_table() always tries to get ip header from the skb. So that, KASAN triggers the use-after-free message. We can reproduce this message using below command. # iptables -I INPUT -p tcp -j SYNPROXY --mss 1460 [ 193.542265] BUG: KASAN: use-after-free in ipt_do_table+0x1405/0x1c10 [ ... ] [ 193.578603] Call Trace: [ 193.581590] <IRQ> [ 193.584107] dump_stack+0x68/0xa0 [ 193.588168] print_address_description+0x78/0x290 [ 193.593828] ? ipt_do_table+0x1405/0x1c10 [ 193.598690] kasan_report+0x230/0x340 [ 193.603194] __asan_report_load2_noabort+0x19/0x20 [ 193.608950] ipt_do_table+0x1405/0x1c10 [ 193.613591] ? rcu_read_lock_held+0xae/0xd0 [ 193.618631] ? ip_route_input_rcu+0x27d7/0x4270 [ 193.624348] ? ipt_do_table+0xb68/0x1c10 [ 193.629124] ? do_add_counters+0x620/0x620 [ 193.634234] ? iptable_filter_net_init+0x60/0x60 [ ... ] After this patch, only when verdict is XT_CONTINUE, ipt_do_table() tries to get ip header. Also arpt_do_table() is modified because it has same bug. Signed-off-by: NTaehee Yoo <ap420073@gmail.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 30 7月, 2017 2 次提交
-
-
由 Arnd Bergmann 提交于
When using CONFIG_UBSAN_SANITIZE_ALL, the TCP code produces a false-positive warning: net/ipv4/tcp_output.c: In function 'tcp_connect': net/ipv4/tcp_output.c:2207:40: error: array subscript is below array bounds [-Werror=array-bounds] tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start; ^~ net/ipv4/tcp_output.c:2207:40: error: array subscript is below array bounds [-Werror=array-bounds] tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~ I have opened a gcc bug for this, but distros have already shipped compilers with this problem, and it's not clear yet whether there is a way for gcc to avoid the warning. As the problem is related to the bitfield access, this introduces a temporary variable to store the old enum value. I did not notice this warning earlier, since UBSAN is disabled when building with COMPILE_TEST, and that was always turned on in both allmodconfig and randconfig tests. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81601Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
When an early demuxed packet reaches __udp6_lib_lookup_skb(), the sk reference is retrieved and used, but the relevant reference count is leaked and the socket destructor is never called. Beyond leaking the sk memory, if there are pending UDP packets in the receive queue, even the related accounted memory is leaked. In the long run, this will cause persistent forward allocation errors and no UDP skbs (both ipv4 and ipv6) will be able to reach the user-space. Fix this by explicitly accessing the early demux reference before the lookup, and properly decreasing the socket reference count after usage. Also drop the skb_steal_sock() in __udp6_lib_lookup_skb(), and the now obsoleted comment about "socket cache". The newly added code is derived from the current ipv4 code for the similar path. v1 -> v2: fixed the __udp6_lib_rcv() return code for resubmission, as suggested by Eric Reported-by: NSam Edwards <CFSworks@gmail.com> Reported-by: NMarc Haber <mh+netdev@zugschlus.de> Fixes: 5425077d ("net: ipv6: Add early demux handler for UDP unicast") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 7月, 2017 1 次提交
-
-
由 Paolo Abeni 提交于
We must use pre-processor conditional block or suitable accessors to manipulate skb->sp elsewhere builds lacking the CONFIG_XFRM will break. Fixes: dce4551c ("udp: preserve head state for IP_CMSG_PASSSEC") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 7月, 2017 1 次提交
-
-
由 Paolo Abeni 提交于
Paul Moore reported a SELinux/IP_PASSSEC regression caused by missing skb->sp at recvmsg() time. We need to preserve the skb head state to process the IP_CMSG_PASSSEC cmsg. With this commit we avoid releasing the skb head state in the BH even if a secpath is attached to the current skb, and stores the skb status (with/without head states) in the scratch area, so that we can access it at skb deallocation time, without incurring in cache-miss penalties. This also avoids misusing the skb CB for ipv6 packets, as introduced by the commit 0ddf3fb2 ("udp: preserve skb->dst if required for IP options processing"). Clean a bit the scratch area helpers implementation, to reduce the code differences between 32 and 64 bits build. Reported-by: NPaul Moore <paul@paul-moore.com> Fixes: 0a463c78 ("udp: avoid a cache miss on dequeue") Fixes: 0ddf3fb2 ("udp: preserve skb->dst if required for IP options processing") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Tested-by: NPaul Moore <paul@paul-moore.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 7月, 2017 1 次提交
-
-
由 Mahesh Bandewar 提交于
Net stack initialization currently initializes fib-trie after the first call to netdevice_notifier() call. In fact fib_trie initialization needs to happen before first rtnl_register(). It does not cause any problem since there are no devices UP at this moment, but trying to bring 'lo' UP at initialization would make this assumption wrong and exposes the issue. Fixes following crash Call Trace: ? alternate_node_alloc+0x76/0xa0 fib_table_insert+0x1b7/0x4b0 fib_magic.isra.17+0xea/0x120 fib_add_ifaddr+0x7b/0x190 fib_netdev_event+0xc0/0x130 register_netdevice_notifier+0x1c1/0x1d0 ip_fib_init+0x72/0x85 ip_rt_init+0x187/0x1e9 ip_init+0xe/0x1a inet_init+0x171/0x26c ? ipv4_offload_init+0x66/0x66 do_one_initcall+0x43/0x160 kernel_init_freeable+0x191/0x219 ? rest_init+0x80/0x80 kernel_init+0xe/0x150 ret_from_fork+0x22/0x30 Code: f6 46 23 04 74 86 4c 89 f7 e8 ae 45 01 00 49 89 c7 4d 85 ff 0f 85 7b ff ff ff 31 db eb 08 4c 89 ff e8 16 47 01 00 48 8b 44 24 38 <45> 8b 6e 14 4d 63 76 74 48 89 04 24 0f 1f 44 00 00 48 83 c4 08 RIP: kmem_cache_alloc+0xcf/0x1c0 RSP: ffff9b1500017c28 CR2: 0000000000000014 Fixes: 7b1a74fd ("[NETNS]: Refactor fib initialization so it can handle multiple namespaces.") Fixes: 7f9b8052 ("[IPV4]: fib hash|trie initialization") Signed-off-by: NMahesh Bandewar <maheshb@google.com> Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 7月, 2017 3 次提交
-
-
由 Sabrina Dubroca 提交于
When we delete a netns with a CLUSTERIP rule, clusterip_net_exit() is called first, removing /proc/net/ipt_CLUSTERIP. Then clusterip_config_entry_put() is called from clusterip_tg_destroy(), and tries to remove its entry under /proc/net/ipt_CLUSTERIP/. Fix this by checking that the parent directory of the entry to remove hasn't already been deleted. The following triggers a KASAN splat (stealing the reproducer from 202f59af, thanks to Jianlin Shi and Xin Long): ip netns add test ip link add veth0_in type veth peer name veth0_out ip link set veth0_in netns test ip netns exec test ip link set lo up ip netns exec test ip link set veth0_in up ip netns exec test iptables -I INPUT -d 1.2.3.4 -i veth0_in -j \ CLUSTERIP --new --clustermac 89:d4:47:eb:9a:fa --total-nodes 3 \ --local-node 1 --hashmode sourceip-sourceport ip netns del test Fixes: ce4ff76c ("netfilter: ipt_CLUSTERIP: make proc directory per net namespace") Signed-off-by: NSabrina Dubroca <sd@queasysnail.net> Reviewed-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Paolo Abeni 提交于
Eric noticed that in udp_recvmsg() we still need to access skb->dst while processing the IP options. Since commit 0a463c78 ("udp: avoid a cache miss on dequeue") skb->dst is no more available at recvmsg() time and bad things will happen if we enter the relevant code path. This commit address the issue, avoid clearing skb->dst if any IP options are present into the relevant skb. Since the IP CB is contained in the first skb cacheline, we can test it to decide to leverage the consume_stateless_skb() optimization, without measurable additional cost in the faster path. v1 -> v2: updated commit message tags Fixes: 0a463c78 ("udp: avoid a cache miss on dequeue") Reported-by: NAndrey Konovalov <andreyknvl@google.com> Reported-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Potapenko 提交于
KMSAN reported use of uninitialized memory in skb_set_hash_from_sk(), which originated from the TCP request socket created in cookie_v6_check(): ================================================================== BUG: KMSAN: use of uninitialized memory in tcp_transmit_skb+0xf77/0x3ec0 CPU: 1 PID: 2949 Comm: syz-execprog Not tainted 4.11.0-rc5+ #2931 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 TCP: request_sock_TCPv6: Possible SYN flooding on port 20028. Sending cookies. Check SNMP counters. Call Trace: <IRQ> __dump_stack lib/dump_stack.c:16 dump_stack+0x172/0x1c0 lib/dump_stack.c:52 kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927 __msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469 skb_set_hash_from_sk ./include/net/sock.h:2011 tcp_transmit_skb+0xf77/0x3ec0 net/ipv4/tcp_output.c:983 tcp_send_ack+0x75b/0x830 net/ipv4/tcp_output.c:3493 tcp_delack_timer_handler+0x9a6/0xb90 net/ipv4/tcp_timer.c:284 tcp_delack_timer+0x1b0/0x310 net/ipv4/tcp_timer.c:309 call_timer_fn+0x240/0x520 kernel/time/timer.c:1268 expire_timers kernel/time/timer.c:1307 __run_timers+0xc13/0xf10 kernel/time/timer.c:1601 run_timer_softirq+0x36/0xa0 kernel/time/timer.c:1614 __do_softirq+0x485/0x942 kernel/softirq.c:284 invoke_softirq kernel/softirq.c:364 irq_exit+0x1fa/0x230 kernel/softirq.c:405 exiting_irq+0xe/0x10 ./arch/x86/include/asm/apic.h:657 smp_apic_timer_interrupt+0x5a/0x80 arch/x86/kernel/apic/apic.c:966 apic_timer_interrupt+0x86/0x90 arch/x86/entry/entry_64.S:489 RIP: 0010:native_restore_fl ./arch/x86/include/asm/irqflags.h:36 RIP: 0010:arch_local_irq_restore ./arch/x86/include/asm/irqflags.h:77 RIP: 0010:__msan_poison_alloca+0xed/0x120 mm/kmsan/kmsan_instr.c:440 RSP: 0018:ffff880024917cd8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10 RAX: 0000000000000246 RBX: ffff8800224c0000 RCX: 0000000000000005 RDX: 0000000000000004 RSI: ffff880000000000 RDI: ffffea0000b6d770 RBP: ffff880024917d58 R08: 0000000000000dd8 R09: 0000000000000004 R10: 0000160000000000 R11: 0000000000000000 R12: ffffffff85abf810 R13: ffff880024917dd8 R14: 0000000000000010 R15: ffffffff81cabde4 </IRQ> poll_select_copy_remaining+0xac/0x6b0 fs/select.c:293 SYSC_select+0x4b4/0x4e0 fs/select.c:653 SyS_select+0x76/0xa0 fs/select.c:634 entry_SYSCALL_64_fastpath+0x13/0x94 arch/x86/entry/entry_64.S:204 RIP: 0033:0x4597e7 RSP: 002b:000000c420037ee0 EFLAGS: 00000246 ORIG_RAX: 0000000000000017 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004597e7 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 000000c420037ef0 R08: 000000c420037ee0 R09: 0000000000000059 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000042dc20 R13: 00000000000000f3 R14: 0000000000000030 R15: 0000000000000003 chained origin: save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302 kmsan_save_stack mm/kmsan/kmsan.c:317 kmsan_internal_chain_origin+0x12a/0x1f0 mm/kmsan/kmsan.c:547 __msan_store_shadow_origin_4+0xac/0x110 mm/kmsan/kmsan_instr.c:259 tcp_create_openreq_child+0x709/0x1ae0 net/ipv4/tcp_minisocks.c:472 tcp_v6_syn_recv_sock+0x7eb/0x2a30 net/ipv6/tcp_ipv6.c:1103 tcp_get_cookie_sock+0x136/0x5f0 net/ipv4/syncookies.c:212 cookie_v6_check+0x17a9/0x1b50 net/ipv6/syncookies.c:245 tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989 tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298 tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487 ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279 NF_HOOK ./include/linux/netfilter.h:257 ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322 dst_input ./include/net/dst.h:492 ip6_rcv_finish net/ipv6/ip6_input.c:69 NF_HOOK ./include/linux/netfilter.h:257 ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203 __netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208 __netif_receive_skb net/core/dev.c:4246 process_backlog+0x667/0xba0 net/core/dev.c:4866 napi_poll net/core/dev.c:5268 net_rx_action+0xc95/0x1590 net/core/dev.c:5333 __do_softirq+0x485/0x942 kernel/softirq.c:284 origin: save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302 kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198 kmsan_kmalloc+0x7f/0xe0 mm/kmsan/kmsan.c:337 kmem_cache_alloc+0x1c2/0x1e0 mm/slub.c:2766 reqsk_alloc ./include/net/request_sock.h:87 inet_reqsk_alloc+0xa4/0x5b0 net/ipv4/tcp_input.c:6200 cookie_v6_check+0x4f4/0x1b50 net/ipv6/syncookies.c:169 tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989 tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298 tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487 ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279 NF_HOOK ./include/linux/netfilter.h:257 ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322 dst_input ./include/net/dst.h:492 ip6_rcv_finish net/ipv6/ip6_input.c:69 NF_HOOK ./include/linux/netfilter.h:257 ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203 __netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208 __netif_receive_skb net/core/dev.c:4246 process_backlog+0x667/0xba0 net/core/dev.c:4866 napi_poll net/core/dev.c:5268 net_rx_action+0xc95/0x1590 net/core/dev.c:5333 __do_softirq+0x485/0x942 kernel/softirq.c:284 ================================================================== Similar error is reported for cookie_v4_check(). Fixes: 58d607d3 ("tcp: provide skb->hash to synack packets") Signed-off-by: NAlexander Potapenko <glider@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 7月, 2017 1 次提交
-
-
由 Florian Westphal 提交于
arp packets cannot be forwarded. They can be bridged, but then they can be filtered using either ebtables or nftables bridge family. The bridge netfilter exposes a "call-arptables" switch which pushes packets into arptables, but lets not expose this for nftables, so better close this asap. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 16 7月, 2017 2 次提交
-
-
由 Neal Cardwell 提交于
Fixes the following behavior: for connections that had no RTT sample at the time of initializing congestion control, BBR was initializing the pacing rate to a high nominal rate (based an a guess of RTT=1ms, in case this is LAN traffic). Then BBR never adjusted the pacing rate downward upon obtaining an actual RTT sample, if the connection never filled the pipe (e.g. all sends were small app-limited writes()). This fix adjusts the pacing rate upon obtaining the first RTT sample. Fixes: 0f8782ea ("tcp_bbr: add BBR congestion control") Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Neal Cardwell 提交于
Fix a corner case noticed by Eric Dumazet, where BBR's setting sk->sk_pacing_rate to 0 during initialization could theoretically cause packets in the sending host to hang if there were packets "in flight" in the pacing infrastructure at the time the BBR congestion control state is initialized. This could occur if the pacing infrastructure happened to race with bbr_init() in a way such that the pacer read the 0 rather than the immediately following non-zero pacing rate. Fixes: 0f8782ea ("tcp_bbr: add BBR congestion control") Reported-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-