- 06 12月, 2016 2 次提交
-
-
由 Alexander Duyck 提交于
It has been reported that update_suffix can be expensive when it is called on a large node in which most of the suffix lengths are the same. The time required to add 200K entries had increased from around 3 seconds to almost 49 seconds. In order to address this we need to move the code for updating the suffix out of resize and instead just have it handled in the cases where we are pushing a node that increases the suffix length, or will decrease the suffix length. Fixes: 5405afd1 ("fib_trie: Add tracking value for suffix length") Reported-by: NRobert Shearman <rshearma@brocade.com> Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Reviewed-by: NRobert Shearman <rshearma@brocade.com> Tested-by: NRobert Shearman <rshearma@brocade.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
It wasn't necessary to pass a leaf in when doing the suffix updates so just drop it. Instead just pass the suffix and work with that. Since we dropped the leaf there is no need to include that in the name so the names are updated to node_push_suffix and node_pull_suffix. Finally I noticed that the logic for pulling the suffix length back actually had some issues. Specifically it would stop prematurely if there was a longer suffix, but it was not as long as the original suffix. I updated the code to address that in node_pull_suffix. Fixes: 5405afd1 ("fib_trie: Add tracking value for suffix length") Suggested-by: NRobert Shearman <rshearma@brocade.com> Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Reviewed-by: NRobert Shearman <rshearma@brocade.com> Tested-by: NRobert Shearman <rshearma@brocade.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 12月, 2016 1 次提交
-
-
由 Eli Cooper 提交于
When xfrm is applied to TSO/GSO packets, it follows this path: xfrm_output() -> xfrm_output_gso() -> skb_gso_segment() where skb_gso_segment() relies on skb->protocol to function properly. This patch sets skb->protocol to ETH_P_IP before dst_output() is called, fixing a bug where GSO packets sent through a sit tunnel are dropped when xfrm is involved. Cc: stable@vger.kernel.org Signed-off-by: NEli Cooper <elicooper@gmx.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 12月, 2016 1 次提交
-
-
由 Hongxu Jia 提交于
Since 09d96860 ("netfilter: x_tables: do compat validation via translate_table"), it used compatr structure to assign newinfo structure. In translate_compat_table of ip_tables.c and ip6_tables.c, it used compatr->hook_entry to replace info->hook_entry and compatr->underflow to replace info->underflow, but not do the same replacement in arp_tables.c. It caused invoking 32-bit "arptbale -P INPUT ACCEPT" failed in 64bit kernel. -------------------------------------- root@qemux86-64:~# arptables -P INPUT ACCEPT root@qemux86-64:~# arptables -P INPUT ACCEPT ERROR: Policy for `INPUT' offset 448 != underflow 0 arptables: Incompatible with this kernel -------------------------------------- Fixes: 09d96860 ("netfilter: x_tables: do compat validation via translate_table") Signed-off-by: NHongxu Jia <hongxu.jia@windriver.com> Acked-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 30 11月, 2016 2 次提交
-
-
由 Tobias Brunner 提交于
When handling inbound packets, the two halves of the sequence number stored on the skb are already in network order. Fixes: 7021b2e1 ("esp4: Switch to new AEAD interface") Signed-off-by: NTobias Brunner <tobias@strongswan.org> Acked-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Arnaldo Carvalho de Melo 提交于
As it may get stale and lead to use after free. Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Cc: Alexander Duyck <aduyck@mirantis.com> Cc: Andrey Konovalov <andreyknvl@google.com> Fixes: cbc53e08 ("GSO: Add GSO type for fixed IPv4 ID") Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com> Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 11月, 2016 1 次提交
-
-
由 Julian Wollrath 提交于
Signed-off-by: NJulian Wollrath <jwollrath@web.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 11月, 2016 1 次提交
-
-
由 Eric Dumazet 提交于
In commits 93821778 ("udp: Fix rcv socket locking") and f7ad74fe ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into __udpv6_queue_rcv_skb") UDP backlog handlers were renamed, but UDPlite was forgotten. This leads to crashes if UDPlite header is pulled twice, which happens starting from commit e6afc8ac ("udp: remove headers from UDP packets before queueing") Bug found by syzkaller team, thanks a lot guys ! Note that backlog use in UDP/UDPlite is scheduled to be removed starting from linux-4.10, so this patch is only needed up to linux-4.9 Fixes: 93821778 ("udp: Fix rcv socket locking") Fixes: f7ad74fe ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into __udpv6_queue_rcv_skb") Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NAndrey Konovalov <andreyknvl@google.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 11月, 2016 1 次提交
-
-
由 David Ahern 提交于
ip_route_me_harder is not considering the L3 domain and sending lookups to the wrong table. For example consider the following output rule: iptables -I OUTPUT -p tcp --dport 12345 -j REJECT --reject-with tcp-reset using perf to analyze lookups via the fib_table_lookup tracepoint shows: vrf-test 1187 [001] 46887.295927: fib:fib_table_lookup: table 255 oif 0 iif 0 src 0.0.0.0 dst 10.100.1.254 tos 0 scope 0 flags 0 ffffffff8143922c perf_trace_fib_table_lookup ([kernel.kallsyms]) ffffffff81493aac fib_table_lookup ([kernel.kallsyms]) ffffffff8148dda3 __inet_dev_addr_type ([kernel.kallsyms]) ffffffff8148ddf6 inet_addr_type ([kernel.kallsyms]) ffffffff8149e344 ip_route_me_harder ([kernel.kallsyms]) and vrf-test 1187 [001] 46887.295933: fib:fib_table_lookup: table 255 oif 0 iif 1 src 10.100.1.254 dst 10.100.1.2 tos 0 scope 0 flags ffffffff8143922c perf_trace_fib_table_lookup ([kernel.kallsyms]) ffffffff81493aac fib_table_lookup ([kernel.kallsyms]) ffffffff814998ff fib4_rule_action ([kernel.kallsyms]) ffffffff81437f35 fib_rules_lookup ([kernel.kallsyms]) ffffffff81499758 __fib_lookup ([kernel.kallsyms]) ffffffff8144f010 fib_lookup.constprop.34 ([kernel.kallsyms]) ffffffff8144f759 __ip_route_output_key_hash ([kernel.kallsyms]) ffffffff8144fc6a ip_route_output_flow ([kernel.kallsyms]) ffffffff8149e39b ip_route_me_harder ([kernel.kallsyms]) In both cases the lookups are directed to table 255 rather than the table associated with the device via the L3 domain. Update both lookups to pull the L3 domain from the dst currently attached to the skb. Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 22 11月, 2016 1 次提交
-
-
由 Florian Westphal 提交于
We need to zero out the private data area when application switches connection to different algorithm (TCP_CONGESTION setsockopt). When congestion ops get assigned at connect time everything is already zeroed because sk_alloc uses GFP_ZERO flag. But in the setsockopt case this contains whatever previous cc placed there. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 11月, 2016 2 次提交
-
-
由 Alexander Duyck 提交于
Fix a small memory leak that can occur where we leak a fib_alias in the event of us not being able to insert it into the local table. Fixes: 0ddcf43d ("ipv4: FIB Local/MAIN table collapse") Reported-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
The patch that removed the FIB offload infrastructure was a bit too aggressive and also removed code needed to clean up us splitting the table if additional rules were added. Specifically the function fib_trie_flush_external was called at the end of a new rule being added to flush the foreign trie entries from the main trie. I updated the code so that we only call fib_trie_flush_external on the main table so that we flush the entries for local from main. This way we don't call it for every rule change which is what was happening previously. Fixes: 347e3b28 ("switchdev: remove FIB offload infrastructure") Reported-by: NEric Dumazet <edumazet@google.com> Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 11月, 2016 2 次提交
-
-
由 Pablo Neira 提交于
Honor udptable parameter that is passed to __udp*_lib_mcast_deliver(), otherwise udplite broadcast/multicast use the wrong table and it breaks. Fixes: 2dc41cff ("udp: Use hash2 for long hash1 chains in __udp*_lib_mcast_deliver.") Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hangbin Liu 提交于
In commit 24cf3af3 ("igmp: call ip_mc_clear_src..."), we forgot to remove igmpv3_clear_delrec() in ip_mc_down(), which also called ip_mc_clear_src(). This make us clear all IGMPv3 source filter info after NETDEV_DOWN. Move igmpv3_clear_delrec() to ip_mc_destroy_dev() and then no need ip_mc_clear_src() in ip_mc_destroy_dev(). On the other hand, we should restore back instead of free all source filter info in igmpv3_del_delrec(). Or we will not able to restore IGMPv3 source filter info after NETDEV_UP and NETDEV_POST_TYPE_CHANGE. Fixes: 24cf3af3 ("igmp: call ip_mc_clear_src() only when ...") Signed-off-by: NHangbin Liu <liuhangbin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 11月, 2016 2 次提交
-
-
由 Eric Dumazet 提交于
With syzkaller help, Marco Grassi found a bug in TCP stack, crashing in tcp_collapse() Root cause is that sk_filter() can truncate the incoming skb, but TCP stack was not really expecting this to happen. It probably was expecting a simple DROP or ACCEPT behavior. We first need to make sure no part of TCP header could be removed. Then we need to adjust TCP_SKB_CB(skb)->end_seq Many thanks to syzkaller team and Marco for giving us a reproducer. Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NMarco Grassi <marco.gra@gmail.com> Reported-by: NVladis Dronov <vdronov@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stephen Suryaputra Lin 提交于
In v2.6, ip_rt_redirect() calls arp_bind_neighbour() which returns 0 and then the state of the neigh for the new_gw is checked. If the state isn't valid then the redirected route is deleted. This behavior is maintained up to v3.5.7 by check_peer_redirect() because rt->rt_gateway is assigned to peer->redirect_learned.a4 before calling ipv4_neigh_lookup(). After commit 5943634f ("ipv4: Maintain redirect and PMTU info in struct rtable again."), ipv4_neigh_lookup() is performed without the rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw) isn't zero, the function uses it as the key. The neigh is most likely valid since the old_gw is the one that sends the ICMP redirect message. Then the new_gw is assigned to fib_nh_exception. The problem is: the new_gw ARP may never gets resolved and the traffic is blackholed. So, use the new_gw for neigh lookup. Changes from v1: - use __ipv4_neigh_lookup instead (per Eric Dumazet). Fixes: 5943634f ("ipv4: Maintain redirect and PMTU info in struct rtable again.") Signed-off-by: NStephen Suryaputra Lin <ssurya@ieee.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 11月, 2016 1 次提交
-
-
由 Lance Richardson 提交于
This is a follow-up to commit 9ee6c5dc ("ipv4: allow local fragmentation in ip_finish_output_gso()"), updating the comment documenting cases in which fragmentation is needed for egress GSO packets. Suggested-by: NShmulik Ladkani <shmulik.ladkani@gmail.com> Reviewed-by: NShmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: NLance Richardson <lrichard@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 11月, 2016 2 次提交
-
-
由 David Ahern 提交于
Lorenzo noted an Android unit test failed due to e0d56fdd: "The expectation in the test was that the RST replying to a SYN sent to a closed port should be generated with oif=0. In other words it should not prefer the interface where the SYN came in on, but instead should follow whatever the routing table says it should do." Revert the change to ip_send_unicast_reply and tcp_v6_send_response such that the oif in the flow is set to the skb_iif only if skb_iif is an L3 master. Fixes: e0d56fdd ("net: l3mdev: remove redundant calls") Reported-by: NLorenzo Colitti <lorenzo@google.com> Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Tested-by: NLorenzo Colitti <lorenzo@google.com> Acked-by: NLorenzo Colitti <lorenzo@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
icmp_send is called in response to some event. The skb may not have the device set (skb->dev is NULL), but it is expected to have an rt. Update icmp_route_lookup to use the rt on the skb to determine L3 domain. Fixes: 613d09b3 ("net: Use VRF device index for lookups on TX") Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 11月, 2016 1 次提交
-
-
由 Alexander Duyck 提交于
The display of /proc/net/route has had a couple issues due to the fact that when I originally rewrote most of fib_trie I made it so that the iterator was tracking the next value to use instead of the current. In addition it had an off by 1 error where I was tracking the first piece of data as position 0, even though in reality that belonged to the SEQ_START_TOKEN. This patch updates the code so the iterator tracks the last reported position and key instead of the next expected position and key. In addition it shifts things so that all of the leaves start at 1 instead of trying to report leaves starting with offset 0 as being valid. With these two issues addressed this should resolve any off by one errors that were present in the display of /proc/net/route. Fixes: 25b97c01 ("ipv4: off-by-one in continuation handling in /proc/net/route") Cc: Andy Whitcroft <apw@canonical.com> Reported-by: NJason Baron <jbaron@akamai.com> Tested-by: NJason Baron <jbaron@akamai.com> Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 11月, 2016 4 次提交
-
-
由 Eric Dumazet 提交于
After my commit, tcp_sendmsg() might restart its loop after processing socket backlog. If sk_err is set, we blindly return an error, even though we copied data to user space before. We should instead return number of bytes that could be copied, otherwise user space might resend data and corrupt the stream. This might happen if another thread is using recvmsg(MSG_ERRQUEUE) to process timestamps. Issue was diagnosed by Soheil and Willem, big kudos to them ! Fixes: d41a69f1 ("tcp: make tcp_sendmsg() aware of socket backlog") Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Tested-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Lance Richardson 提交于
Some configurations (e.g. geneve interface with default MTU of 1500 over an ethernet interface with 1500 MTU) result in the transmission of packets that exceed the configured MTU. While this should be considered to be a "bad" configuration, it is still allowed and should not result in the sending of packets that exceed the configured MTU. Fix by dropping the assumption in ip_finish_output_gso() that locally originated gso packets will never need fragmentation. Basic testing using iperf (observing CPU usage and bandwidth) have shown no measurable performance impact for traffic not requiring fragmentation. Fixes: c7ba65d7 ("net: ip: push gso skb forwarding handling down the stack") Reported-by: NJan Tluka <jtluka@redhat.com> Signed-off-by: NLance Richardson <lrichard@redhat.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Imagine initial value of max_skb_frags is 17, and last skb in write queue has 15 frags. Then max_skb_frags is lowered to 14 or smaller value. tcp_sendmsg() will then be allowed to add additional page frags and eventually go past MAX_SKB_FRAGS, overflowing struct skb_shared_info. Fixes: 5f74f82e ("net:Add sysctl_max_skb_frags") Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> Cc: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 WANG Cong 提交于
Andrey reported this kernel warning: WARNING: CPU: 0 PID: 4608 at kernel/sched/core.c:7724 __might_sleep+0x14c/0x1a0 kernel/sched/core.c:7719 do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff811f5a5c>] prepare_to_wait+0xbc/0x210 kernel/sched/wait.c:178 Modules linked in: CPU: 0 PID: 4608 Comm: syz-executor Not tainted 4.9.0-rc2+ #320 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 ffff88006625f7a0 ffffffff81b46914 ffff88006625f818 0000000000000000 ffffffff84052960 0000000000000000 ffff88006625f7e8 ffffffff81111237 ffff88006aceac00 ffffffff00001e2c ffffed000cc4beff ffffffff84052960 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [<ffffffff81b46914>] dump_stack+0xb3/0x10f lib/dump_stack.c:51 [<ffffffff81111237>] __warn+0x1a7/0x1f0 kernel/panic.c:550 [<ffffffff8111132c>] warn_slowpath_fmt+0xac/0xd0 kernel/panic.c:565 [<ffffffff811922fc>] __might_sleep+0x14c/0x1a0 kernel/sched/core.c:7719 [< inline >] slab_pre_alloc_hook mm/slab.h:393 [< inline >] slab_alloc_node mm/slub.c:2634 [< inline >] slab_alloc mm/slub.c:2716 [<ffffffff81508da0>] __kmalloc_track_caller+0x150/0x2a0 mm/slub.c:4240 [<ffffffff8146be14>] kmemdup+0x24/0x50 mm/util.c:113 [<ffffffff8388b2cf>] dccp_feat_clone_sp_val.part.5+0x4f/0xe0 net/dccp/feat.c:374 [< inline >] dccp_feat_clone_sp_val net/dccp/feat.c:1141 [< inline >] dccp_feat_change_recv net/dccp/feat.c:1141 [<ffffffff8388d491>] dccp_feat_parse_options+0xaa1/0x13d0 net/dccp/feat.c:1411 [<ffffffff83894f01>] dccp_parse_options+0x721/0x1010 net/dccp/options.c:128 [<ffffffff83891280>] dccp_rcv_state_process+0x200/0x15b0 net/dccp/input.c:644 [<ffffffff838b8a94>] dccp_v4_do_rcv+0xf4/0x1a0 net/dccp/ipv4.c:681 [< inline >] sk_backlog_rcv ./include/net/sock.h:872 [<ffffffff82b7ceb6>] __release_sock+0x126/0x3a0 net/core/sock.c:2044 [<ffffffff82b7d189>] release_sock+0x59/0x1c0 net/core/sock.c:2502 [< inline >] inet_wait_for_connect net/ipv4/af_inet.c:547 [<ffffffff8316b2a2>] __inet_stream_connect+0x5d2/0xbb0 net/ipv4/af_inet.c:617 [<ffffffff8316b8d5>] inet_stream_connect+0x55/0xa0 net/ipv4/af_inet.c:656 [<ffffffff82b705e4>] SYSC_connect+0x244/0x2f0 net/socket.c:1533 [<ffffffff82b72dd4>] SyS_connect+0x24/0x30 net/socket.c:1514 [<ffffffff83fbf701>] entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:209 Unlike commit 26cabd31 ("sched, net: Clean up sk_wait_event() vs. might_sleep()"), the sleeping function is called before schedule_timeout(), this is indeed a bug. Fix this by moving the wait logic to the new API, it is similar to commit ff960a73 ("netdev, sched/wait: Fix sleeping inside wait event"). Reported-by: NAndrey Konovalov <andreyknvl@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 11月, 2016 1 次提交
-
-
由 Florian Westphal 提交于
If a congestion control module doesn't provide .undo_cwnd function, tcp_undo_cwnd_reduction() will set cwnd to tp->snd_cwnd = max(tp->snd_cwnd, tp->snd_ssthresh << 1); ... which makes sense for reno (it sets ssthresh to half the current cwnd), but it makes no sense for dctcp, which sets ssthresh based on the current congestion estimate. This can cause severe growth of cwnd (eventually overflowing u32). Fix this by saving last cwnd on loss and restore cwnd based on that, similar to cubic and other algorithms. Fixes: e3118e83 ("net: tcp: add DCTCP congestion control algorithm") Cc: Lawrence Brakmo <brakmo@fb.com> Cc: Andrew Shewmaker <agshew@gmail.com> Cc: Glenn Judd <glenn.judd@morganstanley.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 10月, 2016 1 次提交
-
-
由 Liping Zhang 提交于
The NFTA_DUP_SREG_DEV attribute is not a must option, so we should use it in routing lookup only when the user specify it. Fixes: d877f071 ("netfilter: nf_tables: add nft_dup expression") Signed-off-by: NLiping Zhang <zlpnobody@gmail.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 27 10月, 2016 1 次提交
-
-
由 Eric Dumazet 提交于
First bug was added in commit ad6f939a ("ip: Add offset parameter to ip_cmsg_recv") : Tom missed that ipv4 udp messages could be received on AF_INET6 socket. ip_cmsg_recv(msg, skb) should have been replaced by ip_cmsg_recv_offset(msg, skb, sizeof(struct udphdr)); Then commit e6afc8ac ("udp: remove headers from UDP packets before queueing") forgot to adjust the offsets now UDP headers are pulled before skb are put in receive queue. Fixes: ad6f939a ("ip: Add offset parameter to ip_cmsg_recv") Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing") Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Sam Kumar <samanthakumar@google.com> Cc: Willem de Bruijn <willemb@google.com> Tested-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 10月, 2016 1 次提交
-
-
由 WANG Cong 提交于
This reverts commit a681574c ("ipv4: disable BH in set_ping_group_range()") because we never read ping_group_range in BH context (unlike local_port_range). Then, since we already have a lock for ping_group_range, those using ip_local_ports.lock for ping_group_range are clearly typos. We might consider to share a same lock for both ping_group_range and local_port_range w.r.t. space saving, but that should be for net-next. Fixes: a681574c ("ipv4: disable BH in set_ping_group_range()") Fixes: ba6b918a ("ping: move ping_group_range out of CONFIG_SYSCTL") Cc: Eric Dumazet <edumazet@google.com> Cc: Eric Salo <salo@google.com> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 10月, 2016 3 次提交
-
-
由 Eric Dumazet 提交于
In commit 4ee3bd4a ("ipv4: disable BH when changing ip local port range") Cong added BH protection in set_local_port_range() but missed that same fix was needed in set_ping_group_range() Fixes: b8f1a556 ("udp: Add function to make source port for UDP tunnels") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NEric Salo <salo@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Baozeng Ding reported KASAN traces showing uses after free in udp_lib_get_port() and other related UDP functions. A CONFIG_DEBUG_PAGEALLOC=y kernel would eventually crash. I could write a reproducer with two threads doing : static int sock_fd; static void *thr1(void *arg) { for (;;) { connect(sock_fd, (const struct sockaddr *)arg, sizeof(struct sockaddr_in)); } } static void *thr2(void *arg) { struct sockaddr_in unspec; for (;;) { memset(&unspec, 0, sizeof(unspec)); connect(sock_fd, (const struct sockaddr *)&unspec, sizeof(unspec)); } } Problem is that udp_disconnect() could run without holding socket lock, and this was causing list corruptions. Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NBaozeng Ding <sploving1@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sabrina Dubroca 提交于
Currently, GRO can do unlimited recursion through the gro_receive handlers. This was fixed for tunneling protocols by limiting tunnel GRO to one level with encap_mark, but both VLAN and TEB still have this problem. Thus, the kernel is vulnerable to a stack overflow, if we receive a packet composed entirely of VLAN headers. This patch adds a recursion counter to the GRO layer to prevent stack overflow. When a gro_receive function hits the recursion limit, GRO is aborted for this skb and it is processed normally. This recursion counter is put in the GRO CB, but could be turned into a percpu counter if we run out of space in the CB. Thanks to Vladimír Beneš <vbenes@redhat.com> for the initial bug report. Fixes: CVE-2016-7039 Fixes: 9b174d88 ("net: Add Transparent Ethernet Bridging GRO support.") Fixes: 66e5133f ("vlan: Add GRO support for non hardware accelerated vlan") Signed-off-by: NSabrina Dubroca <sd@queasysnail.net> Reviewed-by: NJiri Benc <jbenc@redhat.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 10月, 2016 1 次提交
-
-
由 Eric Dumazet 提交于
Since commit b2fb4f54 ("tcp: uninline tcp_prequeue()") we no longer access sysctl_tcp_low_latency from a module. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 10月, 2016 2 次提交
-
-
由 Tobias Klauser 提交于
Remove the unused but set variable icsk in listening_get_next to fix the following GCC warning when building with 'W=1': net/ipv4/tcp_ipv4.c: In function ‘listening_get_next’: net/ipv4/tcp_ipv4.c:1890:31: warning: variable ‘icsk’ set but not used [-Wunused-but-set-variable] Signed-off-by: NTobias Klauser <tklauser@distanz.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tobias Klauser 提交于
Remove the unused but set variable dev in ip_do_fragment to fix the following GCC warning when building with 'W=1': net/ipv4/ip_output.c: In function ‘ip_do_fragment’: net/ipv4/ip_output.c:541:21: warning: variable ‘dev’ set but not used [-Wunused-but-set-variable] Signed-off-by: NTobias Klauser <tklauser@distanz.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 10月, 2016 1 次提交
-
-
由 David Ahern 提交于
Currently, socket lookups for l3mdev (vrf) use cases can match a socket that is bound to a port but not a device (ie., a global socket). If the sysctl tcp_l3mdev_accept is not set this leads to ack packets going out based on the main table even though the packet came in from an L3 domain. The end result is that the connection does not establish creating confusion for users since the service is running and a socket shows in ss output. Fix by requiring an exact dif to sk_bound_dev_if match if the skb came through an interface enslaved to an l3mdev device and the tcp_l3mdev_accept is not set. skb's through an l3mdev interface are marked by setting a flag in inet{6}_skb_parm. The IPv6 variant is already set; this patch adds the flag for IPv4. Using an skb flag avoids a device lookup on the dif. The flag is set in the VRF driver using the IP{6}CB macros. For IPv4, the inet_skb_parm struct is moved in the cb per commit 971f10ec, so the match function in the TCP stack needs to use TCP_SKB_CB. For IPv6, the move is done after the socket lookup, so IP6CB is used. The flags field in inet_skb_parm struct needs to be increased to add another flag. There is currently a 1-byte hole following the flags, so it can be expanded to u16 without increasing the size of the struct. Fixes: 193125db ("net: Introduce VRF device driver") Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 10月, 2016 1 次提交
-
-
由 David Ahern 提交于
Commit e0d56fdd was a bit aggressive removing l3mdev calls in the IPv4 stack. If the fib_lookup fails we do not want to drop to make_route if the oif is an l3mdev device. Also reverts 19664c6a ("net: l3mdev: Remove netif_index_is_l3_master") which removed netif_index_is_l3_master. Fixes: e0d56fdd ("net: l3mdev: remove redundant calls") Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 10月, 2016 2 次提交
-
-
由 Alexey Dobriyan 提交于
Current supplementary groups code can massively overallocate memory and is implemented in a way so that access to individual gid is done via 2D array. If number of gids is <= 32, memory allocation is more or less tolerable (140/148 bytes). But if it is not, code allocates full page (!) regardless and, what's even more fun, doesn't reuse small 32-entry array. 2D array means dependent shifts, loads and LEAs without possibility to optimize them (gid is never known at compile time). All of the above is unnecessary. Switch to the usual trailing-zero-len-array scheme. Memory is allocated with kmalloc/vmalloc() and only as much as needed. Accesses become simpler (LEA 8(gi,idx,4) or even without displacement). Maximum number of gids is 65536 which translates to 256KB+8 bytes. I think kernel can handle such allocation. On my usual desktop system with whole 9 (nine) aux groups, struct group_info shrinks from 148 bytes to 44 bytes, yay! Nice side effects: - "gi->gid[i]" is shorter than "GROUP_AT(gi, i)", less typing, - fix little mess in net/ipv4/ping.c should have been using GROUP_AT macro but this point becomes moot, - aux group allocation is persistent and should be accounted as such. Link: http://lkml.kernel.org/r/20160817201927.GA2096@p183.telecom.bySigned-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Cc: Vasily Kulikov <segoon@openwall.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
The cgroup core and the memory controller need to track socket ownership for different purposes, but the tracking sites being entirely different is kind of ugly. Be a better citizen and rename the memory controller callbacks to match the cgroup core callbacks, then move them to the same place. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20160914194846.11153-3-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NTejun Heo <tj@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 10月, 2016 1 次提交
-
-
由 Al Viro 提交于
since pipe_lock is the outermost now, we don't need to drop/regain socket locks around the call of splice_to_pipe() from skb_splice_bits(), which kills the need to have a socket-specific callback; we can just call splice_to_pipe() and be done with that. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 30 9月, 2016 1 次提交
-
-
由 Jia He 提交于
This is to suppress the checkpatch.pl warning "Comparison to NULL could be written". No functional changes here. Signed-off-by: NJia He <hejianet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-