- 24 8月, 2016 4 次提交
-
-
由 David Ahern 提交于
inet_diag_find_one_icsk takes a reference to a socket that is not released if sock_diag_destroy returns an error. Fix by changing tcp_diag_destroy to manage the refcnt for all cases and remove the sock_put calls from tcp_abort. Fixes: c1e64e29 ("net: diag: Support destroying TCP sockets") Reported-by: NLorenzo Colitti <lorenzo@google.com> Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
After commit ca065d0c ("udp: no longer use SLAB_DESTROY_BY_RCU") we do not need this special allocation mode anymore, even if it is harmless. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
When sending an ack in SYN_RECV state, we must scale the offered window if wscale option was negotiated and accepted. Tested: Following packetdrill test demonstrates the issue : 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 // Establish a connection. +0 < S 0:0(0) win 20000 <mss 1000,sackOK,wscale 7, nop, TS val 100 ecr 0> +0 > S. 0:0(0) ack 1 win 28960 <mss 1460,sackOK, TS val 100 ecr 100, nop, wscale 7> +0 < . 1:11(10) ack 1 win 156 <nop,nop,TS val 99 ecr 100> // check that window is properly scaled ! +0 > . 1:1(0) ack 1 win 226 <nop,nop,TS val 200 ecr 100> Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: NYuchung Cheng <ycheng@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Laura tracked poll() [and friends] regression caused by commit e6afc8ac ("udp: remove headers from UDP packets before queueing") udp_poll() needs to know if there is a valid packet in receive queue, even if its payload length is 0. Change first_packet_length() to return an signed int, and use -1 as the indication of an empty queue. Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing") Reported-by: NLaura Abbott <labbott@redhat.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Tested-by: NLaura Abbott <labbott@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 8月, 2016 1 次提交
-
-
由 Shmulik Ladkani 提交于
In b8247f09, "net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs" gso skbs arriving from an ingress interface that go through UDP tunneling, are allowed to be fragmented if the resulting encapulated segments exceed the dst mtu of the egress interface. This aligned the behavior of gso skbs to non-gso skbs going through udp encapsulation path. However the non-gso vs gso anomaly is present also in the following cases of a GRE tunnel: - ip_gre in collect_md mode, where TUNNEL_DONT_FRAGMENT is not set (e.g. OvS vport-gre with df_default=false) - ip_gre in nopmtudisc mode, where IFLA_GRE_IGNORE_DF is set In both of the above cases, the non-gso skbs get fragmented, whereas the gso skbs (having skb_gso_network_seglen that exceeds dst mtu) get dropped, as they don't go through the segment+fragment code path. Fix: Setting IPSKB_FRAG_SEGS if the tunnel specified IP_DF bit is NOT set. Tunnels that do set IP_DF, will not go to fragmentation of segments. This preserves behavior of ip_gre in (the default) pmtudisc mode. Fixes: b8247f09 ("net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs") Reported-by: Nwenxu <wenxu@ucloud.cn> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com> Tested-by: Nwenxu <wenxu@ucloud.cn> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 8月, 2016 1 次提交
-
-
由 Xunlei Pang 提交于
1) Fix one typo: s/tn/tp/ 2) Fix the description about the "u" bits. Signed-off-by: NXunlei Pang <xlpang@redhat.com> Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 8月, 2016 1 次提交
-
-
由 Simon Horman 提交于
Ensure that the inner_protocol is set on transmit so that GSO segmentation, which relies on that field, works correctly. This is achieved by setting the inner_protocol in gre_build_header rather than each caller of that function. It ensures that the inner_protocol is set when gre_fb_xmit() is used to transmit GRE which was not previously the case. I have observed this is not the case when OvS transmits GRE using lwtunnel metadata (which it always does). Fixes: 38720352 ("gre: Use inner_proto to obtain inner header protocol") Cc: Pravin Shelar <pshelar@ovn.org> Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: NSimon Horman <simon.horman@netronome.com> Acked-by: NPravin B Shelar <pshelar@ovn.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 8月, 2016 1 次提交
-
-
由 Lance Richardson 提交于
When executing the script included below, the netns delete operation hangs with the following message (repeated at 10 second intervals): kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1 This occurs because a reference to the lo interface in the "secure" netns is still held by a dst entry in the xfrm bundle cache in the init netns. Address this problem by garbage collecting the tunnel netns flow cache when a cross-namespace vti interface receives a NETDEV_DOWN notification. A more detailed description of the problem scenario (referencing commands in the script below): (1) ip link add vti_test type vti local 1.1.1.1 remote 1.1.1.2 key 1 The vti_test interface is created in the init namespace. vti_tunnel_init() attaches a struct ip_tunnel to the vti interface's netdev_priv(dev), setting the tunnel net to &init_net. (2) ip link set vti_test netns secure The vti_test interface is moved to the "secure" netns. Note that the associated struct ip_tunnel still has tunnel->net set to &init_net. (3) ip netns exec secure ping -c 4 -i 0.02 -I 192.168.100.1 192.168.200.1 The first packet sent using the vti device causes xfrm_lookup() to be called as follows: dst = xfrm_lookup(tunnel->net, skb_dst(skb), fl, NULL, 0); Note that tunnel->net is the init namespace, while skb_dst(skb) references the vti_test interface in the "secure" namespace. The returned dst references an interface in the init namespace. Also note that the first parameter to xfrm_lookup() determines which flow cache is used to store the computed xfrm bundle, so after xfrm_lookup() returns there will be a cached bundle in the init namespace flow cache with a dst referencing a device in the "secure" namespace. (4) ip netns del secure Kernel begins to delete the "secure" namespace. At some point the vti_test interface is deleted, at which point dst_ifdown() changes the dst->dev in the cached xfrm bundle flow from vti_test to lo (still in the "secure" namespace however). Since nothing has happened to cause the init namespace's flow cache to be garbage collected, this dst remains attached to the flow cache, so the kernel loops waiting for the last reference to lo to go away. <Begin script> ip link add br1 type bridge ip link set dev br1 up ip addr add dev br1 1.1.1.1/8 ip netns add secure ip link add vti_test type vti local 1.1.1.1 remote 1.1.1.2 key 1 ip link set vti_test netns secure ip netns exec secure ip link set vti_test up ip netns exec secure ip link s lo up ip netns exec secure ip addr add dev lo 192.168.100.1/24 ip netns exec secure ip route add 192.168.200.0/24 dev vti_test ip xfrm policy flush ip xfrm state flush ip xfrm policy add dir out tmpl src 1.1.1.1 dst 1.1.1.2 \ proto esp mode tunnel mark 1 ip xfrm policy add dir in tmpl src 1.1.1.2 dst 1.1.1.1 \ proto esp mode tunnel mark 1 ip xfrm state add src 1.1.1.1 dst 1.1.1.2 proto esp spi 1 \ mode tunnel enc des3_ede 0x112233445566778811223344556677881122334455667788 ip xfrm state add src 1.1.1.2 dst 1.1.1.1 proto esp spi 1 \ mode tunnel enc des3_ede 0x112233445566778811223344556677881122334455667788 ip netns exec secure ping -c 4 -i 0.02 -I 192.168.100.1 192.168.200.1 ip netns del secure <End script> Reported-by: NHangbin Liu <haliu@redhat.com> Reported-by: NJan Tluka <jtluka@redhat.com> Signed-off-by: NLance Richardson <lrichard@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 8月, 2016 1 次提交
-
-
由 David Forster 提交于
Panic occurs when issuing "cat /proc/net/route" whilst populating FIB with > 1M routes. Use of cached node pointer in fib_route_get_idx is unsafe. BUG: unable to handle kernel paging request at ffffc90001630024 IP: [<ffffffff814cf6a0>] leaf_walk_rcu+0x10/0xe0 PGD 11b08d067 PUD 11b08e067 PMD dac4b067 PTE 0 Oops: 0000 [#1] SMP Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscac snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep virti acpi_cpufreq button parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache jbd tio_ring virtio floppy uhci_hcd ehci_hcd usbcore usb_common libata scsi_mod CPU: 1 PID: 785 Comm: cat Not tainted 4.2.0-rc8+ #4 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 task: ffff8800da1c0bc0 ti: ffff88011a05c000 task.ti: ffff88011a05c000 RIP: 0010:[<ffffffff814cf6a0>] [<ffffffff814cf6a0>] leaf_walk_rcu+0x10/0xe0 RSP: 0018:ffff88011a05fda0 EFLAGS: 00010202 RAX: ffff8800d8a40c00 RBX: ffff8800da4af940 RCX: ffff88011a05ff20 RDX: ffffc90001630020 RSI: 0000000001013531 RDI: ffff8800da4af950 RBP: 0000000000000000 R08: ffff8800da1f9a00 R09: 0000000000000000 R10: ffff8800db45b7e4 R11: 0000000000000246 R12: ffff8800da4af950 R13: ffff8800d97a74c0 R14: 0000000000000000 R15: ffff8800d97a7480 FS: 00007fd3970e0700(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffc90001630024 CR3: 000000011a7e4000 CR4: 00000000000006e0 Stack: ffffffff814d00d3 0000000000000000 ffff88011a05ff20 ffff8800da1f9a00 ffffffff811dd8b9 0000000000000800 0000000000020000 00007fd396f35000 ffffffff811f8714 0000000000003431 ffffffff8138dce0 0000000000000f80 Call Trace: [<ffffffff814d00d3>] ? fib_route_seq_start+0x93/0xc0 [<ffffffff811dd8b9>] ? seq_read+0x149/0x380 [<ffffffff811f8714>] ? fsnotify+0x3b4/0x500 [<ffffffff8138dce0>] ? process_echoes+0x70/0x70 [<ffffffff8121cfa7>] ? proc_reg_read+0x47/0x70 [<ffffffff811bb823>] ? __vfs_read+0x23/0xd0 [<ffffffff811bbd42>] ? rw_verify_area+0x52/0xf0 [<ffffffff811bbe61>] ? vfs_read+0x81/0x120 [<ffffffff811bcbc2>] ? SyS_read+0x42/0xa0 [<ffffffff81549ab2>] ? entry_SYSCALL_64_fastpath+0x16/0x75 Code: 48 85 c0 75 d8 f3 c3 31 c0 c3 f3 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 a 04 89 f0 33 02 44 89 c9 48 d3 e8 0f b6 4a 05 49 89 RIP [<ffffffff814cf6a0>] leaf_walk_rcu+0x10/0xe0 RSP <ffff88011a05fda0> CR2: ffffc90001630024 Signed-off-by: NDave Forster <dforster@brocade.com> Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 7月, 2016 1 次提交
-
-
由 Soheil Hassas Yeganeh 提交于
tcp_select_initial_window() intends to advertise a window scaling for the maximum possible window size. To do so, it considers the maximum of net.ipv4.tcp_rmem[2] and net.core.rmem_max as the only possible upper-bounds. However, users with CAP_NET_ADMIN can use SO_RCVBUFFORCE to set the socket's receive buffer size to values larger than net.ipv4.tcp_rmem[2] and net.core.rmem_max. Thus, SO_RCVBUFFORCE is effectively ignored by tcp_select_initial_window(). To fix this, consider the maximum of net.ipv4.tcp_rmem[2], net.core.rmem_max and socket's initial buffer space. Fixes: b0573dea ("[NET]: Introduce SO_{SND,RCV}BUFFORCE socket options") Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com> Suggested-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 7月, 2016 1 次提交
-
-
由 Nikolay Aleksandrov 提交于
Currently lastuse is updated on entry creation and cache hit, but it should also be updated on entry change. Since both on add and update the ttl array is updated we can simply update the lastuse in ipmr_update_thresholds. Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> CC: Roopa Prabhu <roopa@cumulusnetworks.com> CC: Donald Sharp <sharpd@cumulusnetworks.com> CC: David S. Miller <davem@davemloft.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 7月, 2016 1 次提交
-
-
由 Daniel Borkmann 提交于
After a6127697 ("udp: prevent bugcheck if filter truncates packet too much"), there followed various other fixes for similar cases such as f4979fce ("rose: limit sk_filter trim to payload"). Latter introduced a new helper sk_filter_trim_cap(), where we can pass the trim limit directly to the socket filter handling. Make use of it here as well with sizeof(struct udphdr) as lower cap limit and drop the extra skb->len test in UDP's input path. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Cc: Willem de Bruijn <willemb@google.com> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 7月, 2016 2 次提交
-
-
由 Shmulik Ladkani 提交于
net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs Given: - tap0 and vxlan0 are bridged - vxlan0 stacked on eth0, eth0 having small mtu (e.g. 1400) Assume GSO skbs arriving from tap0 having a gso_size as determined by user-provided virtio_net_hdr (e.g. 1460 corresponding to VM mtu of 1500). After encapsulation these skbs have skb_gso_network_seglen that exceed eth0's ip_skb_dst_mtu. These skbs are accidentally passed to ip_finish_output2 AS IS. Alas, each final segment (segmented either by validate_xmit_skb or by hardware UFO) would be larger than eth0 mtu. As a result, those above-mtu segments get dropped on certain networks. This behavior is not aligned with the NON-GSO case: Assume a non-gso 1500-sized IP packet arrives from tap0. After encapsulation, the vxlan datagram is fragmented normally at the ip_finish_output-->ip_fragment code path. The expected behavior for the GSO case would be segmenting the "gso-oversized" skb first, then fragmenting each segment according to dst mtu, and finally passing the resulting fragments to ip_finish_output2. 'ip_finish_output_gso' already supports this "Slowpath" behavior, according to the IPSKB_FRAG_SEGS flag, which is only set during ipv4 forwarding (not set in the bridged case). In order to support the bridged case, we'll mark skbs arriving from an ingress interface that get udp-encaspulated as "allowed to be fragmented", causing their network_seglen to be validated by 'ip_finish_output_gso' (and fragment if needed). Note the TUNNEL_DONT_FRAGMENT tun_flag is still honoured (both in the gso and non-gso cases), which serves users wishing to forbid fragmentation at the udp tunnel endpoint. Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Shmulik Ladkani 提交于
This flag indicates whether fragmentation of segments is allowed. Formerly this policy was hardcoded according to IPSKB_FORWARDED (set by either ip_forward or ipmr_forward). Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 7月, 2016 1 次提交
-
-
由 Florian Westphal 提交于
The dummy ruleset I used to test the original validation change was broken, most rules were unreachable and were not tested by mark_source_chains(). In some cases rulesets that used to load in a few seconds now require several minutes. sample ruleset that shows the behaviour: echo "*filter" for i in $(seq 0 100000);do printf ":chain_%06x - [0:0]\n" $i done for i in $(seq 0 100000);do printf -- "-A INPUT -j chain_%06x\n" $i printf -- "-A INPUT -j chain_%06x\n" $i printf -- "-A INPUT -j chain_%06x\n" $i done echo COMMIT [ pipe result into iptables-restore ] This ruleset will be about 74mbyte in size, with ~500k searches though all 500k[1] rule entries. iptables-restore will take forever (gave up after 10 minutes) Instead of always searching the entire blob for a match, fill an array with the start offsets of every single ipt_entry struct, then do a binary search to check if the jump target is present or not. After this change ruleset restore times get again close to what one gets when reverting 36472341 (~3 seconds on my workstation). [1] every user-defined rule gets an implicit RETURN, so we get 300k jumps + 100k userchains + 100k returns -> 500k rule entries Fixes: 36472341 ("netfilter: x_tables: validate targets of jumps") Reported-by: NJeff Wu <wujiafu@gmail.com> Tested-by: NJeff Wu <wujiafu@gmail.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 17 7月, 2016 1 次提交
-
-
由 Nikolay Aleksandrov 提交于
In preparation for hardware offloading of ipmr/ip6mr we need an interface that allows to check (and later update) the age of entries. Relying on stats alone can show activity but not actual age of the entry, furthermore when there're tens of thousands of entries a lot of the hardware implementations only support "hit" bits which are cleared on read to denote that the entry was active and shouldn't be aged out, these can then be naturally translated into age timestamp and will be compatible with the software forwarding age. Using a lastuse entry doesn't affect performance because the members in that cache line are written to along with the age. Since all new users are encouraged to use ipmr via netlink, this is exported via the RTA_EXPIRES attribute. Also do a minor local variable declaration style adjustment - arrange them longest to shortest. Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> CC: Roopa Prabhu <roopa@cumulusnetworks.com> CC: Shrijeet Mukherjee <shm@cumulusnetworks.com> CC: Satish Ashok <sashok@cumulusnetworks.com> CC: Donald Sharp <sharpd@cumulusnetworks.com> CC: David S. Miller <davem@davemloft.net> CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> CC: James Morris <jmorris@namei.org> CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> CC: Patrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 7月, 2016 2 次提交
-
-
由 Richard Sailer 提交于
This adds kernel-doc style descriptions for 6 functions and fixes 1 typo. Signed-off-by: NRichard Sailer <richard@weltraumpflege.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jason Baron 提交于
The per-socket rate limit for 'challenge acks' was introduced in the context of limiting ack loops: commit f2b2c582 ("tcp: mitigate ACK loops for connections as tcp_sock") And I think it can be extended to rate limit all 'challenge acks' on a per-socket basis. Since we have the global tcp_challenge_ack_limit, this patch allows for tcp_challenge_ack_limit to be set to a large value and effectively rely on the per-socket limit, or set tcp_challenge_ack_limit to a lower value and still prevents a single connections from consuming the entire challenge ack quota. It further moves in the direction of eliminating the global limit at some point, as Eric Dumazet has suggested. This a follow-up to: Subject: tcp: make challenge acks less predictable Cc: Eric Dumazet <edumazet@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Yue Cao <ycao009@ucr.edu> Signed-off-by: NJason Baron <jbaron@akamai.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 7月, 2016 5 次提交
-
-
由 Paul Gortmaker 提交于
The Makefile controlling compilation of this file is obj-y, meaning that it currently is never being built as a module. Since MODULE_ALIAS is a no-op for non-modular code, we can simply remove the MODULE_ALIAS_NETPROTO variant used here. We replace module.h with kmod.h since the file does make use of request_module() in order to load other modules from here. We don't have to worry about init.h coming in via the removed module.h since the file explicitly includes init.h already. Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: netdev@vger.kernel.org Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Julian Anastasov 提交于
Vegard Nossum is reporting for a crash in fib_dump_info when nh_dev = NULL and fib_nhs == 1: Pid: 50, comm: netlink.exe Not tainted 4.7.0-rc5+ RIP: 0033:[<00000000602b3d18>] RSP: 0000000062623890 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 000000006261b800 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000024 RDI: 000000006245ba00 RBP: 00000000626238f0 R08: 000000000000029c R09: 0000000000000000 R10: 0000000062468038 R11: 000000006245ba00 R12: 000000006245ba00 R13: 00000000625f96c0 R14: 00000000601e16f0 R15: 0000000000000000 Kernel panic - not syncing: Kernel mode fault at addr 0x2e0, ip 0x602b3d18 CPU: 0 PID: 50 Comm: netlink.exe Not tainted 4.7.0-rc5+ #581 Stack: 626238f0 960226a02 00000400 000000fe 62623910 600afca7 62623970 62623a48 62468038 00000018 00000000 00000000 Call Trace: [<602b3e93>] rtmsg_fib+0xd3/0x190 [<602b6680>] fib_table_insert+0x260/0x500 [<602b0e5d>] inet_rtm_newroute+0x4d/0x60 [<60250def>] rtnetlink_rcv_msg+0x8f/0x270 [<60267079>] netlink_rcv_skb+0xc9/0xe0 [<60250d4b>] rtnetlink_rcv+0x3b/0x50 [<60265400>] netlink_unicast+0x1a0/0x2c0 [<60265e47>] netlink_sendmsg+0x3f7/0x470 [<6021dc9a>] sock_sendmsg+0x3a/0x90 [<6021e0d0>] ___sys_sendmsg+0x300/0x360 [<6021fa64>] __sys_sendmsg+0x54/0xa0 [<6021fac0>] SyS_sendmsg+0x10/0x20 [<6001ea68>] handle_syscall+0x88/0x90 [<600295fd>] userspace+0x3fd/0x500 [<6001ac55>] fork_handler+0x85/0x90 $ addr2line -e vmlinux -i 0x602b3d18 include/linux/inetdevice.h:222 net/ipv4/fib_semantics.c:1264 Problem happens when RTNH_F_LINKDOWN is provided from user space when creating routes that do not use the flag, catched with netlink fuzzer. Currently, the kernel allows user space to set both flags to nh_flags and fib_flags but this is not intentional, the assumption was that they are not set. Fix this by rejecting both flags with EINVAL. Reported-by: NVegard Nossum <vegard.nossum@oracle.com> Fixes: 0eeb075f ("net: ipv4 sysctl option to ignore routes when nexthop link is down") Signed-off-by: NJulian Anastasov <ja@ssi.bg> Cc: Andy Gospodarek <gospo@cumulusnetworks.com> Cc: Dinesh Dutt <ddutt@cumulusnetworks.com> Cc: Scott Feldman <sfeldma@gmail.com> Reviewed-by: NAndy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Yue Cao claims that current host rate limiting of challenge ACKS (RFC 5961) could leak enough information to allow a patient attacker to hijack TCP sessions. He will soon provide details in an academic paper. This patch increases the default limit from 100 to 1000, and adds some randomization so that the attacker can no longer hijack sessions without spending a considerable amount of probes. Based on initial analysis and patch from Linus. Note that we also have per socket rate limiting, so it is tempting to remove the host limit in the future. v2: randomize the count of challenge acks per second, not the period. Fixes: 282f23c6 ("tcp: implement RFC 5961 3.2") Reported-by: NYue Cao <ycao009@ucr.edu> Signed-off-by: NEric Dumazet <edumazet@google.com> Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Simon Horman 提交于
Using a combination if #if conditionals and goto labels to unwind tunnel4_init seems unwieldy. This patch takes a simpler approach of directly unregistering previously registered protocols when an error occurs. This fixes a number of problems with the current implementation including the potential presence of labels when they are unused and the potential absence of unregister code when it is needed. Fixes: 8afe97e5 ("tunnels: support MPLS over IPv4 tunnels") Signed-off-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michal Kubeček 提交于
If socket filter truncates an udp packet below the length of UDP header in udpv6_queue_rcv_skb() or udp_queue_rcv_skb(), it will trigger a BUG_ON in skb_pull_rcsum(). This BUG_ON (and therefore a system crash if kernel is configured that way) can be easily enforced by an unprivileged user which was reported as CVE-2016-6162. For a reproducer, see http://seclists.org/oss-sec/2016/q3/8 Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing") Reported-by: NMarco Grassi <marco.gra@gmail.com> Signed-off-by: NMichal Kubecek <mkubecek@suse.cz> Acked-by: NEric Dumazet <edumazet@google.com> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 7月, 2016 1 次提交
-
-
由 Liping Zhang 提交于
When we do "cat /proc/net/nf_conntrack", and meanwhile resize the conntrack hash table via /sys/module/nf_conntrack/parameters/hashsize, race will happen, because reader can observe a newly allocated hash but the old size (or vice versa). So oops will happen like follows: BUG: unable to handle kernel NULL pointer dereference at 0000000000000017 IP: [<ffffffffa0418e21>] seq_print_acct+0x11/0x50 [nf_conntrack] Call Trace: [<ffffffffa0412f4e>] ? ct_seq_show+0x14e/0x340 [nf_conntrack] [<ffffffff81261a1c>] seq_read+0x2cc/0x390 [<ffffffff812a8d62>] proc_reg_read+0x42/0x70 [<ffffffff8123bee7>] __vfs_read+0x37/0x130 [<ffffffff81347980>] ? security_file_permission+0xa0/0xc0 [<ffffffff8123cf75>] vfs_read+0x95/0x140 [<ffffffff8123e475>] SyS_read+0x55/0xc0 [<ffffffff817c2572>] entry_SYSCALL_64_fastpath+0x1a/0xa4 It is very easy to reproduce this kernel crash. 1. open one shell and input the following cmds: while : ; do echo $RANDOM > /sys/module/nf_conntrack/parameters/hashsize done 2. open more shells and input the following cmds: while : ; do cat /proc/net/nf_conntrack done 3. just wait a monent, oops will happen soon. The solution in this patch is based on Florian's Commit 5e3c61f9 ("netfilter: conntrack: fix lookup race during hash resize"). And add a wrapper function nf_conntrack_get_ht to get hash and hsize suggested by Florian Westphal. Signed-off-by: NLiping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 10 7月, 2016 3 次提交
-
-
由 Eric Dumazet 提交于
inet_forward_change() runs with RTNL held. We are allowed to sleep if required. If we use __in_dev_get_rtnl() instead of __in_dev_get_rcu(), we no longer have to use GFP_ATOMIC allocations in inet_netconf_notify_devconf(), meaning we are less likely to miss notifications under memory pressure, and wont touch precious memory reserves either and risk dropping incoming packets. inet_netconf_get_devconf() can also use GFP_KERNEL allocation. Fixes: edc9e748 ("rtnl/ipv4: use netconf msg to advertise forwarding status") Fixes: 9e551110 ("rtnl/ipv4: add support of RTM_GETNETCONF") Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Simon Horman 提交于
Extend the IPIP driver to support MPLS over IPv4. The implementation is an extension of existing support for IPv4 over IPv4 and is based of multiple inner-protocol support for the SIT driver. Signed-off-by: NSimon Horman <simon.horman@netronome.com> Reviewed-by: NDinan Gunawardena <dinan.gunawardena@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Simon Horman 提交于
Extend tunnel support to MPLS over IPv4. The implementation extends the existing differentiation between IPIP and IPv6 over IPv4 to also cover MPLS over IPv4. Signed-off-by: NSimon Horman <simon.horman@netronome.com> Reviewed-by: NDinan Gunawardena <dinan.gunawardena@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 7月, 2016 1 次提交
-
-
由 Thomas Gleixner 提交于
Pinned timers must carry the pinned attribute in the timer structure itself, so convert the code to the new API. No functional change. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Chris Mason <clm@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: George Spelvin <linux@sciencehorizons.net> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160704094341.617891430@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 03 7月, 2016 1 次提交
-
-
由 Joe Perches 提交于
netfilter uses multiple FWINV #defines with identical form that hide a specific structure variable and dereference it with a invflags member. $ git grep "#define FWINV" include/linux/netfilter_bridge/ebtables.h:#define FWINV(bool,invflg) ((bool) ^ !!(info->invflags & invflg)) net/bridge/netfilter/ebtables.c:#define FWINV2(bool, invflg) ((bool) ^ !!(e->invflags & invflg)) net/ipv4/netfilter/arp_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(arpinfo->invflags & (invflg))) net/ipv4/netfilter/ip_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(ipinfo->invflags & (invflg))) net/ipv6/netfilter/ip6_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(ip6info->invflags & (invflg))) net/netfilter/xt_tcpudp.c:#define FWINVTCP(bool, invflg) ((bool) ^ !!(tcpinfo->invflags & (invflg))) Consolidate these macros into a single NF_INVF macro. Miscellanea: o Neaten the alignment around these uses o A few lines are > 80 columns for intelligibility Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 01 7月, 2016 2 次提交
-
-
由 Pablo Neira Ayuso 提交于
No need for a special case to handle NF_INET_POST_ROUTING, this is basically the same handling as for prerouting, input, forward. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Eric Dumazet 提交于
Some arches have virtually mapped kernel stacks, or will soon have. tcp_md5_hash_header() uses an automatic variable to copy tcp header before mangling th->check and calling crypto function, which might be problematic on such arches. David says that using percpu storage is also problematic on non SMP builds. Just use kmalloc() to allocate scratch areas. Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NAndy Lutomirski <luto@amacapital.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 6月, 2016 2 次提交
-
-
由 Shmulik Ladkani 提交于
ip_skb_dst_mtu uses skb->sk, assuming it is an AF_INET socket (e.g. it calls ip_sk_use_pmtu which casts sk as an inet_sk). However, in the case of UDP tunneling, the skb->sk is not necessarily an inet socket (could be AF_PACKET socket, or AF_UNSPEC if arriving from tun/tap). OTOH, the sk passed as an argument throughout IP stack's output path is the one which is of PMTU interest: - In case of local sockets, sk is same as skb->sk; - In case of a udp tunnel, sk is the tunneling socket. Fix, by passing ip_finish_output's sk to ip_skb_dst_mtu. This augments 7026b1dd 'netfilter: Pass socket pointer down through okfn().' Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com> Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Andrey Vagin 提交于
We found that sometimes a restored tcp socket doesn't work. A reason of this bug is incorrect window parameters and in this case tcp_acceptable_seq() returns tcp_wnd_end(tp) instead of tp->snd_nxt. The other side drops packets with this seq, because seq is less than tp->rcv_nxt ( tcp_sequence() ). Data from a send queue is sent only if there is enough space in a window, so when we restore unacked data, we need to expand a window to fit this data. This was in a first version of this patch: "tcp: extend window to fit all restored unacked data in a send queue" Then Alexey recommended me to restore window parameters instead of adjusted them according with data in a sent queue. This sounds resonable. rcv_wnd has to be restored, because it was reported to another side and the offered window is never shrunk. One of reasons why we need to restore snd_wnd was described above. Cc: Pavel Emelyanov <xemul@parallels.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: NAndrey Vagin <avagin@openvz.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 6月, 2016 1 次提交
-
-
由 Eric Dumazet 提交于
Arjun reported a bug in TCP stack and bisected it to a recent commit. In case where we process SACK, we can coalesce multiple skbs into fat ones (tcp_shift_skb_data()), to lower write queue overhead, because we do not expect to retransmit these packets. However, SACK reneging can happen, forcing the sender to retransmit all these packets. If skb->len is above 64KB, we then send buggy IP packets that could hang TSO engine on cxgb4. Neal suggested to use tcp_tso_autosize() instead of tp->gso_segs so that we cook packets of optimal size vs TCP/pacing. Thanks to Arjun for reporting the bug and running the tests ! Fixes: 10d3be56 ("tcp-tso: do not split TSO packets at retransmit time") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NArjun V <arjun@chelsio.com> Tested-by: NArjun V <arjun@chelsio.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 6月, 2016 4 次提交
-
-
由 David Ahern 提交于
Add support to inet_diag facility to filter sockets based on device index. If an interface index is in the filter only sockets bound to that index (sk_bound_dev_if) are returned. Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Goff 提交于
This fixes wrong-interface signaling on 32-bit platforms for entries created when jiffies > 2^31 + MFC_ASSERT_THRESH. Signed-off-by: NTom Goff <thomas.goff@ll.mit.edu> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Huw Davies 提交于
If set, these will take precedence over the parent's options during both sending and child creation. If they're not set, the parent's options (if any) will be used. This is to allow the security_inet_conn_request() hook to modify the IPv6 options in just the same way that it already may do for IPv4. Signed-off-by: NHuw Davies <huw@codeweavers.com> Signed-off-by: NPaul Moore <paul@paul-moore.com>
-
由 Huw Davies 提交于
This is to allow the CALIPSO labelling engine to use these. Signed-off-by: NHuw Davies <huw@codeweavers.com> Signed-off-by: NPaul Moore <paul@paul-moore.com>
-
- 24 6月, 2016 1 次提交
-
-
由 Liping Zhang 提交于
In iptables, if the user add a rule to send tcp RST and specify the non-TCP protocol, such as UDP, kernel will reject this request. But in nftables, this validity check only occurs in nft tool, i.e. only in userspace. This means that user can add such a rule like follows via nfnetlink: "nft add rule filter forward ip protocol udp reject with tcp reset" This will generate some confusing tcp RST packets. So we should send tcp RST only when it is TCP packet. Signed-off-by: NLiping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 23 6月, 2016 1 次提交
-
-
由 Herbert Xu 提交于
Blair Steven noticed that ESN in conjunction with UDP encapsulation is broken because we set the temporary ESP header to the wrong spot. This patch fixes this by first of all using the right spot, i.e., 4 bytes off the real ESP header, and then saving this information so that after encryption we can restore it properly. Fixes: 7021b2e1 ("esp4: Switch to new AEAD interface") Reported-by: NBlair Steven <Blair.Steven@alliedtelesis.co.nz> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Acked-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-