- 25 2月, 2014 9 次提交
-
-
由 Steffen Klassert 提交于
We need to be protocol family indepenent to support inter addresss family tunneling with vti. So use a dst_entry instead of the ipv4 rtable in vti_tunnel_xmit. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Steffen Klassert 提交于
This was used from vti and is replaced by the IPsec protocol multiplexer hooks. It is now unused, so remove it. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Steffen Klassert 提交于
With this patch, vti uses the IPsec protocol multiplexer to register it's own receive side hooks for ESP, AH and IPCOMP. Vti now does the following on receive side: 1. Do an input policy check for the IPsec packet we received. This is required because this packet could be already prosecces by IPsec, so an inbuond policy check is needed. 2. Mark the packet with the i_key. The policy and the state must match this key now. Policy and state belong to the outer namespace and policy enforcement is done at the further layers. 3. Call the generic xfrm layer to do decryption and decapsulation. 4. Wait for a callback from the xfrm layer to properly clean the skb to not leak informations on namespace and to update the device statistics. On transmit side: 1. Mark the packet with the o_key. The policy and the state must match this key now. 2. Do a xfrm_lookup on the original packet with the mark applied. 3. Check if we got an IPsec route. 4. Clean the skb to not leak informations on namespace transitions. 5. Attach the dst_enty we got from the xfrm_lookup to the skb. 6. Call dst_output to do the IPsec processing. 7. Do the device statistics. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Steffen Klassert 提交于
Vti uses the o_key to mark packets that were transmitted or received by a vti interface. Unfortunately we can't apply different marks to in and outbound packets with only one key availabe. Vti interfaces typically use wildcard selectors for vti IPsec policies. On forwarding, the same output policy will match for both directions. This generates a loop between the IPsec gateways until the ttl of the packet is exceeded. The gre i_key/o_key are usually there to find the right gre tunnel during a lookup. When vti uses the i_key to mark packets, the tunnel lookup does not work any more because vti does not use the gre keys as a hash key for the lookup. This patch workarounds this my not including the i_key when comupting the hash for the tunnel lookup in case of vti tunnels. With this we have separate keys available for the transmitting and receiving side of the vti interface. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Steffen Klassert 提交于
IPsec vti_rcv needs to remind the tunnel pointer to check it later at the vti_rcv_cb callback. So add this pointer to the IPsec common buffer, initialize it and check it to avoid transport state matching of a tunneled packet. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Steffen Klassert 提交于
Switch ipcomp4 to use the new IPsec protocol multiplexer. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Steffen Klassert 提交于
Switch ah4 to use the new IPsec protocol multiplexer. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Steffen Klassert 提交于
Switch esp4 to use the new IPsec protocol multiplexer. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Steffen Klassert 提交于
This patch add an IPsec protocol multiplexer. With this it is possible to add alternative protocol handlers as needed for IPsec virtual tunnel interfaces. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
- 20 2月, 2014 2 次提交
-
-
由 Florian Westphal 提交于
Currently the kernel tries to announce a zero window when free_space is below the current receiver mss estimate. When a sender is transmitting small packets and reader consumes data slowly (or not at all), receiver might be unable to shrink the receive win because a) we cannot withdraw already-commited receive window, and, b) we have to round the current rwin up to a multiple of the wscale factor, else we would shrink the current window. This causes the receive buffer to fill up until the rmem limit is hit. When this happens, we start dropping packets. Moreover, tcp_clamp_window may continue to grow sk_rcvbuf towards rmem[2] even if socket is not being read from. As we cannot avoid the "current_win is rounded up to multiple of mss" issue [we would violate a) above] at least try to prevent the receive buf growth towards tcp_rmem[2] limit by attempting to move to zero-window announcement when free_space becomes less than 1/16 of the current allowed receive buffer maximum. If tcp_rmem[2] is large, this will increase our chances to get a zero-window announcement out in time. Reproducer: On server: $ nc -l -p 12345 <suspend it: CTRL-Z> Client: #!/usr/bin/env python import socket import time sock = socket.socket() sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1) sock.connect(("192.168.4.1", 12345)); while True: sock.send('A' * 23) time.sleep(0.005) socket buffer on server-side will grow until tcp_rmem[2] is hit, at which point the client rexmits data until -EDTIMEOUT: tcp_data_queue invokes tcp_try_rmem_schedule which will call tcp_prune_queue which calls tcp_clamp_window(). And that function will grow sk->sk_rcvbuf up until it eventually hits tcp_rmem[2]. Thanks to Eric Dumazet for running regression tests. Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Tested-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hannes Frederic Sowa 提交于
In case we decide in udp6_sendmsg to send the packet down the ipv4 udp_sendmsg path because the destination is either of family AF_INET or the destination is an ipv4 mapped ipv6 address, we don't honor the maybe specified ipv4 mapped ipv6 address in IPV6_PKTINFO. We simply can check for this option in ip_cmsg_send because no calls to ipv6 module functions are needed to do so. Reported-by: NGert Doering <gert@space.net> Cc: Tore Anderson <tore@fud.no> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 2月, 2014 1 次提交
-
-
由 Duan Jiong 提交于
since commit 89aef892("ipv4: Delete routing cache."), the counter in_slow_tot can't work correctly. The counter in_slow_tot increase by one when fib_lookup() return successfully in ip_route_input_slow(), but actually the dst struct maybe not be created and cached, so we can increase in_slow_tot after the dst struct is created. Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 2月, 2014 2 次提交
-
-
由 Florian Westphal 提交于
Currently this always returns ENOBUFS, because the return value of __ip_tunnel_create is discarded. A more common failure is a duplicate name (EEXIST). Propagate the real error code so userspace can display a more meaningful error message. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Duan Jiong 提交于
since commit 251da413("ipv4: Cache ip_error() routes even when not forwarding."), the counter IPSTATS_MIB_INADDRERRORS can't work correctly, because the value of err was always set to ENETUNREACH. Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 2月, 2014 2 次提交
-
-
由 Eric Dumazet 提交于
Add two new fields to struct tcp_info, to report sk_pacing_rate and sk_max_pacing_rate to monitoring applications, as ss from iproute2. User exported fields are 64bit, even if kernel is currently using 32bit fields. lpaa5:~# ss -i .. skmem:(r0,rb357120,t0,tb2097152,f1584,w1980880,o0,bl0) ts sack cubic wscale:6,6 rto:400 rtt:0.875/0.75 mss:1448 cwnd:1 ssthresh:12 send 13.2Mbps pacing_rate 3336.2Mbps unacked:15 retrans:1/5448 lost:15 rcv_space:29200 Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 WANG Cong 提交于
There are many drivers calling alloc_percpu() to allocate pcpu stats and then initializing ->syncp. So just introduce a helper function for them. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 2月, 2014 5 次提交
-
-
由 FX Le Bail 提交于
Even if the 'time_before' macro expand with parentheses, the look is bad. Signed-off-by: NFrancois-Xavier Le Bail <fx.lebail@yahoo.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Denis Kirjanov 提交于
Packets which have L2 address different from ours should be already filtered before entering into ip_forward(). Perform that check at the beginning to avoid processing such packets. Signed-off-by: NDenis Kirjanov <kda@linux-powerpc.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 stephen hemminger 提交于
One of my pet coding style peeves is the practice of adding extra return; at the end of function. Kill several instances of this in network code. I suppose some coccinelle wizardy could do this automatically. Signed-off-by: NStephen Hemminger <stephen@networkplumber.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stanislav Fomichev 提交于
Commit 684bad11 "tcp: use PRR to reduce cwin in CWR state" removed all calls to min_cwnd, so we can safely remove it. Also, remove tcp_reno_min_cwnd because it was only used for min_cwnd. Signed-off-by: NStanislav Fomichev <stfomichev@yandex-team.ru> Acked-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
Marcelo Ricardo Leitner reported problems when the forwarding link path has a lower mtu than the incoming one if the inbound interface supports GRO. Given: Host <mtu1500> R1 <mtu1200> R2 Host sends tcp stream which is routed via R1 and R2. R1 performs GRO. In this case, the kernel will fail to send ICMP fragmentation needed messages (or pkt too big for ipv6), as GSO packets currently bypass dstmtu checks in forward path. Instead, Linux tries to send out packets exceeding the mtu. When locking route MTU on Host (i.e., no ipv4 DF bit set), R1 does not fragment the packets when forwarding, and again tries to send out packets exceeding R1-R2 link mtu. This alters the forwarding dstmtu checks to take the individual gso segment lengths into account. For ipv6, we send out pkt too big error for gso if the individual segments are too big. For ipv4, we either send icmp fragmentation needed, or, if the DF bit is not set, perform software segmentation and let the output path create fragments when the packet is leaving the machine. It is not 100% correct as the error message will contain the headers of the GRO skb instead of the original/segmented one, but it seems to work fine in my (limited) tests. Eric Dumazet suggested to simply shrink mss via ->gso_size to avoid sofware segmentation. However it turns out that skb_segment() assumes skb nr_frags is related to mss size so we would BUG there. I don't want to mess with it considering Herbert and Eric disagree on what the correct behavior should be. Hannes Frederic Sowa notes that when we would shrink gso_size skb_segment would then also need to deal with the case where SKB_MAX_FRAGS would be exceeded. This uses sofware segmentation in the forward path when we hit ipv4 non-DF packets and the outgoing link mtu is too small. Its not perfect, but given the lack of bug reports wrt. GRO fwd being broken this is a rare case anyway. Also its not like this could not be improved later once the dust settles. Acked-by: NHerbert Xu <herbert@gondor.apana.org.au> Reported-by: NMarcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 2月, 2014 2 次提交
-
-
由 Fan Du 提交于
This patch add esn support for AH input stage by attaching upper 32bits sequence number right after packet payload as specified by RFC 4302. Then the ICV value will guard upper 32bits sequence number as well when packet getting in. Signed-off-by: NFan Du <fan.du@windriver.com> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Fan Du 提交于
This patch add esn support for AH output stage by attaching upper 32bits sequence number right after packet payload as specified by RFC 4302. Then the ICV value will guard upper 32bits sequence number as well when packet going out. Signed-off-by: NFan Du <fan.du@windriver.com> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
- 11 2月, 2014 1 次提交
-
-
由 John Ogness 提交于
Commit 46d3ceab ("tcp: TCP Small Queues") introduced a possible regression for applications using TCP_NODELAY. If TCP session is throttled because of tsq, we should consult tp->nonagle when TX completion is done and allow us to send additional segment, especially if this segment is not a full MSS. Otherwise this segment is sent after an RTO. [edumazet] : Cooked the changelog, added another fix about testing sk_wmem_alloc twice because TX completion can happen right before setting TSQ_THROTTLED bit. This problem is particularly visible with recent auto corking, but might also be triggered with low tcp_limit_output_bytes values or NIC drivers delaying TX completion by hundred of usec, and very low rtt. Thomas Glanzmann for example reported an iscsi regression, caused by tcp auto corking making this bug quite visible. Fixes: 46d3ceab ("tcp: TCP Small Queues") Signed-off-by: NJohn Ogness <john.ogness@linutronix.de> Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NThomas Glanzmann <thomas@glanzmann.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 2月, 2014 1 次提交
-
-
由 Jesper Juhl 提交于
As far as I can tell we have used a default of 60 seconds for FIN_WAIT2 timeout for ages (since 2.x times??). In any case, the timeout these days is 60 seconds, so the 3 min comment is wrong (and cost me a few minutes of my life when I was debugging a FIN_WAIT2 related problem in a userspace application and checked the kernel source for details). Signed-off-by: NJesper Juhl <jj@chaosbits.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 2月, 2014 2 次提交
-
-
由 Eric Dumazet 提交于
TCP pacing depends on an accurate srtt estimation. Current srtt estimation is using jiffie resolution, and has an artificial offset of at least 1 ms, which can produce slowdowns when FQ/pacing is used, especially in DC world, where typical rtt is below 1 ms. We are planning a switch to usec resolution for linux-3.15, but in the meantime, this patch removes the 1 ms offset. All we need is to have tp->srtt minimal value of 1 to differentiate the case of srtt being initialized or not, not 8. The problematic behavior was observed on a 40Gbit testbed, where 32 concurrent netperf were reaching 12Gbps of aggregate speed, instead of line speed. This patch also has the effect of reporting more accurate srtt and send rates to iproute2 ss command as in : $ ss -i dst cca2 Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port tcp ESTAB 0 0 10.244.129.1:56984 10.244.129.2:12865 cubic wscale:6,6 rto:200 rtt:0.25/0.25 ato:40 mss:1448 cwnd:10 send 463.4Mbps rcv_rtt:1 rcv_space:29200 tcp ESTAB 0 390960 10.244.129.1:60247 10.244.129.2:50204 cubic wscale:6,6 rto:200 rtt:0.875/0.75 mss:1448 cwnd:73 ssthresh:51 send 966.4Mbps unacked:73 retrans:0/121 rcv_space:29200 Reported-by: NVytautas Valancius <valas@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Geert Uytterhoeven 提交于
On m68k/ARAnyM: WARNING: CPU: 0 PID: 407 at net/ipv4/devinet.c:1599 0x316a99() Modules linked in: CPU: 0 PID: 407 Comm: ifconfig Not tainted 3.13.0-atari-09263-g0c71d68014d1 #1378 Stack from 10c4fdf0: 10c4fdf0 002ffabb 000243e8 00000000 008ced6c 00024416 00316a99 0000063f 00316a99 00000009 00000000 002501b4 00316a99 0000063f c0a86117 00000080 c0a86117 00ad0c90 00250a5a 00000014 00ad0c90 00000000 00000000 00000001 00b02dd0 00356594 00000000 00356594 c0a86117 eff6c9e4 008ced6c 00000002 008ced60 0024f9b4 00250b52 00ad0c90 00000000 00000000 00252390 00ad0c90 eff6c9e4 0000004f 00000000 00000000 eff6c9e4 8000e25c eff6c9e4 80001020 Call Trace: [<000243e8>] warn_slowpath_common+0x52/0x6c [<00024416>] warn_slowpath_null+0x14/0x1a [<002501b4>] rtmsg_ifa+0xdc/0xf0 [<00250a5a>] __inet_insert_ifa+0xd6/0x1c2 [<0024f9b4>] inet_abc_len+0x0/0x42 [<00250b52>] inet_insert_ifa+0xc/0x12 [<00252390>] devinet_ioctl+0x2ae/0x5d6 Adding some debugging code reveals that net_fill_ifaddr() fails in put_cacheinfo(skb, ifa->ifa_cstamp, ifa->ifa_tstamp, preferred, valid)) nla_put complains: lib/nlattr.c:454: skb_tailroom(skb) = 12, nla_total_size(attrlen) = 20 Apparently commit 5c766d64 ("ipv4: introduce address lifetime") forgot to take into account the addition of struct ifa_cacheinfo in inet_nlmsg_size(). Hence add it, like is already done for ipv6. Suggested-by: NCong Wang <cwang@twopensource.com> Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: NCong Wang <cwang@twopensource.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 2月, 2014 3 次提交
-
-
由 Patrick McHardy 提交于
Add a reject module for NFPROTO_INET. It does nothing but dispatch to the AF-specific modules based on the hook family. Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Patrick McHardy 提交于
Currently the nft_reject module depends on symbols from ipv6. This is wrong since no generic module should force IPv6 support to be loaded. Split up the module into AF-specific and a generic part. Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Alexey Dobriyan 提交于
Similar bug fixed in SIP module in 3f509c68 ("netfilter: nf_nat_sip: fix incorrect handling of EBUSY for RTCP expectation"). BUG: unable to handle kernel paging request at 00100104 IP: [<f8214f07>] nf_ct_unlink_expect_report+0x57/0xf0 [nf_conntrack] ... Call Trace: [<c0244bd8>] ? del_timer+0x48/0x70 [<f8215687>] nf_ct_remove_expectations+0x47/0x60 [nf_conntrack] [<f8211c99>] nf_ct_delete_from_lists+0x59/0x90 [nf_conntrack] [<f8212e5e>] death_by_timeout+0x14e/0x1c0 [nf_conntrack] [<f8212d10>] ? nf_conntrack_set_hashsize+0x190/0x190 [nf_conntrack] [<c024442d>] call_timer_fn+0x1d/0x80 [<c024461e>] run_timer_softirq+0x18e/0x1a0 [<f8212d10>] ? nf_conntrack_set_hashsize+0x190/0x190 [nf_conntrack] [<c023e6f3>] __do_softirq+0xa3/0x170 [<c023e650>] ? __local_bh_enable+0x70/0x70 <IRQ> [<c023e587>] ? irq_exit+0x67/0xa0 [<c0202af6>] ? do_IRQ+0x46/0xb0 [<c027ad05>] ? clockevents_notify+0x35/0x110 [<c066ac6c>] ? common_interrupt+0x2c/0x40 [<c056e3c1>] ? cpuidle_enter_state+0x41/0xf0 [<c056e6fb>] ? cpuidle_idle_call+0x8b/0x100 [<c02085f8>] ? arch_cpu_idle+0x8/0x30 [<c027314b>] ? cpu_idle_loop+0x4b/0x140 [<c0273258>] ? cpu_startup_entry+0x18/0x20 [<c066056d>] ? rest_init+0x5d/0x70 [<c0813ac8>] ? start_kernel+0x2ec/0x2f2 [<c081364f>] ? repair_env_string+0x5b/0x5b [<c0813269>] ? i386_start_kernel+0x33/0x35 Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 05 2月, 2014 1 次提交
-
-
由 Shlomo Pongratz 提交于
RCU writer side should use rcu_dereference_protected() and not rcu_dereference(), fix that. This also removes the "suspicious RCU usage" warning seen when running with CONFIG_PROVE_RCU. Also, don't use rcu_assign_pointer/rcu_dereference for pointers which are invisible beyond the udp offload code. Fixes: b582ef09 ('net: Add GRO support for UDP encapsulating protocols') Reported-by: NEric Dumazet <edumazet@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NShlomo Pongratz <shlomop@mellanox.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 2月, 2014 1 次提交
-
-
由 Eric Dumazet 提交于
Setting rt variable to NULL at the beginning of ip_tunnel_xmit() missed possible use of this variable as a scratch value. Also fixes a possible dst leak in tunnel_dst_check() : If we had to call tunnel_dst_reset(), we forgot to release the reference on dst. Merges tunnel_dst_get()/tunnel_dst_check() into a single tunnel_rtable_get() function for clarity. Many thanks to Tommi for his report and tests. Fixes: 7d442fab ("ipv4: Cache dst in tunnels") Reported-by: NTommi Rantala <tt.rantala@gmail.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Tested-by: NTommi Rantala <tt.rantala@gmail.com> Cc: Tom Herbert <therbert@google.com> Cc: Maciej Żenczykowski <maze@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 1月, 2014 1 次提交
-
-
由 Or Gerlitz 提交于
Since udp_add_offload() can be called from non-sleepable context e.g under this call tree from the vxlan driver use case: vxlan_socket_create() <-- holds the spinlock -> vxlan_notify_add_rx_port() -> udp_add_offload() <-- schedules we should allocate the udp_offloads structure in atomic manner. Fixes: b582ef09 ('net: Add GRO support for UDP encapsulating protocols') Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 1月, 2014 3 次提交
-
-
由 Duan Jiong 提交于
When dealing with icmp messages, the skb->data points the ip header that triggered the sending of the icmp message. In gre_cisco_err(), the parse_gre_header() is called, and the iptunnel_pull_header() is called to pull the skb at the end of the parse_gre_header(), so the skb->data doesn't point the inner ip header. Unfortunately, the ipgre_err still needs those ip addresses in inner ip header to look up tunnel by ip_tunnel_lookup(). So just use icmp_hdr() to get inner ip header instead of skb->data. Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Holger Eitzenberger 提交于
I see a memory leak when using a transparent HTTP proxy using TPROXY together with TCP early demux and Kernel v3.8.13.15 (Ubuntu stable): unreferenced object 0xffff88008cba4a40 (size 1696): comm "softirq", pid 0, jiffies 4294944115 (age 8907.520s) hex dump (first 32 bytes): 0a e0 20 6a 40 04 1b 37 92 be 32 e2 e8 b4 00 00 .. j@..7..2..... 02 00 07 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff810b710a>] kmem_cache_alloc+0xad/0xb9 [<ffffffff81270185>] sk_prot_alloc+0x29/0xc5 [<ffffffff812702cf>] sk_clone_lock+0x14/0x283 [<ffffffff812aaf3a>] inet_csk_clone_lock+0xf/0x7b [<ffffffff8129a893>] netlink_broadcast+0x14/0x16 [<ffffffff812c1573>] tcp_create_openreq_child+0x1b/0x4c3 [<ffffffff812c033e>] tcp_v4_syn_recv_sock+0x38/0x25d [<ffffffff812c13e4>] tcp_check_req+0x25c/0x3d0 [<ffffffff812bf87a>] tcp_v4_do_rcv+0x287/0x40e [<ffffffff812a08a7>] ip_route_input_noref+0x843/0xa55 [<ffffffff812bfeca>] tcp_v4_rcv+0x4c9/0x725 [<ffffffff812a26f4>] ip_local_deliver_finish+0xe9/0x154 [<ffffffff8127a927>] __netif_receive_skb+0x4b2/0x514 [<ffffffff8127aa77>] process_backlog+0xee/0x1c5 [<ffffffff8127c949>] net_rx_action+0xa7/0x200 [<ffffffff81209d86>] add_interrupt_randomness+0x39/0x157 But there are many more, resulting in the machine going OOM after some days. From looking at the TPROXY code, and with help from Florian, I see that the memory leak is introduced in tcp_v4_early_demux(): void tcp_v4_early_demux(struct sk_buff *skb) { /* ... */ iph = ip_hdr(skb); th = tcp_hdr(skb); if (th->doff < sizeof(struct tcphdr) / 4) return; sk = __inet_lookup_established(dev_net(skb->dev), &tcp_hashinfo, iph->saddr, th->source, iph->daddr, ntohs(th->dest), skb->skb_iif); if (sk) { skb->sk = sk; where the socket is assigned unconditionally to skb->sk, also bumping the refcnt on it. This is problematic, because in our case the skb has already a socket assigned in the TPROXY target. This then results in the leak I see. The very same issue seems to be with IPv6, but haven't tested. Reviewed-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NHolger Eitzenberger <holger@eitzenberger.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sachin Kamat 提交于
PTR_RET is deprecated. Use PTR_ERR_OR_ZERO instead. While at it also include missing err.h header. Signed-off-by: NSachin Kamat <sachin.kamat@linaro.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 1月, 2014 1 次提交
-
-
由 Oliver Hartkopp 提交于
The two commits 0115e8e3 (net: remove delay at device dismantle) and 748e2d93 (net: reinstate rtnl in call_netdevice_notifiers()) silently removed a NULL pointer check for in_dev since Linux 3.7. This patch re-introduces this check as it causes crashing the kernel when setting small mtu values on non-ip capable netdevices. Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 1月, 2014 3 次提交
-
-
由 Alex Elder 提交于
Now that the definition is centralized in <linux/kernel.h>, the definitions of U32_MAX (and related) elsewhere in the kernel can be removed. Signed-off-by: NAlex Elder <elder@linaro.org> Acked-by: NSage Weil <sage@inktank.com> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alex Elder 提交于
The symbol U32_MAX is defined in several spots. Change these definitions to be conditional. This is in preparation for the next patch, which centralizes the definition in <linux/kernel.h>. Signed-off-by: NAlex Elder <elder@linaro.org> Cc: Sage Weil <sage@inktank.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Duan Jiong 提交于
commit a6222602("ip_tunnel: fix kernel panic with icmp_dest_unreach") clear IPCB in ip_tunnel_xmit() , or else skb->cb[] may contain garbage from GSO segmentation layer. But commit 0e6fbc5b("ip_tunnels: extend iptunnel_xmit()") refactor codes, and it clear IPCB behind the dst_link_failure(). So clear IPCB in ip_tunnel_xmit() just like commti a6222602("ip_tunnel: fix kernel panic with icmp_dest_unreach"). Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-