- 02 10月, 2013 4 次提交
-
-
由 Steffen Klassert 提交于
When queueing the netdevices for removal, we queue the fallback device twice in ip_tunnel_destroy(). The first time when we queue all netdevices in the namespace and then again explicitly. Fix this by removing the explicit queueing of the fallback device. Bug was introduced when network namespace support was added with commit 6c742e71 ("ipip: add x-netns support"). Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com> Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Steffen Klassert 提交于
Git commit 0e6fbc5b ("ip_tunnels: extend iptunnel_xmit()") moved the IP header installation to iptunnel_xmit() and changed skb_push() to __skb_push(). This makes possible bugs hard to track down, so change it back to skb_push(). Cc: Pravin Shelar <pshelar@nicira.com> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Steffen Klassert 提交于
Currently we can not update the tunnel parameters of the fallback tunnels because we don't find them in the hash lists. Fix this by adding them on initialization. Bug was introduced with commit c5441932 ("GRE: Refactor GRE tunneling code.") Cc: Pravin Shelar <pshelar@nicira.com> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Steffen Klassert 提交于
We might extend the used aera of a skb beyond the total headroom when we install the ipip header. Fix this by calling skb_cow_head() unconditionally. Bug was introduced with commit c5441932 ("GRE: Refactor GRE tunneling code.") Cc: Pravin Shelar <pshelar@nicira.com> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 10月, 2013 4 次提交
-
-
由 Salam Noureddine 提交于
It is possible for the timer handlers to run after the call to ip_mc_down so use in_dev_put instead of __in_dev_put in the handler function in order to do proper cleanup when the refcnt reaches 0. Otherwise, the refcnt can reach zero without the in_device being destroyed and we end up leaking a reference to the net_device and see messages like the following, unregister_netdevice: waiting for eth0 to become free. Usage count = 1 Tested on linux-3.4.43. Signed-off-by: NSalam Noureddine <noureddine@aristanetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
- Move sysctl_local_ports from a global variable into struct netns_ipv4. - Modify inet_get_local_port_range to take a struct net, and update all of the callers. - Move the initialization of sysctl_local_ports into sysctl_net_ipv4.c:ipv4_sysctl_init_net from inet_connection_sock.c v2: - Ensure indentation used tabs - Fixed ip.h so it applies cleanly to todays net-next v3: - Compile fixes of strange callers of inet_get_local_port_range. This patch now successfully passes an allmodconfig build. Removed manual inlining of inet_get_local_port_range in ipv4_local_port_range Originally-by: NSamya <samya@twitter.com> Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
When TCP Small Queues was added, we used a sysctl to limit amount of packets queues on Qdisc/device queues for a given TCP flow. Problem is this limit is either too big for low rates, or too small for high rates. Now TCP stack has rate estimation in sk->sk_pacing_rate, and TSO auto sizing, it can better control number of packets in Qdisc/device queues. New limit is two packets or at least 1 to 2 ms worth of packets. Low rates flows benefit from this patch by having even smaller number of packets in queues, allowing for faster recovery, better RTT estimations. High rates flows benefit from this patch by allowing more than 2 packets in flight as we had reports this was a limiting factor to reach line rate. [ In particular if TX completion is delayed because of coalescing parameters ] Example for a single flow on 10Gbp link controlled by FQ/pacing 14 packets in flight instead of 2 $ tc -s -d qd qdisc fq 8001: dev eth0 root refcnt 32 limit 10000p flow_limit 100p buckets 1024 quantum 3028 initial_quantum 15140 Sent 1168459366606 bytes 771822841 pkt (dropped 0, overlimits 0 requeues 6822476) rate 9346Mbit 771713pps backlog 953820b 14p requeues 6822476 2047 flow, 2046 inactive, 1 throttled, delay 15673 ns 2372 gc, 0 highprio, 0 retrans, 9739249 throttled, 0 flows_plimit Note that sk_pacing_rate is currently set to twice the actual rate, but this might be refined in the future when a flow is in congestion avoidance. Additional change : skb->destructor should be set to tcp_wfree(). A future patch (for linux 3.13+) might remove tcp_limit_output_bytes Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pravin B Shelar 提交于
While sending packet skb_cow_head() can change skb header which invalidates inner_iph pointer to skb header. Following patch avoid using it. Found by code inspection. This bug was introduced by commit 0e6fbc5b (ip_tunnels: extend iptunnel_xmit()). Signed-off-by: NPravin B Shelar <pshelar@nicira.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 9月, 2013 1 次提交
-
-
由 Patrick McHardy 提交于
TCP packets hitting the SYN proxy through the SYNPROXY target are not validated by TCP conntrack. When th->doff is below 5, an underflow happens when calculating the options length, causing skb_header_pointer() to return NULL and triggering the BUG_ON(). Handle this case gracefully by checking for NULL instead of using BUG_ON(). Reported-by: NMartin Topholm <mph@one.com> Tested-by: NMartin Topholm <mph@one.com> Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 29 9月, 2013 4 次提交
-
-
由 Eric Dumazet 提交于
As mentioned in commit afe4fd06 ("pkt_sched: fq: Fair Queue packet scheduler"), this patch adds a new socket option. SO_MAX_PACING_RATE offers the application the ability to cap the rate computed by transport layer. Value is in bytes per second. u32 val = 1000000; setsockopt(sockfd, SOL_SOCKET, SO_MAX_PACING_RATE, &val, sizeof(val)); To be effectively paced, a flow must use FQ packet scheduler. Note that a packet scheduler takes into account the headers for its computations. The effective payload rate depends on MSS and retransmits if any. I chose to make this pacing rate a SOL_SOCKET option instead of a TCP one because this can be used by other protocols. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Francesco Fusco 提交于
If IP_TOS or IP_TTL are specified as ancillary data, then sendmsg() sends out packets with the specified TTL or TOS overriding the socket values specified with the traditional setsockopt(). The struct inet_cork stores the values of TOS, TTL and priority that are passed through the struct ipcm_cookie. If there are user-specified TOS (tos != -1) or TTL (ttl != 0) in the struct ipcm_cookie, these values are used to override the per-socket values. In case of TOS also the priority is changed accordingly. Two helper functions get_rttos and get_rtconn_flags are defined to take into account the presence of a user specified TOS value when computing RT_TOS and RT_CONN_FLAGS. Signed-off-by: NFrancesco Fusco <ffusco@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Francesco Fusco 提交于
This patch enables the IP_TTL and IP_TOS values passed from userspace to be stored in the ipcm_cookie struct. Three fields are added to the struct: - the TTL, expressed as __u8. The allowed values are in the [1-255]. A value of 0 means that the TTL is not specified. - the TOS, expressed as __s16. The allowed values are in the range [0,255]. A value of -1 means that the TOS is not specified. - the priority, expressed as a char and computed when handling the ancillary data. Signed-off-by: NFrancesco Fusco <ffusco@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
A host might need net_secret[] and never open a single socket. Problem added in commit aebda156 ("net: defer net_secret[] initialization") Based on prior patch from Hannes Frederic Sowa. Reported-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NHannes Frederic Sowa <hannes@strressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 9月, 2013 5 次提交
-
-
由 Eric Dumazet 提交于
Dynamic Right Sizing (DRS) is supposed to open TCP receive window automatically, but suffers from two bugs, presented by order of importance. 1) tcp_rcv_space_adjust() fix : Using twice the last received amount is very pessimistic, because it doesn't allow fast recovery or proper slow start ramp up, if sender wants to increase cwin by 100% every RTT. copied = bytes received in previous RTT 2*copied = bytes we expect to receive in next RTT 4*copied = bytes we need to advertise in rwin at end of next RTT DRS is one RTT late, it needs a 4x factor. If sender is not using ABC, and increases cwin by 50% every rtt, then we needed 1.5*1.5 = 2.25 factor. This is probably why this bug was not really noticed. 2) There is no window adjustment after first RTT. DRS triggers only after the second RTT. DRS needs two RTT to initialize, so tcp_fixup_rcvbuf() should setup sk_rcvbuf to allow proper window grow for first two RTT. This patch increases TCP efficiency particularly for large RTT flows when autotuning is used at the receiver, and more particularly in presence of packet losses. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Cc: Van Jacobson <vanj@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
Halve mss table size to make blind cookie guessing more difficult. This is sad since the tables were already small, but there is little alternative except perhaps adding more precise mss information in the tcp timestamp. Timestamps are unfortunately not ubiquitous. Guessing all possible cookie values still has 8-in 2**32 chance. Reported-by: NJakob Lell <jakob@jakoblell.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
We currently accept cookies that were created less than 4 minutes ago (ie, cookies with counter delta 0-3). Combined with the 8 mss table values, this yields 32 possible values (out of 2**32) that will be valid. Reducing the lifetime to < 2 minutes halves the guessing chance while still providing a large enough period. While at it, get rid of jiffies value -- they overflow too quickly on 32 bit platforms. getnstimeofday is used to create a counter that increments every 64s. perf shows getnstimeofday cost is negible compared to sha_transform; normal tcp initial sequence number generation uses getnstimeofday, too. Reported-by: NJakob Lell <jakob@jakoblell.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Duan Jiong 提交于
Redirect isn't an error condition, it should leave the error handler without touching the socket. Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Duan Jiong 提交于
Redirect isn't an error condition, it should leave the error handler without touching the socket. Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 9月, 2013 2 次提交
-
-
由 Ansis Atteka 提交于
If local fragmentation is allowed, then ip_select_ident() and ip_select_ident_more() need to generate unique IDs to ensure correct defragmentation on the peer. For example, if IPsec (tunnel mode) has to encrypt large skbs that have local_df bit set, then all IP fragments that belonged to different ESP datagrams would have used the same identificator. If one of these IP fragments would get lost or reordered, then peer could possibly stitch together wrong IP fragments that did not belong to the same datagram. This would lead to a packet loss or data corruption. Signed-off-by: NAnsis Atteka <aatteka@nicira.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ansis Atteka 提交于
skb->data already points to IP header, but for the sake of consistency we can also use ip_hdr() to retrieve it. Signed-off-by: NAnsis Atteka <aatteka@nicira.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 9月, 2013 1 次提交
-
-
由 Neal Cardwell 提交于
Commit 1b7fdd2a ("tcp: do not use cached RTT for RTT estimation") did not correctly account for the fact that crtt is the RTT shifted left 3 bits. Fix the calculation to consistently reflect this fact. Signed-off-by: NNeal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Acked-By: NYuchung Cheng <ycheng@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 9月, 2013 1 次提交
-
-
由 Sha Zhengju 提交于
RESOURCE_MAX is far too general name, change it to RES_COUNTER_MAX. Signed-off-by: NSha Zhengju <handai.szj@taobao.com> Signed-off-by: NQiang Huang <h.huangqiang@huawei.com> Acked-by: NMichal Hocko <mhocko@suse.cz> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Jeff Liu <jeff.liu@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 9月, 2013 2 次提交
-
-
由 Eric Dumazet 提交于
TCP receive window handling is multi staged. A socket has a memory budget, static or dynamic, in sk_rcvbuf. Because we do not really know how this memory budget translates to a TCP window (payload), TCP announces a small initial window (about 20 MSS). When a packet is received, we increase TCP rcv_win depending on the payload/truesize ratio of this packet. Good citizen packets give a hint that it's reasonable to have rcv_win = sk_rcvbuf/2 This heuristic takes place in tcp_grow_window() Problem is : We currently call tcp_grow_window() only for in-order packets. This means that reorders or packet losses stop proper grow of rcv_win, and senders are unable to benefit from fast recovery, or proper reordering level detection. Really, a packet being stored in OFO queue is not a bad citizen. It should be part of the game as in-order packets. In our traces, we very often see sender is limited by linux small receive windows, even if linux hosts use autotuning (DRS) and should allow rcv_win to grow to ~3MB. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuchung Cheng 提交于
In commit 0f7cc9a3 "tcp: increase throughput when reordering is high", it only allows cwnd to increase in Open state. This mistakenly disables slow start after timeout (CA_Loss). Moreover cwnd won't grow if the state moves from Disorder to Open later in tcp_fastretrans_alert(). Therefore the correct logic should be to allow cwnd to grow as long as the data is received in order in Open, Loss, or even Disorder state. Signed-off-by: NYuchung Cheng <ycheng@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 9月, 2013 1 次提交
-
-
由 Dave Jones 提交于
Signed-off-by: NDave Jones <davej@fedoraproject.org> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 9月, 2013 1 次提交
-
-
由 Yuchung Cheng 提交于
Commit 1b7fdd2a("tcp: do not use cached RTT for RTT estimation") removes important comments on how RTO is initialized and updated. Hopefully this patch puts those information back. Signed-off-by: NYuchung Cheng <ycheng@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 9月, 2013 9 次提交
-
-
由 Jesper Dangaard Brouer 提交于
Packets reaching SYNPROXY were default dropped, as they were most likely invalid (given the recommended state matching). This patch, changes SYNPROXY target to let packets, not consumed, continue being processed by the stack. This will be more in line other target modules. As it will allow more flexible configurations of handling, logging or matching on packets in INVALID states. Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Acked-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Jesper Dangaard Brouer 提交于
Its seems Patrick missed to incoorporate some of my requested changes during review v2 of SYNPROXY netfilter module. Which were, to avoid SYN+ACK packets to enter the path, meant for the ACK packet from the client (from the 3WHS). Further there were a bug in ip6t_SYNPROXY.c, for matching SYN packets that didn't exclude the ACK flag. Go a step further with SYN packet/flag matching by excluding flags ACK+FIN+RST, in both IPv4 and IPv6 modules. The intented usage of SYNPROXY is as follows: (gracefully describing usage in commit) iptables -t raw -A PREROUTING -i eth0 -p tcp --dport 80 --syn -j NOTRACK iptables -A INPUT -i eth0 -p tcp --dport 80 -m state UNTRACKED,INVALID \ -j SYNPROXY --sack-perm --timestamp --mss 1480 --wscale 7 --ecn echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose This does filter SYN flags early, for packets in the UNTRACKED state, but packets in the INVALID state with other TCP flags could still reach the module, thus this stricter flag matching is still needed. Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Acked-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Vijay Subramanian 提交于
tcp_rcv_established() returns only one value namely 0. We change the return value to void (as suggested by David Miller). After commit 0c24604b (tcp: implement RFC 5961 4.2), we no longer send RSTs in response to SYNs. We can remove the check and processing on the return value of tcp_rcv_established(). We also fix jtcp_rcv_established() in tcp_probe.c to match that of tcp_rcv_established(). Signed-off-by: NVijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
With recent changes in tcp_probe module (e.g. f925d0a6 ("net: tcp_probe: add IPv6 support")) we also need to take into account that tbuf needs to be updated as format string will be further expanded. tbuf sits on the stack in tcpprobe_read() function that is invoked when user space reads procfs file /proc/net/tcpprobe, hence not fast path as in jtcp_rcv_established(). Having a size similarly as in sctp_probe module of 256 bytes is fully sufficient for that, we need theoretical maximum of 252 bytes otherwise we could get truncated. Signed-off-by: NDaniel Borkmann <dborkman@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nicolas Dichtel 提交于
The goal of this patch is to harmonize cleanup done on a skbuff on rx path. Before this patch, behaviors were different depending of the tunnel type. Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nicolas Dichtel 提交于
The goal of this patch is to harmonize cleanup done on a skbuff on xmit path. Before this patch, behaviors were different depending of the tunnel type. Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nicolas Dichtel 提交于
This function was only used when a packet was sent to another netns. Now, it can also be used after tunnel encapsulation or decapsulation. Only skb_orphan() should not be done when a packet is not crossing netns. Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nicolas Dichtel 提交于
This argument is not used, let's remove it. Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tim Gardner 提交于
This config option is superfluous in that it only guards a call to neigh_app_ns(). Enabling CONFIG_ARPD by default has no change in behavior. There will now be call to __neigh_notify() for each ARP resolution, which has no impact unless there is a user space daemon waiting to receive the notification, i.e., the case for which CONFIG_ARPD was designed anyways. Suggested-by: NEric W. Biederman <ebiederm@xmission.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Gao feng <gaofeng@cn.fujitsu.com> Cc: Joe Perches <joe@perches.com> Cc: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: NTim Gardner <tim.gardner@canonical.com> Reviewed-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 9月, 2013 1 次提交
-
-
由 Cong Wang 提交于
Fengguang reported: net/built-in.o: In function `in6_dev_finish_destroy': (.text+0x4ca7d): undefined reference to `snmp_mib_free' this is due to snmp_mib_free() is defined when CONFIG_INET is enabled, but in6_dev_finish_destroy() is now moved to core kernel. I think snmp_mib_free() is small enough to be inlined, so just make it static inline. Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Signed-off-by: NCong Wang <amwang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 9月, 2013 1 次提交
-
-
由 Cong Wang 提交于
As suggested by Pravin, we can unify the code in case of duplicated code. Cc: Pravin Shelar <pshelar@nicira.com> Signed-off-by: NCong Wang <amwang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 8月, 2013 3 次提交
-
-
由 Li Hongjun 提交于
Since commit 3d7b46cd (ip_tunnel: push generic protocol handling to ip_tunnel module.), an Oops is triggered when an xfrm policy is configured on an IPv4 over IPv4 tunnel. xfrm4_policy_check() calls __xfrm_policy_check2(), which uses skb_dst(skb). But this field is NULL because iptunnel_pull_header() calls skb_dst_drop(skb). Signed-off-by: NLi Hongjun <hongjun.li@6wind.com> Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Phil Oester 提交于
In commit 90ba9b19 (tcp: tcp_make_synack() can use alloc_skb()), Eric changed the call to sock_wmalloc in tcp_make_synack to alloc_skb. In doing so, the netfilter owner match lost its ability to block the SYNACK packet on outbound listening sockets. Revert the change, restoring the owner match functionality. This closes netfilter bugzilla #847. Signed-off-by: NPhil Oester <kernel@linuxace.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuchung Cheng 提交于
RTT cached in the TCP metrics are valuable for the initial timeout because SYN RTT usually does not account for serialization delays on low BW path. However using it to seed the RTT estimator maybe disruptive because other components (e.g., pacing) require the smooth RTT to be obtained from actual connection. The solution is to use the higher cached RTT to set the first RTO conservatively like tcp_rtt_estimator(), but avoid seeding the other RTT estimator variables such as srtt. It is also a good idea to keep RTO conservative to obtain the first RTT sample, and the performance is insured by TCP loss probe if SYN RTT is available. To keep the seeding formula consistent across SYN RTT and cached RTT, the rttvar is twice the cached RTT instead of cached RTTVAR value. The reason is because cached variation may be too small (near min RTO) which defeats the purpose of being conservative on first RTO. However the metrics still keep the RTT variations as they might be useful for user applications (through ip). Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Tested-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-