- 06 9月, 2014 1 次提交
-
-
由 Eric Dumazet 提交于
TCP_SKB_CB(skb)->when has different meaning in output and input paths. In output path, it contains a timestamp. In input path, it contains an ISN, chosen by tcp_timewait_state_process() Lets add a different name to ease code comprehension. Note that 'when' field will disappear in following patch, as skb_mstamp already contains timestamp, the anonymous union will promptly disappear as well. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 9月, 2014 1 次提交
-
-
由 stephen hemminger 提交于
Fix places where there is space before tab, long lines, and awkward if(){, double spacing etc. Add blank line after declaration/initialization. Signed-off-by: NStephen Hemminger <stephen@networkplumber.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 8月, 2014 1 次提交
-
-
由 Neal Cardwell 提交于
Make sure we use the correct address-family-specific function for handling MTU reductions from within tcp_release_cb(). Previously AF_INET6 sockets were incorrectly always using the IPv6 code path when sometimes they were handling IPv4 traffic and thus had an IPv4 dst. Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Diagnosed-by: NWillem de Bruijn <willemb@google.com> Fixes: 563d34d0 ("tcp: dont drop MTU reduction indications") Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 8月, 2014 1 次提交
-
-
由 Dmitry Popov 提交于
Since a8afca03 (tcp: md5: protects md5sig_info with RCU) tcp_md5_do_lookup doesn't require socket lock, rcu_read_lock is enough. Therefore socket lock is no longer required for tcp_v{4,6}_inbound_md5_hash too, so we can move these calls (wrapped with rcu_read_{,un}lock) before bh_lock_sock: from tcp_v{4,6}_do_rcv to tcp_v{4,6}_rcv. Signed-off-by: NDmitry Popov <ixaphire@qrator.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 8月, 2014 1 次提交
-
-
由 Dmitry Popov 提交于
tcpm_key is an array inside struct tcp_md5sig, there is no need to check it against NULL. Signed-off-by: NDmitry Popov <ixaphire@qrator.net> Acked-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 8月, 2014 1 次提交
-
-
由 Duan Jiong 提交于
When dealing with ICMPv[46] Error Message, function icmp_socket_deliver() and icmpv6_notify() do some valid checks on packet's length, but then some protocols check packet's length redaudantly. So remove those duplicated statements, and increase counter ICMP_MIB_INERRORS/ICMP6_MIB_INERRORS in function icmp_socket_deliver() and icmpv6_notify() respectively. In addition, add missed counter in udp6/udplite6 when socket is NULL. Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 7月, 2014 2 次提交
-
-
由 Tom Herbert 提交于
For a connected socket we can precompute the flow hash for setting in skb->hash on output. This is a performance advantage over calculating the skb->hash for every packet on the connection. The computation is done using the common hash algorithm to be consistent with computations done for packets of the connection in other states where thers is no socket (e.g. time-wait, syn-recv, syn-cookies). This patch adds sk_txhash to the sock structure. inet_set_txhash and ip6_set_txhash functions are added which are called from points in TCP and UDP where socket moves to established state. skb_set_hash_from_sk is a function which sets skb->hash from the sock txhash value. This is called in UDP and TCP transmit path when transmitting within the context of a socket. Tested: ran super_netperf with 200 TCP_RR streams over a vxlan interface (in this case skb_get_hash called on every TX packet to create a UDP source port). Before fix: 95.02% CPU utilization 154/256/505 90/95/99% latencies 1.13042e+06 tps Time in functions: 0.28% skb_flow_dissect 0.21% __skb_get_hash After fix: 94.95% CPU utilization 156/254/485 90/95/99% latencies 1.15447e+06 Neither __skb_get_hash nor skb_flow_dissect appear in perf Signed-off-by: NTom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Neal Cardwell 提交于
Always store in snt_synack the time at which the server received the first client SYN and attempted to send the first SYNACK. Recent commit aa27fc50 ("tcp: tcp_v[46]_conn_request: fix snt_synack initialization") resolved an inconsistency between IPv4 and IPv6 in the initialization of snt_synack. This commit brings back the idea from 843f4a55 (tcp: use tcp_v4_send_synack on first SYN-ACK), which was going for the original behavior of snt_synack from the commit where it was added in 9ad7c049 ("tcp: RFC2988bis + taking RTT sample from 3WHS for the passive open side") in v3.1. In addition to being simpler (and probably a tiny bit faster), unconditionally storing the time of the first SYNACK attempt has been useful because it allows calculating a performance metric quantifying how long it took to establish a passive TCP connection. Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Cc: Octavian Purdila <octavian.purdila@intel.com> Cc: Jerry Chu <hkchu@google.com> Acked-by: NOctavian Purdila <octavian.purdila@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 6月, 2014 10 次提交
-
-
由 Octavian Purdila 提交于
Create tcp_conn_request and remove most of the code from tcp_v4_conn_request and tcp_v6_conn_request. Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Octavian Purdila 提交于
Add queue_add_hash member to tcp_request_sock_ops so that we can later unify tcp_v4_conn_request and tcp_v6_conn_request. Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Octavian Purdila 提交于
Add mss_clamp member to tcp_request_sock_ops so that we can later unify tcp_v4_conn_request and tcp_v6_conn_request. Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Octavian Purdila 提交于
Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Octavian Purdila 提交于
Create a new tcp_request_sock_ops method to unify the IPv4/IPv6 signature for tcp_v[46]_send_synack. This allows us to later unify tcp_v4_rtx_synack with tcp_v6_rtx_synack and tcp_v4_conn_request with tcp_v4_conn_request. Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Octavian Purdila 提交于
More work in preparation of unifying tcp_v4_conn_request and tcp_v6_conn_request: indirect the init sequence calls via the tcp_request_sock_ops. Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Octavian Purdila 提交于
Create wrappers with same signature for the IPv4/IPv6 request routing calls and use these wrappers (via route_req method from tcp_request_sock_ops) in tcp_v4_conn_request and tcp_v6_conn_request with the purpose of unifying the two functions in a later patch. We can later drop the wrapper functions and modify inet_csk_route_req and inet6_cks_route_req to use the same signature. Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Octavian Purdila 提交于
Move the specific IPv4/IPv6 cookie sequence initialization to a new method in tcp_request_sock_ops in preparation for unifying tcp_v4_conn_request and tcp_v6_conn_request. Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Octavian Purdila 提交于
Move the specific IPv4/IPv6 intializations to a new method in tcp_request_sock_ops in preparation for unifying tcp_v4_conn_request and tcp_v6_conn_request. Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Octavian Purdila 提交于
Commit 016818d0 (tcp: TCP Fast Open Server - take SYNACK RTT after completing 3WHS) changes the code to only take a snt_synack timestamp when a SYNACK transmit or retransmit succeeds. This behaviour is later broken by commit 843f4a55 (tcp: use tcp_v4_send_synack on first SYN-ACK), as snt_synack is now updated even if tcp_v4_send_synack fails. Also, commit 3a19ce0e (tcp: IPv6 support for fastopen server) misses the required IPv6 updates for 016818d0. This patch makes sure that snt_synack is updated only when the SYNACK trasnmit/retransmit succeeds, for both IPv4 and IPv6. Cc: Cardwell <ncardwell@google.com> Cc: Daniel Lee <longinus00@gmail.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 6月, 2014 1 次提交
-
-
由 Octavian Purdila 提交于
ir_mark initialization is done for both TCP v4 and v6, move it in the common tcp_openreq_init function. Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 5月, 2014 5 次提交
-
-
由 Lorenzo Colitti 提交于
When using mark-based routing, sockets returned from accept() may need to be marked differently depending on the incoming connection request. This is the case, for example, if different socket marks identify different networks: a listening socket may want to accept connections from all networks, but each connection should be marked with the network that the request came in on, so that subsequent packets are sent on the correct network. This patch adds a sysctl to mark TCP sockets based on the fwmark of the incoming SYN packet. If enabled, and an unmarked socket receives a SYN, then the SYN packet's fwmark is written to the connection's inet_request_sock, and later written back to the accepted socket when the connection is established. If the socket already has a nonzero mark, then the behaviour is the same as it is today, i.e., the listening socket's fwmark is used. Black-box tested using user-mode linux: - IPv4/IPv6 SYN+ACK, FIN, etc. packets are routed based on the mark of the incoming SYN packet. - The socket returned by accept() is marked with the mark of the incoming SYN packet. - Tested with syncookies=1 and syncookies=2. Signed-off-by: NLorenzo Colitti <lorenzo@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuchung Cheng 提交于
If a fast open socket is already accepted by the user, it should be treated like a connected socket to record the ICMP error in sk_softerr, so the user can fetch it. Do that in both tcp_v4_err and tcp_v6_err. Also refactor the sequence window check to improve readability (e.g., there were two local variables named 'req'). Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NDaniel Lee <longinus00@gmail.com> Signed-off-by: NJerry Chu <hkchu@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuchung Cheng 提交于
To avoid large code duplication in IPv6, we need to first simplify the complicate SYN-ACK sending code in tcp_v4_conn_request(). To use tcp_v4(6)_send_synack() to send all SYN-ACKs, we need to initialize the mini socket's receive window before trying to create the child socket and/or building the SYN-ACK packet. So we move that initialization from tcp_make_synack() to tcp_v4_conn_request() as a new function tcp_openreq_init_req_rwin(). After this refactoring the SYN-ACK sending code is simpler and easier to implement Fast Open for IPv6. Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NDaniel Lee <longinus00@gmail.com> Signed-off-by: NJerry Chu <hkchu@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuchung Cheng 提交于
Consolidate various cookie checking and generation code to simplify the fast open processing. The main goal is to reduce code duplication in tcp_v4_conn_request() for IPv6 support. Removes two experimental sysctl flags TFO_SERVER_ALWAYS and TFO_SERVER_COOKIE_NOT_CHKD used primarily for developmental debugging purposes. Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NDaniel Lee <longinus00@gmail.com> Signed-off-by: NJerry Chu <hkchu@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuchung Cheng 提交于
Move common TFO functions that will be used by both v4 and v6 to tcp_fastopen.c. Create a helper tcp_fastopen_queue_check(). Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NDaniel Lee <longinus00@gmail.com> Signed-off-by: NJerry Chu <hkchu@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 5月, 2014 1 次提交
-
-
由 Tom Herbert 提交于
Call skb_checksum_init instead of private functions. Signed-off-by: NTom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 4月, 2014 1 次提交
-
-
由 David S. Miller 提交于
Several spots in the kernel perform a sequence like: skb_queue_tail(&sk->s_receive_queue, skb); sk->sk_data_ready(sk, skb->len); But at the moment we place the SKB onto the socket receive queue it can be consumed and freed up. So this skb->len access is potentially to freed up memory. Furthermore, the skb->len can be modified by the consumer so it is possible that the value isn't accurate. And finally, no actual implementation of this callback actually uses the length argument. And since nobody actually cared about it's value, lots of call sites pass arbitrary values in such as '0' and even '1'. So just remove the length argument from the callback, that way there is no confusion whatsoever and all of these use-after-free cases get fixed as a side effect. Based upon a patch by Eric Dumazet and his suggestion to audit this issue tree-wide. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 3月, 2014 1 次提交
-
-
由 Eric Dumazet 提交于
It seems I missed one change in get_timewait4_sock() to compute the remaining time before deletion of IPV4 timewait socket. This could result in wrong output in /proc/net/tcp for tm->when field. Fixes: 96f817fe ("tcp: shrink tcp6_timewait_sock by one cache line") Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 3月, 2014 1 次提交
-
-
由 Daniel Baluta 提交于
Signed-off-by: NDaniel Baluta <dbaluta@ixiacom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 3月, 2014 1 次提交
-
-
由 Yuchung Cheng 提交于
Add the following snmp stats: TCPFastOpenActiveFail: Fast Open attempts (SYN/data) failed beacuse the remote does not accept it or the attempts timed out. TCPSynRetrans: number of SYN and SYN/ACK retransmits to break down retransmissions into SYN, fast-retransmits, timeout retransmits, etc. TCPOrigDataSent: number of outgoing packets with original data (excluding retransmission but including data-in-SYN). This counter is different from TcpOutSegs because TcpOutSegs also tracks pure ACKs. TCPOrigDataSent is more useful to track the TCP retransmission rate. Change TCPFastOpenActive to track only successful Fast Opens to be symmetric to TCPFastOpenPassive. Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NNandita Dukkipati <nanditad@google.com> Signed-off-by: NLawrence Brakmo <brakmo@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 2月, 2014 1 次提交
-
-
由 Eric Dumazet 提交于
Upcoming congestion controls for TCP require usec resolution for RTT estimations. Millisecond resolution is simply not enough these days. FQ/pacing in DC environments also require this change for finer control and removal of bimodal behavior due to the current hack in tcp_update_pacing_rate() for 'small rtt' TCP_CONG_RTT_STAMP is no longer needed. As Julian Anastasov pointed out, we need to keep user compatibility : tcp_metrics used to export RTT and RTTVAR in msec resolution, so we added RTT_US and RTTVAR_US. An iproute2 patch is needed to use the new attributes if provided by the kernel. In this example ss command displays a srtt of 32 usecs (10Gbit link) lpk51:~# ./ss -i dst lpk52 Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port tcp ESTAB 0 1 10.246.11.51:42959 10.246.11.52:64614 cubic wscale:6,6 rto:201 rtt:0.032/0.001 ato:40 mss:1448 cwnd:10 send 3620.0Mbps pacing_rate 7240.0Mbps unacked:1 rcv_rtt:993 rcv_space:29559 Updated iproute2 ip command displays : lpk51:~# ./ip tcp_metrics | grep 10.246.11.52 10.246.11.52 age 561.914sec cwnd 10 rtt 274us rttvar 213us source 10.246.11.51 Old binary displays : lpk51:~# ip tcp_metrics | grep 10.246.11.52 10.246.11.52 age 561.914sec cwnd 10 rtt 250us rttvar 125us source 10.246.11.51 With help from Julian Anastasov, Stephen Hemminger and Yuchung Cheng Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Yuchung Cheng <ycheng@google.com> Cc: Larry Brakmo <brakmo@google.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 1月, 2014 1 次提交
-
-
由 Peter Pan(潘卫平) 提交于
As tcp_rcv_state_process() has already calls tcp_mtup_init() for non-fastopen sock, we can delete the redundant calls of tcp_mtup_init() in tcp_{v4,v6}_syn_recv_sock(). Signed-off-by: NWeiping Pan <panweiping3@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 12月, 2013 1 次提交
-
-
由 Weilong Chen 提交于
Signed-off-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 12月, 2013 1 次提交
-
-
由 Steffen Klassert 提交于
FLOWI_FLAG_CAN_SLEEP was used to notify xfrm about the posibility to sleep until the needed states are resolved. This code is gone, so FLOWI_FLAG_CAN_SLEEP is not needed anymore. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
- 30 11月, 2013 1 次提交
-
-
由 Eric Dumazet 提交于
In commit c9e90429 ("ipv4: fix possible seqlock deadlock") I left another places where IP_INC_STATS_BH() were improperly used. udp_sendmsg(), ping_v4_sendmsg() and tcp_v4_connect() are called from process context, not from softirq context. This was detected by lockdep seqlock support. Reported-by: Njongman heo <jongman.heo@samsung.com> Fixes: 584bdf8c ("[IPV4]: Fix "ipOutNoRoutes" counter error for TCP and UDP") Fixes: c319b4d7 ("net: ipv4: add IPPROTO_ICMP socket kind") Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 11月, 2013 1 次提交
-
-
由 Tetsuo Handa 提交于
All seq_printf() users are using "%n" for calculating padding size, convert them to use seq_setwidth() / seq_pad() pair. Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: NKees Cook <keescook@chromium.org> Cc: Joe Perches <joe@perches.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 11月, 2013 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
Sockets marked with IP_PMTUDISC_INTERFACE won't do path mtu discovery, their sockets won't accept and install new path mtu information and they will always use the interface mtu for outgoing packets. It is guaranteed that the packet is not fragmented locally. But we won't set the DF-Flag on the outgoing frames. Florian Weimer had the idea to use this flag to ensure DNS servers are never generating outgoing fragments. They may well be fragmented on the path, but the server never stores or usees path mtu values, which could well be forged in an attack. (The root of the problem with path MTU discovery is that there is no reliable way to authenticate ICMP Fragmentation Needed But DF Set messages because they are sent from intermediate routers with their source addresses, and the IMCP payload will not always contain sufficient information to identify a flow.) Recent research in the DNS community showed that it is possible to implement an attack where DNS cache poisoning is feasible by spoofing fragments. This work was done by Amir Herzberg and Haya Shulman: <https://sites.google.com/site/hayashulman/files/fragmentation-poisoning.pdf> This issue was previously discussed among the DNS community, e.g. <http://www.ietf.org/mail-archive/web/dnsext/current/msg01204.html>, without leading to fixes. This patch depends on the patch "ipv4: fix DO and PROBE pmtu mode regarding local fragmentation with UFO/CORK" for the enforcement of the non-fragmentable checks. If other users than ip_append_page/data should use this semantic too, we have to add a new flag to IPCB(skb)->flags to suppress local fragmentation and check for this in ip_finish_output. Many thanks to Florian Weimer for the idea and feedback while implementing this patch. Cc: David S. Miller <davem@davemloft.net> Suggested-by: NFlorian Weimer <fweimer@redhat.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 10月, 2013 1 次提交
-
-
由 Eric W. Biederman 提交于
The code that is implemented is per memory cgroup not per netns, and having per netns bits is just confusing. Remove the per netns bits to make it easier to see what is really going on. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 10月, 2013 1 次提交
-
-
由 Eric Dumazet 提交于
TCP listener refactoring, part 5 : We want to be able to insert request sockets (SYN_RECV) into main ehash table instead of the per listener hash table to allow RCU lookups and remove listener lock contention. This patch includes the needed struct sock_common in front of struct request_sock This means there is no more inet6_request_sock IPv6 specific structure. Following inet_request_sock fields were renamed as they became macros to reference fields from struct sock_common. Prefix ir_ was chosen to avoid name collisions. loc_port -> ir_loc_port loc_addr -> ir_loc_addr rmt_addr -> ir_rmt_addr rmt_port -> ir_rmt_port iif -> ir_iif Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 10月, 2013 1 次提交
-
-
由 Eric Dumazet 提交于
TCP listener refactoring, part 3 : Our goal is to hash SYN_RECV sockets into main ehash for fast lookup, and parallel SYN processing. Current inet_ehash_bucket contains two chains, one for ESTABLISH (and friend states) sockets, another for TIME_WAIT sockets only. As the hash table is sized to get at most one socket per bucket, it makes little sense to have separate twchain, as it makes the lookup slightly more complicated, and doubles hash table memory usage. If we make sure all socket types have the lookup keys at the same offsets, we can use a generic and faster lookup. It turns out TIME_WAIT and ESTABLISHED sockets already have common lookup fields for IPv4. [ INET_TW_MATCH() is no longer needed ] I'll provide a follow-up to factorize IPv6 lookup as well, to remove INET6_TW_MATCH() This way, SYN_RECV pseudo sockets will be supported the same. A new sock_gen_put() helper is added, doing either a sock_put() or inet_twsk_put() [ and will support SYN_RECV later ]. Note this helper should only be called in real slow path, when rcu lookup found a socket that was moved to another identity (freed/reused immediately), but could eventually be used in other contexts, like sock_edemux() Before patch : dmesg | grep "TCP established" TCP established hash table entries: 524288 (order: 11, 8388608 bytes) After patch : TCP established hash table entries: 524288 (order: 10, 4194304 bytes) Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 10月, 2013 1 次提交
-
-
由 Eric Dumazet 提交于
tcp_fixup_sndbuf() is underestimating initial send buffer requirements. It was not noticed because big GSO packets were escaping the limitation, but with smaller TSO packets (or TSO/GSO/SG off), application hits sk_sndbuf before having a chance to fill enough packets in socket write queue. - initial cwnd can be bigger than 10 for specific routes - SKB_TRUESIZE() is a bit under real needs in some cases, because of power-of-two rounding in kmalloc() - Fast Recovery (RFC 5681 3.2) : Cubic needs 70% factor - Extra cushion (application might react slowly to POLLOUT) tcp_v4_conn_req_fastopen() needs to call tcp_init_metrics() before calling tcp_init_buffer_space() Then we realize tcp_new_space() should call tcp_fixup_sndbuf() instead of duplicating this stuff. Rename tcp_fixup_sndbuf() to tcp_sndbuf_expand() to be more descriptive. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Acked-by: NMaciej Żenczykowski <maze@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-