- 23 9月, 2006 2 次提交
-
-
由 Herbert Xu 提交于
By passing a Linux-generated TSO packet straight back into Linux, Xen becomes our first LRO user :) Unfortunately, there is at least one spot in our stack that needs to be changed to cope with this. The receive MSS estimate is computed from the raw packet size. This is broken if the packet is GSO/LRO. Fortunately the real MSS can be found in gso_size so we simply need to use that if it is non-zero. Real LRO NICs should of course set the gso_size field in future. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Brian Haley 提交于
Change net/core, ipv4 and ipv6 sysctl variables to __read_mostly. Couldn't actually measure any performance increase while testing (.3% I consider noise), but seems like the right thing to do. Signed-off-by: NBrian Haley <brian.haley@hp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 9月, 2006 1 次提交
-
-
由 Stephen Hemminger 提交于
Turn Appropriate Byte Count off by default because it unfairly penalizes applications that do small writes. Add better documentation to describe what it is so users will understand why they might want to turn it on. Signed-off-by: NStephen Hemminger <shemminger@osdl.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 8月, 2006 1 次提交
-
-
由 Daikichi Osuga 提交于
1) fix slow start after retransmit timeout 2) fix case of L=2*SMSS acked bytes comparison Signed-off-by: NDaikichi Osuga <osugad@s1.nttdocomo.co.jp> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 8月, 2006 1 次提交
-
-
由 Ilpo Järvinen 提交于
Whenever a transfer is application limited, we are allowed at least initial window worth of data per window unless cwnd is previously less than that. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 7月, 2006 1 次提交
-
-
由 Jörn Engel 提交于
Signed-off-by: NJörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: NAdrian Bunk <bunk@stusta.de>
-
- 30 6月, 2006 1 次提交
-
-
由 Michael Chan 提交于
In the current TSO implementation, NETIF_F_TSO and ECN cannot be turned on together in a TCP connection. The problem is that most hardware that supports TSO does not handle CWR correctly if it is set in the TSO packet. Correct handling requires CWR to be set in the first packet only if it is set in the TSO header. This patch adds the ability to turn on NETIF_F_TSO and ECN using GSO if necessary to handle TSO packets with CWR set. Hardware that handles CWR correctly can turn on NETIF_F_TSO_ECN in the dev-> features flag. All TSO packets with CWR set will have the SKB_GSO_TCPV4_ECN set. If the output device does not have the NETIF_F_TSO_ECN feature set, GSO will split the packet up correctly with CWR only set in the first segment. With help from Herbert Xu <herbert@gondor.apana.org.au>. Since ECN can always be enabled with TSO, the SOCK_NO_LARGESEND sock flag is completely removed. Signed-off-by: NMichael Chan <mchan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 6月, 2006 1 次提交
-
-
由 Herbert Xu 提交于
Having separate fields in sk_buff for TSO/UFO (tso_size/ufo_size) is not going to scale if we add any more segmentation methods (e.g., DCCP). So let's merge them. They were used to tell the protocol of a packet. This function has been subsumed by the new gso_type field. This is essentially a set of netdev feature bits (shifted by 16 bits) that are required to process a specific skb. As such it's easy to tell whether a given device can process a GSO skb: you just have to and the gso_type field and the netdev's features field. I've made gso_type a conjunction. The idea is that you have a base type (e.g., SKB_GSO_TCPV4) that can be modified further to support new features. For example, if we add a hardware TSO type that supports ECN, they would declare NETIF_F_TSO | NETIF_F_TSO_ECN. All TSO packets with CWR set would have a gso_type of SKB_GSO_TCPV4 | SKB_GSO_TCPV4_ECN while all other TSO packets would be SKB_GSO_TCPV4. This means that only the CWR packets need to be emulated in software. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 6月, 2006 3 次提交
-
-
由 Stephen Hemminger 提交于
Many of the TCP congestion methods all just use ssthresh as the minimum congestion window on decrease. Rather than duplicating the code, just have that be the default if that handle in the ops structure is not set. Minor behaviour change to TCP compound. It probably wants to use this (ssthresh) as lower bound, rather than ssthresh/2 because the latter causes undershoot on loss. Signed-off-by: NStephen Hemminger <shemminger@osdl.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
We only want to take receive RTT mesaurements for data bearing frames, here in the header prediction fast path for a pure-sender, we know that we have a pure-ACK and thus the checks in tcp_rcv_rtt_mesaure_ts() will not pass. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Chris Leech 提交于
Locks down user pages and sets up for DMA in tcp_recvmsg, then calls dma_async_try_early_copy in tcp_v4_do_rcv Signed-off-by: NChris Leech <christopher.leech@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 6月, 2006 1 次提交
-
-
由 Aki M Nyrhinen 提交于
From: Aki M Nyrhinen <anyrhine@cs.helsinki.fi> IMHO the current fix to the problem (in_flight underflow in reno) is incorrect. it treats the symptons but ignores the problem. the problem is timing out packets other than the head packet when we don't have sack. i try to explain (sorry if explaining the obvious). with sack, scanning the retransmit queue for timed out packets is fine because we know which packets in our retransmit queue have been acked by the receiver. without sack, we know only how many packets in our retransmit queue the receiver has acknowledged, but no idea which packets. think of a "typical" slow-start overshoot case, where for example every third packet in a window get lost because a router buffer gets full. with sack, we check for timeouts on those every third packet (as the rest have been sacked). the packet counting works out and if there is no reordering, we'll retransmit exactly the packets that were lost. without sack, however, we check for timeout on every packet and end up retransmitting consecutive packets in the retransmit queue. in our slow-start example, 2/3 of those retransmissions are unnecessary. these unnecessary retransmissions eat the congestion window and evetually prevent fast recovery from continuing, if enough packets were lost. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 5月, 2006 1 次提交
-
-
由 Angelo P. Castellani 提交于
From: "Angelo P. Castellani" <angelo.castellani+lkml@gmail.com> Using NewReno, if a sk_buff is timed out and is accounted as lost_out, it should also be removed from the sacked_out. This is necessary because recovery using NewReno fast retransmit could take up to a lot RTTs and the sk_buff RTO can expire without actually being really lost. left_out = sacked_out + lost_out in_flight = packets_out - left_out + retrans_out Using NewReno without this patch, on very large network losses, left_out becames bigger than packets_out + retrans_out (!!). For this reason unsigned integer in_flight overflows to 2^32 - something. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 4月, 2006 1 次提交
-
-
由 Adrian Bunk 提交于
This patch contains the following possible cleanups: - make the following needlessly global function static: - arp.c: arp_rcv() - remove the following unused EXPORT_SYMBOL's: - devinet.c: devinet_ioctl - fib_frontend.c: ip_rt_ioctl - inet_hashtables.c: inet_bind_bucket_create - inet_hashtables.c: inet_bind_hash - tcp_input.c: sysctl_tcp_abc - tcp_ipv4.c: sysctl_tcp_tw_reuse - tcp_output.c: sysctl_tcp_mtu_probing - tcp_output.c: sysctl_tcp_base_mss Signed-off-by: NAdrian Bunk <bunk@stusta.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 3月, 2006 2 次提交
-
-
由 John Heffner 提交于
This moves some TCP-specific MTU probing state out of inet_connection_sock back to tcp_sock. Signed-off-by: NJohn Heffner <jheffner@psc.edu> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Heffner 提交于
Implementation of packetization layer path mtu discovery for TCP, based on the internet-draft currently found at <http://www.ietf.org/internet-drafts/draft-ietf-pmtud-method-05.txt>. Signed-off-by: NJohn Heffner <jheffner@psc.edu> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 2月, 2006 1 次提交
-
-
由 John Heffner 提交于
The rcvbuf lock should probably be honored here. Signed-off-by: NJohn Heffner <jheffner@psc.edu> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 1月, 2006 1 次提交
-
-
由 Kris Katterjohn 提交于
This changes some simple "if (x) BUG();" statements to "BUG_ON(x);" Signed-off-by: NKris Katterjohn <kjak@users.sourceforge.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 1月, 2006 3 次提交
-
-
由 Stephen Hemminger 提交于
TCP inline usage cleanup: * get rid of inline in several places * replace __inline__ with inline where possible * move functions used in one file out of tcp.h * let compiler decide on used once cases On x86_64: text data bss dec hex filename 3594701 648348 567400 4810449 4966d1 vmlinux.orig 3593133 648580 567400 4809113 496199 vmlinux On sparc64: text data bss dec hex filename 2538278 406152 530392 3474822 350586 vmlinux.ORIG 2536382 406384 530392 3473158 34ff06 vmlinux Signed-off-by: NStephen Hemminger <shemminger@osdl.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arnaldo Carvalho de Melo 提交于
As DCCP needs to be called in the same spots. Now we have a member in inet_sock (is_icsk), set at sock creation time from struct inet_protosw->flags (if INET_PROTOSW_ICSK is set, like for TCP and DCCP) to see if a struct sock instance is a inet_connection_sock for places like the ones in ip_sockglue.c (v4 and v6) where we previously were looking if sk_type was SOCK_STREAM, that is insufficient because we now use the same code for DCCP, that has sk_type SOCK_DCCP. Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arnaldo Carvalho de Melo 提交于
And move it to struct inet_connection_sock. DCCP will use it in the upcoming changesets. Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 11月, 2005 1 次提交
-
-
由 Stephen Hemminger 提交于
From Joe Perches Signed-off-by: NStephen Hemminger <shemminger@osdl.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 11月, 2005 5 次提交
-
-
由 Stephen Hemminger 提交于
Use "hints" to speed up the SACK processing. Various forms of this have been used by TCP developers (Web100, STCP, BIC) to avoid the 2x linear search of outstanding segments. Signed-off-by: NStephen Hemminger <shemminger@osdl.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stephen Hemminger 提交于
Minor spelling fixes for TCP code. Signed-off-by: NStephen Hemminger <shemminger@osdl.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Heffner 提交于
This is a patch for discussion addressing some receive buffer growing issues. This is partially related to the thread "Possible BUG in IPv4 TCP window handling..." last week. Specifically it addresses the problem of an interaction between rcvbuf moderation (receiver autotuning) and rcv_ssthresh. The problem occurs when sending small packets to a receiver with a larger MTU. (A very common case I have is a host with a 1500 byte MTU sending to a host with a 9k MTU.) In such a case, the rcv_ssthresh code is targeting a window size corresponding to filling up the current rcvbuf, not taking into account that the new rcvbuf moderation may increase the rcvbuf size. One hunk makes rcv_ssthresh use tcp_rmem[2] as the size target rather than rcvbuf. The other changes the behavior when it overflows its memory bounds with in-order data so that it tries to grow rcvbuf (the same as with out-of-order data). These changes should help my problem of mixed MTUs, and should also help the case from last week's thread I think. (In both cases though you still need tcp_rmem[2] to be set much larger than the TCP window.) One question is if this is too aggressive at trying to increase rcvbuf if it's under memory stress. Orignally-from: John Heffner <jheffner@psc.edu> Signed-off-by: NStephen Hemminger <shemminger@osdl.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stephen Hemminger 提交于
This is an updated version of the RFC3465 ABC patch originally for Linux 2.6.11-rc4 by Yee-Ting Li. ABC is a way of counting bytes ack'd rather than packets when updating congestion control. The orignal ABC described in the RFC applied to a Reno style algorithm. For advanced congestion control there is little change after leaving slow start. Signed-off-by: NStephen Hemminger <shemminger@osdl.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stephen Hemminger 提交于
Simplify the code that comuputes microsecond rtt estimate used by TCP Vegas. Move the callback out of the RTT sampler and into the end of the ack cleanup. Signed-off-by: NStephen Hemminger <shemminger@osdl.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 10月, 2005 1 次提交
-
-
由 Herbert Xu 提交于
This bug is responsible for causing the infamous "Treason uncloaked" messages that's been popping up everywhere since the printk was added. It has usually been blamed on foreign operating systems. However, some of those reports implicate Linux as both systems are running Linux or the TCP connection is going across the loopback interface. In fact, there really is a bug in the Linux TCP header prediction code that's been there since at least 2.1.8. This bug was tracked down with help from Dale Blount. The effect of this bug ranges from harmless "Treason uncloaked" messages to hung/aborted TCP connections. The details of the bug and fix is as follows. When snd_wnd is updated, we only update pred_flags if tcp_fast_path_check succeeds. When it fails (for example, when our rcvbuf is used up), we will leave pred_flags with an out-of-date snd_wnd value. When the out-of-date pred_flags happens to match the next incoming packet we will again hit the fast path and use the current snd_wnd which will be wrong. In the case of the treason messages, it just happens that the snd_wnd cached in pred_flags is zero while tp->snd_wnd is non-zero. Therefore when a zero-window packet comes in we incorrectly conclude that the window is non-zero. In fact if the peer continues to send us zero-window pure ACKs we will continue making the same mistake. It's only when the peer transmits a zero-window packet with data attached that we get a chance to snap out of it. This is what triggers the treason message at the next retransmit timeout. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
-
- 30 9月, 2005 1 次提交
-
-
由 Alexey Kuznetsov 提交于
From: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Handle better the case where the sender sends full sized frames initially, then moves to a mode where it trickles out small amounts of data at a time. This known problem is even mentioned in the comments above tcp_grow_window() in tcp_input.c, specifically: ... * The scheme does not work when sender sends good segments opening * window and then starts to feed us spagetti. But it should work * in common situations. Otherwise, we have to rely on queue collapsing. ... When the sender gives full sized frames, the "struct sk_buff" overhead from each packet is small. So we'll advertize a larger window. If the sender moves to a mode where small segments are sent, this ratio becomes tilted to the other extreme and we start overrunning the socket buffer space. tcp_clamp_window() tries to address this, but it's clamping of tp->window_clamp is a wee bit too aggressive for this particular case. Fix confirmed by Ion Badulescu. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 9月, 2005 1 次提交
-
-
由 Herbert Xu 提交于
The problem is that the SACK fragmenting code may incorrectly call tcp_fragment() with a length larger than the skb->len. This happens when the skb on the transmit queue completely falls to the LHS of the SACK. And add a BUG() check to tcp_fragment() so we can spot this kind of error more quickly in the future. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 9月, 2005 1 次提交
-
-
由 David S. Miller 提交于
All we need to do is resegment the queue so that we record SACK information accurately. The edges of the SACK blocks guide our resegmenting decisions. With help from Herbert Xu. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 8月, 2005 7 次提交
-
-
由 Arnaldo Carvalho de Melo 提交于
Of this type, mostly: CHECK net/ipv6/netfilter.c net/ipv6/netfilter.c:96:12: warning: symbol 'ipv6_netfilter_init' was not declared. Should it be static? net/ipv6/netfilter.c:101:6: warning: symbol 'ipv6_netfilter_fini' was not declared. Should it be static? Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Patrick McHardy 提交于
Reduces skb size by 8 bytes on 64-bit. Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arnaldo Carvalho de Melo 提交于
This changeset basically moves tcp_sk()->{ca_ops,ca_state,etc} to inet_csk(), minimal renaming/moving done in this changeset to ease review. Most of it is just changes of struct tcp_sock * to struct sock * parameters. With this we move to a state closer to two interesting goals: 1. Generalisation of net/ipv4/tcp_diag.c, becoming inet_diag.c, being used for any INET transport protocol that has struct inet_hashinfo and are derived from struct inet_connection_sock. Keeps the userspace API, that will just not display DCCP sockets, while newer versions of tools can support DCCP. 2. INET generic transport pluggable Congestion Avoidance infrastructure, using the current TCP CA infrastructure with DCCP. Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arnaldo Carvalho de Melo 提交于
With this we're very close to getting all of the current TCP refactorings in my dccp-2.6 tree merged, next changeset will export some functions needed by the current DCCP code and then dccp-2.6.git will be born! Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arnaldo Carvalho de Melo 提交于
Completing the previous changeset, this also generalises tcp_v4_synq_add, renaming it to inet_csk_reqsk_queue_hash_add, already geing used in the DCCP tree, which I plan to merge RSN. Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arnaldo Carvalho de Melo 提交于
This creates struct inet_connection_sock, moving members out of struct tcp_sock that are shareable with other INET connection oriented protocols, such as DCCP, that in my private tree already uses most of these members. The functions that operate on these members were renamed, using a inet_csk_ prefix while not being moved yet to a new file, so as to ease the review of these changes. Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Remove the "list" member of struct sk_buff, as it is entirely redundant. All SKB list removal callers know which list the SKB is on, so storing this in sk_buff does nothing other than taking up some space. Two tricky bits were SCTP, which I took care of, and two ATM drivers which Francois Romieu <romieu@fr.zoreil.com> fixed up. Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NFrancois Romieu <romieu@fr.zoreil.com>
-
- 09 7月, 2005 1 次提交
-
-
由 David S. Miller 提交于
This is part of the grand scheme to eliminate the qlen member of skb_queue_head, and subsequently remove the 'list' member of sk_buff. Most users of skb_queue_len() want to know if the queue is empty or not, and that's trivially done with skb_queue_empty() which doesn't use the skb_queue_head->qlen member and instead uses the queue list emptyness as the test. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 7月, 2005 1 次提交
-
-
由 David S. Miller 提交于
Make TSO segment transmit size decisions at send time not earlier. The basic scheme is that we try to build as large a TSO frame as possible when pulling in the user data, but the size of the TSO frame output to the card is determined at transmit time. This is guided by tp->xmit_size_goal. It is always set to a multiple of MSS and tells sendmsg/sendpage how large an SKB to try and build. Later, tcp_write_xmit() and tcp_push_one() chop up the packet if necessary and conditions warrant. These routines can also decide to "defer" in order to wait for more ACKs to arrive and thus allow larger TSO frames to be emitted. A general observation is that TSO elongates the pipe, thus requiring a larger congestion window and larger buffering especially at the sender side. Therefore, it is important that applications 1) get a large enough socket send buffer (this is accomplished by our dynamic send buffer expansion code) 2) do large enough writes. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-