- 03 5月, 2012 8 次提交
-
-
由 Alexander Duyck 提交于
This change reorders the code related to the use of an skb->head_frag so it is placed before we check the rest of the frags. This allows the code to read more linearly instead of like some sort of loop. Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
This patch addresses several issues in the way we were tracking the truesize in tcp_try_coalesce. First it was using ksize which prevents us from having a 0 sized head frag and getting a usable result. To resolve that this patch uses the end pointer which is set based off either ksize, or the frag_size supplied in build_skb. This allows us to compute the original truesize of the entire buffer and remove that value leaving us with just what was added as pages. The second issue was the use of skb->len if there is a mergeable head frag. We should only need to remove the size of an data aligned sk_buff from our current skb->truesize to compute the delta for a buffer with a reused head. By using skb->len the value of truesize was being artificially reduced which means that head frags could use more memory than buffers using standard allocations. Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
This change is meant ot prevent stealing the skb->head to use as a page in the event that the skb->head was cloned. This allows the other clones to track each other via shinfo->dataref. Without this we break down to two methods for tracking the reference count, one being dataref, the other being the page count. As a result it becomes difficult to track how many references there are to skb->head. Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Extend tcp coalescing implementing it from tcp_queue_rcv(), the main receiver function when application is not blocked in recvmsg(). Function tcp_queue_rcv() is moved a bit to allow its call from tcp_data_queue() This gives good results especially if GRO could not kick, and if skb head is a fragment. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Before stealing fragments or skb head, we must make sure skbs are not cloned. Alexander was worried about destination skb being cloned : In bridge setups, a driver could be fooled if skb->data_len would not match skb nr_frags. If source skb is cloned, we must take references on pages instead. Bug happened using tcpdump (if not using mmap()) Introduce kfree_skb_partial() helper to cleanup code. Reported-by: NAlexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuchung Cheng 提交于
Implementing the advanced early retransmit (sysctl_tcp_early_retrans==2). Delays the fast retransmit by an interval of RTT/4. We borrow the RTO timer to implement the delay. If we receive another ACK or send a new packet, the timer is cancelled and restored to original RTO value offset by time elapsed. When the delayed-ER timer fires, we enter fast recovery and perform fast retransmit. Signed-off-by: NYuchung Cheng <ycheng@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuchung Cheng 提交于
This patch implements RFC 5827 early retransmit (ER) for TCP. It reduces DUPACK threshold (dupthresh) if outstanding packets are less than 4 to recover losses by fast recovery instead of timeout. While the algorithm is simple, small but frequent network reordering makes this feature dangerous: the connection repeatedly enter false recovery and degrade performance. Therefore we implement a mitigation suggested in the appendix of the RFC that delays entering fast recovery by a small interval, i.e., RTT/4. Currently ER is conservative and is disabled for the rest of the connection after the first reordering event. A large scale web server experiment on the performance impact of ER is summarized in section 6 of the paper "Proportional Rate Reduction for TCP”, IMC 2011. http://conferences.sigcomm.org/imc/2011/docs/p155.pdf Note that Linux has a similar feature called THIN_DUPACK. The differences are THIN_DUPACK do not mitigate reorderings and is only used after slow start. Currently ER is disabled if THIN_DUPACK is enabled. I would be happy to merge THIN_DUPACK feature with ER if people think it's a good idea. ER is enabled by sysctl_tcp_early_retrans: 0: Disables ER 1: Reduce dupthresh to packets_out - 1 when outstanding packets < 4. 2: (Default) reduce dupthresh like mode 1. In addition, delay entering fast recovery by RTT/4. Note: mode 2 is implemented in the third part of this patch series. Signed-off-by: NYuchung Cheng <ycheng@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuchung Cheng 提交于
This a prepartion patch that refactors the code to enter recovery into a new function tcp_enter_recovery(). It's needed to implement the delayed fast retransmit in ER. Signed-off-by: NYuchung Cheng <ycheng@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 5月, 2012 1 次提交
-
-
由 Eric Dumazet 提交于
TCP coalesce can check if skb to be merged has its skb->head mapped to a page fragment, instead of a kmalloc() area. We had to disable coalescing in this case, for performance reasons. We 'upgrade' skb->head as a fragment in itself. This reduces number of cache misses when user makes its copies, since a less sk_buff are fetched. This makes receive and ofo queues shorter and thus reduce cache line misses in TCP stack. This is a followup of patch "net: allow skb->head to be a page fragment" Tested with tg3 nic, with GRO on or off. We can see "TCPRcvCoalesce" counter being incremented. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Maciej Żenczykowski <maze@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Matt Carlson <mcarlson@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 4月, 2012 2 次提交
-
-
由 Eric Dumazet 提交于
This clarifies code intention, as suggested by David. Suggested-by: NDavid Miller <davem@davemloft.net> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
commit c8628155 (tcp: reduce out_of_order memory use) took care of coalescing tcp segments provided by legacy devices (linear skbs) We extend this idea to fragged skbs, as their truesize can be heavy. ixgbe for example uses 256+1024+PAGE_SIZE/2 = 3328 bytes per segment. Use this coalescing strategy for receive queue too. This contributes to reduce number of tcp collapses, at minimal cost, and reduces memory overhead and packets drops. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 4月, 2012 1 次提交
-
-
由 Pavel Emelyanov 提交于
This is just the preparation patch, which makes the needed for TCP repair code ready for use. Signed-off-by: NPavel Emelyanov <xemul@parallels.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 4月, 2012 1 次提交
-
-
由 Eric Dumazet 提交于
tcp_grow_window() has to grow rcv_ssthresh up to window_clamp, allowing sender to increase its window. tcp_grow_window() still assumes a tcp frame is under MSS, but its no longer true with LRO/GRO. This patch fixes one of the performance issue we noticed with GRO on. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 4月, 2012 1 次提交
-
-
由 Eric Dumazet 提交于
Use of "unsigned int" is preferred to bare "unsigned" in net tree. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 4月, 2012 2 次提交
-
-
由 Vijay Subramanian 提交于
tcp_enter_quickack_mode() already calls tcp_incr_quickack() and sets icsk->icsk_ack.ato to TCP_ATO_MIN. This patch removes the duplication. Signed-off-by: NVijay Subramanian <subramanian.vijay@gmail.com> Reviewed-by: NFlavio Leitner <fbl@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Updates some comments to track RFC6298 Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: H.K. Jerry Chu <hkchu@google.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 4月, 2012 1 次提交
-
-
由 Neal Cardwell 提交于
Fix a code path in tcp_rcv_rtt_update() that was comparing scaled and unscaled RTT samples. The intent in the code was to only use the 'm' measurement if it was a new minimum. However, since 'm' had not yet been shifted left 3 bits but 'new_sample' had, this comparison would nearly always succeed, leading us to erroneously set our receive-side RTT estimate to the 'm' sample when that sample could be nearly 8x too high to use. The overall effect is to often cause the receive-side RTT estimate to be significantly too large (up to 40% too large for brief periods in my tests). Signed-off-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 4月, 2012 1 次提交
-
-
由 Dave Jiang 提交于
This is the fallout from adding memcpy alignment workaround for certain IOATDMA hardware. NetDMA will only use DMA engine that can handle byte align ops. Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NDave Jiang <dave.jiang@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 20 3月, 2012 2 次提交
-
-
由 Eric Dumazet 提交于
With increasing receive window sizes, but speed of light not improved that much, out of order queue can contain a huge number of skbs, waiting to be moved to receive_queue when missing packets can fill the holes. Some devices happen to use fat skbs (truesize of 4096 + sizeof(struct sk_buff)) to store regular (MTU <= 1500) frames. This makes highly probable sk_rmem_alloc hits sk_rcvbuf limit, which can be 4Mbytes in many cases. When limit is hit, tcp stack calls tcp_collapse_ofo_queue(), a true latency killer and cpu cache blower. Doing the coalescing attempt each time we add a frame in ofo queue permits to keep memory use tight and in many cases avoid the tcp_collapse() thing later. Tested on various wireless setups (b43, ath9k, ...) known to use big skb truesize, this patch removed the "packets collapsed in receive queue due to low socket buffer" I had before. This also reduced average memory used by tcp sockets. With help from Neal Cardwell. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: H.K. Jerry Chu <hkchu@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Split tcp_data_queue() in two parts for better readability. tcp_data_queue_ofo() is responsible for queueing incoming skb into out of order queue. Change code layout so that the skb_set_owner_r() is performed only if skb is not dropped. This is a preliminary patch before "reduce out_of_order memory use" following patch. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: H.K. Jerry Chu <hkchu@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 3月, 2012 1 次提交
-
-
由 Joe Perches 提交于
Add #define pr_fmt(fmt) as appropriate. Add "IPv4: ", "TCP: ", and "IPsec: " to appropriate files. Standardize on "UDPLite: " for appropriate uses. Some prefixes were previously "UDPLITE: " and "UDP-Lite: ". Add KBUILD_MODNAME ": " to icmp and gre. Remove embedded prefixes as appropriate. Add missing "\n" to pr_info in gre.c. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 3月, 2012 1 次提交
-
-
由 Joe Perches 提交于
Use a more current kernel messaging style. Convert a printk block to print_hex_dump. Coalesce formats, align arguments. Use %s, __func__ instead of embedding function names. Some messages that were prefixed with <foo>_close are now prefixed with <foo>_fini. Some ah4 and esp messages are now not prefixed with "ip ". The intent of this patch is to later add something like #define pr_fmt(fmt) "IPv4: " fmt. to standardize the output messages. Text size is trivially reduced. (x86-32 allyesconfig) $ size net/ipv4/built-in.o* text data bss dec hex filename 887888 31558 249696 1169142 11d6f6 net/ipv4/built-in.o.new 887934 31558 249800 1169292 11d78c net/ipv4/built-in.o.old Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 3月, 2012 1 次提交
-
-
由 Neal Cardwell 提交于
This commit fixes tcp_shift_skb_data() so that it does not shift SACKed data below snd_una. This fixes an issue whose symptoms exactly match reports showing tp->sacked_out going negative since 3.3.0-rc4 (see "WARNING: at net/ipv4/tcp_input.c:3418" thread on netdev). Since 2008 (832d11c5) tcp_shift_skb_data() had been shifting SACKed ranges that were below snd_una. It checked that the *end* of the skb it was about to shift from was above snd_una, but did not check that the end of the actual shifted range was above snd_una; this commit adds that check. Shifting SACKed ranges below snd_una is problematic because for such ranges tcp_sacktag_one() short-circuits: it does not declare anything as SACKed and does not increase sacked_out. Before the fixes in commits cc9a672e and daef52ba, shifting SACKed ranges below snd_una happened to work because tcp_shifted_skb() was always (incorrectly) passing in to tcp_sacktag_one() an skb whose end_seq tcp_shift_skb_data() had already guaranteed was beyond snd_una. Hence tcp_sacktag_one() never short-circuited and always increased tp->sacked_out in this case. After those two fixes, my testing has verified that shifting SACKed ranges below snd_una could cause tp->sacked_out to go negative with the following sequence of events: (1) tcp_shift_skb_data() sees an skb whose end_seq is beyond snd_una, then shifts a prefix of that skb that is below snd_una (2) tcp_shifted_skb() increments the packet count of the already-SACKed prev sk_buff (3) tcp_sacktag_one() sees the end of the new SACKed range is below snd_una, so it short-circuits and doesn't increase tp->sacked_out (5) tcp_clean_rtx_queue() sees the SACKed skb has been ACKed, decrements tp->sacked_out by this "inflated" pcount that was missing a matching increase in tp->sacked_out, and hence tp->sacked_out underflows to a u32 like 0xFFFFFFFF, which casted to s32 is negative. (6) this leads to the warnings seen in the recent "WARNING: at net/ipv4/tcp_input.c:3418" thread on the netdev list; e.g.: tcp_input.c:3418 WARN_ON((int)tp->sacked_out < 0); More generally, I think this bug can be tickled in some cases where two or more ACKs from the receiver are lost and then a DSACK arrives that is immediately above an existing SACKed skb in the write queue. This fix changes tcp_shift_skb_data() to abort this sequence at step (1) in the scenario above by noticing that the bytes are below snd_una and not shifting them. Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 3月, 2012 1 次提交
-
-
由 Neal Cardwell 提交于
In tcp_mark_head_lost() we should not attempt to fragment a SACKed skb to mark the first portion as lost. This is for two primary reasons: (1) tcp_shifted_skb() coalesces adjacent regions of SACKed skbs. When doing this, it preserves the sum of their packet counts in order to reflect the real-world dynamics on the wire. But given that skbs can have remainders that do not align to MSS boundaries, this packet count preservation means that for SACKed skbs there is not necessarily a direct linear relationship between tcp_skb_pcount(skb) and skb->len. Thus tcp_mark_head_lost()'s previous attempts to fragment off and mark as lost a prefix of length (packets - oldcnt)*mss from SACKed skbs were leading to occasional failures of the WARN_ON(len > skb->len) in tcp_fragment() (which used to be a BUG_ON(); see the recent "crash in tcp_fragment" thread on netdev). (2) there is no real point in fragmenting off part of a SACKed skb and calling tcp_skb_mark_lost() on it, since tcp_skb_mark_lost() is a NOP for SACKed skbs. Signed-off-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Acked-by: NYuchung Cheng <ycheng@google.com> Acked-by: NNandita Dukkipati <nanditad@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 2月, 2012 1 次提交
-
-
由 Neal Cardwell 提交于
When tcp_shifted_skb() shifts bytes from the skb that is currently pointed to by 'highest_sack' then the increment of TCP_SKB_CB(skb)->seq implicitly advances tcp_highest_sack_seq(). This implicit advancement, combined with the recent fix to pass the correct SACKed range into tcp_sacktag_one(), caused tcp_sacktag_one() to think that the newly SACKed range was before the tcp_highest_sack_seq(), leading to a call to tcp_update_reordering() with a degree of reordering matching the size of the newly SACKed range (typically just 1 packet, which is a NOP, but potentially larger). This commit fixes this by simply calling tcp_sacktag_one() before the TCP_SKB_CB(skb)->seq advancement that can advance our notion of the highest SACKed sequence. Correspondingly, we can simplify the code a little now that tcp_shifted_skb() should update the lost_cnt_hint in all cases where skb == tp->lost_skb_hint. Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 2月, 2012 1 次提交
-
-
由 Neal Cardwell 提交于
This commit ensures that lost_cnt_hint is correctly updated in tcp_shifted_skb() for FACK TCP senders. The lost_cnt_hint adjustment in tcp_sacktag_one() only applies to non-FACK senders, so FACK senders need their own adjustment. This applies the spirit of 1e5289e1 - except now that the sequence range passed into tcp_sacktag_one() is correct we need only have a special case adjustment for FACK. Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 2月, 2012 2 次提交
-
-
由 Neal Cardwell 提交于
Fix the newly-SACKed range to be the range of newly-shifted bytes. Previously - since 832d11c5 - tcp_shifted_skb() incorrectly called tcp_sacktag_one() with the start and end sequence numbers of the skb it passes in set to the range just beyond the range that is newly-SACKed. This commit also removes a special-case adjustment to lost_cnt_hint in tcp_shifted_skb() since the pre-existing adjustment of lost_cnt_hint in tcp_sacktag_one() now properly handles this things now that the correct start sequence number is passed in. Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Neal Cardwell 提交于
This commit allows callers of tcp_sacktag_one() to pass in sequence ranges that do not align with skb boundaries, as tcp_shifted_skb() needs to do in an upcoming fix in this patch series. In fact, now tcp_sacktag_one() does not need to depend on an input skb at all, which makes its semantics and dependencies more clear. Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 1月, 2012 1 次提交
-
-
由 Yuchung Cheng 提交于
Correctly implement a loss detection heuristic: New sequences (above high_seq) sent during the fast recovery are deemed lost when higher sequences are SACKed. Current code does not catch these losses, because tcp_mark_head_lost() does not check packets beyond high_seq. The fix is straight-forward by checking packets until the highest sacked packet. In addition, all the FLAG_DATA_LOST logic are in-effective and redundant and can be removed. Update the loss heuristic comments. The algorithm above is documented as heuristic B, but it is redundant too because heuristic A already covers B. Note that this change only marks some forward-retransmitted packets LOST. It does NOT forbid TCP performing further CWR on new losses. A potential follow-up patch under preparation is to perform another CWR on "new" losses such as 1) sequence above high_seq is lost (by resetting high_seq to snd_nxt) 2) retransmission is lost. Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 12月, 2011 1 次提交
-
-
由 Vijay Subramanian 提交于
to record the state of SACK/FACK and DSACK for better readability and maintenance. Signed-off-by: NVijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 12月, 2011 1 次提交
-
-
由 Glauber Costa 提交于
This patch replaces all uses of struct sock fields' memory_pressure, memory_allocated, sockets_allocated, and sysctl_mem to acessor macros. Those macros can either receive a socket argument, or a mem_cgroup argument, depending on the context they live in. Since we're only doing a macro wrapping here, no performance impact at all is expected in the case where we don't have cgroups disabled. Signed-off-by: NGlauber Costa <glommer@parallels.com> Reviewed-by: NHiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> CC: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 12月, 2011 1 次提交
-
-
由 Eric Dumazet 提交于
Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 12月, 2011 1 次提交
-
-
由 Eric Dumazet 提交于
Denys Fedoryshchenko reported that SYN+FIN attacks were bringing his linux machines to their limits. Dont call conn_request() if the TCP flags includes SYN flag Reported-by: NDenys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 11月, 2011 5 次提交
-
-
由 Neal Cardwell 提交于
The problem: Senders were overriding cwnd values picked during an undo by calling tcp_moderate_cwnd() in tcp_try_to_open(). The fix: Don't moderate cwnd in tcp_try_to_open() if we're in TCP_CA_Open, since doing so is generally unnecessary and specifically would override a DSACK-based undo of a cwnd reduction made in fast recovery. Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Neal Cardwell 提交于
Previously, SACK-enabled connections hung around in TCP_CA_Disorder state while snd_una==high_seq, just waiting to accumulate DSACKs and hopefully undo a cwnd reduction. This could and did lead to the following unfortunate scenario: if some incoming ACKs advance snd_una beyond high_seq then we were setting undo_marker to 0 and moving to TCP_CA_Open, so if (due to reordering in the ACK return path) we shortly thereafter received a DSACK then we were no longer able to undo the cwnd reduction. The change: Simplify the congestion avoidance state machine by removing the behavior where SACK-enabled connections hung around in the TCP_CA_Disorder state just waiting for DSACKs. Instead, when snd_una advances to high_seq or beyond we typically move to TCP_CA_Open immediately and allow an undo in either TCP_CA_Open or TCP_CA_Disorder if we later receive enough DSACKs. Other patches in this series will provide other changes that are necessary to fully fix this problem. Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Neal Cardwell 提交于
The bug: When the ACK field is below snd_una (which can happen when ACKs are reordered), senders ignored DSACKs (preventing undo) and did not call tcp_fastretrans_alert, so they did not increment prr_delivered to reflect newly-SACKed sequence ranges, and did not call tcp_xmit_retransmit_queue, thus passing up chances to send out more retransmitted and new packets based on any newly-SACKed packets. The change: When the ACK field is below snd_una (the "old_ack" goto label), call tcp_fastretrans_alert to allow undo based on any newly-arrived DSACKs and try to send out more packets based on newly-SACKed packets. Other patches in this series will provide other changes that are necessary to fully fix this problem. Signed-off-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Neal Cardwell 提交于
The bug: Senders ignored DSACKs after recovery when there were no outstanding packets (a common scenario for HTTP servers). The change: when there are no outstanding packets (the "no_queue" goto label), call tcp_fastretrans_alert() in order to use DSACKs to undo congestion window reductions. Other patches in this series will provide other changes that are necessary to fully fix this problem. Signed-off-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Neal Cardwell 提交于
Allow callers to decide whether an ACK is a duplicate ACK. This is a prerequisite to allowing fastretrans_alert to be called from new contexts, such as the no_queue and old_ack code paths, from which we have extra info that tells us whether an ACK is a dupack. Signed-off-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 10月, 2011 2 次提交
-
-
由 Eric Dumazet 提交于
Adding const qualifiers to pointers can ease code review, and spot some bugs. It might allow compiler to optimize code further. For example, is it legal to temporary write a null cksum into tcphdr in tcp_md5_hash_header() ? I am afraid a sniffer could catch the temporary null value... Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
tcp_fin() only needs socket pointer, we can remove skb and th params. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-