- 30 5月, 2009 1 次提交
-
-
由 Ilpo Järvinen 提交于
Somewhat luckily, I was looking into these parts with very fine comb because I've made somewhat similar changes on the same area (conflicts that arose weren't that lucky though). The loop was very much overengineered recently in commit 91521944 (tcp: Use SKB queue and list helpers instead of doing it by-hand), while it basically just wants to know if there are skbs after 'skb'. Also it got broken because skb1 = skb->next got translated into skb1 = skb1->next (though abstracted) improperly. Note that 'skb1' is pointing to previous sk_buff than skb or NULL if at head. Two things went wrong: - We'll kfree 'skb' on the first iteration instead of the skbuff following 'skb' (it would require required SACK reneging to recover I think). - The list head case where 'skb1' is NULL is checked too early and the loop won't execute whereas it previously did. Conclusion, mostly revert the recent changes which makes the cset very messy looking but using proper accessor in the previous-like version. The effective changes against the original can be viewed with: git-diff 91521944^ \ net/ipv4/tcp_input.c | sed -n -e '57,70 p' Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 5月, 2009 1 次提交
-
-
由 David S. Miller 提交于
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 5月, 2009 2 次提交
-
-
由 Satoru SATOH 提交于
tcp_prequeue() refers to the constant value (TCP_RTO_MIN) regardless of the actual value might be tuned. The following patches fix this and make tcp_prequeue get the actual value returns from tcp_rto_min(). Signed-off-by: NSatoru SATOH <satoru.satoh@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
This should be very safe compared with full enabled, so I see no reason why it shouldn't be done right away. As ECN can only be negotiated if the SYN sending party is also supporting it, somebody in the loop probably knows what he/she is doing. If SYN does not ask for ECN, the server side SYN-ACK is identical to what it is without ECN. Thus it's quite safe. The chosen value is safe w.r.t to existing configs which choose to currently set manually either 0 or 1 but silently upgrades those who have not explicitly requested ECN off. Whether to just enable both sides comes up time to time but unless that gets done now we can at least make the servers aware of ECN already. As there are some known problems to occur if ECN is enabled, it's currently questionable whether there's any real gain from enabling clients as servers mostly won't support it anyway (so we'd hit just the negative sides). After enabling the servers and getting that deployed, the client end enable really has some potential gain too. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 4月, 2009 1 次提交
-
-
由 Ilpo Järvinen 提交于
A long-standing feature in tcp_init_metrics() is such that any of its goto reset prevents call to tcp_init_cwnd(). Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 3月, 2009 1 次提交
-
-
由 John Dykstra 提交于
Discard incoming packets whose ack field iincludes data not yet sent. This is consistent with RFC 793 Section 3.9. Change tcp_ack() to distinguish between too-small and too-large ack field values. Keep segments with too-large ack fields out of the fast path, and change slow path to discard them. Reported-by: NOliver Zheng <mailinglists+netdev@oliverzheng.com> Signed-off-by: NJohn Dykstra <john.dykstra1@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 3月, 2009 1 次提交
-
-
由 Ilpo Järvinen 提交于
tcp_sack_swap seems unnecessary so I pushed swap to the caller. Also removed comment that seemed then pointless, and added include when not already there. Compile tested. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 3月, 2009 5 次提交
-
-
由 Ilpo Järvinen 提交于
There's very little need for most of the callsites to get tp->xmit_goal_size updated. That will cost us divide as is, so slice the function in two. Also, the only users of the tp->xmit_goal_size are directly behind tcp_current_mss(), so there's no need to store that variable into tcp_sock at all! The drop of xmit_goal_size currently leaves 16-bit hole and some reorganization would again be necessary to change that (but I'm aiming to fill that hole with u16 xmit_goal_size_segs to cache the results of the remaining divide to get that tso on regression). Bring xmit_goal_size parts into tcp.c Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Evgeniy Polyakov <zbr@ioremap.net> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
It seems that no variables clash such that we couldn't do the check just once later on. Therefore move it. Also kill dead obvious comment, dead argument and add unlikely since this mtu probe does not happen too often. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
Wow, it was quite tricky to merge that stream of negations but I think I finally got it right: check & replace_ts_recent: (s32)(rcv_tsval - ts_recent) >= 0 => 0 (s32)(ts_recent - rcv_tsval) <= 0 => 0 discard: (s32)(ts_recent - rcv_tsval) > TCP_PAWS_WINDOW => 1 (s32)(ts_recent - rcv_tsval) <= TCP_PAWS_WINDOW => 0 I toggled the return values of tcp_paws_check around since the old encoding added yet-another negation making tracking of truth-values really complicated. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
I've already forgotten what for this was necessary, anyway it's no longer used (if it ever was). Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
In the pure assignment case, the earlier zeroing is still in effect. David S. Miller raised concerns if the ifs are there to avoid dirtying cachelines. I came to these conclusions: > We'll be dirty it anyway (now that I check), the first "real" statement > in tcp_rcv_established is: > > tp->rx_opt.saw_tstamp = 0; > > ...that'll land on the same dword. :-/ > > I suppose the blocks are there just because they had more complexity > inside when they had to calculate the eff_sacks too (maybe it would > have been better to just remove them in that drop-patch so you would > have had less head-ache :-)). Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 3月, 2009 1 次提交
-
-
由 Hantzis Fotis 提交于
The above functions from include/net/tcp.h have been defined with an argument that they never use. The argument is 'u32 ack' which is never used inside the function body, and thus it can be removed. The rest of the patch involves the necessary changes to the function callers of the above two functions. Signed-off-by: NHantzis Fotis <xantzis@ceid.upatras.gr> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 3月, 2009 4 次提交
-
-
由 Ilpo Järvinen 提交于
Also fixes insignificant bug that would cause sending of stale SACK block (would occur in some corner cases). Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
Some comment about its current state added. So far I have seen very few cases where the thing is actually useful, usually just marginally (though admittedly I don't usually see top of window losses where it seems possible that there could be some gain), instead, more often the cases suffer from L-marking spike which is certainly not desirable (I'll bury improving it to my todo list, but on a low prio position). Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
Arnd Hannemann <hannemann@nets.rwth-aachen.de> noticed and was puzzled by the fact that !tcp_is_fack(tp) leads to early return near the beginning and the later on tcp_is_fack(tp) was still used in an if condition. The later check was a left-over from RFC3517 SACK stuff (== !tcp_is_fack(tp) behavior nowadays) as there wasn't clear way how to handle this particular check cheaply in the spirit of RFC3517 (using only SACK blocks, not holes + SACK blocks as with FACK). I sort of left it there as a reminder but since it's confusing other people just remove it and comment the missing-feature stuff instead. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Arnd Hannemann <hannemann@nets.rwth-aachen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
It is possible that lost_cnt_hint gets underflow in tcp_clean_rtx_queue because the cumulative ACK can cover the segment where lost_skb_hint points to only partially, which means that the hint is not cleared, opposite to what my (earlier) comment claimed. Also I don't agree what I ended up writing about non-trivial case there to be what I intented to say. It was not supposed to happen that the hint won't get cleared and we underflow in any scenario. In general, this is quite hard to trigger in practice. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 3月, 2009 1 次提交
-
-
由 Ilpo Järvinen 提交于
There's conflicting assumptions in shifting, the caller assumes that dupsack results in S'ed skbs (or a part of it) for sure but never gave a hint to tcp_sacktag_one when dsack is actually in use. Thus DSACK retrans_out -= pcount was not taken and the counter became out of sync. Remove obstacle from that information flow to get DSACKs accounted in tcp_sacktag_one as expected. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Tested-by: NDenys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 1月, 2009 1 次提交
-
-
由 Dan Williams 提交于
Use the general-purpose channel allocation provided by dmaengine. Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 06 12月, 2008 7 次提交
-
-
由 Ilpo Järvinen 提交于
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
Both are called by the same sites. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
There are just too many args to some sacktag functions. This idea was first proposed by David S. Miller around a year ago, and the current situation is much worse that what it was back then. tcp_sacktag_one can be made a bit simpler by returning the new sacked (it can be achieved with a single variable though the previous code "caching" sacked into a local variable and therefore it is not exactly equal but the results will be the same). codiff on x86_64 tcp_sacktag_one | -15 tcp_shifted_skb | -50 tcp_match_skb_to_sack | -1 tcp_sacktag_walk | -64 tcp_sacktag_write_queue | -59 tcp_urg | +1 tcp_event_data_recv | -1 7 functions changed, 1 bytes added, 190 bytes removed, diff: -189 Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
I noticed that since skb->len has nothing to do with actual segment length with gso, we need to figure it out separately, reuse a function from the recent shifting stuff (generalize it). Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
S|R won't result in S if just SACK is received. DSACK is another story (but it is covered correctly already). Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
Also make if-goto forest nicer looking. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 11月, 2008 1 次提交
-
-
由 Ilpo Järvinen 提交于
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 11月, 2008 7 次提交
-
-
由 Ilpo Järvinen 提交于
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
The earlier version was just very basic one which is "playing safe" by always clearing the hints. However, clearing of a hint is extremely costly operation with large windows, so it must be avoided at all cost whenever possible, there is a way with shifting too achieve not-clearing. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
During SACK processing, most of the benefits of TSO are eaten by the SACK blocks that one-by-one fragment SKBs to MSS sized chunks. Then we're in problems when cleanup work for them has to be done when a large cumulative ACK comes. Try to return back to pre-split state already while more and more SACK info gets discovered by combining newly discovered SACK areas with the previous skb if that's SACKed as well. This approach has a number of benefits: 1) The processing overhead is spread more equally over the RTT 2) Write queue has less skbs to process (affect everything which has to walk in the queue past the sacked areas) 3) Write queue is consistent whole the time, so no other parts of TCP has to be aware of this (this was not the case with some other approach that was, well, quite intrusive all around). 4) Clean_rtx_queue can release most of the pages using single put_page instead of previous PAGE_SIZE/mss+1 calls In case a hole is fully filled by the new SACK block, we attempt to combine the next skb too which allows construction of skbs that are even larger than what tso split them to and it handles hole per on every nth patterns that often occur during slow start overshoot pretty nicely. Though this to be really useful also a retransmission would have to get lost since cumulative ACKs advance one hole at a time in the most typical case. TODO: handle upwards only merging. That should be rather easy when segment is fully sacked but I'm leaving that as future work item (it won't make very large difference anyway since this current approach already covers quite a lot of normal cases). I was earlier thinking of some sophisticated way of tracking timestamps of the first and the last segment but later on realized that it won't be that necessary at all to store the timestamp of the last segment. The cases that can occur are basically either: 1) ambiguous => no sensible measurement can be taken anyway 2) non-ambiguous is due to reordering => having the timestamp of the last segment there is just skewing things more off than does some good since the ack got triggered by one of the holes (besides some substle issues that would make determining right hole/skb even harder problem). Anyway, it has nothing to do with this change then. I choose to route some abnormal looking cases with goto noop, some could be handled differently (eg., by stopping the walking at that skb but again). In general, they either shouldn't happen at all or are rare enough to make no difference in practice. In theory this change (as whole) could cause some macroscale regression (global) because of cache misses that are taken over the round-trip time but it gets very likely better because of much less (local) cache misses per other write queue walkers and the big recovery clearing cumulative ack. Worth to note that these benefits would be very easy to get also without TSO/GSO being on as long as the data is in pages so that we can merge them. Currently I won't let that happen because DSACK splitting at fragment that would mess up pcounts due to sk_can_gso in tcp_set_skb_tso_segs. Once DSACKs fragments gets avoided, we have some conditions that can be made less strict. TODO: I will probably have to convert the excessive pointer passing to struct sacktag_state... :-) My testing revealed that considerable amount of skbs couldn't be shifted because they were cloned (most likely still awaiting tx reclaim)... [The rest is considering future work instead since I got repeatably EFAULT to tcpdump's recvfrom when I added pskb_expand_head to deal with clones, so I separated that into another, later patch] ...To counter that, I gave up on the fifth advantage: 5) When growing previous SACK block, less allocs for new skbs are done, basically a new alloc is needed only when new hole is detected and when the previous skb runs out of frags space ...which now only happens of if reclaim is fast enough to dispose the clone before the SACK block comes in (the window is RTT long), otherwise we'll have to alloc some. With clones being handled I got these numbers (will be somewhat worse without that), taken with fine-grained mibs: TCPSackShifted 398 TCPSackMerged 877 TCPSackShiftFallback 320 TCPSACKCOLLAPSEFALLBACKGSO 0 TCPSACKCOLLAPSEFALLBACKSKBBITS 0 TCPSACKCOLLAPSEFALLBACKSKBDATA 0 TCPSACKCOLLAPSEFALLBACKBELOW 0 TCPSACKCOLLAPSEFALLBACKFIRST 1 TCPSACKCOLLAPSEFALLBACKPREVBITS 318 TCPSACKCOLLAPSEFALLBACKMSS 1 TCPSACKCOLLAPSEFALLBACKNOHEAD 0 TCPSACKCOLLAPSEFALLBACKSHIFT 0 TCPSACKCOLLAPSENOOPSEQ 0 TCPSACKCOLLAPSENOOPSMALLPCOUNT 0 TCPSACKCOLLAPSENOOPSMALLLEN 0 TCPSACKCOLLAPSEHOLE 12 Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
This is preparatory work for SACK combiner patch which may have to count TCP state changes for only a part of the skb because it will intentionally avoids splitting skb to SACKed and not sacked parts. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
Sadly enough, this adds possible divide though we try to avoid it by checking one mss as common case. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
I knew already when rewriting the sacktag that this condition was too conservative, change it now since it prevent lot of useless work (especially in the sack shifter decision code that is being added by a later patch). This shouldn't change anything really, just save some processing regardless of the shifter. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 10月, 2008 1 次提交
-
-
由 Harvey Harrison 提交于
Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u can be replaced with %pI4 Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 10月, 2008 1 次提交
-
-
由 Harvey Harrison 提交于
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 10月, 2008 1 次提交
-
-
由 Harvey Harrison 提交于
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 10月, 2008 3 次提交
-
-
由 Ali Saidi 提交于
From: Ali Saidi <saidi@engin.umich.edu> When TCP receive copy offload is enabled it's possible that tcp_rcv_established() will cause two acks to be sent for a single packet. In the case that a tcp_dma_early_copy() is successful, copied_early is set to true which causes tcp_cleanup_rbuf() to be called early which can send an ack. Further along in tcp_rcv_established(), __tcp_ack_snd_check() is called and will schedule a delayed ACK. If no packets are processed before the delayed ack timer expires the packet will be acked twice. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
I'm quite sure that if I give this function in its old format for you to inspect, you start to wonder what is the type of demanded or if it's a global variable. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
It all started from me noticing that this urgent check in tcp_clean_rtx_queue is unnecessarily inside the loop. Then I took a longer look to it and found out that the users of urg_mode can trivially do without, well almost, there was one gotcha. Bonus: those funny people who use urg with >= 2^31 write_seq - snd_una could now rejoice too (that's the only purpose for the between being there, otherwise a simple compare would have done the thing). Not that I assume that the rest of the tcp code happily lives with such mind-boggling numbers :-). Alas, it turned out to be impossible to set wmem to such numbers anyway, yes I really tried a big sendfile after setting some wmem but nothing happened :-). ...Tcp_wmem is int and so is sk_sndbuf... So I hacked a bit variable to long and found out that it seems to work... :-) Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-