提交 · 988ade6b8e27e79311812f83a87b5cea11fabcd7 · openanolis / cloud-kernel

02 10月, 2009 1 次提交

IPv4 TCP fails to send window scale option when window scale is zero · 89e95a61

由 Ori Finkelman 提交于 10月 01, 2009

Acknowledge TCP window scale support by inserting the proper option in SYN/ACK
and SYN headers even if our window scale is zero.

This fixes the following observed behavior:

1. Client sends a SYN with TCP window scaling option and non zero window scale
value to a Linux box.
2. Linux box notes large receive window from client.
3. Linux decides on a zero value of window scale for its part.
4. Due to compare against requested window scale size option, Linux does not to
 send windows scale TCP option header on SYN/ACK at all.

With the following result:

Client box thinks TCP window scaling is not supported, since SYN/ACK had no
TCP window scale option, while Linux thinks that TCP window scaling is
supported (and scale might be non zero), since SYN had  TCP window scale
option and we have a mismatched idea between the client and server
regarding window sizes.

Probably it also fixes up the following bug (not observed in practice):

1. Linux box opens TCP connection to some server.
2. Linux decides on zero value of window scale.
3. Due to compare against computed window scale size option, Linux does
not to set windows scale TCP  option header on SYN.

With the expected result that the server OS does not use window scale option
due to not receiving such an option in the SYN headers, leading to suboptimal
performance.
Signed-off-by: NGilad Ben-Yossef <gilad@codefidence.com>
Signed-off-by: NOri Finkelman <ori@comsleep.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

89e95a61

03 9月, 2009 1 次提交

tcp: replace hard coded GFP_KERNEL with sk_allocation · aa133076

由 Wu Fengguang 提交于 9月 02, 2009

This fixed a lockdep warning which appeared when doing stress
memory tests over NFS:

	inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.

	page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock

	mount_root => nfs_root_data => tcp_close => lock sk_lock =>
			tcp_send_fin => alloc_skb_fclone => page reclaim

David raised a concern that if the allocation fails in tcp_send_fin(), and it's
GFP_ATOMIC, we are going to yield() (which sleeps) and loop endlessly waiting
for the allocation to succeed.

But fact is, the original GFP_KERNEL also sleeps. GFP_ATOMIC+yield() looks
weird, but it is no worse the implicit sleep inside GFP_KERNEL. Both could
loop endlessly under memory pressure.

CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
CC: David S. Miller <davem@davemloft.net>
CC: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa133076

24 7月, 2009 1 次提交

TCP: Add comments to (near) all functions in tcp_output.c v3 · 67edfef7

由 Andi Kleen 提交于 7月 21, 2009

While looking for something else I spent some time adding
one liner comments to the tcp_output.c functions that
didn't have any. That makes the comments more consistent.

I hope I documented everything right.

No code changes.

v2: Incorporated feedback from Ilpo.
v3: Change style of one liner comments, add a few more comments.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67edfef7

20 7月, 2009 1 次提交

tcp: Fix MD5 signature checking on IPv4 mapped sockets · e3afe7b7

由 John Dykstra 提交于 7月 16, 2009

Fix MD5 signature checking so that an IPv4 active open
to an IPv6 socket can succeed.  In particular, use the
correct address family's signature generation function
for the SYN/ACK.
Reported-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NJohn Dykstra <john.dykstra1@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3afe7b7

30 6月, 2009 1 次提交

tcp: Stop non-TSO packets morphing into TSO · 8e5b9dda

由 Herbert Xu 提交于 6月 28, 2009

If a socket starts out on a non-TSO route, and then switches to
a TSO route, then the tail on the tx queue can morph into a TSO
packet, causing mischief because the rest of the stack does not
expect a partially linear TSO packet.

This patch fixes this by ensuring that skb->ip_summed is set to
CHECKSUM_PARTIAL before declaring a packet as TSO.
Reported-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8e5b9dda

03 6月, 2009 1 次提交

net: skb->dst accessors · adf30907

由 Eric Dumazet 提交于 6月 02, 2009

Define three accessors to get/set dst attached to a skb

struct dst_entry *skb_dst(const struct sk_buff *skb)

void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;

Delete skb->dst field
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

adf30907

05 5月, 2009 1 次提交

tcp: extend ECN sysctl to allow server-side only ECN · 255cac91

由 Ilpo Järvinen 提交于 5月 04, 2009

This should be very safe compared with full enabled, so I see
no reason why it shouldn't be done right away. As ECN can only
be negotiated if the SYN sending party is also supporting it,
somebody in the loop probably knows what he/she is doing. If
SYN does not ask for ECN, the server side SYN-ACK is identical
to what it is without ECN. Thus it's quite safe.

The chosen value is safe w.r.t to existing configs which
choose to currently set manually either 0 or 1 but
silently upgrades those who have not explicitly requested
ECN off.

Whether to just enable both sides comes up time to time but
unless that gets done now we can at least make the servers
aware of ECN already. As there are some known problems to occur
if ECN is enabled, it's currently questionable whether there's
any real gain from enabling clients as servers mostly won't
support it anyway (so we'd hit just the negative sides). After
enabling the servers and getting that deployed, the client end
enable really has some potential gain too.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

255cac91

20 4月, 2009 1 次提交

tcp: fix mid-wq adjustment helper · 52cf3cc8

由 Ilpo Järvinen 提交于 4月 18, 2009

Just noticed while doing some new work that the recent
mid-wq adjustment logic will misbehave when FACK is not
in use (happens either due sysctl'ed off or auto-detected
reordering) because I forgot the relevant TCPCB tagbit.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52cf3cc8

03 4月, 2009 2 次提交

tcp: miscounts due to tcp_fragment pcount reset · 9eb9362e

由 Ilpo Järvinen 提交于 4月 01, 2009

It seems that trivial reset of pcount to one was not sufficient
in tcp_retransmit_skb. Multiple counters experience a positive
miscount when skb's pcount gets lowered without the necessary
adjustments (depending on skb's sacked bits which exactly), at
worst a packets_out miscount can crash at RTO if the write queue
is empty!

Triggering this requires mss change, so bidir tcp or mtu probe or
like.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
Tested-by: NUwe Bugla <uwe.bugla@gmx.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9eb9362e

tcp: add helper for counter tweaking due mid-wq change · 797108d1

由 Ilpo Järvinen 提交于 4月 01, 2009

We need full-scale adjustment to fix a TCP miscount in the next
patch, so just move it into a helper and call for that from the
other places.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

797108d1

16 3月, 2009 2 次提交

tcp: simplify tcp_current_mss · 0c54b85f

由 Ilpo Järvinen 提交于 3月 14, 2009

There's very little need for most of the callsites to get
tp->xmit_goal_size updated. That will cost us divide as is,
so slice the function in two. Also, the only users of the
tp->xmit_goal_size are directly behind tcp_current_mss(),
so there's no need to store that variable into tcp_sock
at all! The drop of xmit_goal_size currently leaves 16-bit
hole and some reorganization would again be necessary to
change that (but I'm aiming to fill that hole with u16
xmit_goal_size_segs to cache the results of the remaining
divide to get that tso on regression).

Bring xmit_goal_size parts into tcp.c
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c54b85f

tcp: remove pointless .dsack/.num_sacks code · 5861f8e5

由 Ilpo Järvinen 提交于 3月 14, 2009

In the pure assignment case, the earlier zeroing is
still in effect.

David S. Miller raised concerns if the ifs are there to avoid
dirtying cachelines. I came to these conclusions:

> We'll be dirty it anyway (now that I check), the first "real" statement
> in tcp_rcv_established is:
>
>       tp->rx_opt.saw_tstamp = 0;
>
> ...that'll land on the same dword. :-/
>
> I suppose the blocks are there just because they had more complexity
> inside when they had to calculate the eff_sacks too (maybe it would
> have been better to just remove them in that drop-patch so you would
> have had less head-ache :-)).
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5861f8e5

03 3月, 2009 1 次提交

tcp: tcp_init_wl / tcp_update_wl argument cleanup · ee7537b6

由 Hantzis Fotis 提交于 3月 02, 2009

The above functions from include/net/tcp.h have been defined with an
argument that they never use. The argument is 'u32 ack' which is never
used inside the function body, and thus it can be removed. The rest of
the patch involves the necessary changes to the function callers of the
above two functions.
Signed-off-by: NHantzis Fotis <xantzis@ceid.upatras.gr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ee7537b6

02 3月, 2009 7 次提交

tcp: get rid of two unnecessary u16s in TCP skb flags copying · 9ce01461

由 Ilpo Järvinen 提交于 2月 28, 2009

I guess these fields were one day 16-bit in the struct but
nowadays they're just using 8 bits anyway.

This is just a precaution, didn't result any change in my
case but who knows what all those varying gcc versions &
options do. I've been told that 16-bit is not so nice with
some cpus.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ce01461

tcp: kill eff_sacks "cache", the sole user can calculate itself · cabeccbd

由 Ilpo Järvinen 提交于 2月 28, 2009

Also fixes insignificant bug that would cause sending of stale
SACK block (would occur in some corner cases).
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cabeccbd

tcp: drop unnecessary local var in collapse · e6c7d085

由 Ilpo Järvinen 提交于 2月 28, 2009

Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e6c7d085

tcp: fix corner case issue in segmentation during rexmitting · 02276f3c

由 Ilpo Järvinen 提交于 2月 28, 2009

If cur_mss grew very recently so that the previously G/TSOed skb
now fits well into a single segment it would get send up in
parts unless we calculate # of segments again. This corner-case
could happen eg. after mtu probe completes or less than
previously sack blocks are required for the opposite direction.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02276f3c

tcp: Don't clear hints when tcp_fragmenting · d3d2ae45

由 Ilpo Järvinen 提交于 2月 28, 2009

1) We didn't remove any skbs, so no need to handle stale refs.

2) scoreboard_skb_hint is trivial, no timestamps were changed
   so no need to clear that one

3) lost_skb_hint needs tweaking similar to that of
   tcp_sacktag_one().
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3d2ae45

tcp: deferring in middle of queue makes very little sense · 62ad2761

由 Ilpo Järvinen 提交于 2月 28, 2009

If skb can be sent right away, we certainly should do that
if it's in the middle of the queue because it won't get
more data into it.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

62ad2761

tcp: don't backtrack to sacked skbs · ac11ba75

由 Ilpo Järvinen 提交于 2月 28, 2009

Backtracking to sacked skbs is a horrible performance killer
since the hint cannot be advanced successfully past them...
...And it's totally unnecessary too.

In theory this is 2.6.27..28 regression but I doubt anybody
can make .28 to have worse performance because of other TCP
improvements.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac11ba75

22 2月, 2009 1 次提交

tcp: Always set urgent pointer if it's beyond snd_nxt · 7691367d

由 Herbert Xu 提交于 2月 21, 2009

Our TCP stack does not set the urgent flag if the urgent pointer
does not fit in 16 bits, i.e., if it is more than 64K from the
sequence number of a packet.

This behaviour is different from the BSDs, and clearly contradicts
the purpose of urgent mode, which is to send the notification
(though not necessarily the associated data) as soon as possible.
Our current behaviour may in fact delay the urgent notification
indefinitely if the receiver window does not open up.

Simply matching BSD however may break legacy applications which
incorrectly rely on the out-of-band delivery of urgent data, and
conversely the in-band delivery of non-urgent data.

Alexey Kuznetsov suggested a safe solution of following BSD only
if the urgent pointer itself has not yet been transmitted.  This
way we guarantee that when the remote end sees the packet with
non-urgent data marked as urgent due to wrap-around we would have
advanced the urgent pointer beyond, either to the actual urgent
data or to an as-yet untransmitted packet.

The only potential downside is that applications on the remote
end may see multiple SIGURG notifications.  However, this would
occur anyway with other TCP stacks.  More importantly, the outcome
of such a duplicate notification is likely to be harmless since
the signal itself does not carry any information other than the
fact that we're in urgent mode.

Thanks to Ilpo Järvinen for fixing a critical bug in this and
Jeff Chua for reporting that bug.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7691367d

19 2月, 2009 1 次提交

tcp: remove obsoleted comment about different passes · 5209921c

由 Ilpo Järvinen 提交于 2月 18, 2009

This is obsolete since the passes got combined.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5209921c

06 2月, 2009 1 次提交

Revert "tcp: Always set urgent pointer if it's beyond snd_nxt" · a23f4bbd

由 David S. Miller 提交于 2月 05, 2009

This reverts commit 64ff3b93.

Jeff Chua reports that it breaks rlogin for him.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a23f4bbd

26 12月, 2008 1 次提交

tcp: Always set urgent pointer if it's beyond snd_nxt · 64ff3b93

由 Herbert Xu 提交于 12月 25, 2008

Our TCP stack does not set the urgent flag if the urgent pointer
does not fit in 16 bits, i.e., if it is more than 64K from the
sequence number of a packet.

This behaviour is different from the BSDs, and clearly contradicts
the purpose of urgent mode, which is to send the notification
(though not necessarily the associated data) as soon as possible.
Our current behaviour may in fact delay the urgent notification
indefinitely if the receiver window does not open up.

Simply matching BSD however may break legacy applications which
incorrectly rely on the out-of-band delivery of urgent data, and
conversely the in-band delivery of non-urgent data.

Alexey Kuznetsov suggested a safe solution of following BSD only
if the urgent pointer itself has not yet been transmitted.  This
way we guarantee that when the remote end sees the packet with
non-urgent data marked as urgent due to wrap-around we would have
advanced the urgent pointer beyond, either to the actual urgent
data or to an as-yet untransmitted packet.

The only potential downside is that applications on the remote
end may see multiple SIGURG notifications.  However, this would
occur anyway with other TCP stacks.  More importantly, the outcome
of such a duplicate notification is likely to be harmless since
the signal itself does not carry any information other than the
fact that we're in urgent mode.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64ff3b93

06 12月, 2008 3 次提交

tcp: fix tso_should_defer in 64bit · a2acde07

由 Ilpo Järvinen 提交于 12月 05, 2008

Since jiffies is unsigned long, the types get expanded into
that and after long enough time the difference will therefore
always be > 1 (and that probably happens near boot as well as
iirc the first jiffies wrap is scheduler close after boot to
find out problems related to that early).

This was originally noted by Bill Fink in Dec'07 but nobody
never ended fixing it.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a2acde07

tcp: use tcp_write_xmit also in tcp_push_one · d5dd9175

由 Ilpo Järvinen 提交于 12月 05, 2008

tcp_minshall_update is not significant difference since it only
checks for not full-sized skb which is BUG'ed on the push_one
path anyway.

tcp_snd_test is tcp_nagle_test+tcp_cwnd_test+tcp_snd_wnd_test,
just the order changed slightly.

net/ipv4/tcp_output.c:
  tcp_snd_test              |  -89
  tcp_mss_split_point       |  -91
  tcp_may_send_now          |  +53
  tcp_cwnd_validate         |  -98
  tso_fragment              | -239
  __tcp_push_pending_frames | -1340
  tcp_push_one              | -146
 7 functions changed, 53 bytes added, 2003 bytes removed, diff: -1950

net/ipv4/tcp_output.c:
  tcp_write_xmit | +1772
 1 function changed, 1772 bytes added, diff: +1772

tcp_output.o.new:
 8 functions changed, 1825 bytes added, 2003 bytes removed, diff: -178
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d5dd9175

tcp: move some parts from tcp_write_xmit · 726e07a8

由 Ilpo Järvinen 提交于 12月 05, 2008

Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

726e07a8

04 12月, 2008 1 次提交

tcp: make urg+gso work for real this time · f8269a49

由 Ilpo Järvinen 提交于 12月 03, 2008

I should have noticed this earlier... :-) The previous solution
to URG+GSO/TSO will cause SACK block tcp_fragment to do zig-zig
patterns, or even worse, a steep downward slope into packet
counting because each skb pcount would be truncated to pcount
of 2 and then the following fragments of the later portion would
restore the window again.

Basically this reverts "tcp: Do not use TSO/GSO when there is
urgent data" (33cf71ce). It also removes some unnecessary code
from tcp_current_mss that didn't work as intented either (could
be that something was changed down the road, or it might have
been broken since the dawn of time) because it only works once
urg is already written while this bug shows up starting from
~64k before the urg point.

The retransmissions already are split to mss sized chunks, so
only new data sending paths need splitting in case they have
a segment otherwise suitable for gso/tso. The actually check
can be improved to be more narrow but since this is late -rc
already, I'll postpone thinking the more fine-grained things.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8269a49

25 11月, 2008 2 次提交

tcp: move tcp_simple_retransmit to tcp_input · e1aa680f

由 Ilpo Järvinen 提交于 11月 24, 2008

Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e1aa680f

tcp: collapse more than two on retransmission · 4a17fc3a

由 Ilpo Järvinen 提交于 11月 24, 2008

I always had thought that collapsing up to two at a time was
intentional decision to avoid excessive processing if 1 byte
sized skbs are to be combined for a full mtu, and consecutive
retransmissions would make the size of the retransmittee
double each round anyway, but some recent discussion made me
to understand that was not the case. Thus make collapse work
more and wait less.

It would be possible to take advantage of the shifting
machinery (added in the later patch) in the case of paged
data but that can be implemented on top of this change.

tcp_skb_is_last check is now provided by the loop.

I tested a bit (ss-after-idle-off, fill 4096x4096B xfer,
10s sleep + 4096 x 1byte writes while dropping them for
some a while with netem):

. 16774097:16775545(1448) ack 1 win 46
. 16775545:16776993(1448) ack 1 win 46
. ack 16759617 win 2399
P 16776993:16777217(224) ack 1 win 46
. ack 16762513 win 2399
. ack 16765409 win 2399
. ack 16768305 win 2399
. ack 16771201 win 2399
. ack 16774097 win 2399
. ack 16776993 win 2399
. ack 16777217 win 2399
P 16777217:16777257(40) ack 1 win 46
. ack 16777257 win 2399
P 16777257:16778705(1448) ack 1 win 46
P 16778705:16780153(1448) ack 1 win 46
FP 16780153:16781313(1160) ack 1 win 46
. ack 16778705 win 2399
. ack 16780153 win 2399
F 1:1(0) ack 16781314 win 2399

While without drop-all period I get this:

. 16773585:16775033(1448) ack 1 win 46
. ack 16764897 win 9367
. ack 16767793 win 9367
. ack 16770689 win 9367
. ack 16773585 win 9367
. 16775033:16776481(1448) ack 1 win 46
P 16776481:16777217(736) ack 1 win 46
. ack 16776481 win 9367
. ack 16777217 win 9367
P 16777217:16777218(1) ack 1 win 46
P 16777218:16777219(1) ack 1 win 46
P 16777219:16777220(1) ack 1 win 46
  ...
P 16777247:16777248(1) ack 1 win 46
. ack 16777218 win 9367
. ack 16777219 win 9367
  ...
. ack 16777233 win 9367
. ack 16777248 win 9367
P 16777248:16778696(1448) ack 1 win 46
P 16778696:16780144(1448) ack 1 win 46
FP 16780144:16781313(1169) ack 1 win 46
. ack 16780144 win 9367
F 1:1(0) ack 16781314 win 9367

The window seems to be 30-40 segments, which were successfully
combined into: P 16777217:16777257(40) ack 1 win 46
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4a17fc3a

22 11月, 2008 1 次提交

tcp: Do not use TSO/GSO when there is urgent data · 33cf71ce

由 Petr Tesarik 提交于 11月 21, 2008

This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=12014

Since most (if not all) implementations of TSO and even the in-kernel
software GSO do not update the urgent pointer when splitting a large
segment, it is necessary to turn off TSO/GSO for all outgoing traffic
with the URG pointer set.

Looking at tcp_current_mss (and the preceding comment) I even think
this was the original intention. However, this approach is insufficient,
because TSO/GSO is turned off only for newly created frames, not for
frames which were already pending at the arrival of a message with
MSG_OOB set. These frames were created when TSO/GSO was enabled,
so they may be large, and they will have the urgent pointer set
in tcp_transmit_skb().

With this patch, such large packets will be fragmented again before
going to the transmit routine.

As a side note, at least the following NICs are known to screw up
the urgent pointer in the TCP header when doing TSO:

	Intel 82566MM (PCI ID 8086:1049)
	Intel 82566DC (PCI ID 8086:104b)
	Intel 82541GI (PCI ID 8086:1076)
	Broadcom NetXtreme II BCM5708 (PCI ID 14e4:164c)
Signed-off-by: NPetr Tesarik <ptesarik@suse.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

33cf71ce

03 11月, 2008 1 次提交

net: clean up net/ipv4/ip_sockglue.c tcp_output.c · 09cb105e

由 Jianjun Kong 提交于 11月 03, 2008

Signed-off-by: NJianjun Kong <jianjun@zeuux.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

09cb105e

27 10月, 2008 1 次提交

syncookies: fix inclusion of tcp options in syn-ack · 8b5f12d0

由 Florian Westphal 提交于 10月 26, 2008

David Miller noticed that commit
33ad798c '(tcp: options clean up')
did not move the req->cookie_ts check.
This essentially disabled commit 4dfc2817
'[Syncookies]: Add support for TCP options via timestamps.'.

This restores the original logic.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8b5f12d0

24 10月, 2008 1 次提交

tcp: Restore ordering of TCP options for the sake of inter-operability · fd6149d3

由 Ilpo Järvinen 提交于 10月 23, 2008

This is not our bug! Sadly some devices cannot cope with the change
of TCP option ordering which was a result of the recent rewrite of
the option code (not that there was some particular reason steming
from the rewrite for the reordering) though any ordering of TCP
options is perfectly legal. Thus we restore the original ordering
to allow interoperability with/through such broken devices and add
some warning about this trap. Since the reordering just happened
without any particular reason, this change shouldn't cost us
anything.

There are already couple of known failure reports (within close
proximity of the last release), so the problem might be more
wide-spread than a single device. And other reports which may
be due to the same problem though the symptoms were less obvious.
Analysis of one of the case revealed (with very high probability)
that sack capability cannot be negotiated as the first option
(SYN never got a response).
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Reported-by: NAldo Maggi <sentiniate@tiscali.it>
Tested-by: NAldo Maggi <sentiniate@tiscali.it>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd6149d3

22 10月, 2008 1 次提交

tcp: should use number of sack blocks instead of -1 · 75e3d8db

由 Ilpo Järvinen 提交于 10月 21, 2008

While looking for the recent "sack issue" I also read all eff_sacks
usage that was played around by some relevant commit. I found
out that there's another thing that is asking for a fix (unrelated
to the "sack issue" though).

This feature has probably very little significance in practice.
Opposite direction timeout with bidirectional tcp comes to me as
the most likely scenario though there might be other cases as
well related to non-data segments we send (e.g., response to the
opposite direction segment). Also some ACK losses or option space
wasted for other purposes is necessary to prevent the earlier
SACK feedback getting to the sender.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

75e3d8db

08 10月, 2008 1 次提交

tcp: kill pointless urg_mode · 33f5f57e

由 Ilpo Järvinen 提交于 10月 07, 2008

It all started from me noticing that this urgent check in
tcp_clean_rtx_queue is unnecessarily inside the loop. Then
I took a longer look to it and found out that the users of
urg_mode can trivially do without, well almost, there was
one gotcha.

Bonus: those funny people who use urg with >= 2^31 write_seq -
snd_una could now rejoice too (that's the only purpose for the
between being there, otherwise a simple compare would have done
the thing). Not that I assume that the rest of the tcp code
happily lives with such mind-boggling numbers :-). Alas, it
turned out to be impossible to set wmem to such numbers anyway,
yes I really tried a big sendfile after setting some wmem but
nothing happened :-). ...Tcp_wmem is int and so is sk_sndbuf...
So I hacked a bit variable to long and found out that it seems
to work... :-)
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

33f5f57e

01 10月, 2008 1 次提交

tcp: Port redirection support for TCP · a3116ac5

由 KOVACS Krisztian 提交于 10月 01, 2008

Current TCP code relies on the local port of the listening socket
being the same as the destination address of the incoming
connection. Port redirection used by many transparent proxying
techniques obviously breaks this, so we have to store the original
destination port address.

This patch extends struct inet_request_sock and stores the incoming
destination port value there. It also modifies the handshake code to
use that value as the source port when sending reply packets.
Signed-off-by: NKOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a3116ac5

23 9月, 2008 1 次提交

tcp: Fix order of tests in tcp_retransmit_skb() · 77d40a09

由 David S. Miller 提交于 9月 23, 2008

tcp_write_queue_next() must only be made if we know that
tcp_skb_is_last() evaluates to false.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77d40a09

21 9月, 2008 2 次提交

tcp: advertise MSS requested by user · f5fff5dc

由 Tom Quetchenbach 提交于 9月 21, 2008

I'm trying to use the TCP_MAXSEG option to setsockopt() to set the MSS
for both sides of a bidirectional connection.

man tcp says: "If this option is set before connection establishment, it
also changes the MSS value announced to the other end in the initial
packet."

However, the kernel only uses the MTU/route cache to set the advertised
MSS. That means if I set the MSS to, say, 500 before calling connect(),
I will send at most 500-byte packets, but I will still receive 1500-byte
packets in reply.

This is a bug, either in the kernel or the documentation.

This patch (applies to latest net-2.6) reduces the advertised value to
that requested by the user as long as setsockopt() is called before
connect() or accept(). This seems like the behavior that one would
expect as well as that which is documented.

I've tried to make sure that things that depend on the advertised MSS
are set correctly.
Signed-off-by: NTom Quetchenbach <virtualphtn@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f5fff5dc

tcp: back retransmit_high when it over-estimated · 618d9f25

由 Ilpo Järvinen 提交于 9月 20, 2008

If lost skb is sacked, we might have nothing to retransmit
as high as the retransmit_high is pointing to, so place
it lower to avoid unnecessary walking.

This is mainly for the case where high L'ed skbs gets sacked.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

618d9f25

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功