提交 · 79320d7e14900c549c3520791a297328f53ff71e · openeuler / raspberrypi-kernel

12 6月, 2006 1 次提交

[TCP]: continued: reno sacked_out count fix · 79320d7e

由 Aki M Nyrhinen 提交于 6月 11, 2006

From: Aki M Nyrhinen <anyrhine@cs.helsinki.fi>

IMHO the current fix to the problem (in_flight underflow in reno)
is incorrect.  it treats the symptons but ignores the problem. the
problem is timing out packets other than the head packet when we
don't have sack. i try to explain (sorry if explaining the obvious).

with sack, scanning the retransmit queue for timed out packets is
fine because we know which packets in our retransmit queue have been
acked by the receiver.

without sack, we know only how many packets in our retransmit queue the
receiver has acknowledged, but no idea which packets.

think of a "typical" slow-start overshoot case, where for example
every third packet in a window get lost because a router buffer gets
full.

with sack, we check for timeouts on those every third packet (as the
rest have been sacked). the packet counting works out and if there
is no reordering, we'll retransmit exactly the packets that were 
lost.

without sack, however, we check for timeout on every packet and end up
retransmitting consecutive packets in the retransmit queue. in our
slow-start example, 2/3 of those retransmissions are unnecessary. these
unnecessary retransmissions eat the congestion window and evetually
prevent fast recovery from continuing, if enough packets were lost.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79320d7e

17 5月, 2006 1 次提交

[TCP]: reno sacked_out count fix · 8872d8e1

由 Angelo P. Castellani 提交于 5月 16, 2006

From: "Angelo P. Castellani" <angelo.castellani+lkml@gmail.com>

Using NewReno, if a sk_buff is timed out and is accounted as lost_out,
it should also be removed from the sacked_out.

This is necessary because recovery using NewReno fast retransmit could
take up to a lot RTTs and the sk_buff RTO can expire without actually
being really lost.

left_out = sacked_out + lost_out
in_flight = packets_out - left_out + retrans_out

Using NewReno without this patch, on very large network losses,
left_out becames bigger than packets_out + retrans_out (!!).

For this reason unsigned integer in_flight overflows to 2^32 - something.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8872d8e1

15 4月, 2006 1 次提交

[IPV4]: Possible cleanups. · 6c97e72a

由 Adrian Bunk 提交于 4月 12, 2006

This patch contains the following possible cleanups:
- make the following needlessly global function static:
  - arp.c: arp_rcv()
- remove the following unused EXPORT_SYMBOL's:
  - devinet.c: devinet_ioctl
  - fib_frontend.c: ip_rt_ioctl
  - inet_hashtables.c: inet_bind_bucket_create
  - inet_hashtables.c: inet_bind_hash
  - tcp_input.c: sysctl_tcp_abc
  - tcp_ipv4.c: sysctl_tcp_tw_reuse
  - tcp_output.c: sysctl_tcp_mtu_probing
  - tcp_output.c: sysctl_tcp_base_mss
Signed-off-by: NAdrian Bunk <bunk@stusta.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c97e72a

21 3月, 2006 2 次提交

[TCP] mtu probing: move tcp-specific data out of inet_connection_sock · 0e7b1368

由 John Heffner 提交于 3月 20, 2006

This moves some TCP-specific MTU probing state out of
inet_connection_sock back to tcp_sock.
Signed-off-by: NJohn Heffner <jheffner@psc.edu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e7b1368

[TCP]: MTU probing · 5d424d5a

由 John Heffner 提交于 3月 20, 2006

Implementation of packetization layer path mtu discovery for TCP, based on
the internet-draft currently found at
<http://www.ietf.org/internet-drafts/draft-ietf-pmtud-method-05.txt>.
Signed-off-by: NJohn Heffner <jheffner@psc.edu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d424d5a

10 2月, 2006 1 次提交

[TCP]: rcvbuf lock when tcp_moderate_rcvbuf enabled · 6fcf9412

由 John Heffner 提交于 2月 09, 2006

The rcvbuf lock should probably be honored here.
Signed-off-by: NJohn Heffner <jheffner@psc.edu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6fcf9412

10 1月, 2006 1 次提交

[NET]: Change some "if (x) BUG();" to "BUG_ON(x);" · 09a62660

由 Kris Katterjohn 提交于 1月 08, 2006

This changes some simple "if (x) BUG();" statements to "BUG_ON(x);"
Signed-off-by: NKris Katterjohn <kjak@users.sourceforge.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

09a62660

04 1月, 2006 3 次提交

[TCP]: less inline's · 40efc6fa

由 Stephen Hemminger 提交于 1月 03, 2006

TCP inline usage cleanup:
 * get rid of inline in several places
 * replace __inline__ with inline where possible
 * move functions used in one file out of tcp.h
 * let compiler decide on used once cases

On x86_64: 
   text	   data	    bss	    dec	    hex	filename
3594701	 648348	 567400	4810449	 4966d1	vmlinux.orig
3593133	 648580	 567400	4809113	 496199	vmlinux

On sparc64:
   text	   data	    bss	    dec	    hex	filename
2538278	 406152	 530392	3474822	 350586	vmlinux.ORIG
2536382	 406384	 530392	3473158	 34ff06	vmlinux
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

40efc6fa

[IP_SOCKGLUE]: Remove most of the tcp specific calls · d83d8461

由 Arnaldo Carvalho de Melo 提交于 12月 13, 2005

As DCCP needs to be called in the same spots.

Now we have a member in inet_sock (is_icsk), set at sock creation time from
struct inet_protosw->flags (if INET_PROTOSW_ICSK is set, like for TCP and
DCCP) to see if a struct sock instance is a inet_connection_sock for places
like the ones in ip_sockglue.c (v4 and v6) where we previously were looking if
sk_type was SOCK_STREAM, that is insufficient because we now use the same code
for DCCP, that has sk_type SOCK_DCCP.
Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d83d8461

[ICSK]: Rename struct tcp_func to struct inet_connection_sock_af_ops · 8292a17a

由 Arnaldo Carvalho de Melo 提交于 12月 13, 2005

And move it to struct inet_connection_sock. DCCP will use it in the
upcoming changesets.
Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8292a17a

16 11月, 2005 1 次提交

[TCP]: More spelling fixes. · 31f34269

由 Stephen Hemminger 提交于 11月 15, 2005

From Joe Perches
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31f34269

11 11月, 2005 5 次提交

[TCP]: speed up SACK processing · 6a438bbe

由 Stephen Hemminger 提交于 11月 10, 2005

Use "hints" to speed up the SACK processing. Various forms 
of this have been used by TCP developers (Web100, STCP, BIC)
to avoid the 2x linear search of outstanding segments.
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6a438bbe

[TCP]: spelling fixes · caa20d9a

由 Stephen Hemminger 提交于 11月 10, 2005

Minor spelling fixes for TCP code.
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

caa20d9a

[TCP]: receive buffer growth limiting with mixed MTU · 326f36e9

由 John Heffner 提交于 11月 10, 2005

This is a patch for discussion addressing some receive buffer growing issues.
This is partially related to the thread "Possible BUG in IPv4 TCP window
handling..." last week.

Specifically it addresses the problem of an interaction between rcvbuf
moderation (receiver autotuning) and rcv_ssthresh. The problem occurs when
sending small packets to a receiver with a larger MTU. (A very common case I
have is a host with a 1500 byte MTU sending to a host with a 9k MTU.) In
such a case, the rcv_ssthresh code is targeting a window size corresponding
to filling up the current rcvbuf, not taking into account that the new rcvbuf
moderation may increase the rcvbuf size.

One hunk makes rcv_ssthresh use tcp_rmem[2] as the size target rather than
rcvbuf. The other changes the behavior when it overflows its memory bounds
with in-order data so that it tries to grow rcvbuf (the same as with
out-of-order data).

These changes should help my problem of mixed MTUs, and should also help the
case from last week's thread I think. (In both cases though you still need
tcp_rmem[2] to be set much larger than the TCP window.) One question is if
this is too aggressive at trying to increase rcvbuf if it's under memory
stress.

Orignally-from: John Heffner <jheffner@psc.edu>
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

326f36e9

[TCP]: Appropriate Byte Count support · 9772efb9

由 Stephen Hemminger 提交于 11月 10, 2005

This is an updated version of the RFC3465 ABC patch originally
for Linux 2.6.11-rc4 by Yee-Ting Li. ABC is a way of counting
bytes ack'd rather than packets when updating congestion control.

The orignal ABC described in the RFC applied to a Reno style
algorithm. For advanced congestion control there is little
change after leaving slow start.
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9772efb9

[TCP]: simplify microsecond rtt sampling · 2d2abbab

由 Stephen Hemminger 提交于 11月 10, 2005

Simplify the code that comuputes microsecond rtt estimate used
by TCP Vegas. Move the callback out of the RTT sampler and into
the end of the ack cleanup.
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d2abbab

28 10月, 2005 1 次提交

[TCP]: Clear stale pred_flags when snd_wnd changes · 2ad41065

由 Herbert Xu 提交于 10月 27, 2005

This bug is responsible for causing the infamous "Treason uncloaked"
messages that's been popping up everywhere since the printk was added.
It has usually been blamed on foreign operating systems.  However,
some of those reports implicate Linux as both systems are running
Linux or the TCP connection is going across the loopback interface.

In fact, there really is a bug in the Linux TCP header prediction code
that's been there since at least 2.1.8.  This bug was tracked down with
help from Dale Blount.

The effect of this bug ranges from harmless "Treason uncloaked"
messages to hung/aborted TCP connections.  The details of the bug
and fix is as follows.

When snd_wnd is updated, we only update pred_flags if
tcp_fast_path_check succeeds.  When it fails (for example,
when our rcvbuf is used up), we will leave pred_flags with
an out-of-date snd_wnd value.

When the out-of-date pred_flags happens to match the next incoming
packet we will again hit the fast path and use the current snd_wnd
which will be wrong.

In the case of the treason messages, it just happens that the snd_wnd
cached in pred_flags is zero while tp->snd_wnd is non-zero.  Therefore
when a zero-window packet comes in we incorrectly conclude that the
window is non-zero.

In fact if the peer continues to send us zero-window pure ACKs we
will continue making the same mistake.  It's only when the peer
transmits a zero-window packet with data attached that we get a
chance to snap out of it.  This is what triggers the treason
message at the next retransmit timeout.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>

2ad41065

30 9月, 2005 1 次提交

[TCP]: Don't over-clamp window in tcp_clamp_window() · 09e9ec87

由 Alexey Kuznetsov 提交于 9月 29, 2005

From: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>

Handle better the case where the sender sends full sized
frames initially, then moves to a mode where it trickles
out small amounts of data at a time.

This known problem is even mentioned in the comments
above tcp_grow_window() in tcp_input.c, specifically:

...
 * The scheme does not work when sender sends good segments opening
 * window and then starts to feed us spagetti. But it should work
 * in common situations. Otherwise, we have to rely on queue collapsing.
...

When the sender gives full sized frames, the "struct sk_buff" overhead
from each packet is small.  So we'll advertize a larger window.
If the sender moves to a mode where small segments are sent, this
ratio becomes tilted to the other extreme and we start overrunning
the socket buffer space.

tcp_clamp_window() tries to address this, but it's clamping of
tp->window_clamp is a wee bit too aggressive for this particular case.

Fix confirmed by Ion Badulescu.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

09e9ec87

15 9月, 2005 1 次提交

[TCP]: Compute in_sacked properly when we split up a TSO frame. · 3c05d92e

由 Herbert Xu 提交于 9月 14, 2005

The problem is that the SACK fragmenting code may incorrectly call
tcp_fragment() with a length larger than the skb->len.  This happens
when the skb on the transmit queue completely falls to the LHS of the
SACK.

And add a BUG() check to tcp_fragment() so we can spot this kind of
error more quickly in the future.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c05d92e

02 9月, 2005 1 次提交

[TCP]: Keep TSO enabled even during loss events. · 6475be16

由 David S. Miller 提交于 9月 01, 2005

All we need to do is resegment the queue so that
we record SACK information accurately.  The edges
of the SACK blocks guide our resegmenting decisions.

With help from Herbert Xu.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6475be16

30 8月, 2005 7 次提交

[NET]: Fix sparse warnings · 20380731

由 Arnaldo Carvalho de Melo 提交于 8月 16, 2005

Of this type, mostly:

CHECK net/ipv6/netfilter.c
net/ipv6/netfilter.c:96:12: warning: symbol 'ipv6_netfilter_init' was not declared. Should it be static?
net/ipv6/netfilter.c:101:6: warning: symbol 'ipv6_netfilter_fini' was not declared. Should it be static?
Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

20380731

[NET]: Store skb->timestamp as offset to a base timestamp · a61bbcf2

由 Patrick McHardy 提交于 8月 14, 2005

Reduces skb size by 8 bytes on 64-bit.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a61bbcf2

[ICSK]: Move TCP congestion avoidance members to icsk · 6687e988

由 Arnaldo Carvalho de Melo 提交于 8月 10, 2005

This changeset basically moves tcp_sk()->{ca_ops,ca_state,etc} to inet_csk(),
minimal renaming/moving done in this changeset to ease review.

Most of it is just changes of struct tcp_sock * to struct sock * parameters.

With this we move to a state closer to two interesting goals:

1. Generalisation of net/ipv4/tcp_diag.c, becoming inet_diag.c, being used
for any INET transport protocol that has struct inet_hashinfo and are
derived from struct inet_connection_sock. Keeps the userspace API, that will
just not display DCCP sockets, while newer versions of tools can support
DCCP.

2. INET generic transport pluggable Congestion Avoidance infrastructure, using
the current TCP CA infrastructure with DCCP.
Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6687e988

[ICSK]: Introduce reqsk_queue_prune from code in tcp_synack_timer · 295f7324

由 Arnaldo Carvalho de Melo 提交于 8月 09, 2005

With this we're very close to getting all of the current TCP
refactorings in my dccp-2.6 tree merged, next changeset will export
some functions needed by the current DCCP code and then dccp-2.6.git
will be born!
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

295f7324

[NET]: Just move the inet_connection_sock function from tcp sources · 3f421baa

由 Arnaldo Carvalho de Melo 提交于 8月 09, 2005

Completing the previous changeset, this also generalises tcp_v4_synq_add,
renaming it to inet_csk_reqsk_queue_hash_add, already geing used in the
DCCP tree, which I plan to merge RSN.
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f421baa

[NET]: Introduce inet_connection_sock · 463c84b9

由 Arnaldo Carvalho de Melo 提交于 8月 09, 2005

This creates struct inet_connection_sock, moving members out of struct
tcp_sock that are shareable with other INET connection oriented
protocols, such as DCCP, that in my private tree already uses most of
these members.

The functions that operate on these members were renamed, using a
inet_csk_ prefix while not being moved yet to a new file, so as to
ease the review of these changes.
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

463c84b9

[NET]: Kill skb->list · 8728b834

由 David S. Miller 提交于 8月 09, 2005

Remove the "list" member of struct sk_buff, as it is entirely
redundant.  All SKB list removal callers know which list the
SKB is on, so storing this in sk_buff does nothing other than
taking up some space.

Two tricky bits were SCTP, which I took care of, and two ATM
drivers which Francois Romieu <romieu@fr.zoreil.com> fixed
up.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NFrancois Romieu <romieu@fr.zoreil.com>

8728b834

09 7月, 2005 1 次提交

[NET]: Transform skb_queue_len() binary tests into skb_queue_empty() · b03efcfb

由 David S. Miller 提交于 7月 08, 2005

This is part of the grand scheme to eliminate the qlen
member of skb_queue_head, and subsequently remove the
'list' member of sk_buff.

Most users of skb_queue_len() want to know if the queue is
empty or not, and that's trivially done with skb_queue_empty()
which doesn't use the skb_queue_head->qlen member and instead
uses the queue list emptyness as the test.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b03efcfb

06 7月, 2005 6 次提交

[TCP]: Move to new TSO segmenting scheme. · c1b4a7e6

由 David S. Miller 提交于 7月 05, 2005

Make TSO segment transmit size decisions at send time not earlier.

The basic scheme is that we try to build as large a TSO frame as
possible when pulling in the user data, but the size of the TSO frame
output to the card is determined at transmit time.

This is guided by tp->xmit_size_goal. It is always set to a multiple
of MSS and tells sendmsg/sendpage how large an SKB to try and build.

Later, tcp_write_xmit() and tcp_push_one() chop up the packet if
necessary and conditions warrant. These routines can also decide to
"defer" in order to wait for more ACKs to arrive and thus allow larger
TSO frames to be emitted.

A general observation is that TSO elongates the pipe, thus requiring a
larger congestion window and larger buffering especially at the sender
side. Therefore, it is important that applications 1) get a large
enough socket send buffer (this is accomplished by our dynamic send
buffer expansion code) 2) do large enough writes.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c1b4a7e6

[TCP]: Break out send buffer expansion test. · 0d9901df

由 David S. Miller 提交于 7月 05, 2005

This makes it easier to understand, and allows easier
tweaking of the heuristic later on.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d9901df

[TCP]: Do not call tcp_tso_acked() if no work to do. · cb83199a

由 David S. Miller 提交于 7月 05, 2005

In tcp_clean_rtx_queue(), if the TSO packet is not even partially
acked, do not waste time calling tcp_tso_acked().
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cb83199a

[TCP]: Kill bogus comment above tcp_tso_acked(). · a5647696

由 David S. Miller 提交于 7月 05, 2005

Everything stated there is out of data.  tcp_trim_skb()
does adjust the available socket send buffer space and
skb->truesize now.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5647696

[TCP]: Fix __tcp_push_pending_frames() 'nonagle' handling. · 55c97f3e

由 David S. Miller 提交于 7月 05, 2005

'nonagle' should be passed to the tcp_snd_test() function
as 'TCP_NAGLE_PUSH' if we are checking an SKB not at the
tail of the write_queue.  This is because Nagle does not
apply to such frames since we cannot possibly tack more
data onto them.

However, while doing this __tcp_push_pending_frames() makes
all of the packets in the write_queue use this modified
'nonagle' value.

Fix the bug and simplify this function by just calling
tcp_write_xmit() directly if sk_send_head is non-NULL.

As a result, we can now make tcp_data_snd_check() just call
tcp_push_pending_frames() instead of the specialized
__tcp_data_snd_check().
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

55c97f3e

[TCP]: Move __tcp_data_snd_check into tcp_output.c · 84d3e7b9

由 David S. Miller 提交于 7月 05, 2005

It reimplements portions of tcp_snd_check(), so it
we move it to tcp_output.c we can consolidate it's
logic much easier in a later change.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

84d3e7b9

24 6月, 2005 1 次提交

[TCP]: Add pluggable congestion control algorithm infrastructure. · 317a76f9

由 Stephen Hemminger 提交于 6月 23, 2005

Allow TCP to have multiple pluggable congestion control algorithms.
Algorithms are defined by a set of operations and can be built in
or modules.  The legacy "new RENO" algorithm is used as a starting
point and fallback.
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

317a76f9

24 5月, 2005 1 次提交

[TCP]: Fix stretch ACK performance killer when doing ucopy. · 31432412

由 David S. Miller 提交于 5月 23, 2005

When we are doing ucopy, we try to defer the ACK generation to
cleanup_rbuf().  This works most of the time very well, but if the
ucopy prequeue is large, this ACKing behavior kills performance.

With TSO, it is possible to fill the prequeue so large that by the
time the ACK is sent and gets back to the sender, most of the window
has emptied of data and performance suffers significantly.

This behavior does help in some cases, so we should think about
re-enabling this trick in the future, using some kind of limit in
order to avoid the bug case.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31432412

06 5月, 2005 1 次提交

[PATCH] update Ross Biro bouncing email address · 02c30a84

由 Jesper Juhl 提交于 5月 05, 2005

Ross moved.  Remove the bad email address so people will find the correct
one in ./CREDITS.
Signed-off-by: NJesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

02c30a84

26 4月, 2005 1 次提交

[TCP]: Trivial tcp_data_queue() cleanup · 088dd3a4

由 James Morris 提交于 4月 25, 2005

This patch removes a superfluous intialization from tcp_data_queue().
Signed-off-by: NJames Morris <jmorris@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

088dd3a4

17 4月, 2005 1 次提交

Linux-2.6.12-rc2 · 1da177e4

由 Linus Torvalds 提交于 4月 16, 2005

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!

1da177e4