提交 · 7b0eb22b1d3b049306813a4aaa52966650f7491c · openeuler / raspberrypi-kernel

26 4月, 2007 6 次提交

[TCP] FRTO: Use Disorder state during operation instead of Open · 7b0eb22b

由 Ilpo Järvinen 提交于 2月 21, 2007

Retransmission counter assumptions are to be changed. Forcing
reason to do this exist: Using sysctl in check would be racy
as soon as FRTO starts to ignore some ACKs (doing that in the
following patches). Userspace may disable it at any moment
giving nice oops if timing is right. frto_counter would be
inaccessible from userspace, but with SACK enhanced FRTO
retrans_out can include other than head, and possibly leaving
it non-zero after spurious RTO, boom again.

Luckily, solution seems rather simple: never go directly to Open
state but use Disorder instead. This does not really change much,
since TCP could anyway change its state to Disorder during FRTO
using path tcp_fastretrans_alert -> tcp_try_to_open (e.g., when
a SACK block makes ACK dubious). Besides, Disorder seems to be
the state where TCP should be if not recovering (in Recovery or
Loss state) while having some retransmissions in-flight (see
tcp_try_to_open), which is exactly what happens with FRTO.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7b0eb22b

[TCP] FRTO: Consecutive RTOs keep prior_ssthresh and ssthresh · 7487c48c

由 Ilpo Järvinen 提交于 2月 21, 2007

In case a latency spike causes more than one RTO, the later should not
cause the already reduced ssthresh to propagate into the prior_ssthresh
since FRTO declares all such RTOs spurious at once or none of them. In
treating of ssthresh, we mimic what tcp_enter_loss() does.

The previous state (in frto_counter) must be available until we have
checked it in tcp_enter_frto(), and also ACK information flag in
process_frto().
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7487c48c

[TCP] FRTO: Comment cleanup & improvement · 30935cf4

由 Ilpo Järvinen 提交于 2月 21, 2007

Moved comments out from the body of process_frto() to the head
(preferred way; see Documentation/CodingStyle). Bonus: it's much
easier to read in this compacted form.

FRTO algorithm and implementation is described in greater detail.
For interested reader, more information is available in RFC4138.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

30935cf4

[TCP] FRTO: Moved tcp_use_frto from tcp.h to tcp_input.c · bdaae17d

由 Ilpo Järvinen 提交于 2月 21, 2007

In addition, removed inline.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bdaae17d

[TCP] FRTO: Separated response from FRTO detection algorithm · 9ead9a1d

由 Ilpo Järvinen 提交于 2月 21, 2007

FRTO spurious RTO detection algorithm (RFC4138) does not include response
to a detected spurious RTO but can use different response algorithms.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ead9a1d

[TCP] FRTO: Incorrectly clears TCPCB_EVER_RETRANS bit · 522e7548

由 Ilpo Järvinen 提交于 2月 21, 2007

FRTO was slightly too brave... Should only clear
TCPCB_SACKED_RETRANS bit.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

522e7548

11 2月, 2007 1 次提交

[NET] IPV4: Fix whitespace errors. · e905a9ed

由 YOSHIFUJI Hideaki 提交于 2月 09, 2007

Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e905a9ed

09 2月, 2007 3 次提交

[TCP]: Check num sacks in SACK fast path · 8a3c3a97

由 Baruch Even 提交于 2月 04, 2007

We clear the unused parts of the SACK cache, This prevents us from mistakenly
taking the cache data if the old data in the SACK cache is the same as the data
in the SACK block. This assumes that we never receive an empty SACK block with
start and end both at zero.
Signed-off-by: NBaruch Even <baruch@ev-en.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8a3c3a97

[TCP]: Seperate DSACK from SACK fast path · 6f74651a

由 Baruch Even 提交于 2月 04, 2007

Move DSACK code outside the SACK fast-path checking code. If the DSACK
determined that the information was too old we stayed with a partial cache
copied. Most likely this matters very little since the next packet will not be
DSACK and we will find it in the cache. but it's still not good form and there
is little reason to couple the two checks.

Since the SACK receive cache doesn't need the data to be in host order we also
remove the ntohl in the checking loop.
Signed-off-by: NBaruch Even <baruch@ev-en.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f74651a

[TCP]: Advance fast path pointer for first block only · fda03fbb

由 Baruch Even 提交于 2月 04, 2007

Only advance the SACK fast-path pointer for the first block, the
fast-path assumes that only the first block advances next time so we
should not move the cached skb for the next sack blocks.
Signed-off-by: NBaruch Even <baruch@ev-en.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fda03fbb

26 1月, 2007 1 次提交

[TCP]: Fix sorting of SACK blocks. · db3ccdac

由 Baruch Even 提交于 1月 25, 2007

The sorting of SACK blocks actually munges them rather than sort,
causing the TCP stack to ignore some SACK information and breaking the
assumption of ordered SACK blocks after sorting.

The sort takes the data from a second buffer which isn't moved causing
subsequent data moves to occur from the wrong location. The fix is to
use a temporary buffer as a normal sort does.
Signed-off-By: NBaruch Even <baruch@ev-en.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

db3ccdac

24 1月, 2007 1 次提交

[TCP]: skb is unexpectedly freed. · fb7e2399

由 Masayuki Nakagawa 提交于 1月 23, 2007

I encountered a kernel panic with my test program, which is a very
simple IPv6 client-server program.

The server side sets IPV6_RECVPKTINFO on a listening socket, and the
client side just sends a message to the server.  Then the kernel panic
occurs on the server.  (If you need the test program, please let me
know. I can provide it.)

This problem happens because a skb is forcibly freed in
tcp_rcv_state_process().

When a socket in listening state(TCP_LISTEN) receives a syn packet,
then tcp_v6_conn_request() will be called from
tcp_rcv_state_process().  If the tcp_v6_conn_request() successfully
returns, the skb would be discarded by __kfree_skb().

However, in case of a listening socket which was already set
IPV6_RECVPKTINFO, an address of the skb will be stored in
treq->pktopts and a ref count of the skb will be incremented in
tcp_v6_conn_request().  But, even if the skb is still in use, the skb
will be freed.  Then someone still using the freed skb will cause the
kernel panic.

I suggest to use kfree_skb() instead of __kfree_skb().
Signed-off-by: NMasayuki Nakagawa <nakagawa.msy@ncos.nec.co.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb7e2399

07 12月, 2006 1 次提交

[NET]: Memory barrier cleanups · e16aa207

由 Ralf Baechle 提交于 12月 07, 2006

I believe all the below memory barriers only matter on SMP so
therefore the smp_* variant of the barrier should be used.

I'm wondering if the barrier in net/ipv4/inet_timewait_sock.c should be
dropped entirely. schedule_work's implementation currently implies a
memory barrier and I think sane semantics of schedule_work() should imply
a memory barrier, as needed so the caller shouldn't have to worry.
It's not quite obvious why the barrier in net/packet/af_packet.c is
needed; maybe it should be implied through flush_dcache_page?
Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e16aa207

03 12月, 2006 3 次提交

[NET]: Annotate __skb_checksum_complete() and friends. · b51655b9

由 Al Viro 提交于 11月 14, 2006

Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b51655b9

[TCP]: MD5 Signature Option (RFC2385) support. · cfb6eeb4

由 YOSHIFUJI Hideaki 提交于 11月 14, 2006

Based on implementation by Rick Payne.
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cfb6eeb4

SELinux: Return correct context for SO_PEERSEC · 6b877699

由 Venkat Yekkirala 提交于 11月 08, 2006

Fix SO_PEERSEC for tcp sockets to return the security context of
the peer (as represented by the SA from the peer) as opposed to the
SA used by the local/source socket.
Signed-off-by: NVenkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

6b877699

04 10月, 2006 1 次提交

[TCP]: Kill warning in tcp_clean_rtx_queue(). · 80246ab3

由 David S. Miller 提交于 10月 03, 2006

GCC can't tell we always initialize 'tv' in all the cases
we actually use it, so explicitly set it up with zeros.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

80246ab3

29 9月, 2006 3 次提交

[TCP]: Fix and simplify microsecond rtt sampling · 8ea333eb

由 John Heffner 提交于 9月 28, 2006

This changes the microsecond RTT sampling so that samples are taken in
the same way that RTT samples are taken for the RTO calculator: on the
last segment acknowledged, and only when the segment hasn't been
retransmitted.
Signed-off-by: NJohn Heffner <jheffner@psc.edu>
Acked-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8ea333eb

[TCP] net/ipv4/tcp_input.c: trivial annotations · 4f3608b7

由 Al Viro 提交于 9月 27, 2006

Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4f3608b7

[TCP]: struct tcp_sack_block annotations · 269bd27e

由 Al Viro 提交于 9月 27, 2006

Some of the instances of tcp_sack_block are host-endian, some - net-endian.
Define struct tcp_sack_block_wire identical to struct tcp_sack_block
with u32 replaced with __be32; annotate uses of tcp_sack_block replacing
net-endian ones with tcp_sack_block_wire. Change is obviously safe since
for cc(1) __be32 is typedefed to u32.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

269bd27e

23 9月, 2006 3 次提交

[TCP]: Send ACKs each 2nd received segment. · 1ef9696c

由 Alexey Kuznetsov 提交于 9月 19, 2006

It does not affect either mss-sized connections (obviously) or
connections controlled by Nagle (because there is only one small
segment in flight).

The idea is to record the fact that a small segment arrives on a
connection, where one small segment has already been received and
still not-ACKed. In this case ACK is forced after tcp_recvmsg() drains
receive buffer.

In other words, it is a "soft" each-2nd-segment ACK, which is enough
to preserve ACK clock even when ABC is enabled.
Signed-off-by: NAlexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ef9696c

[TCP]: Fix rcv mss estimate for LRO · ff9b5e0f

由 Herbert Xu 提交于 8月 31, 2006

By passing a Linux-generated TSO packet straight back into Linux, Xen
becomes our first LRO user :) Unfortunately, there is at least one spot
in our stack that needs to be changed to cope with this.

The receive MSS estimate is computed from the raw packet size. This is
broken if the packet is GSO/LRO. Fortunately the real MSS can be found
in gso_size so we simply need to use that if it is non-zero.

Real LRO NICs should of course set the gso_size field in future.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff9b5e0f

[NET/IPV4/IPV6]: Change some sysctl variables to __read_mostly · ab32ea5d

由 Brian Haley 提交于 9月 22, 2006

Change net/core, ipv4 and ipv6 sysctl variables to __read_mostly.

Couldn't actually measure any performance increase while testing (.3%
I consider noise), but seems like the right thing to do.
Signed-off-by: NBrian Haley <brian.haley@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab32ea5d

18 9月, 2006 1 次提交

[TCP]: Turn ABC off. · b3a8a40d

由 Stephen Hemminger 提交于 9月 13, 2006

Turn Appropriate Byte Count off by default because it unfairly
penalizes applications that do small writes.  Add better documentation
to describe what it is so users will understand why they might want to
turn it on.
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b3a8a40d

30 8月, 2006 1 次提交

[TCP]: Two RFC3465 Appropriate Byte Count fixes. · 3fdf3f0c

由 Daikichi Osuga 提交于 8月 29, 2006

1) fix slow start after retransmit timeout
2) fix case of L=2*SMSS acked bytes comparison
Signed-off-by: NDaikichi Osuga <osugad@s1.nttdocomo.co.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3fdf3f0c

05 8月, 2006 1 次提交

[TCP]: Fixes IW > 2 cases when TCP is application limited · d254bcdb

由 Ilpo Järvinen 提交于 8月 04, 2006

Whenever a transfer is application limited, we are allowed at least
initial window worth of data per window unless cwnd is previously
less than that.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d254bcdb

01 7月, 2006 1 次提交

Remove obsolete #include <linux/config.h> · 6ab3d562

由 Jörn Engel 提交于 6月 30, 2006

Signed-off-by: NJörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>

6ab3d562

30 6月, 2006 1 次提交

[NET]: Add ECN support for TSO · b0da8537

由 Michael Chan 提交于 6月 29, 2006

In the current TSO implementation, NETIF_F_TSO and ECN cannot be
turned on together in a TCP connection.  The problem is that most
hardware that supports TSO does not handle CWR correctly if it is set
in the TSO packet.  Correct handling requires CWR to be set in the
first packet only if it is set in the TSO header.

This patch adds the ability to turn on NETIF_F_TSO and ECN using
GSO if necessary to handle TSO packets with CWR set.  Hardware
that handles CWR correctly can turn on NETIF_F_TSO_ECN in the dev->
features flag.

All TSO packets with CWR set will have the SKB_GSO_TCPV4_ECN set.  If
the output device does not have the NETIF_F_TSO_ECN feature set, GSO
will split the packet up correctly with CWR only set in the first
segment.

With help from Herbert Xu <herbert@gondor.apana.org.au>.

Since ECN can always be enabled with TSO, the SOCK_NO_LARGESEND sock
flag is completely removed.
Signed-off-by: NMichael Chan <mchan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b0da8537

23 6月, 2006 1 次提交

[NET]: Merge TSO/UFO fields in sk_buff · 7967168c

由 Herbert Xu 提交于 6月 22, 2006

Having separate fields in sk_buff for TSO/UFO (tso_size/ufo_size) is not
going to scale if we add any more segmentation methods (e.g., DCCP).  So
let's merge them.

They were used to tell the protocol of a packet.  This function has been
subsumed by the new gso_type field.  This is essentially a set of netdev
feature bits (shifted by 16 bits) that are required to process a specific
skb.  As such it's easy to tell whether a given device can process a GSO
skb: you just have to and the gso_type field and the netdev's features
field.

I've made gso_type a conjunction.  The idea is that you have a base type
(e.g., SKB_GSO_TCPV4) that can be modified further to support new features.
For example, if we add a hardware TSO type that supports ECN, they would
declare NETIF_F_TSO | NETIF_F_TSO_ECN.  All TSO packets with CWR set would
have a gso_type of SKB_GSO_TCPV4 | SKB_GSO_TCPV4_ECN while all other TSO
packets would be SKB_GSO_TCPV4.  This means that only the CWR packets need
to be emulated in software.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7967168c

18 6月, 2006 3 次提交

[TCP]: Minimum congestion window consolidation. · 72dc5b92

由 Stephen Hemminger 提交于 6月 05, 2006

Many of the TCP congestion methods all just use ssthresh
as the minimum congestion window on decrease.  Rather than
duplicating the code, just have that be the default if that
handle in the ops structure is not set.

Minor behaviour change to TCP compound.  It probably wants
to use this (ssthresh) as lower bound, rather than ssthresh/2
because the latter causes undershoot on loss.
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72dc5b92

[TCP]: tcp_rcv_rtt_measure_ts() call in pure-ACK path is superfluous · 15986e1a

由 David S. Miller 提交于 5月 25, 2006

We only want to take receive RTT mesaurements for data
bearing frames, here in the header prediction fast path
for a pure-sender, we know that we have a pure-ACK and
thus the checks in tcp_rcv_rtt_mesaure_ts() will not pass.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

15986e1a

[I/OAT]: TCP recv offload to I/OAT · 1a2449a8

由 Chris Leech 提交于 5月 23, 2006

Locks down user pages and sets up for DMA in tcp_recvmsg, then calls
dma_async_try_early_copy in tcp_v4_do_rcv
Signed-off-by: NChris Leech <christopher.leech@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a2449a8

12 6月, 2006 1 次提交

[TCP]: continued: reno sacked_out count fix · 79320d7e

由 Aki M Nyrhinen 提交于 6月 11, 2006

From: Aki M Nyrhinen <anyrhine@cs.helsinki.fi>

IMHO the current fix to the problem (in_flight underflow in reno)
is incorrect.  it treats the symptons but ignores the problem. the
problem is timing out packets other than the head packet when we
don't have sack. i try to explain (sorry if explaining the obvious).

with sack, scanning the retransmit queue for timed out packets is
fine because we know which packets in our retransmit queue have been
acked by the receiver.

without sack, we know only how many packets in our retransmit queue the
receiver has acknowledged, but no idea which packets.

think of a "typical" slow-start overshoot case, where for example
every third packet in a window get lost because a router buffer gets
full.

with sack, we check for timeouts on those every third packet (as the
rest have been sacked). the packet counting works out and if there
is no reordering, we'll retransmit exactly the packets that were 
lost.

without sack, however, we check for timeout on every packet and end up
retransmitting consecutive packets in the retransmit queue. in our
slow-start example, 2/3 of those retransmissions are unnecessary. these
unnecessary retransmissions eat the congestion window and evetually
prevent fast recovery from continuing, if enough packets were lost.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79320d7e

17 5月, 2006 1 次提交

[TCP]: reno sacked_out count fix · 8872d8e1

由 Angelo P. Castellani 提交于 5月 16, 2006

From: "Angelo P. Castellani" <angelo.castellani+lkml@gmail.com>

Using NewReno, if a sk_buff is timed out and is accounted as lost_out,
it should also be removed from the sacked_out.

This is necessary because recovery using NewReno fast retransmit could
take up to a lot RTTs and the sk_buff RTO can expire without actually
being really lost.

left_out = sacked_out + lost_out
in_flight = packets_out - left_out + retrans_out

Using NewReno without this patch, on very large network losses,
left_out becames bigger than packets_out + retrans_out (!!).

For this reason unsigned integer in_flight overflows to 2^32 - something.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8872d8e1

15 4月, 2006 1 次提交

[IPV4]: Possible cleanups. · 6c97e72a

由 Adrian Bunk 提交于 4月 12, 2006

This patch contains the following possible cleanups:
- make the following needlessly global function static:
  - arp.c: arp_rcv()
- remove the following unused EXPORT_SYMBOL's:
  - devinet.c: devinet_ioctl
  - fib_frontend.c: ip_rt_ioctl
  - inet_hashtables.c: inet_bind_bucket_create
  - inet_hashtables.c: inet_bind_hash
  - tcp_input.c: sysctl_tcp_abc
  - tcp_ipv4.c: sysctl_tcp_tw_reuse
  - tcp_output.c: sysctl_tcp_mtu_probing
  - tcp_output.c: sysctl_tcp_base_mss
Signed-off-by: NAdrian Bunk <bunk@stusta.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c97e72a

21 3月, 2006 2 次提交

[TCP] mtu probing: move tcp-specific data out of inet_connection_sock · 0e7b1368

由 John Heffner 提交于 3月 20, 2006

This moves some TCP-specific MTU probing state out of
inet_connection_sock back to tcp_sock.
Signed-off-by: NJohn Heffner <jheffner@psc.edu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e7b1368

[TCP]: MTU probing · 5d424d5a

由 John Heffner 提交于 3月 20, 2006

Implementation of packetization layer path mtu discovery for TCP, based on
the internet-draft currently found at
<http://www.ietf.org/internet-drafts/draft-ietf-pmtud-method-05.txt>.
Signed-off-by: NJohn Heffner <jheffner@psc.edu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d424d5a

10 2月, 2006 1 次提交

[TCP]: rcvbuf lock when tcp_moderate_rcvbuf enabled · 6fcf9412

由 John Heffner 提交于 2月 09, 2006

The rcvbuf lock should probably be honored here.
Signed-off-by: NJohn Heffner <jheffner@psc.edu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6fcf9412

10 1月, 2006 1 次提交

[NET]: Change some "if (x) BUG();" to "BUG_ON(x);" · 09a62660

由 Kris Katterjohn 提交于 1月 08, 2006

This changes some simple "if (x) BUG();" statements to "BUG_ON(x);"
Signed-off-by: NKris Katterjohn <kjak@users.sourceforge.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

09a62660

04 1月, 2006 1 次提交

[TCP]: less inline's · 40efc6fa

由 Stephen Hemminger 提交于 1月 03, 2006

TCP inline usage cleanup:
 * get rid of inline in several places
 * replace __inline__ with inline where possible
 * move functions used in one file out of tcp.h
 * let compiler decide on used once cases

On x86_64: 
   text	   data	    bss	    dec	    hex	filename
3594701	 648348	 567400	4810449	 4966d1	vmlinux.orig
3593133	 648580	 567400	4809113	 496199	vmlinux

On sparc64:
   text	   data	    bss	    dec	    hex	filename
2538278	 406152	 530392	3474822	 350586	vmlinux.ORIG
2536382	 406384	 530392	3473158	 34ff06	vmlinux
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

40efc6fa