提交 · 3cfe3baaf07c9e40a75f9a70662de56df1c246a8 · openeuler / Kernel

26 4月, 2007 4 次提交

[TCP]: Add two new spurious RTO responses to FRTO · 3cfe3baa

由 Ilpo Järvinen 提交于 2月 27, 2007

New sysctl tcp_frto_response is added to select amongst these
responses:
	- Rate halving based; reuses CA_CWR state (default)
	- Very conservative; used to be the only one available (=1)
	- Undo cwr; undoes ssthresh and cwnd reductions (=2)

The response with rate halving requires a new parameter to
tcp_enter_cwr because FRTO has already reduced ssthresh and
doing a second reduction there has to be prevented. In addition,
to keep things nice on 80 cols screen, a local variable was
added.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3cfe3baa

J
[TCP]: Add RFC3742 Limited Slow-Start, controlled by variable sysctl_tcp_max_ssthresh. · 886236c1
由 John Heffner 提交于 3月 25, 2007
```
Signed-off-by: NJohn Heffner <jheffner@psc.edu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
886236c1

[TCP] FRTO: Entry is allowed only during (New)Reno like recovery · 46d0de4e

由 Ilpo Järvinen 提交于 2月 21, 2007

This interpretation comes from RFC4138:
    "If the sender implements some loss recovery algorithm other
     than Reno or NewReno [FHG04], the F-RTO algorithm SHOULD
     NOT be entered when earlier fast recovery is underway."

I think the RFC means to say (especially in the light of
Appendix B) that ...recovery is underway (not just fast recovery)
or was underway when it was interrupted by an earlier (F-)RTO
that hasn't yet been resolved (snd_una has not advanced enough).
Thus, my interpretation is that whenever TCP has ever
retransmitted other than head, basic version cannot be used
because then the order assumptions which are used as FRTO basis
do not hold.

NewReno has only the head segment retransmitted at a time.
Therefore, walk up to the segment that has not been SACKed, if
that segment is not retransmitted nor anything before it, we know
for sure, that nothing after the non-SACKed segment should be
either. This assumption is valid because TCPCB_EVER_RETRANS does
not leave holes but each non-SACKed segment is rexmitted
in-order.

Check for retrans_out > 1 avoids more expensive walk through the
skb list, as we can know the result beforehand: F-RTO will not be
allowed.

SACKed skb can turn into non-SACked only in the extremely rare
case of SACK reneging, in this case we might fail to detect
retransmissions if there were them for any other than head. To
get rid of that feature, whole rexmit queue would have to be
walked (always) or FRTO should be prevented when SACK reneging
happens. Of course RTO should still trigger after reneging which
makes this issue even less likely to show up. And as long as the
response is as conservative as it's now, nothing bad happens even
then.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46d0de4e

[TCP] FRTO: Moved tcp_use_frto from tcp.h to tcp_input.c · bdaae17d

由 Ilpo Järvinen 提交于 2月 21, 2007

In addition, removed inline.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bdaae17d

09 2月, 2007 1 次提交

[TCP]: remove tcp header from tcp_v4_check (take #2) · ba7808ea

由 Frederik Deweerdt 提交于 2月 04, 2007

The tcphdr struct passed to tcp_v4_check is not used, the following
patch removes it from the parameter list.

This adds the netfilter modifications missing in the patch I sent
for rc3-mm1.
Signed-off-by: NFrederik Deweerdt <frederik.deweerdt@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba7808ea

05 1月, 2007 1 次提交

[TCP]: Use old definition of before · 0d630cc0

由 Gerrit Renker 提交于 1月 04, 2007

This reverts the new (unambiguous) definition of the TCP `before'
relation. As pointed out in an example by Herbert Xu, there is 
existing code which implicitly requires the old definition in order
to work correctly.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d630cc0

23 12月, 2006 1 次提交

[TCP]: Fix ambiguity in the `before' relation. · 9a036b9c

由 Gerrit Renker 提交于 12月 20, 2006

While looking at DCCP sequence numbers, I stumbled over a problem with
the following definition of before in tcp.h:

static inline int before(__u32 seq1, __u32 seq2)
{
        return (__s32)(seq1-seq2) < 0;
}

Problem: This definition suffers from an an ambiguity, i.e. always

           before(a, (a + 2^31) % 2^32)) = 1
           before((a + 2^31) % 2^32), a) = 1

         In text: when the difference between a and b amounts to 2^31,
         a is always considered `before' b, the function can not decide.
         The reason is that implicitly 0 is `before' 1 ... 2^31-1 ... 2^31

Solution: There is a simple fix, by defining before in such a way that
          0 is no longer `before' 2^31, i.e. 0 `before' 1 ... 2^31-1
          By not using the middle between 0 and 2^32, before can be made
          unambiguous.
          This is achieved by testing whether seq2-seq1 > 0 (using signed
          32-bit arithmetic).

I attach a patch to codify this. Also the `after' relation is basically
a redefinition of `before', it is now defined as a macro after before.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a036b9c

03 12月, 2006 7 次提交

A
[NET]: Fix assorted misannotations (from md5 and udplite merges). · 8e5200f5
由 Al Viro 提交于 11月 20, 2006
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
8e5200f5

[NET]: Annotate __skb_checksum_complete() and friends. · b51655b9

由 Al Viro 提交于 11月 14, 2006

Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b51655b9

[NET]: Annotate csum_tcpudp_magic() callers in net/* · 6b11687e

由 Al Viro 提交于 11月 14, 2006

Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b11687e

[TCP]: MD5 Signature Option (RFC2385) support. · cfb6eeb4

由 YOSHIFUJI Hideaki 提交于 11月 14, 2006

Based on implementation by Rick Payne.
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cfb6eeb4

[TCP]: Restrict congestion control choices. · ce7bc3bf

由 Stephen Hemminger 提交于 11月 09, 2006

Allow normal users to only choose among a restricted set of congestion
control choices.  The default is reno and what ever has been configured
as default. But the policy can be changed by administrator at any time.

For example, to allow any choice:
    cp /proc/sys/net/ipv4/tcp_available_congestion_control \
       /proc/sys/net/ipv4/tcp_allowed_congestion_control
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce7bc3bf

[TCP]: Add tcp_available_congestion_control sysctl. · 3ff825b2

由 Stephen Hemminger 提交于 11月 09, 2006

Create /proc/sys/net/ipv4/tcp_available_congestion_control
that reflects currently available TCP choices.
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ff825b2

[NET]: Size listen hash tables using backlog hint · 72a3effa

由 Eric Dumazet 提交于 11月 16, 2006

We currently allocate a fixed size (TCP_SYNQ_HSIZE=512) slots hash table for
each LISTEN socket, regardless of various parameters (listen backlog for
example)

On x86_64, this means order-1 allocations (might fail), even for 'small'
sockets, expecting few connections. On the contrary, a huge server wanting a
backlog of 50000 is slowed down a bit because of this fixed limit.

This patch makes the sizing of listen hash table a dynamic parameter,
depending of :
- net.core.somaxconn tunable (default is 128)
- net.ipv4.tcp_max_syn_backlog tunable (default : 256, 1024 or 128)
- backlog value given by user application  (2nd parameter of listen())

For large allocations (bigger than PAGE_SIZE), we use vmalloc() instead of
kmalloc().

We still limit memory allocation with the two existing tunables (somaxconn &
tcp_max_syn_backlog). So for standard setups, this patch actually reduce RAM
usage.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72a3effa

03 8月, 2006 1 次提交

[TCP]: SNMPv2 tcpAttemptFails counter error · 3687b1dc

由 Wei Yongjun 提交于 7月 30, 2006

Refer to RFC2012, tcpAttemptFails is defined as following:
  tcpAttemptFails OBJECT-TYPE
      SYNTAX      Counter32
      MAX-ACCESS  read-only
      STATUS      current
      DESCRIPTION
              "The number of times TCP connections have made a direct
              transition to the CLOSED state from either the SYN-SENT
              state or the SYN-RCVD state, plus the number of times TCP
              connections have made a direct transition to the LISTEN
              state from the SYN-RCVD state."
      ::= { tcp 7 }

When I lookup into RFC793, I found that the state change should occured
under following condition:
  1. SYN-SENT -> CLOSED
     a) Received ACK,RST segment when SYN-SENT state.

  2. SYN-RCVD -> CLOSED
     b) Received SYN segment when SYN-RCVD state(came from LISTEN).
     c) Received RST segment when SYN-RCVD state(came from SYN-SENT).
     d) Received SYN segment when SYN-RCVD state(came from SYN-SENT).

  3. SYN-RCVD -> LISTEN
     e) Received RST segment when SYN-RCVD state(came from LISTEN).

In my test, those direct state transition can not be counted to
tcpAttemptFails.
Signed-off-by: NWei Yongjun <yjwei@nanjing-fnst.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3687b1dc

09 7月, 2006 1 次提交

[NET] gso: Fix up GSO packets with broken checksums · a430a43d

由 Herbert Xu 提交于 7月 08, 2006

Certain subsystems in the stack (e.g., netfilter) can break the partial
checksum on GSO packets. Until they're fixed, this patch allows this to
work by recomputing the partial checksums through the GSO mechanism.

Once they've all been converted to update the partial checksum instead of
clearing it, this workaround can be removed.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a430a43d

01 7月, 2006 1 次提交

[NET]: Generalise TSO-specific bits from skb_setup_caps · bcd76111

由 Herbert Xu 提交于 6月 30, 2006

This patch generalises the TSO-specific bits from sk_setup_caps by adding
the sk_gso_type member to struct sock.  This makes sk_setup_caps generic
so that it can be used by TCPv6 or UFO.

The only catch is that whoever uses this must provide a GSO implementation
for their protocol which I think is a fair deal :) For now UFO continues to
live without a GSO implementation which is OK since it doesn't use the sock
caps field at the moment.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bcd76111

30 6月, 2006 1 次提交

[NET]: Added GSO header verification · 576a30eb

由 Herbert Xu 提交于 6月 27, 2006

When GSO packets come from an untrusted source (e.g., a Xen guest domain),
we need to verify the header integrity before passing it to the hardware.

Since the first step in GSO is to verify the header, we can reuse that
code by adding a new bit to gso_type: SKB_GSO_DODGY. Packets with this
bit set can only be fed directly to devices with the corresponding bit
NETIF_F_GSO_ROBUST. If the device doesn't have that bit, then the skb
is fed to the GSO engine which will allow the packet to be sent to the
hardware if it passes the header check.

This patch changes the sg flag to a full features flag. The same method
can be used to implement TSO ECN support. We simply have to mark packets
with CWR set with SKB_GSO_ECN so that only hardware with a corresponding
NETIF_F_TSO_ECN can accept them. The GSO engine can either fully segment
the packet, or segment the first MTU and pass the rest to the hardware for
further segmentation.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

576a30eb

23 6月, 2006 2 次提交

[NET]: Add software TSOv4 · f4c50d99

由 Herbert Xu 提交于 6月 22, 2006

This patch adds the GSO implementation for IPv4 TCP.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f4c50d99

[NET]: Merge TSO/UFO fields in sk_buff · 7967168c

由 Herbert Xu 提交于 6月 22, 2006

Having separate fields in sk_buff for TSO/UFO (tso_size/ufo_size) is not
going to scale if we add any more segmentation methods (e.g., DCCP).  So
let's merge them.

They were used to tell the protocol of a packet.  This function has been
subsumed by the new gso_type field.  This is essentially a set of netdev
feature bits (shifted by 16 bits) that are required to process a specific
skb.  As such it's easy to tell whether a given device can process a GSO
skb: you just have to and the gso_type field and the netdev's features
field.

I've made gso_type a conjunction.  The idea is that you have a base type
(e.g., SKB_GSO_TCPV4) that can be modified further to support new features.
For example, if we add a hardware TSO type that supports ECN, they would
declare NETIF_F_TSO | NETIF_F_TSO_ECN.  All TSO packets with CWR set would
have a gso_type of SKB_GSO_TCPV4 | SKB_GSO_TCPV4_ECN while all other TSO
packets would be SKB_GSO_TCPV4.  This means that only the CWR packets need
to be emulated in software.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7967168c

18 6月, 2006 5 次提交

[TCP]: Add tcp_slow_start_after_idle sysctl. · 35089bb2

由 David S. Miller 提交于 6月 13, 2006

A lot of people have asked for a way to disable tcp_cwnd_restart(),
and it seems reasonable to add a sysctl to do that.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35089bb2

[TCP]: Minimum congestion window consolidation. · 72dc5b92

由 Stephen Hemminger 提交于 6月 05, 2006

Many of the TCP congestion methods all just use ssthresh
as the minimum congestion window on decrease.  Rather than
duplicating the code, just have that be the default if that
handle in the ops structure is not set.

Minor behaviour change to TCP compound.  It probably wants
to use this (ssthresh) as lower bound, rather than ssthresh/2
because the latter causes undershoot on loss.
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72dc5b92

[I/OAT]: Add a sysctl for tuning the I/OAT offloaded I/O threshold · 95937825

由 Chris Leech 提交于 5月 23, 2006

Any socket recv of less than this ammount will not be offloaded
Signed-off-by: NChris Leech <christopher.leech@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95937825

[I/OAT]: Rename cleanup_rbuf to tcp_cleanup_rbuf and make non-static · 0e4b4992

由 Chris Leech 提交于 5月 23, 2006

Needed to be able to call tcp_cleanup_rbuf in tcp_input.c for I/OAT
Signed-off-by: NChris Leech <christopher.leech@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e4b4992

[I/OAT]: Structure changes for TCP recv offload to I/OAT · 97fc2f08

由 Chris Leech 提交于 5月 23, 2006

Adds an async_wait_queue and some additional fields to tcp_sock, and a
dma_cookie_t to sk_buff.
Signed-off-by: NChris Leech <christopher.leech@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

97fc2f08

26 4月, 2006 1 次提交
- D
  Don't include linux/config.h from anywhere else in include/ · 62c4f0a2
  由 David Woodhouse 提交于 4月 26, 2006
```
Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
```
  62c4f0a2
31 3月, 2006 1 次提交
- D
  [TCP]: Kill unused extern decl for tcp_v4_hash_connecting() · 0803dbed
  由 David S. Miller 提交于 3月 31, 2006
```
Noticed by Alan Menegotto.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  0803dbed
21 3月, 2006 3 次提交

[NET]: {get|set}sockopt compatibility layer · 3fdadf7d

由 Dmitry Mishin 提交于 3月 20, 2006

This patch extends {get|set}sockopt compatibility layer in order to
move protocol specific parts to their place and avoid huge universal
net/compat.c file in the future.
Signed-off-by: NDmitry Mishin <dim@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3fdadf7d

[TCP]: sysctl to allow TCP window > 32767 sans wscale · 15d99e02

由 Rick Jones 提交于 3月 20, 2006

Back in the dark ages, we had to be conservative and only allow 15-bit
window fields if the window scale option was not negotiated.  Some
ancient stacks used a signed 16-bit quantity for the window field of
the TCP header and would get confused.

Those days are long gone, so we can use the full 16-bits by default
now.

There is a sysctl added so that we can still interact with such old
stacks
Signed-off-by: NRick Jones <rick.jones2@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

15d99e02

[TCP]: MTU probing · 5d424d5a

由 John Heffner 提交于 3月 20, 2006

Implementation of packetization layer path mtu discovery for TCP, based on
the internet-draft currently found at
<http://www.ietf.org/internet-drafts/draft-ietf-pmtud-method-05.txt>.
Signed-off-by: NJohn Heffner <jheffner@psc.edu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d424d5a

04 1月, 2006 4 次提交

[TCP]: less inline's · 40efc6fa

由 Stephen Hemminger 提交于 1月 03, 2006

TCP inline usage cleanup:
 * get rid of inline in several places
 * replace __inline__ with inline where possible
 * move functions used in one file out of tcp.h
 * let compiler decide on used once cases

On x86_64: 
   text	   data	    bss	    dec	    hex	filename
3594701	 648348	 567400	4810449	 4966d1	vmlinux.orig
3593133	 648580	 567400	4809113	 496199	vmlinux

On sparc64:
   text	   data	    bss	    dec	    hex	filename
2538278	 406152	 530392	3474822	 350586	vmlinux.ORIG
2536382	 406384	 530392	3473158	 34ff06	vmlinux
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

40efc6fa

A
[TCP]: Don't use __constant_htonl for a non const arg · 8639a11e
由 Arnaldo Carvalho de Melo 提交于 12月 27, 2005
```
Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
8639a11e

[TWSK]: Introduce struct timewait_sock_ops · 6d6ee43e

由 Arnaldo Carvalho de Melo 提交于 12月 13, 2005

So that we can share several timewait sockets related functions and
make the timewait mini sockets infrastructure closer to the request
mini sockets one.

Next changesets will take advantage of this, moving more code out of
TCP and DCCP v4 and v6 to common infrastructure.
Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d6ee43e

[ICSK]: Rename struct tcp_func to struct inet_connection_sock_af_ops · 8292a17a

由 Arnaldo Carvalho de Melo 提交于 12月 13, 2005

And move it to struct inet_connection_sock. DCCP will use it in the
upcoming changesets.
Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8292a17a

16 11月, 2005 1 次提交

[TCP]: More spelling fixes. · 31f34269

由 Stephen Hemminger 提交于 11月 15, 2005

From Joe Perches
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31f34269

11 11月, 2005 5 次提交

[TCP]: speed up SACK processing · 6a438bbe

由 Stephen Hemminger 提交于 11月 10, 2005

Use "hints" to speed up the SACK processing. Various forms 
of this have been used by TCP developers (Web100, STCP, BIC)
to avoid the 2x linear search of outstanding segments.
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6a438bbe

[TCP]: spelling fixes · caa20d9a

由 Stephen Hemminger 提交于 11月 10, 2005

Minor spelling fixes for TCP code.
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

caa20d9a

[TCP]: Appropriate Byte Count support · 9772efb9

由 Stephen Hemminger 提交于 11月 10, 2005

This is an updated version of the RFC3465 ABC patch originally
for Linux 2.6.11-rc4 by Yee-Ting Li. ABC is a way of counting
bytes ack'd rather than packets when updating congestion control.

The orignal ABC described in the RFC applied to a Reno style
algorithm. For advanced congestion control there is little
change after leaving slow start.
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9772efb9

[TCP]: add tcp_slow_start helper · 7faffa1c

由 Stephen Hemminger 提交于 11月 10, 2005

Move all the code that does linear TCP slowstart to one
inline function to ease later patch to add ABC support.
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7faffa1c

[TCP]: fix congestion window update when using TSO deferal · f4805ede

由 Stephen Hemminger 提交于 11月 10, 2005

TCP peformance with TSO over networks with delay is awful.
On a 100Mbit link with 150ms delay, we get 4Mbits/sec with TSO and
50Mbits/sec without TSO.

The problem is with TSO, we intentionally do not keep the maximum
number of packets in flight to fill the window, we hold out to until 
we can send a MSS chunk. But, we also don't update the congestion window 
unless we have filled, as per RFC2861.

This patch replaces the check for the congestion window being full
with something smarter that accounts for TSO.
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f4805ede

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功