提交 · 04b964dbad25cbd6edd8ecbeca2efb40c9860865 · openeuler / raspberrypi-kernel

26 4月, 2007 40 次提交

[SK_BUFF] ipconfig: Another conversion to skb_reset_network_header related to skb_put · 04b964db

由 Arnaldo Carvalho de Melo 提交于 3月 10, 2007

boot_pkt->iph is the first member, that is at skb->data, so just use
skb_reset_network_header().
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04b964db

A
[SK_BUFF]: Some more skb_put cases converted to skb_reset_network_header · 2ca9e6f2
由 Arnaldo Carvalho de Melo 提交于 3月 10, 2007
```
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
2ca9e6f2

[SK_BUFF]: Some more simple skb_reset_network_header conversions · 31c7711b

由 Arnaldo Carvalho de Melo 提交于 3月 10, 2007

This time of the type:

 skb->nh.iph = (struct iphdr *)skb->data;

That is completely equivalent to:

 skb->nh.raw = skb->data;

Wonder why people love casts... :-)
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31c7711b

[SK_BUFF]: Use skb_reset_network_header where the return of __pskb_pull was being used · 4209fb60

由 Arnaldo Carvalho de Melo 提交于 3月 10, 2007

It returns skb->data, so we can just use skb_reset_network_header after it.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4209fb60

[SK_BUFF]: Use skb_reset_network_header where the skb_pull return was being used · 7e28ecc2

由 Arnaldo Carvalho de Melo 提交于 3月 10, 2007

But only in the cases where its a newly allocated skb, i.e. one where skb->tail
is equal to skb->data, or just after skb_reserve, where this requirement is
maintained.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e28ecc2

[SK_BUFF]: Use skb_reset_network_header in skb_push cases · e2d1bca7

由 Arnaldo Carvalho de Melo 提交于 4月 10, 2007

skb_push updates and returns skb->data, so we can just call
skb_reset_network_header after the call to skb_push.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e2d1bca7

[SK_BUFF]: Introduce skb_reset_network_header(skb) · c1d2bbe1

由 Arnaldo Carvalho de Melo 提交于 4月 10, 2007

For the common, open coded 'skb->nh.raw = skb->data' operation, so that we can
later turn skb->nh.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.

This one touches just the most simple case, next will handle the slightly more
"complex" cases.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c1d2bbe1

[SK_BUFF]: Introduce skb_mac_header() · 98e399f8

由 Arnaldo Carvalho de Melo 提交于 3月 19, 2007

For the places where we need a pointer to the mac header, it is still legal to
touch skb->mac.raw directly if just adding to, subtracting from or setting it
to another layer header.

This one also converts some more cases to skb_reset_mac_header() that my
regex missed as it had no spaces before nor after '=', ugh.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

98e399f8

[TCP]: Use skb_set_mac_header in tcp_collapse · 31713c33

由 Arnaldo Carvalho de Melo 提交于 3月 10, 2007

Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31713c33

[TCP]: Do the layer header setting in tcp_collapse relative to skb->data · c51957da

由 Arnaldo Carvalho de Melo 提交于 3月 10, 2007

That is equal to skb->head before skb_reserve, to help in the layer header
changes.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c51957da

A
[SK_BUFF] xfrm: Use skb_set_mac_header in the memmove cases · 39f69c6f
由 Arnaldo Carvalho de Melo 提交于 3月 10, 2007
```
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
39f69c6f

[SK_BUFF]: Introduce skb_reset_mac_header(skb) · 459a98ed

由 Arnaldo Carvalho de Melo 提交于 3月 19, 2007

For the common, open coded 'skb->mac.raw = skb->data' operation, so that we can
later turn skb->mac.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.

This one touches just the most simple case, next will handle the slightly more
"complex" cases.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

459a98ed

A
[UDP]: Use __skb_pull since we have checked it won't fail with pskb_may_pull · c7a3c5da
由 Arnaldo Carvalho de Melo 提交于 3月 09, 2007
```
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
c7a3c5da

[UDP]: deinline · 3fbe070a

由 Stephen Hemminger 提交于 3月 08, 2007

A couple of functions are exported or used indirectly
so it is pointless to mark them as inline.
Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3fbe070a

[TCP]: whitespace cleanup · 2de979bd

由 Stephen Hemminger 提交于 3月 08, 2007

Add whitespace around keywords.
Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2de979bd

[IPV4]: cleanup · 132adf54

由 Stephen Hemminger 提交于 3月 08, 2007

Add whitespace around keywords.
Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

132adf54

[UDP]: ipv4 whitespace cleanup · 6516c655

由 Stephen Hemminger 提交于 3月 08, 2007

Fix whitespace around keywords.
Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6516c655

[NET]: Introduce SIOCGSTAMPNS ioctl to get timestamps with nanosec resolution · ae40eb1e

由 Eric Dumazet 提交于 3月 18, 2007

Now network timestamps use ktime_t infrastructure, we can add a new
ioctl() SIOCGSTAMPNS command to get timestamps in 'struct timespec'.
User programs can thus access to nanosecond resolution.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
CC: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae40eb1e

[TCP]: Abstract out all write queue operations. · fe067e8a

由 David S. Miller 提交于 3月 07, 2007

This allows the write queue implementation to be changed,
for example, to one which allows fast interval searching.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fe067e8a

[NET] IPV4: Use hton{s,l}() where appropriate. · 4412ec49

由 YOSHIFUJI Hideaki 提交于 3月 07, 2007

Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4412ec49

[UDP]: Clean up UDP-Lite receive checksum · 759e5d00

由 Herbert Xu 提交于 3月 25, 2007

This patch eliminates some duplicate code for the verification of
receive checksums between UDP-Lite and UDP.  It does this by
introducing __skb_checksum_complete_head which is identical to
__skb_checksum_complete_head apart from the fact that it takes
a length parameter rather than computing the first skb->len bytes.

As a result UDP-Lite will be able to use hardware checksum offload
for packets which do not use partial coverage checksums.  It also
means that UDP-Lite loopback no longer does unnecessary checksum
verification.

If any NICs start support UDP-Lite this would also start working
automatically.

This patch removes the assumption that msg_flags has MSG_TRUNC clear
upon entry in recvmsg.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

759e5d00

[IPV4]: Optimize inet_getpeer() · 243bbcaa

由 Eric Dumazet 提交于 3月 06, 2007

1) Some sysctl vars are declared __read_mostly

2) We can avoid updating stack[] when doing an AVL lookup only.

    lookup() macro is extended to receive a second parameter, that may be NULL
in case of a pure lookup (no need to save the AVL path). This removes
unnecessary instructions, because compiler knows if this _stack parameter is
NULL or not.

    text size of net/ipv4/inetpeer.o is 2063 bytes instead of 2107 on x86_64
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

243bbcaa

[TCP] TCP Yeah: cleanup · 43e68392

由 Stephen Hemminger 提交于 3月 06, 2007

Eliminate need for full 6/4/64 divide to compute queue.
Variable maxqueue was really a constant.
Fix indentation.
Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

43e68392

[TCP] tcp_cubic: faster cube root · c5f5877c

由 Stephen Hemminger 提交于 3月 25, 2007

The Newton-Raphson method is quadratically convergent so
only a small fixed number of steps are necessary.
Therefore it is faster to unroll the loop. Since div64_64 is no longer
inline it won't cause code explosion.

Also fixes a bug that can occur if x^2 was bigger than 32 bits.
Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c5f5877c

[NET]: convert network timestamps to ktime_t · b7aa0bf7

由 Eric Dumazet 提交于 4月 19, 2007

We currently use a special structure (struct skb_timeval) and plain
'struct timeval' to store packet timestamps in sk_buffs and struct
sock.

This has some drawbacks :
- Fixed resolution of micro second.
- Waste of space on 64bit platforms where sizeof(struct timeval)=16

I suggest using ktime_t that is a nice abstraction of high resolution
time services, currently capable of nanosecond resolution.

As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits
a 8 byte shrink of this structure on 64bit architectures. Some other
structures also benefit from this size reduction (struct ipq in
ipv4/ip_fragment.c, struct frag_queue in ipv6/reassembly.c, ...)

Once this ktime infrastructure adopted, we can more easily provide
nanosecond resolution on top of it. (ioctl SIOCGSTAMPNS and/or
SO_TIMESTAMPNS/SCM_TIMESTAMPNS)

Note : this patch includes a bug correction in
compat_sock_get_timestamp() where a "err = 0;" was missing (so this
syscall returned -ENOENT instead of 0)
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
CC: Stephen Hemminger <shemminger@linux-foundation.org>
CC: John find <linux.kernel@free.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b7aa0bf7

[NET]: div64_64 consolidate (rev3) · 3927f2e8

由 Stephen Hemminger 提交于 3月 25, 2007

Here is the current version of the 64 bit divide common code.
Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3927f2e8

[NET]: Convert xtime.tv_sec to get_seconds() · 9d729f72

由 James Morris 提交于 3月 04, 2007

Where appropriate, convert references to xtime.tv_sec to the
get_seconds() helper function.
Signed-off-by: NJames Morris <jmorris@namei.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d729f72

[TCP]: FRTO undo response falls back to ratehalving one if ECEd · e317f6f6

由 Ilpo Järvinen 提交于 3月 02, 2007

Undoing ssthresh is disabled in fastretrans_alert whenever
FLAG_ECE is set by clearing prior_ssthresh. The clearing does
not protect FRTO because FRTO operates before fastretrans_alert.
Moving the clearing of prior_ssthresh earlier seems to be a
suboptimal solution to the FRTO case because then FLAG_ECE will
cause a second ssthresh reduction in try_to_open (the first
occurred when FRTO was entered). So instead, FRTO falls back
immediately to the rate halving response, which switches TCP to
CA_CWR state preventing the latter reduction of ssthresh.

If the first ECE arrived before the ACK after which FRTO is able
to decide RTO as spurious, prior_ssthresh is already cleared.
Thus no undoing for ssthresh occurs. Besides, FLAG_ECE should be
set also in the following ACKs resulting in rate halving response
that sees TCP is already in CA_CWR, which again prevents an extra
ssthresh reduction on that round-trip.

If the first ECE arrived before RTO, ssthresh has already been
adapted and prior_ssthresh remains cleared on entry because TCP
is in CA_CWR (the same applies also to a case where FRTO is
entered more than once and ECE comes in the middle).

High_seq must not be touched after tcp_enter_cwr because CWR
round-trip calculation depends on it.

I believe that after this patch, FRTO should be ECN-safe and
even able to take advantage of synergy benefits.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e317f6f6

[TCP]: Complete icsk-to-local-variable change (in tcp_enter_cwr) · e01f9d77

由 Ilpo Järvinen 提交于 3月 02, 2007

A local variable for icsk was created but this change was
missing. Spotted by Jarek Poplawski.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e01f9d77

[TCP]: Add two new spurious RTO responses to FRTO · 3cfe3baa

由 Ilpo Järvinen 提交于 2月 27, 2007

New sysctl tcp_frto_response is added to select amongst these
responses:
	- Rate halving based; reuses CA_CWR state (default)
	- Very conservative; used to be the only one available (=1)
	- Undo cwr; undoes ssthresh and cwnd reductions (=2)

The response with rate halving requires a new parameter to
tcp_enter_cwr because FRTO has already reduced ssthresh and
doing a second reduction there has to be prevented. In addition,
to keep things nice on 80 cols screen, a local variable was
added.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3cfe3baa

[TCP]: Correct reordering detection change (no FRTO case) · c5e7af0d

由 Ilpo Järvinen 提交于 2月 23, 2007

The reordering detection must work also when FRTO has not been
used at all which was the original intention of mine, just the
expression of the idea was flawed.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c5e7af0d

[TCP]: Keep copied_seq, rcv_wup and rcv_next together. · 54287cc1

由 Eric Dumazet 提交于 2月 22, 2007

I noticed in oprofile study a cache miss in tcp_rcv_established() to read
copied_seq.

ffffffff80400a80 <tcp_rcv_established>: /* tcp_rcv_established total: 4034293  
2.0400 */

 55493  0.0281 :ffffffff80400bc9:   mov    0x4c8(%r12),%eax copied_seq
543103  0.2746 :ffffffff80400bd1:   cmp    0x3e0(%r12),%eax   rcv_nxt    

if (tp->copied_seq == tp->rcv_nxt &&
        len - tcp_header_len <= tp->ucopy.len) {

In this function, the cache line 0x4c0 -> 0x500 is used only for this
reading 'copied_seq' field.

rcv_wup and copied_seq should be next to rcv_nxt field, to lower number of
active cache lines in hot paths. (tcp_rcv_established(), tcp_poll(), ...)

As you suggested, I changed tcp_create_openreq_child() so that these fields
are changed together, to avoid adding a new store buffer stall.

Patch is 64bit friendly (no new hole because of alignment constraints)
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

54287cc1

[TCP]: struct *sock argument renamed: sp -> sk · cf4c6bf8

由 Ilpo Järvinen 提交于 2月 22, 2007

In general, TCP code uses "sk" for struct sock pointer.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf4c6bf8

J
[TCP]: Add RFC3742 Limited Slow-Start, controlled by variable sysctl_tcp_max_ssthresh. · 886236c1
由 John Heffner 提交于 3月 25, 2007
```
Signed-off-by: NJohn Heffner <jheffner@psc.edu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
886236c1

[TCP] YeAH-TCP: algorithm implementation · 5ef81475

由 Angelo P. Castellani 提交于 2月 22, 2007

YeAH-TCP is a sender-side high-speed enabled TCP congestion control
algorithm, which uses a mixed loss/delay approach to compute the
congestion window. It's design goals target high efficiency, internal,
RTT and Reno fairness, resilience to link loss while keeping network
elements load as low as possible.

For further details look here:
http://wil.cs.caltech.edu/pfldnet2007/paper/YeAH_TCP.pdfSigned-off-by: NAngelo P. Castellani <angelo.castellani@gmail.con>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5ef81475

[TCP]: SACK enhanced FRTO · 4dc2665e

由 Ilpo Järvinen 提交于 2月 21, 2007

Implements the SACK-enhanced FRTO given in RFC4138 using the
variant given in Appendix B.

RFC4138, Appendix B:
  "This means that in order to declare timeout spurious, the TCP
   sender must receive an acknowledgment for non-retransmitted
   segment between SND.UNA and RecoveryPoint in algorithm step 3.
   RecoveryPoint is defined in conservative SACK-recovery
   algorithm [RFC3517]"

The basic version of the FRTO algorithm can still be used also
when SACK is enabled. To enabled SACK-enhanced version, tcp_frto
sysctl is set to 2.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4dc2665e

[TCP]: Prevent reordering adjustments during FRTO · 288035f9

由 Ilpo Järvinen 提交于 2月 21, 2007

To be honest, I'm not too sure how the reord stuff works in the
first place but this seems necessary.

When FRTO has been active, the one and only retransmission could
be unnecessary but the state and sending order might not be what
the sacktag code expects it to be (to work correctly).
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

288035f9

[TCP] FRTO: Fake cwnd for ssthresh callback · 66e93e45

由 Ilpo Järvinen 提交于 2月 21, 2007

TCP without FRTO would be in Loss state with small cwnd. FRTO,
however, leaves cwnd (typically) to a larger value which causes
ssthresh to become too large in case RTO is triggered again
compared to what conventional recovery would do. Because
consecutive RTOs result in only a single ssthresh reduction,
RTO+cumulative ACK+RTO pattern is required to trigger this
event.

A large comment is included for congestion control module writers
trying to figure out what CA_EVENT_FRTO handler should do because
there exists a remote possibility of incompatibility between
FRTO and module defined ssthresh functions.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66e93e45

[TCP] FRTO: Reverse RETRANS bit clearing logic · d1a54c6a

由 Ilpo Järvinen 提交于 2月 21, 2007

Previously RETRANS bits were cleared on the entry to FRTO. We
postpone that into tcp_enter_frto_loss, which is really the
place were the clearing should be done anyway. This allows
simplification of the logic from a clearing loop to the head skb
clearing only.

Besides, the other changes made in the previous patches to
tcp_use_frto made it impossible for the non-SACKed FRTO to be
entered if other than the head has been rexmitted.

With SACK-enhanced FRTO (and Appendix B), however, there can be
a number retransmissions in flight when RTO expires (same thing
could happen before this patchset also with non-SACK FRTO). To
not introduce any jumpiness into the packet counting during FRTO,
instead of clearing RETRANS bits from skbs during entry, do it
later on.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1a54c6a

[TCP] FRTO: Entry is allowed only during (New)Reno like recovery · 46d0de4e

由 Ilpo Järvinen 提交于 2月 21, 2007

This interpretation comes from RFC4138:
    "If the sender implements some loss recovery algorithm other
     than Reno or NewReno [FHG04], the F-RTO algorithm SHOULD
     NOT be entered when earlier fast recovery is underway."

I think the RFC means to say (especially in the light of
Appendix B) that ...recovery is underway (not just fast recovery)
or was underway when it was interrupted by an earlier (F-)RTO
that hasn't yet been resolved (snd_una has not advanced enough).
Thus, my interpretation is that whenever TCP has ever
retransmitted other than head, basic version cannot be used
because then the order assumptions which are used as FRTO basis
do not hold.

NewReno has only the head segment retransmitted at a time.
Therefore, walk up to the segment that has not been SACKed, if
that segment is not retransmitted nor anything before it, we know
for sure, that nothing after the non-SACKed segment should be
either. This assumption is valid because TCPCB_EVER_RETRANS does
not leave holes but each non-SACKed segment is rexmitted
in-order.

Check for retrans_out > 1 avoids more expensive walk through the
skb list, as we can know the result beforehand: F-RTO will not be
allowed.

SACKed skb can turn into non-SACked only in the extremely rare
case of SACK reneging, in this case we might fail to detect
retransmissions if there were them for any other than head. To
get rid of that feature, whole rexmit queue would have to be
walked (always) or FRTO should be prevented when SACK reneging
happens. Of course RTO should still trigger after reneging which
makes this issue even less likely to show up. And as long as the
response is as conservative as it's now, nothing bad happens even
then.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46d0de4e