提交 · e0683e707c12a431919e1be814e15a4360523533 · openanolis / cloud-kernel

01 11月, 2012 2 次提交

tcp: make tcp_clear_md5_list static · e0683e70

由 stephen hemminger 提交于 10月 26, 2012

Trivial. Only used in one file.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0683e70

net/ipv4/ipconfig: add device address to a KERN_INFO message · 9ecd1c3d

由 Claudio Fontana 提交于 10月 25, 2012

adds a "hwaddr" to the "IP-Config: Complete" KERN_INFO message
with the dev_addr of the device selected for auto configuration.
Signed-off-by: NClaudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ecd1c3d

29 10月, 2012 2 次提交

rtnl/ipv4: add support of RTM_GETNETCONF · 9e551110

由 Nicolas Dichtel 提交于 10月 25, 2012

This message allows to get the devconf for an interface.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e551110

rtnl/ipv4: use netconf msg to advertise forwarding status · edc9e748

由 Nicolas Dichtel 提交于 10月 25, 2012

Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

edc9e748

24 10月, 2012 2 次提交

sock-diag: Report shutdown for inet and unix sockets (v2) · e4e541a8

由 Pavel Emelyanov 提交于 10月 23, 2012

Make it simple -- just put new nlattr with just sk->sk_shutdown bits.
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e4e541a8

ipv4: tcp: clean up tcp_v4_early_demux() · 45f00f99

由 Eric Dumazet 提交于 10月 22, 2012

Use same header helpers than tcp_v6_early_demux() because they
are a bit faster, and as they make IPv4/IPv6 versions look
the same.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45f00f99

23 10月, 2012 3 次提交

ipv4: 16 slots in initial fib_info hash table · d94ce9b2

由 Eric Dumazet 提交于 10月 21, 2012

A small host typically needs ~10 fib_info structures, so create initial
hash table with 16 slots instead of only one. This removes potential
false sharing and reallocs/rehashes (1->2->4->8->16)
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d94ce9b2

tcp: speedup SIOCINQ ioctl · 0e71c55c

由 Eric Dumazet 提交于 10月 21, 2012

SIOCINQ can use the lock_sock_fast() version to avoid double acquisition
of socket lock.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e71c55c

tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation · 354e4aa3

由 Eric Dumazet 提交于 10月 21, 2012

RFC 5961 5.2 [Blind Data Injection Attack].[Mitigation]

  All TCP stacks MAY implement the following mitigation.  TCP stacks
  that implement this mitigation MUST add an additional input check to
  any incoming segment.  The ACK value is considered acceptable only if
  it is in the range of ((SND.UNA - MAX.SND.WND) <= SEG.ACK <=
  SND.NXT).  All incoming segments whose ACK value doesn't satisfy the
  above condition MUST be discarded and an ACK sent back.

Move tcp_send_challenge_ack() before tcp_ack() to avoid a forward
declaration.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Jerry Chu <hkchu@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

354e4aa3

19 10月, 2012 2 次提交

tcp: fix FIONREAD/SIOCINQ · a3374c42

由 Eric Dumazet 提交于 10月 18, 2012

tcp_ioctl() tries to take into account if tcp socket received a FIN
to report correct number bytes in receive queue.

But its flaky because if the application ate the last skb,
we return 1 instead of 0.

Correct way to detect that FIN was received is to test SOCK_DONE.
Reported-by: NElliot Hughes <enh@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a3374c42

ipv4: Fix flushing of cached routing informations · 13d82bf5

由 Steffen Klassert 提交于 10月 17, 2012

Currently we can not flush cached pmtu/redirect informations via
the ipv4_sysctl_rtcache_flush sysctl. We need to check the rt_genid
of the old route and reset the nh exeption if the old route is
expired when we bind a new route to a nh exeption.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13d82bf5

13 10月, 2012 2 次提交

vti: fix sparse bit endian warnings · 8437e761

由 stephen hemminger 提交于 10月 11, 2012

Use be32_to_cpu instead of htonl to keep sparse happy.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8437e761

tcp: resets are misrouted · 4c675258

由 Alexey Kuznetsov 提交于 10月 12, 2012

After commit e2446eaa ("tcp_v4_send_reset: binding oif to iif in no
sock case").. tcp resets are always lost, when routing is asymmetric.
Yes, backing out that patch will result in misrouting of resets for
dead connections which used interface binding when were alive, but we
actually cannot do anything here.  What's died that's died and correct
handling normal unbound connections is obviously a priority.

Comment to comment:
> This has few benefits:
>   1. tcp_v6_send_reset already did that.

It was done to route resets for IPv6 link local addresses. It was a
mistake to do so for global addresses. The patch fixes this as well.

Actually, the problem appears to be even more serious than guaranteed
loss of resets.  As reported by Sergey Soloviev <sol@eqv.ru>, those
misrouted resets create a lot of arp traffic and huge amount of
unresolved arp entires putting down to knees NAT firewalls which use
asymmetric routing.
Signed-off-by: NAlexey Kuznetsov <kuznet@ms2.inr.ac.ru>

4c675258

12 10月, 2012 1 次提交

tcp: sysctl interface leaks 16 bytes of kernel memory · 0e24c4fc

由 Alan Cox 提交于 10月 11, 2012

If the rc_dereference of tcp_fastopen_ctx ever fails then we copy 16 bytes
of kernel stack into the proc result.
Signed-off-by: NAlan Cox <alan@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e24c4fc

11 10月, 2012 1 次提交

ipv4: fix route mark sparse warning · 68aaed54

由 stephen hemminger 提交于 10月 10, 2012

Sparse complains about RTA_MARK which is should be host order according
to include file and usage in iproute.

net/ipv4/route.c:2223:46: warning: incorrect type in argument 3 (different base types)
net/ipv4/route.c:2223:46: expected restricted __be32 [usertype] value
net/ipv4/route.c:2223:46: got unsigned int [unsigned] [usertype] flowic_mark
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

68aaed54

09 10月, 2012 8 次提交

ipv4: Add FLOWI_FLAG_KNOWN_NH · c92b9655

由 Julian Anastasov 提交于 10月 08, 2012

Add flag to request that output route should be
returned with known rt_gateway, in case we want to use
it as nexthop for neighbour resolving.

	The returned route can be cached as follows:

- in NH exception: because the cached routes are not shared
	with other destinations
- in FIB NH: when using gateway because all destinations for
	NH share same gateway

	As last option, to return rt_gateway!=0 we have to
set DST_NOCACHE.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c92b9655

ipv4: introduce rt_uses_gateway · 155e8336

由 Julian Anastasov 提交于 10月 08, 2012

Add new flag to remember when route is via gateway.
We will use it to allow rt_gateway to contain address of
directly connected host for the cases when DST_NOCACHE is
used or when the NH exception caches per-destination route
without DST_NOCACHE flag, i.e. when routes are not used for
other destinations. By this way we force the neighbour
resolving to work with the routed destination but we
can use different address in the packet, feature needed
for IPVS-DR where original packet for virtual IP is routed
via route to real IP.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

155e8336

ipv4: make sure nh_pcpu_rth_output is always allocated · f8a17175

由 Julian Anastasov 提交于 10月 08, 2012

Avoid checking nh_pcpu_rth_output in fast path,
abort fib_info creation on alloc_percpu failure.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8a17175

ipv4: fix forwarding for strict source routes · e0adef0f

由 Julian Anastasov 提交于 10月 08, 2012

After the change "Adjust semantics of rt->rt_gateway"
(commit f8126f1d) rt_gateway can be 0 but ip_forward() compares
it directly with nexthop. What we want here is to check if traffic
is to directly connected nexthop and to fail if using gateway.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0adef0f

ipv4: fix sending of redirects · e81da0e1

由 Julian Anastasov 提交于 10月 08, 2012

After "Cache input routes in fib_info nexthops" (commit
d2d68ba9) and "Elide fib_validate_source() completely when possible"
(commit 7a9bc9b8) we can not send ICMP redirects. It seems we
should not cache the RTCF_DOREDIRECT flag in nh_rth_input because
the same fib_info can be used for traffic that is not redirected,
eg. from other input devices or from sources that are not in same subnet.

	As result, we have to disable the caching of RTCF_DOREDIRECT
flag and to force source validation for the case when forwarding
traffic to the input device. If traffic comes from directly connected
source we allow redirection as it was done before both changes.

	Avoid setting RTCF_DOREDIRECT if IN_DEV_TX_REDIRECTS
is disabled, this can avoid source address validation and to
help caching the routes.

	After the change "Adjust semantics of rt->rt_gateway"
(commit f8126f1d) we should make sure our ICMP_REDIR_HOST messages
contain daddr instead of 0.0.0.0 when target is directly connected.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e81da0e1

ipv4: Don't report stale pmtu values to userspace · ee9a8f7a

由 Steffen Klassert 提交于 10月 08, 2012

We report cached pmtu values even if they are already expired.
Change this to not report these values after they are expired
and fix a race in the expire time calculation, as suggested by
Eric Dumazet.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ee9a8f7a

ipv4: Don't create nh exeption when the device mtu is smaller than the reported pmtu · 7f92d334

由 Steffen Klassert 提交于 10月 07, 2012

When a local tool like tracepath tries to send packets bigger than
the device mtu, we create a nh exeption and set the pmtu to device
mtu. The device mtu does not expire, so check if the device mtu is
smaller than the reported pmtu and don't crerate a nh exeption in
that case.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f92d334

ipv4: Always invalidate or update the route on pmtu events · d851c12b

由 Steffen Klassert 提交于 10月 07, 2012

Some protocols, like IPsec still cache routes. So we need to invalidate
the old route on pmtu events to avoid the reuse of stale routes.
We also need to update the mtu and expire time of the route if we already
use a nh exception route, otherwise we ignore newly learned pmtu values
after the first expiration.

With this patch we always invalidate or update the route on pmtu events.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d851c12b

06 10月, 2012 1 次提交

sections: fix section conflicts in net · 04a6f82c

由 Andi Kleen 提交于 10月 04, 2012

Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

04a6f82c

05 10月, 2012 1 次提交

ipv4: add a fib_type to fib_info · f4ef85bb

由 Eric Dumazet 提交于 10月 04, 2012

commit d2d68ba9 (ipv4: Cache input routes in fib_info nexthops.)
introduced a regression for forwarding.

This was hard to reproduce but the symptom was that packets were
delivered to local host instead of being forwarded.

David suggested to add fib_type to fib_info so that we dont
inadvertently share same fib_info for different purposes.

With help from Julian Anastasov who provided very helpful
hints, reproduced here :

<quote>
        Can it be a problem related to fib_info reuse
from different routes. For example, when local IP address
is created for subnet we have:

broadcast 192.168.0.255 dev DEV  proto kernel  scope link  src
192.168.0.1
192.168.0.0/24 dev DEV  proto kernel  scope link  src 192.168.0.1
local 192.168.0.1 dev DEV  proto kernel  scope host  src 192.168.0.1

        The "dev DEV  proto kernel  scope link  src 192.168.0.1" is
a reused fib_info structure where we put cached routes.
The result can be same fib_info for 192.168.0.255 and
192.168.0.0/24. RTN_BROADCAST is cached only for input
routes. Incoming broadcast to 192.168.0.255 can be cached
and can cause problems for traffic forwarded to 192.168.0.0/24.
So, this patch should solve the problem because it
separates the broadcast from unicast traffic.

        And the ip_route_input_slow caching will work for
local and broadcast input routes (above routes 1 and 3) just
because they differ in scope and use different fib_info.

</quote>

Many thanks to Chris Clayton for his patience and help.
Reported-by: NChris Clayton <chris2553@googlemail.com>
Bisected-by: NChris Clayton <chris2553@googlemail.com>
Reported-by: NDave Jones <davej@redhat.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Tested-by: NChris Clayton <chris2553@googlemail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f4ef85bb

02 10月, 2012 4 次提交

igmp: export symbol ip_mc_leave_group · 193ba924

由 stephen hemminger 提交于 10月 01, 2012

Needed for VXLAN.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

193ba924

gre: fix sparse warning · 9fbef059

由 stephen hemminger 提交于 10月 01, 2012

Use be16 consistently when looking at flags.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9fbef059

ipv4: gre: add GRO capability · 60769a5d

由 Eric Dumazet 提交于 9月 27, 2012

Add GRO capability to IPv4 GRE tunnels, using the gro_cells
infrastructure.

Tested using IPv4 and IPv6 TCP traffic inside this tunnel, and
checking GRO is building large packets.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60769a5d

tcp: gro: add checksuming helpers · 861b6501

由 Eric Dumazet 提交于 9月 27, 2012

skb with CHECKSUM_NONE cant currently be handled by GRO, and
we notice this deep in GRO stack in tcp[46]_gro_receive()

But there are cases where GRO can be a benefit, even with a lack
of checksums.

This preliminary work is needed to add GRO support
to tunnels.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

861b6501

28 9月, 2012 6 次提交

inetpeer: fix token initialization · bc9259a8

由 Nicolas Dichtel 提交于 9月 27, 2012

When jiffies wraps around (for example, 5 minutes after the boot, see
INITIAL_JIFFIES) and peer has just been created, now - peer->rate_last can be
< XRLIM_BURST_FACTOR * timeout, so token is not set to the maximum value, thus
some icmp packets can be unexpectedly dropped.

Fix this case by initializing last_rate to 60 seconds in the past.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc9259a8

tcp: Remove unused parameter from tcp_v4_save_options · 5dff747b

由 Christoph Paasch 提交于 9月 26, 2012

struct sock *sk is not used inside tcp_v4_save_options. Thus it can be
removed.
Signed-off-by: NChristoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5dff747b

tunnel: drop packet if ECN present with not-ECT · eccc1bb8

由 stephen hemminger 提交于 9月 25, 2012

Linux tunnels were written before RFC6040 and therefore never
implemented the corner case of ECN getting set in the outer header
and the inner header not being ready for it.

Section 4.2.  Default Tunnel Egress Behaviour.
 o If the inner ECN field is Not-ECT, the decapsulator MUST NOT
      propagate any other ECN codepoint onwards.  This is because the
      inner Not-ECT marking is set by transports that rely on dropped
      packets as an indication of congestion and would not understand or
      respond to any other ECN codepoint [RFC4774].  Specifically:

      *  If the inner ECN field is Not-ECT and the outer ECN field is
         CE, the decapsulator MUST drop the packet.

      *  If the inner ECN field is Not-ECT and the outer ECN field is
         Not-ECT, ECT(0), or ECT(1), the decapsulator MUST forward the
         outgoing packet with the ECN field cleared to Not-ECT.

This patch moves the ECN decap logic out of the individual tunnels
into a common place.

It also adds logging to allow detecting broken systems that
set ECN bits incorrectly when tunneling (or an intermediate
router might be changing the header).

Overloads rx_frame_error to keep track of ECN related error.

Thanks to Chris Wright who caught this while reviewing the new VXLAN
tunnel.

This code was tested by injecting faulty logic in other end GRE
to send incorrectly encapsulated packets.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eccc1bb8

xfrm: remove extranous rcu_read_lock · b0558ef2

由 stephen hemminger 提交于 9月 24, 2012

The handlers for xfrm_tunnel are always invoked with rcu read lock
already.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b0558ef2

gre: remove unnecessary rcu_read_lock/unlock · 0c5794a6

由 stephen hemminger 提交于 9月 24, 2012

The gre function pointers for receive and error handling are
always called (from gre.c) with rcu_read_lock already held.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c5794a6

gre: fix handling of key 0 · d2083287

由 stephen hemminger 提交于 9月 24, 2012

GRE driver incorrectly uses zero as a flag value. Zero is a perfectly
valid value for key, and the tunnel should match packets with no key only
with tunnels created without key, and vice versa.

This is a slightly visible  change since previously it might be possible to
construct a working tunnel that sent key 0 and received only because
of the key wildcard of zero.  I.e the sender sent key of zero, but tunnel
was defined without key.

Note: using gre key 0 requires iproute2 utilities v3.2 or later.
The original utility code was broken as well.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d2083287

26 9月, 2012 1 次提交

ipconfig: fix trivial build error · 842b08bb

由 Andy Shevchenko 提交于 9月 24, 2012

The commit 5e953778 ("ipconfig: add nameserver
IPs to kernel-parameter ip=") introduces ic_nameservers_predef() that defined
only for BOOTP. However it is used by ip_auto_config_setup() as well. This
patch moves it outside of #ifdef BOOTP.
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Christoph Fritz <chf.fritz@googlemail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

842b08bb

25 9月, 2012 2 次提交

net: raw: revert unrelated change · 8489c1d9

由 Eric Dumazet 提交于 9月 25, 2012

Commit 5640f768 ("net: use a per task frag allocator")
accidentally contained an unrelated change to net/ipv4/raw.c,
later committed (without the pr_err() debugging bits) in
net tree as commit ab43ed8b (ipv4: raw: fix icmp_filter())

This patch reverts this glitch, noticed by Stephen Rothwell.
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8489c1d9

net: use a per task frag allocator · 5640f768

由 Eric Dumazet 提交于 9月 23, 2012

We currently use a per socket order-0 page cache for tcp_sendmsg()
operations.

This page is used to build fragments for skbs.

Its done to increase probability of coalescing small write() into
single segments in skbs still in write queue (not yet sent)

But it wastes a lot of memory for applications handling many mostly
idle sockets, since each socket holds one page in sk->sk_sndmsg_page

Its also quite inefficient to build TSO 64KB packets, because we need
about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit
page allocator more than wanted.

This patch adds a per task frag allocator and uses bigger pages,
if available. An automatic fallback is done in case of memory pressure.

(up to 32768 bytes per frag, thats order-3 pages on x86)

This increases TCP stream performance by 20% on loopback device,
but also benefits on other network devices, since 8x less frags are
mapped on transmit and unmapped on tx completion. Alexander Duyck
mentioned a probable performance win on systems with IOMMU enabled.

Its possible some SG enabled hardware cant cope with bigger fragments,
but their ndo_start_xmit() should already handle this, splitting a
fragment in sub fragments, since some arches have PAGE_SIZE=65536

Successfully tested on various ethernet devices.
(ixgbe, igb, bnx2x, tg3, mellanox mlx4)
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NVijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5640f768

23 9月, 2012 2 次提交

tcp: TCP Fast Open Server - record retransmits after 3WHS · 30099b2e

由 Neal Cardwell 提交于 9月 22, 2012

When recording the number of SYNACK retransmits for servers using TCP
Fast Open, fix the code to ensure that we copy over the retransmit
count from the request_sock after we receive the ACK that completes
the 3-way handshake.

The story here is similar to that of SYNACK RTT
measurements. Previously we were always doing this in
tcp_v4_syn_recv_sock(). However, for TCP Fast Open connections
tcp_v4_conn_req_fastopen() calls tcp_v4_syn_recv_sock() at the time we
receive the SYN. So for TFO we must copy the final SYNACK retransmit
count in tcp_rcv_state_process().

Note that copying over the SYNACK retransmit count will give us the
correct count since, as is mentioned in a comment in
tcp_retransmit_timer(), before we receive an ACK for our SYN-ACK a TFO
passive connection does not retransmit anything else (e.g., data or
FIN segments).
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

30099b2e

tcp: TCP Fast Open Server - call tcp_validate_incoming() for all packets · e69bebde

由 Neal Cardwell 提交于 9月 22, 2012

A TCP Fast Open (TFO) passive connection must call both
tcp_check_req() and tcp_validate_incoming() for all incoming ACKs that
are attempting to complete the 3WHS.

This is needed to parallel all the action that happens for a non-TFO
connection, where for an ACK that is attempting to complete the 3WHS
we call both tcp_check_req() and tcp_validate_incoming().

For example, upon receiving the ACK that completes the 3WHS, we need
to call tcp_fast_parse_options() and update ts_recent based on the
incoming timestamp value in the ACK.

One symptom of the problem with the previous code was that for passive
TFO connections using TCP timestamps, the outgoing TS ecr values
ignored the incoming TS val value on the ACK that completed the 3WHS.
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e69bebde

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功