提交 · 61648d91fc278fd1d500da8061d17e6920cd3500 · openeuler / raspberrypi-kernel

30 7月, 2012 2 次提交

由 Lin Ming 提交于 7月 29, 2012

The first parameter struct trie *t is not used anymore.
Remove it.
Signed-off-by: NLin Ming <mlin@ss.pku.edu.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

61648d91

ipv4: fix debug info in tnode_new · 4ea4bf7e

由 Lin Ming 提交于 7月 29, 2012

It should print size of struct rt_trie_node * allocated instead of size
of struct rt_trie_node.
Signed-off-by: NLin Ming <mlin@ss.pku.edu.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4ea4bf7e

28 7月, 2012 3 次提交

tcp: perform DMA to userspace only if there is a task waiting for it · 59ea33a6

由 Jiri Kosina 提交于 7月 27, 2012

Back in 2006, commit 1a2449a8 ("[I/OAT]: TCP recv offload to I/OAT")
added support for receive offloading to IOAT dma engine if available.

The code in tcp_rcv_established() tries to perform early DMA copy if
applicable. It however does so without checking whether the userspace
task is actually expecting the data in the buffer.

This is not a problem under normal circumstances, but there is a corner
case where this doesn't work -- and that's when MSG_TRUNC flag to
recvmsg() is used.

If the IOAT dma engine is not used, the code properly checks whether
there is a valid ucopy.task and the socket is owned by userspace, but
misses the check in the dmaengine case.

This problem can be observed in real trivially -- for example 'tbench' is a
good reproducer, as it makes a heavy use of MSG_TRUNC. On systems utilizing
IOAT, you will soon find tbench waiting indefinitely in sk_wait_data(), as they
have been already early-copied in tcp_rcv_established() using dma engine.

This patch introduces the same check we are performing in the simple
iovec copy case to the IOAT case as well. It fixes the indefinite
recvmsg(MSG_TRUNC) hangs.
Signed-off-by: NJiri Kosina <jkosina@suse.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

59ea33a6

ipv4: fix TCP early demux · 505fbcf0

由 Eric Dumazet 提交于 7月 27, 2012

commit 92101b3b (ipv4: Prepare for change of rt->rt_iif encoding.)
invalidated TCP early demux, because rx_dst_ifindex is not properly
initialized and checked.

Also remove the use of inet_iif(skb) in favor or skb->skb_iif
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

505fbcf0

tcp: Add TCP_USER_TIMEOUT negative value check · 42493570

由 Hangbin Liu 提交于 7月 26, 2012

TCP_USER_TIMEOUT is a TCP level socket option that takes an unsigned int. But
patch "tcp: Add TCP_USER_TIMEOUT socket option"(dca43c75) didn't check the negative
values. If a user assign -1 to it, the socket will set successfully and wait
for 4294967295 miliseconds. This patch add a negative value check to avoid
this issue.
Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

42493570

27 7月, 2012 2 次提交

ipv6: Early TCP socket demux · c7109986

由 Eric Dumazet 提交于 7月 26, 2012

This is the IPv6 missing bits for infrastructure added in commit
41063e9d (ipv4: Early TCP socket demux.)
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c7109986

ipv4: Fix input route performance regression. · c6cffba4

由 David S. Miller 提交于 7月 26, 2012

With the routing cache removal we lost the "noref" code paths on
input, and this can kill some routing workloads.

Reinstate the noref path when we hit a cached route in the FIB
nexthops.

With help from Eric Dumazet.
Reported-by: NAlexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c6cffba4

26 7月, 2012 1 次提交

ipv4: rt_cache_valid must check expired routes · 4331debc

由 Eric Dumazet 提交于 7月 25, 2012

commit d2d68ba9 (ipv4: Cache input routes in fib_info nexthops.)
introduced rt_cache_valid() helper. It unfortunately doesn't check if
route is expired before caching it.

I noticed sk_setup_caps() was constantly called on a tcp workload.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4331debc

25 7月, 2012 1 次提交

tcp: early_demux fixes · 9cb429d6

由 Eric Dumazet 提交于 7月 24, 2012

1) Remove a non needed pskb_may_pull() in tcp_v4_early_demux()
   and fix a potential bug if skb->head was reallocated
   (iph & th pointers were not reloaded)

TCP stack will pull/check headers anyway.

2) must reload iph in ip_rcv_finish() after early_demux()
 call since skb->head might have changed.

3) skb->dev->ifindex can be now replaced by skb->skb_iif
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9cb429d6

24 7月, 2012 6 次提交

ipv4: Change rt->rt_iif encoding. · 13378cad

由 David S. Miller 提交于 7月 23, 2012

On input packet processing, rt->rt_iif will be zero if we should
use skb->dev->ifindex.

Since we access rt->rt_iif consistently via inet_iif(), that is
the only spot whose interpretation have to adjust.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13378cad

ipv4: Prepare for change of rt->rt_iif encoding. · 92101b3b

由 David S. Miller 提交于 7月 23, 2012

Use inet_iif() consistently, and for TCP record the input interface of
cached RX dst in inet sock.

rt->rt_iif is going to be encoded differently, so that we can
legitimately cache input routes in the FIB info more aggressively.

When the input interface is "use SKB device index" the rt->rt_iif will
be set to zero.

This forces us to move the TCP RX dst cache installation into the ipv4
specific code, and as well it should since doing the route caching for
ipv6 is pointless at the moment since it is not inspected in the ipv6
input paths yet.

Also, remove the unlikely on dst->obsolete, all ipv4 dsts have
obsolete set to a non-zero value to force invocation of the check
callback.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

92101b3b

ipv4: Remove all RTCF_DIRECTSRC handliing. · fe3edf45

由 David S. Miller 提交于 7月 23, 2012

The last and final kernel user, ICMP address replies,
has been removed.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fe3edf45

ipv4: Really ignore ICMP address requests/replies. · 838942a5

由 David S. Miller 提交于 7月 23, 2012

Alexey removed kernel side support for requests, and the
only thing we do for replies is log a message if something
doesn't look right.

As Alexey's comment indicates, this belongs in userspace (if
anywhere), and thus we can safely just get rid of this code.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

838942a5

net/ipv4/ip_vti.c: Fix __rcu warnings detected by sparse. · e7d4b18c

由 Saurabh 提交于 7月 23, 2012

With CONFIG_SPARSE_RCU_POINTER=y sparse identified references which did not
specificy __rcu in ip_vti.c
Signed-off-by: NSaurabh Mohan <saurabh.mohan@vyatta.com>
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e7d4b18c

ipv4: Remove redundant assignment · 8fe5cb87

由 Lin Ming 提交于 7月 23, 2012

It is redundant to set no_addr and accept_local to 0 and then set them
with other values just after that.
Signed-off-by: NLin Ming <mlin@ss.pku.edu.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8fe5cb87

23 7月, 2012 3 次提交

tcp: dont drop MTU reduction indications · 563d34d0

由 Eric Dumazet 提交于 7月 23, 2012

ICMP messages generated in output path if frame length is bigger than
mtu are actually lost because socket is owned by user (doing the xmit)

One example is the ipgre_tunnel_xmit() calling
icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu));

We had a similar case fixed in commit a34a101e (ipv6: disable GSO on
sockets hitting dst_allfrag).

Problem of such fix is that it relied on retransmit timers, so short tcp
sessions paid a too big latency increase price.

This patch uses the tcp_release_cb() infrastructure so that MTU
reduction messages (ICMP messages) are not lost, and no extra delay
is added in TCP transmits.
Reported-by: NMaciej Żenczykowski <maze@google.com>
Diagnosed-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Tore Anderson <tore@fud.no>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

563d34d0

tcp: avoid oops in tcp_metrics and reset tcpm_stamp · 9a0a9502

由 Julian Anastasov 提交于 7月 23, 2012

	In tcp_tw_remember_stamp we incorrectly checked tw
instead of tm, it can lead to oops if the cached entry is
not found.

	tcpm_stamp was not updated in tcpm_check_stamp when
tcpm_suck_dst was called, move the update into tcpm_suck_dst,
so that we do not call it infinitely on every next cache hit
after TCP_METRICS_TIMEOUT.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a0a9502

ipv4: tcp: set unicast_sock uc_ttl to -1 · 0980e56e

由 Eric Dumazet 提交于 7月 20, 2012

Set unicast_sock uc_ttl to -1 so that we select the right ttl,
instead of sending packets with a 0 ttl.

Bug added in commit be9f4a44 (ipv4: tcp: remove per net tcp_sock)
Signed-off-by: NHiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0980e56e

21 7月, 2012 21 次提交

ipv4: Kill rt->fi · 2860583f

由 David S. Miller 提交于 7月 17, 2012

It's not really needed.

We only grabbed a reference to the fib_info for the sake of fib_info
local metrics.

However, fib_info objects are freed using RCU, as are therefore their
private metrics (if any).

We would have triggered a route cache flush if we eliminated a
reference to a fib_info object in the routing tables.

Therefore, any existing cached routes will first check and see that
they have been invalidated before an errant reference to these
metric values would occur.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2860583f

ipv4: Turn rt->rt_route_iif into rt->rt_is_input. · 9917e1e8

由 David S. Miller 提交于 7月 17, 2012

That is this value's only use, as a boolean to indicate whether
a route is an input route or not.

So implement it that way, using a u16 gap present in the struct
already.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9917e1e8

ipv4: Kill rt->rt_oif · 4fd551d7

由 David S. Miller 提交于 7月 17, 2012

Never actually used.

It was being set on output routes to the original OIF specified in the
flow key used for the lookup.

Adjust the only user, ipmr_rt_fib_lookup(), for greater correctness of
the flowi4_oif and flowi4_iif values, thanks to feedback from Julian
Anastasov.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4fd551d7

ipv4: Dirty less cache lines in route caching paths. · 93ac5341

由 David S. Miller 提交于 7月 17, 2012

Don't bother incrementing dst->__use and setting dst->lastuse,
they are completely pointless and just slow things down.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93ac5341

D
ipv4: Kill FLOWI_FLAG_RT_NOCACHE and associated code. · ba3f7f04
由 David S. Miller 提交于 7月 17, 2012
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
ba3f7f04

ipv4: Cache input routes in fib_info nexthops. · d2d68ba9

由 David S. Miller 提交于 7月 17, 2012

Caching input routes is slightly simpler than output routes, since we
don't need to be concerned with nexthop exceptions.  (locally
destined, and routed packets, never trigger PMTU events or redirects
that will be processed by us).

However, we have to elide caching for the DIRECTSRC and non-zero itag
cases.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d2d68ba9

ipv4: Cache output routes in fib_info nexthops. · f2bb4bed

由 David S. Miller 提交于 7月 17, 2012

If we have an output route that lacks nexthop exceptions, we can cache
it in the FIB info nexthop.

Such routes will have DST_HOST cleared because such routes refer to a
family of destinations, rather than just one.

The sequence of the handling of exceptions during route lookup is
adjusted to make the logic work properly.

Before we allocate the route, we lookup the exception.

Then we know if we will cache this route or not, and therefore whether
DST_HOST should be set on the allocated route.

Then we use DST_HOST to key off whether we should store the resulting
route, during rt_set_nexthop(), in the FIB nexthop cache.

With help from Eric Dumazet.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f2bb4bed

ipv4: Kill routes during PMTU/redirect updates. · ceb33206

由 David S. Miller 提交于 7月 17, 2012

Mark them obsolete so there will be a re-lookup to fetch the
FIB nexthop exception info.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ceb33206

net: Document dst->obsolete better. · f5b0a874

由 David S. Miller 提交于 7月 19, 2012

Add a big comment explaining how the field works, and use defines
instead of magic constants for the values assigned to it.

Suggested by Joe Perches.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f5b0a874

ipv4: Adjust semantics of rt->rt_gateway. · f8126f1d

由 David S. Miller 提交于 7月 13, 2012

In order to allow prefixed routes, we have to adjust how rt_gateway
is set and interpreted.

The new interpretation is:

1) rt_gateway == 0, destination is on-link, nexthop is iph->daddr

2) rt_gateway != 0, destination requires a nexthop gateway

Abstract the fetching of the proper nexthop value using a new
inline helper, rt_nexthop(), as suggested by Joe Perches.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Tested-by: NVijay Subramanian <subramanian.vijay@gmail.com>

f8126f1d

D
ipv4: Remove 'rt_dst' from 'struct rtable' · f1ce3062
由 David S. Miller 提交于 7月 12, 2012
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
f1ce3062
D
ipv4: Remove 'rt_mark' from 'struct rtable' · b4869889
由 David Miller 提交于 7月 01, 2012
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
b4869889
D
ipv4: Kill 'rt_src' from 'struct rtable' · d6c0a4f6
由 David Miller 提交于 7月 01, 2012
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
d6c0a4f6

ipv4: Remove rt_key_{src,dst,tos} from struct rtable. · 1a00fee4

由 David Miller 提交于 7月 01, 2012

They are always used in contexts where they can be reconstituted,
or where the finally resolved rt->rt_{src,dst} is semantically
equivalent.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a00fee4

ipv4: Kill ip_route_input_noref(). · 38a424e4

由 David Miller 提交于 7月 01, 2012

The "noref" argument to ip_route_input_common() is now always ignored
because we do not cache routes, and in that case we must always grab
a reference to the resulting 'dst'.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

38a424e4

ipv4: Delete routing cache. · 89aef892

由 David S. Miller 提交于 7月 17, 2012

The ipv4 routing cache is non-deterministic, performance wise, and is
subject to reasonably easy to launch denial of service attacks.

The routing cache works great for well behaved traffic, and the world
was a much friendlier place when the tradeoffs that led to the routing
cache's design were considered.

What it boils down to is that the performance of the routing cache is
a product of the traffic patterns seen by a system rather than being a
product of the contents of the routing tables.  The former of which is
controllable by external entitites.

Even for "well behaved" legitimate traffic, high volume sites can see
hit rates in the routing cache of only ~%10.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

89aef892

ipv4: show pmtu in route list · 521f5490

由 Julian Anastasov 提交于 7月 20, 2012

Override the metrics with rt_pmtu
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

521f5490

tcp: improve latencies of timer triggered events · 6f458dfb

由 Eric Dumazet 提交于 7月 20, 2012

Modern TCP stack highly depends on tcp_write_timer() having a small
latency, but current implementation doesn't exactly meet the
expectations.

When a timer fires but finds the socket is owned by the user, it rearms
itself for an additional delay hoping next run will be more
successful.

tcp_write_timer() for example uses a 50ms delay for next try, and it
defeats many attempts to get predictable TCP behavior in term of
latencies.

Use the recently introduced tcp_release_cb(), so that the user owning
the socket will call various handlers right before socket release.

This will permit us to post a followup patch to address the
tcp_tso_should_defer() syndrome (some deferred packets have to wait
RTO timer to be transmitted, while cwnd should allow us to send them
sooner)
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: John Heffner <johnwheffner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f458dfb

tcp: fix ABC in tcp_slow_start() · 9dc27415

由 Eric Dumazet 提交于 7月 20, 2012

When/if sysctl_tcp_abc > 1, we expect to increase cwnd by 2 if the
received ACK acknowledges more than 2*MSS bytes, in tcp_slow_start()

Problem is this RFC 3465 statement is not correctly coded, as
the while () loop increases snd_cwnd one by one.

Add a new variable to avoid this off-by one error.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: John Heffner <johnwheffner@gmail.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9dc27415

tcp: use hash_32() in tcp_metrics · 5815d5e7

由 Eric Dumazet 提交于 7月 19, 2012

Fix a missing roundup_pow_of_two(), since tcpmhash_entries is not
guaranteed to be a power of two.

Uses hash_32() instead of custom hash.

tcpmhash_entries should be an unsigned int.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5815d5e7

tcp: Return bool instead of int where appropriate · 67b95bd7

由 Vijay Subramanian 提交于 7月 19, 2012

Applied to a set of static inline functions in tcp_input.c
Signed-off-by: NVijay Subramanian <subramanian.vijay@gmail.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67b95bd7

20 7月, 2012 1 次提交

ipv4: Fix again the time difference calculation · f31fd383

由 Julian Anastasov 提交于 7月 19, 2012

	Fix again the diff value in rt_bind_exception
after collision of two latest patches, my original commit
actually fixed the same problem.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f31fd383