提交 · 3be0686b6e2f953afe83626e871b4a7b0ceae49b · openanolis / cloud-kernel

08 3月, 2011 2 次提交

ipv4: Inline fib_semantic_match into check_leaf · 3be0686b

由 David S. Miller 提交于 3月 07, 2011

This elimiates a lot of pure overhead due to parameter
passing.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3be0686b

ipv4: Validate route entry type at insert instead of every lookup. · 4c8237cd

由 David S. Miller 提交于 3月 07, 2011

fib_semantic_match() requires that if the type doesn't signal an
automatic error, it must be of type RTN_UNICAST, RTN_LOCAL,
RTN_BROADCAST, RTN_ANYCAST, or RTN_MULTICAST.

Checking this every route lookup is pointless work.

Instead validate it during route insertion, via fib_create_info().

Also, there was nothing making sure the type value was less than
RTN_MAX, so add that missing check while we're here.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4c8237cd

05 3月, 2011 5 次提交

ipv4: Remove flowi from struct rtable. · 5e2b61f7

由 David S. Miller 提交于 3月 04, 2011

The only necessary parts are the src/dst addresses, the
interface indexes, the TOS, and the mark.

The rest is unnecessary bloat, which amounts to nearly
50 bytes on 64-bit.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e2b61f7

ipv4: Set rt->rt_iif more sanely on output routes. · 1018b5c0

由 David S. Miller 提交于 3月 04, 2011

rt->rt_iif is only ever inspected on input routes, for example DCCP
uses this to populate a route lookup flow key when generating replies
to another packet.

Therefore, setting it to anything other than zero on output routes
makes no sense.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1018b5c0

ipv4: Get peer more cheaply in rt_init_metrics(). · 3c0afdca

由 David S. Miller 提交于 3月 04, 2011

We know this is a new route object, so doing atomics and
stuff makes no sense at all.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c0afdca

ipv4: Optimize flow initialization in output route lookup. · 44713b67

由 David S. Miller 提交于 3月 04, 2011

We burn a lot of useless cycles, cpu store buffer traffic, and
memory operations memset()'ing the on-stack flow used to perform
output route lookups in __ip_route_output_key().

Only the first half of the flow object members even matter for
output route lookups in this context, specifically:

FIB rules matching cares about:

	dst, src, tos, iif, oif, mark

FIB trie lookup cares about:

	dst

FIB semantic match cares about:

	tos, scope, oif

Therefore only initialize these specific members and elide the
memset entirely.

On Niagara2 this kills about ~300 cycles from the output route
lookup path.

Likely, we can take things further, since all callers of output
route lookups essentially throw away the on-stack flow they use.
So they don't care if we use it as a scratch-pad to compute the
final flow key.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>

44713b67

inetpeer: seqlock optimization · 65e8354e

由 Eric Dumazet 提交于 3月 04, 2011

David noticed :

------------------
Eric, I was profiling the non-routing-cache case and something that
stuck out is the case of calling inet_getpeer() with create==0.

If an entry is not found, we have to redo the lookup under a spinlock
to make certain that a concurrent writer rebalancing the tree does
not "hide" an existing entry from us.

This makes the case of a create==0 lookup for a not-present entry
really expensive.  It is on the order of 600 cpu cycles on my
Niagara2.

I added a hack to not do the relookup under the lock when create==0
and it now costs less than 300 cycles.

This is now a pretty common operation with the way we handle COW'd
metrics, so I think it's definitely worth optimizing.
-----------------

One solution is to use a seqlock instead of a spinlock to protect struct
inet_peer_base.

After a failed avl tree lookup, we can easily detect if a writer did
some changes during our lookup. Taking the lock and redo the lookup is
only necessary in this case.

Note: Add one private rcu_deref_locked() macro to place in one spot the
access to spinlock included in seqlock.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

65e8354e

04 3月, 2011 2 次提交

ipv4: Fix __ip_dev_find() to use ifa_local instead of ifa_address. · e066008b

由 David S. Miller 提交于 3月 03, 2011

Reported-by: NStephen Hemminger <shemminger@vyatta.com>
Reported-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e066008b

ipv4: Fix crash in dst_release when udp_sendmsg route lookup fails. · 06dc94b1

由 David S. Miller 提交于 3月 03, 2011

As reported by Eric:

[11483.697233] IP: [<c12b0638>] dst_release+0x18/0x60
 ...
[11483.697741] Call Trace:
[11483.697764]  [<c12fc9d2>] udp_sendmsg+0x282/0x6e0
[11483.697790]  [<c12a1c01>] ? memcpy_toiovec+0x51/0x70
[11483.697818]  [<c12dbd90>] ? ip_generic_getfrag+0x0/0xb0

The pointer passed to dst_release() is -EINVAL, that's because
we leave an error pointer in the local variable "rt" by accident.

NULL it out to fix the bug.
Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

06dc94b1

03 3月, 2011 3 次提交
- D
  ipv4: ip_route_output_key() is better as an inline. · 5bfa787f
  由 David S. Miller 提交于 3月 02, 2011
```
This avoid a stack frame at zero cost.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  5bfa787f
- D
  ipv4: Make output route lookup return rtable directly. · b23dd4fe
  由 David S. Miller 提交于 3月 02, 2011
```
Instead of on the stack.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  b23dd4fe
- D
  xfrm: Return dst directly from xfrm_lookup() · 452edd59
  由 David S. Miller 提交于 3月 02, 2011
```
Instead of on the stack.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  452edd59
02 3月, 2011 13 次提交

inet: Replace left-over references to inet->cork · 07df5294

由 Herbert Xu 提交于 3月 01, 2011

The patch to replace inet->cork with cork left out two spots in
__ip_append_data that can result in bogus packet construction.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

07df5294

ipv4: Make icmp route lookup code a bit clearer. · f6d460cf

由 David S. Miller 提交于 3月 01, 2011

The route lookup code in icmp_send() is slightly tricky as a result of
having to handle all of the requirements of RFC 4301 host relookups.

Pull the route resolution into a seperate function, so that the error
handling and route reference counting is hopefully easier to see and
contained wholly within this new routine.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6d460cf

xfrm: Handle blackhole route creation via afinfo. · 2774c131

由 David S. Miller 提交于 3月 01, 2011

That way we don't have to potentially do this in every xfrm_lookup()
caller.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2774c131

xfrm: Kill XFRM_LOOKUP_WAIT flag. · 80c0bc9e

由 David S. Miller 提交于 3月 01, 2011

This can be determined from the flow flags instead.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

80c0bc9e

ipv4: Kill can_sleep arg to ip_route_output_flow() · 273447b3

由 David S. Miller 提交于 3月 01, 2011

This boolean state is now available in the flow flags.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

273447b3

net: Add FLOWI_FLAG_CAN_SLEEP. · 5df65e55

由 David S. Miller 提交于 3月 01, 2011

And set is in contexts where the route resolution can sleep.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5df65e55

D
ipv4: Make final arg to ip_route_output_flow to be boolean "can_sleep" · 420d44da
由 David S. Miller 提交于 3月 01, 2011
```
Since that is what the current vague "flags" argument means.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
420d44da
D
ipv4: Can final ip_route_connect() arg to boolean "can_sleep". · abdf7e72
由 David S. Miller 提交于 3月 01, 2011
```
Since that's what the current vague "flags" thing means.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
abdf7e72

udp: Add lockless transmit path · 903ab86d

由 Herbert Xu 提交于 3月 01, 2011

The UDP transmit path has been running under the socket lock
for a long time because of the corking feature.  This means that
transmitting to the same socket in multiple threads does not
scale at all.

However, as most users don't actually use corking, the locking
can be removed in the common case.

This patch creates a lockless fast path where corking is not used.

Please note that this does create a slight inaccuracy in the
enforcement of socket send buffer limits.  In particular, we
may exceed the socket limit by up to (number of CPUs) * (packet
size) because of the way the limit is computed.

As the primary purpose of socket buffers is to indicate congestion,
this should not be a great problem for now.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

903ab86d

udp: Switch to ip_finish_skb · f6b9664f

由 Herbert Xu 提交于 3月 01, 2011

This patch converts UDP to use the new ip_finish_skb API.  This
would then allows us to more easily use ip_make_skb which allows
UDP to run without a socket lock.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6b9664f

inet: Add ip_make_skb and ip_finish_skb · 1c32c5ad

由 Herbert Xu 提交于 3月 01, 2011

This patch adds the helper ip_make_skb which is like ip_append_data
and ip_push_pending_frames all rolled into one, except that it does
not send the skb produced.  The sending part is carried out by
ip_send_skb, which the transport protocol can call after it has
tweaked the skb.

It is meant to be called in cases where corking is not used should
have a one-to-one correspondence to sendmsg.

This patch also adds the helper ip_finish_skb which is meant to
be replace ip_push_pending_frames when corking is required.
Previously the protocol stack would peek at the socket write
queue and add its header to the first packet.  With ip_finish_skb,
the protocol stack can directly operate on the final skb instead,
just like the non-corking case with ip_make_skb.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1c32c5ad

inet: Remove explicit write references to sk/inet in ip_append_data · 1470ddf7

由 Herbert Xu 提交于 3月 01, 2011

In order to allow simultaneous calls to ip_append_data on the same
socket, it must not modify any shared state in sk or inet (other
than those that are designed to allow that such as atomic counters).

This patch abstracts out write references to sk and inet_sk in
ip_append_data and its friends so that we may use the underlying
code in parallel.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1470ddf7

inet: Remove unused sk_sndmsg_* from UFO · 5a2ef920

由 Herbert Xu 提交于 3月 01, 2011

UFO doesn't really use the sk_sndmsg_* parameters so touching
them is pointless.  It can't use them anyway since the whole
point of UFO is to use the original pages without copying.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a2ef920

25 2月, 2011 1 次提交

ipv4: Rearrange how ip_route_newports() gets port keys. · dca8b089

由 David S. Miller 提交于 2月 24, 2011

ip_route_newports() is the only place in the entire kernel that
cares about the port members in the routing cache entry's lookup
flow key.

Therefore the only reason we store an entire flow inside of the
struct rtentry is for this one special case.

Rewrite ip_route_newports() such that:

1) The caller passes in the original port values, so we don't need
   to use the rth->fl.fl_ip_{s,d}port values to remember them.

2) The lookup flow is constructed by hand instead of being copied
   from the routing cache entry's flow.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dca8b089

24 2月, 2011 2 次提交
- D
  xfrm: Const'ify address arguments to ->dst_lookup() · 5e6b930f
  由 David S. Miller 提交于 2月 24, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  5e6b930f
- D
  xfrm: Const'ify tmpl and address arguments to ->init_temprop() · 19bd6244
  由 David S. Miller 提交于 2月 24, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  19bd6244
23 2月, 2011 3 次提交
- D
  xfrm: Mark flowi arg to ->init_tempsel() const. · 73e5ebb2
  由 David S. Miller 提交于 2月 22, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  73e5ebb2
- D
  xfrm: Mark flowi arg to ->fill_dst() const. · 0c7b3eef
  由 David S. Miller 提交于 2月 22, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  0c7b3eef
- D
  xfrm: Mark flowi arg to ->get_tos() const. · 05d84025
  由 David S. Miller 提交于 2月 22, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  05d84025
22 2月, 2011 1 次提交

tcp: undo_retrans counter fixes · c24f691b

由 Yuchung Cheng 提交于 2月 07, 2011

Fix a bug that undo_retrans is incorrectly decremented when undo_marker is
not set or undo_retrans is already 0. This happens when sender receives
more DSACK ACKs than packets retransmitted during the current
undo phase. This may also happen when sender receives DSACK after
the undo operation is completed or cancelled.

Fix another bug that undo_retrans is incorrectly incremented when
sender retransmits an skb and tcp_skb_pcount(skb) > 1 (TSO). This case
is rare but not impossible.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c24f691b

21 2月, 2011 1 次提交

tcp: Remove debug macro of TCP_CHECK_TIMER · 089c3482

由 Shan Wei 提交于 2月 19, 2011

Now, TCP_CHECK_TIMER is not used for debuging, it does nothing.
And, it has been there for several years, maybe 6 years.

Remove it to keep code clearer.
Signed-off-by: NShan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

089c3482

20 2月, 2011 1 次提交

tcp: fix inet_twsk_deschedule() · 91035f0b

由 Eric Dumazet 提交于 2月 18, 2011

Eric W. Biederman reported a lockdep splat in inet_twsk_deschedule()

This is caused by inet_twsk_purge(), run from process context,
and commit 575f4cd5 (net: Use rcu lookups in inet_twsk_purge.)
removed the BH disabling that was necessary.

Add the BH disabling but fine grained, right before calling
inet_twsk_deschedule(), instead of whole function.

With help from Linus Torvalds and Eric W. Biederman
Reported-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Daniel Lezcano <daniel.lezcano@free.fr>
CC: Pavel Emelyanov <xemul@openvz.org>
CC: Arnaldo Carvalho de Melo <acme@redhat.com>
CC: stable <stable@kernel.org> (# 2.6.33+)
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

91035f0b

19 2月, 2011 3 次提交

D
ipv4: Implement __ip_dev_find using new interface address hash. · 9435eb1c
由 David S. Miller 提交于 2月 18, 2011
```
Much quicker than going through the FIB tables.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
9435eb1c

ipv4: Add hash table of interface addresses. · fd23c3b3

由 David S. Miller 提交于 2月 18, 2011

This will be used to optimize __ip_dev_find() and friends.

With help from Eric Dumazet.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd23c3b3

net: provide default_advmss() methods to blackhole dst_ops · 214f45c9

由 Eric Dumazet 提交于 2月 18, 2011

Commit 0dbaee3b (net: Abstract default ADVMSS behind an
accessor.) introduced a possible crash in tcp_connect_init(), when
dst->default_advmss() is called from dst_metric_advmss()
Reported-by: NGeorge Spelvin <linux@horizon.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

214f45c9

18 2月, 2011 3 次提交

ipv4: Use const'ify fib_result deep in the route call chains. · 982721f3

由 David S. Miller 提交于 2月 16, 2011

The only troublesome bit here is __mkroute_output which wants
to override res->fi and res->type, compute those in local
variables instead.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

982721f3

ipv4: Avoid use of signed integers in fib_trie code. · 3b004569

由 David S. Miller 提交于 2月 16, 2011

GCC emits all kinds of crazy zero extensions when we go from signed
int, to unsigned short, etc. etc.

This transformation has to be legal because:

1) In tkey_extract_bits() in mask_pfx(), the values are used to
   perform shifts, on which negative values are undefined by C.

2) In fib_table_lookup() we perform comparisons with unsigned
   values, constants, and additions.  None of which should
   encounter negative values.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b004569

net: Add initial_ref arg to dst_alloc(). · 3c7bd1a1

由 David S. Miller 提交于 2月 16, 2011

This allows avoiding multiple writes to the initial __refcnt.

The most simplest cases of wanting an initial reference of "1"
in ipv4 and ipv6 have been converted, the rest have been left
along and kept at the existing "0".
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c7bd1a1

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功