提交 · 78fbfd8a653ca972afe479517a40661bfff6d8c3 · openanolis / cloud-kernel

13 3月, 2011 1 次提交

ipv4: Create and use route lookup helpers. · 78fbfd8a

由 David S. Miller 提交于 3月 12, 2011

The idea here is this minimizes the number of places one has to edit
in order to make changes to how flows are defined and used.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

78fbfd8a

11 3月, 2011 3 次提交

D
ipv4: Kill flowi arg to fib_select_multipath() · 1b7fe593
由 David S. Miller 提交于 3月 10, 2011
```
Completely unused.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
1b7fe593

ipv4: Remove unnecessary test from ip_mkroute_input() · ff3fccb3

由 David S. Miller 提交于 3月 10, 2011

fl->oif will always be zero on the input path, so there is no reason
to test for that.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff3fccb3

ipv4: Remove redundant RCU locking in ip_check_mc(). · dbdd9a52

由 David S. Miller 提交于 3月 10, 2011

All callers are under rcu_read_lock() protection already.

Rename to ip_check_mc_rcu() to make it even more clear.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dbdd9a52

10 3月, 2011 7 次提交

tcp: mark tcp_congestion_ops read_mostly · a252bebe

由 Stephen Hemminger 提交于 3月 10, 2011

Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a252bebe

ipv4: Optimize flow initialization in fib_validate_source(). · cc7e17ea

由 David S. Miller 提交于 3月 09, 2011

Like in commit 44713b67
("ipv4: Optimize flow initialization in output route lookup."
we can optimize the on-stack flow setup to only initialize
the members which are actually used.

Otherwise we bzero the entire structure, then initialize
explicitly the first half of it.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc7e17ea

ipv4: Optimize flow initialization in input route lookup. · 67e28ffd

由 David S. Miller 提交于 3月 09, 2011

Like in commit 44713b67
("ipv4: Optimize flow initialization in output route lookup."
we can optimize the on-stack flow setup to only initialize
the members which are actually used.

Otherwise we bzero the entire structure, then initialize
explicitly the first half of it.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67e28ffd

net: don't allow CAP_NET_ADMIN to load non-netdev kernel modules · 8909c9ad

由 Vasiliy Kulikov 提交于 3月 02, 2011

Since a8f80e8f any process with
CAP_NET_ADMIN may load any module from /lib/modules/.  This doesn't mean
that CAP_NET_ADMIN is a superset of CAP_SYS_MODULE as modules are
limited to /lib/modules/**.  However, CAP_NET_ADMIN capability shouldn't
allow anybody load any module not related to networking.

This patch restricts an ability of autoloading modules to netdev modules
with explicit aliases.  This fixes CVE-2011-1019.

Arnd Bergmann suggested to leave untouched the old pre-v2.6.32 behavior
of loading netdev modules by name (without any prefix) for processes
with CAP_SYS_MODULE to maintain the compatibility with network scripts
that use autoloading netdev modules by aliases like "eth0", "wlan0".

Currently there are only three users of the feature in the upstream
kernel: ipip, ip_gre and sit.

    root@albatros:~# capsh --drop=$(seq -s, 0 11),$(seq -s, 13 34) --
    root@albatros:~# grep Cap /proc/$$/status
    CapInh:	0000000000000000
    CapPrm:	fffffff800001000
    CapEff:	fffffff800001000
    CapBnd:	fffffff800001000
    root@albatros:~# modprobe xfs
    FATAL: Error inserting xfs
    (/lib/modules/2.6.38-rc6-00001-g2bf4ca3/kernel/fs/xfs/xfs.ko): Operation not permitted
    root@albatros:~# lsmod | grep xfs
    root@albatros:~# ifconfig xfs
    xfs: error fetching interface information: Device not found
    root@albatros:~# lsmod | grep xfs
    root@albatros:~# lsmod | grep sit
    root@albatros:~# ifconfig sit
    sit: error fetching interface information: Device not found
    root@albatros:~# lsmod | grep sit
    root@albatros:~# ifconfig sit0
    sit0      Link encap:IPv6-in-IPv4
	      NOARP  MTU:1480  Metric:1

    root@albatros:~# lsmod | grep sit
    sit                    10457  0
    tunnel4                 2957  1 sit

For CAP_SYS_MODULE module loading is still relaxed:

    root@albatros:~# grep Cap /proc/$$/status
    CapInh:	0000000000000000
    CapPrm:	ffffffffffffffff
    CapEff:	ffffffffffffffff
    CapBnd:	ffffffffffffffff
    root@albatros:~# ifconfig xfs
    xfs: error fetching interface information: Device not found
    root@albatros:~# lsmod | grep xfs
    xfs                   745319  0

Reference: https://lkml.org/lkml/2011/2/24/203Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
Signed-off-by: NMichael Tokarev <mjt@tls.msk.ru>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NKees Cook <kees.cook@canonical.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

8909c9ad

tcp: ioctl type SIOCOUTQNSD returns amount of data not sent · 2f4e1b39

由 Mario Schuknecht 提交于 3月 09, 2011

In contrast to SIOCOUTQ which returns the amount of data sent
but not yet acknowledged plus data not yet sent this patch only
returns the data not sent.

For various methods of live streaming bitrate control it may
be helpful to know how much data are in the tcp outqueue are
not sent yet.
Signed-off-by: NMario Schuknecht <m.schuknecht@dresearch.de>
Signed-off-by: NSteffen Sledz <sledz@dresearch.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f4e1b39

ipv4: Lookup multicast routes by rtable using helper. · ee3f1aaf

由 David S. Miller 提交于 3月 09, 2011

Create a common helper for this operation, since we do
it identically in three spots.

Suggested by Eric Dumazet.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ee3f1aaf

ipv4: Fix erroneous uses of ifa_address. · 6c91afe1

由 David S. Miller 提交于 3月 09, 2011

In usual cases ifa_address == ifa_local, but in the case where
SIOCSIFDSTADDR sets the destination address on a point-to-point
link, ifa_address gets set to that destination address.

Therefore we should use ifa_local when we want the local interface
address.

There were two cases where the selection was done incorrectly:

1) When devinet_ioctl() does matching, it checks ifa_address even
   though gifconf correct reported ifa_local to the user

2) IN_DEV_ARP_NOTIFY handling sends a gratuitous ARP using
   ifa_address instead of ifa_local.
Reported-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c91afe1

09 3月, 2011 2 次提交

inetpeer: Don't disable BH for initial fast RCU lookup. · 7b46ac4e

由 David S. Miller 提交于 3月 08, 2011

If modifications on other cpus are ok, then modifications to
the tree during lookup done by the local cpu are ok too.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7b46ac4e

ipv4: Fix scope value used in route src-address caching. · a7ac8fc1

由 David S. Miller 提交于 3月 08, 2011

We have to use cfg->fc_scope not the final nh_scope value.
Reported-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a7ac8fc1

08 3月, 2011 3 次提交

ipv4: Cache source address in nexthop entries. · 1fc050a1

由 David S. Miller 提交于 3月 07, 2011

When doing output route lookups, we have to select the source address
if the user has not specified an explicit one.

First, if the route has an explicit preferred source address
specified, then we use that.

Otherwise we search the route's outgoing interface for a suitable
address.

This search can be precomputed and cached at route insertion time.

The only missing part is that we have to refresh this precomputed
value any time addresses are added or removed from the interface, and
this is accomplished by fib_update_nh_saddrs().
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1fc050a1

ipv4: Inline fib_semantic_match into check_leaf · 3be0686b

由 David S. Miller 提交于 3月 07, 2011

This elimiates a lot of pure overhead due to parameter
passing.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3be0686b

ipv4: Validate route entry type at insert instead of every lookup. · 4c8237cd

由 David S. Miller 提交于 3月 07, 2011

fib_semantic_match() requires that if the type doesn't signal an
automatic error, it must be of type RTN_UNICAST, RTN_LOCAL,
RTN_BROADCAST, RTN_ANYCAST, or RTN_MULTICAST.

Checking this every route lookup is pointless work.

Instead validate it during route insertion, via fib_create_info().

Also, there was nothing making sure the type value was less than
RTN_MAX, so add that missing check while we're here.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4c8237cd

05 3月, 2011 5 次提交

ipv4: Remove flowi from struct rtable. · 5e2b61f7

由 David S. Miller 提交于 3月 04, 2011

The only necessary parts are the src/dst addresses, the
interface indexes, the TOS, and the mark.

The rest is unnecessary bloat, which amounts to nearly
50 bytes on 64-bit.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e2b61f7

ipv4: Set rt->rt_iif more sanely on output routes. · 1018b5c0

由 David S. Miller 提交于 3月 04, 2011

rt->rt_iif is only ever inspected on input routes, for example DCCP
uses this to populate a route lookup flow key when generating replies
to another packet.

Therefore, setting it to anything other than zero on output routes
makes no sense.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1018b5c0

ipv4: Get peer more cheaply in rt_init_metrics(). · 3c0afdca

由 David S. Miller 提交于 3月 04, 2011

We know this is a new route object, so doing atomics and
stuff makes no sense at all.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c0afdca

ipv4: Optimize flow initialization in output route lookup. · 44713b67

由 David S. Miller 提交于 3月 04, 2011

We burn a lot of useless cycles, cpu store buffer traffic, and
memory operations memset()'ing the on-stack flow used to perform
output route lookups in __ip_route_output_key().

Only the first half of the flow object members even matter for
output route lookups in this context, specifically:

FIB rules matching cares about:

	dst, src, tos, iif, oif, mark

FIB trie lookup cares about:

	dst

FIB semantic match cares about:

	tos, scope, oif

Therefore only initialize these specific members and elide the
memset entirely.

On Niagara2 this kills about ~300 cycles from the output route
lookup path.

Likely, we can take things further, since all callers of output
route lookups essentially throw away the on-stack flow they use.
So they don't care if we use it as a scratch-pad to compute the
final flow key.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>

44713b67

inetpeer: seqlock optimization · 65e8354e

由 Eric Dumazet 提交于 3月 04, 2011

David noticed :

------------------
Eric, I was profiling the non-routing-cache case and something that
stuck out is the case of calling inet_getpeer() with create==0.

If an entry is not found, we have to redo the lookup under a spinlock
to make certain that a concurrent writer rebalancing the tree does
not "hide" an existing entry from us.

This makes the case of a create==0 lookup for a not-present entry
really expensive.  It is on the order of 600 cpu cycles on my
Niagara2.

I added a hack to not do the relookup under the lock when create==0
and it now costs less than 300 cycles.

This is now a pretty common operation with the way we handle COW'd
metrics, so I think it's definitely worth optimizing.
-----------------

One solution is to use a seqlock instead of a spinlock to protect struct
inet_peer_base.

After a failed avl tree lookup, we can easily detect if a writer did
some changes during our lookup. Taking the lock and redo the lookup is
only necessary in this case.

Note: Add one private rcu_deref_locked() macro to place in one spot the
access to spinlock included in seqlock.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

65e8354e

04 3月, 2011 2 次提交

ipv4: Fix __ip_dev_find() to use ifa_local instead of ifa_address. · e066008b

由 David S. Miller 提交于 3月 03, 2011

Reported-by: NStephen Hemminger <shemminger@vyatta.com>
Reported-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e066008b

ipv4: Fix crash in dst_release when udp_sendmsg route lookup fails. · 06dc94b1

由 David S. Miller 提交于 3月 03, 2011

As reported by Eric:

[11483.697233] IP: [<c12b0638>] dst_release+0x18/0x60
 ...
[11483.697741] Call Trace:
[11483.697764]  [<c12fc9d2>] udp_sendmsg+0x282/0x6e0
[11483.697790]  [<c12a1c01>] ? memcpy_toiovec+0x51/0x70
[11483.697818]  [<c12dbd90>] ? ip_generic_getfrag+0x0/0xb0

The pointer passed to dst_release() is -EINVAL, that's because
we leave an error pointer in the local variable "rt" by accident.

NULL it out to fix the bug.
Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

06dc94b1

03 3月, 2011 3 次提交
- D
  ipv4: ip_route_output_key() is better as an inline. · 5bfa787f
  由 David S. Miller 提交于 3月 02, 2011
```
This avoid a stack frame at zero cost.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  5bfa787f
- D
  ipv4: Make output route lookup return rtable directly. · b23dd4fe
  由 David S. Miller 提交于 3月 02, 2011
```
Instead of on the stack.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  b23dd4fe
- D
  xfrm: Return dst directly from xfrm_lookup() · 452edd59
  由 David S. Miller 提交于 3月 02, 2011
```
Instead of on the stack.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  452edd59
02 3月, 2011 13 次提交

inet: Replace left-over references to inet->cork · 07df5294

由 Herbert Xu 提交于 3月 01, 2011

The patch to replace inet->cork with cork left out two spots in
__ip_append_data that can result in bogus packet construction.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

07df5294

ipv4: Make icmp route lookup code a bit clearer. · f6d460cf

由 David S. Miller 提交于 3月 01, 2011

The route lookup code in icmp_send() is slightly tricky as a result of
having to handle all of the requirements of RFC 4301 host relookups.

Pull the route resolution into a seperate function, so that the error
handling and route reference counting is hopefully easier to see and
contained wholly within this new routine.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6d460cf

xfrm: Handle blackhole route creation via afinfo. · 2774c131

由 David S. Miller 提交于 3月 01, 2011

That way we don't have to potentially do this in every xfrm_lookup()
caller.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2774c131

xfrm: Kill XFRM_LOOKUP_WAIT flag. · 80c0bc9e

由 David S. Miller 提交于 3月 01, 2011

This can be determined from the flow flags instead.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

80c0bc9e

ipv4: Kill can_sleep arg to ip_route_output_flow() · 273447b3

由 David S. Miller 提交于 3月 01, 2011

This boolean state is now available in the flow flags.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

273447b3

net: Add FLOWI_FLAG_CAN_SLEEP. · 5df65e55

由 David S. Miller 提交于 3月 01, 2011

And set is in contexts where the route resolution can sleep.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5df65e55

D
ipv4: Make final arg to ip_route_output_flow to be boolean "can_sleep" · 420d44da
由 David S. Miller 提交于 3月 01, 2011
```
Since that is what the current vague "flags" argument means.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
420d44da
D
ipv4: Can final ip_route_connect() arg to boolean "can_sleep". · abdf7e72
由 David S. Miller 提交于 3月 01, 2011
```
Since that's what the current vague "flags" thing means.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
abdf7e72

udp: Add lockless transmit path · 903ab86d

由 Herbert Xu 提交于 3月 01, 2011

The UDP transmit path has been running under the socket lock
for a long time because of the corking feature.  This means that
transmitting to the same socket in multiple threads does not
scale at all.

However, as most users don't actually use corking, the locking
can be removed in the common case.

This patch creates a lockless fast path where corking is not used.

Please note that this does create a slight inaccuracy in the
enforcement of socket send buffer limits.  In particular, we
may exceed the socket limit by up to (number of CPUs) * (packet
size) because of the way the limit is computed.

As the primary purpose of socket buffers is to indicate congestion,
this should not be a great problem for now.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

903ab86d

udp: Switch to ip_finish_skb · f6b9664f

由 Herbert Xu 提交于 3月 01, 2011

This patch converts UDP to use the new ip_finish_skb API.  This
would then allows us to more easily use ip_make_skb which allows
UDP to run without a socket lock.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6b9664f

inet: Add ip_make_skb and ip_finish_skb · 1c32c5ad

由 Herbert Xu 提交于 3月 01, 2011

This patch adds the helper ip_make_skb which is like ip_append_data
and ip_push_pending_frames all rolled into one, except that it does
not send the skb produced.  The sending part is carried out by
ip_send_skb, which the transport protocol can call after it has
tweaked the skb.

It is meant to be called in cases where corking is not used should
have a one-to-one correspondence to sendmsg.

This patch also adds the helper ip_finish_skb which is meant to
be replace ip_push_pending_frames when corking is required.
Previously the protocol stack would peek at the socket write
queue and add its header to the first packet.  With ip_finish_skb,
the protocol stack can directly operate on the final skb instead,
just like the non-corking case with ip_make_skb.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1c32c5ad

inet: Remove explicit write references to sk/inet in ip_append_data · 1470ddf7

由 Herbert Xu 提交于 3月 01, 2011

In order to allow simultaneous calls to ip_append_data on the same
socket, it must not modify any shared state in sk or inet (other
than those that are designed to allow that such as atomic counters).

This patch abstracts out write references to sk and inet_sk in
ip_append_data and its friends so that we may use the underlying
code in parallel.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1470ddf7

inet: Remove unused sk_sndmsg_* from UFO · 5a2ef920

由 Herbert Xu 提交于 3月 01, 2011

UFO doesn't really use the sk_sndmsg_* parameters so touching
them is pointless.  It can't use them anyway since the whole
point of UFO is to use the original pages without copying.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a2ef920

25 2月, 2011 1 次提交

ipv4: Rearrange how ip_route_newports() gets port keys. · dca8b089

由 David S. Miller 提交于 2月 24, 2011

ip_route_newports() is the only place in the entire kernel that
cares about the port members in the routing cache entry's lookup
flow key.

Therefore the only reason we store an entire flow inside of the
struct rtentry is for this one special case.

Rewrite ip_route_newports() such that:

1) The caller passes in the original port values, so we don't need
   to use the rth->fl.fl_ip_{s,d}port values to remember them.

2) The lookup flow is constructed by hand instead of being copied
   from the routing cache entry's flow.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dca8b089

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功