提交 · 6e34a8b37aca63f109bf990d46131ee07206f5f1 · openeuler / raspberrypi-kernel

30 1月, 2013 3 次提交

net: cacheline adjust struct inet_frag_queue · 6e34a8b3

由 Jesper Dangaard Brouer 提交于 1月 28, 2013

Fragmentation code cacheline adjusting of struct inet_frag_queue.

Take advantage of the size of struct timer_list, and move all but
spinlock_t lock, below the timer struct.  On 64-bit 'lru_list',
'list' and 'refcnt', fits exactly into the next cacheline, and a
new cacheline starts at 'fragments'.

The netns_frags *net pointer is moved to the end of the struct,
because its used in a compare, with "next/close-by" elements of
which this struct is embedded into.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e34a8b3

net: cacheline adjust struct inet_frags for better frag performance · 5f8e1e8b

由 Jesper Dangaard Brouer 提交于 1月 28, 2013

The globally shared rwlock, of struct inet_frags, shares
cacheline with the 'rnd' number, which is used by the hash
calculations.  Fix this, as this obviously is a bad idea, as
unnecessary cache-misses will occur when accessing the 'rnd'
number.

Also small note that, moving function ptr (*match) up in struct,
is to avoid it lands on the next cacheline (on 64-bit).
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f8e1e8b

net: cacheline adjust struct netns_frags for better frag performance · cd39a789

由 Jesper Dangaard Brouer 提交于 1月 28, 2013

This small cacheline adjustment of struct netns_frags improves
performance significantly for the fragmentation code.

Struct members 'lru_list' and 'mem' are both hot elements, and it
hurts performance, due to cacheline bouncing at every call point,
when they share a cacheline.  Also notice, how mem is placed
together with 'high_thresh' and 'low_thresh', as they are used in
the compare operations together.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cd39a789

29 1月, 2013 1 次提交

net neigh: Optimize neighbor entry size calculation. · 08433eff

由 YOSHIFUJI Hideaki / 吉藤英明提交于 1月 24, 2013

When allocating memory for neighbour cache entry, if
tbl->entry_size is not set, we always calculate
sizeof(struct neighbour) + tbl->key_len, which is common
in the same table.

With this change, set tbl->entry_size during the table
initialization phase, if it was not set, and use it in
neigh_alloc() and neighbour_priv().

This change also allow us to have both of protocol private
data and device priate data at tha same time.

Note that the only user of prototcol private is DECnet
and the only user of device private is ATM CLIP.
Since those are exclusive, we have not been facing issues
here.
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

08433eff

28 1月, 2013 3 次提交

net: add RCU annotation to sk_dst_cache field · 0e36cbb3

由 Cong Wang 提交于 1月 22, 2013

sock->sk_dst_cache is protected by RCU.
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e36cbb3

decnet: use correct RCU API to deref sk_dst_cache field · cec771d6

由 Cong Wang 提交于 1月 22, 2013

sock->sk_dst_cache is protected by RCU, therefore we should
use __sk_dst_get() to deref it once we lock the sock.

This fixes several sparse warnings.

Cc: linux-decnet-user@lists.sourceforge.net
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cec771d6

gro: Fix kcalloc argument order · a1b1add0

由 Joe Perches 提交于 1月 26, 2013

First number, then size.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a1b1add0

24 1月, 2013 3 次提交

soreuseport: TCP/IPv6 implementation · 5ba24953

由 Tom Herbert 提交于 1月 22, 2013

Motivation for soreuseport would be something like a web server
binding to port 80 running with multiple threads, where each thread
might have it's own listener socket.  This could be done as an
alternative to other models: 1) have one listener thread which
dispatches completed connections to workers. 2) accept on a single
listener socket from multiple threads.  In case #1 the listener thread
can easily become the bottleneck with high connection turn-over rate.
In case #2, the proportion of connections accepted per thread tends
to be uneven under high connection load (assuming simple event loop:
while (1) { accept(); process() }, wakeup does not promote fairness
among the sockets.  We have seen the  disproportion to be as high
as 3:1 ratio between thread accepting most connections and the one
accepting the fewest.  With so_reusport the distribution is
uniform.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5ba24953

soreuseport: TCP/IPv4 implementation · da5e3630

由 Tom Herbert 提交于 1月 22, 2013

Allow multiple listener sockets to bind to the same port.

Motivation for soresuseport would be something like a web server
binding to port 80 running with multiple threads, where each thread
might have it's own listener socket.  This could be done as an
alternative to other models: 1) have one listener thread which
dispatches completed connections to workers. 2) accept on a single
listener socket from multiple threads.  In case #1 the listener thread
can easily become the bottleneck with high connection turn-over rate.
In case #2, the proportion of connections accepted per thread tends
to be uneven under high connection load (assuming simple event loop:
while (1) { accept(); process() }, wakeup does not promote fairness
among the sockets.  We have seen the  disproportion to be as high
as 3:1 ratio between thread accepting most connections and the one
accepting the fewest.  With so_reusport the distribution is
uniform.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

da5e3630

soreuseport: infrastructure · 055dc21a

由 Tom Herbert 提交于 1月 22, 2013

Definitions and macros for implementing soreusport.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

055dc21a

23 1月, 2013 11 次提交

netfilter: nf_conntrack: refactor l4proto support for netns · c296bb4d

由 Gao feng 提交于 1月 23, 2013

Move the code that register/unregister l4proto to the
module_init/exit context.

Given that we have to modify some interfaces to accomodate
these changes, it is a good time to use shorter function names
for this using the nf_ct_* prefix instead of nf_conntrack_*,
that is:

nf_ct_l4proto_register
nf_ct_l4proto_pernet_register
nf_ct_l4proto_unregister
nf_ct_l4proto_pernet_unregister

We same many line breaks with it.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

c296bb4d

netfilter: nf_conntrack: refactor l3proto support for netns · 6330750d

由 Gao feng 提交于 1月 21, 2013

Move the code that register/unregister l3proto to the
module_init/exit context.

Given that we have to modify some interfaces to accomodate
these changes, it is a good time to use shorter function names
for this using the nf_ct_* prefix instead of nf_conntrack_*,
that is:

nf_ct_l3proto_register
nf_ct_l3proto_pernet_register
nf_ct_l3proto_unregister
nf_ct_l3proto_pernet_unregister

We same many line breaks with it.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

6330750d

netfilter: nf_ct_proto: move initialization out of pernet_operations · 04d87001

由 Gao feng 提交于 1月 21, 2013

Move the global initial codes to the module_init/exit context.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

04d87001

netfilter: nf_ct_labels: move initialization out of pernet_operations · 5f69b8f5

由 Gao feng 提交于 1月 21, 2013

Move the global initial codes to the module_init/exit context.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

5f69b8f5

netfilter: nf_ct_helper: move initialization out of pernet_operations · 5e615b22

由 Gao feng 提交于 1月 21, 2013

Move the global initial codes to the module_init/exit context.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

5e615b22

netfilter: nf_ct_timeout: move initialization out of pernet_operations · 8684094c

由 Gao feng 提交于 1月 21, 2013

Move the global initial codes to the module_init/exit context.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

8684094c

netfilter: nf_ct_ecache: move initialization out of pernet_operations · 3fe0f943

由 Gao feng 提交于 1月 21, 2013

Move the global initial codes to the module_init/exit context.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

3fe0f943

netfilter: nf_ct_tstamp: move initialization out of pernet_operations · 73f4001a

由 Gao feng 提交于 1月 21, 2013

Move the global initial codes to the module_init/exit context.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

73f4001a

netfilter: nf_ct_acct: move initialization out of pernet_operations · b7ff3a1f

由 Gao feng 提交于 1月 21, 2013

Move the global initial codes to the module_init/exit context.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

b7ff3a1f

netfilter: nf_ct_expect: move initialization out of pernet_operations · 83b4dbe1

由 Gao feng 提交于 1月 21, 2013

Move the global initial codes to the module_init/exit context.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

83b4dbe1

netfilter: nf_conntrack: move initialization out of pernet operations · f94161c1

由 Gao feng 提交于 1月 21, 2013

nf_conntrack initialization and cleanup codes happens in pernet
operations function. This task should be done in module_init/exit.
We can't use init_net to identify if it's the right time to initialize
or cleanup since we cannot make assumption on the order netns are
created/destroyed.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

f94161c1

22 1月, 2013 2 次提交

ipv6: Unshare ip6_nd_hdr() and change return type to void. · 2576f17d

由 YOSHIFUJI Hideaki / 吉藤英明提交于 1月 21, 2013

- move ip6_nd_hdr() to its users' source files.
  In net/ipv6/mcast.c, it will be called ip6_mc_hdr().
- make return type to void since this function never fails.
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2576f17d

ndisc: Move ndisc_opt_addr_space() to include/net/ndisc.h. · c558e9fc

由 YOSHIFUJI Hideaki / 吉藤英明提交于 1月 21, 2013

This also makes ndisc_opt_addr_data() and ndisc_fill_addr_option()
use ndisc_opt_addr_space().
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c558e9fc

21 1月, 2013 5 次提交

xfrm: Remove unused defines · 02bfd8ec

由 Steffen Klassert 提交于 11月 27, 2012

XFRM_REPLAY_SEQ, XFRM_REPLAY_OSEQ and XFRM_REPLAY_SEQ_MASK
were introduced years ago but actually never used.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

02bfd8ec

Y
ipv6: Optimize ipv6_addr_is_ll_all_{nodes,routers}(). · d1641565
由 YOSHIFUJI Hideaki / 吉藤英明提交于 1月 20, 2013
```
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
d1641565
Y
ipv6: Optimize ipv6_addr_is_solict_mult(). · 9d100774
由 YOSHIFUJI Hideaki / 吉藤英明提交于 1月 20, 2013
```
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
9d100774
Y
ipv6: Introduce ipv6_addr_is_solict_mult() to check Solicited Node Multicast Addresses. · ca97a644
由 YOSHIFUJI Hideaki / 吉藤英明提交于 1月 20, 2013
```
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
ca97a644

ipv6: Make ipv6_addr_is_XXX() return boolean. · b27b28cb

由 YOSHIFUJI Hideaki 提交于 1月 21, 2013

ipv6_addr_is_{multicast,ll_all_nodes,ll_all_routers,isatap}()
return boolean.
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b27b28cb

19 1月, 2013 1 次提交

ipv6: Remove unused neigh argument for icmp6_dst_alloc() and its callers. · 12fd84f4

由 YOSHIFUJI Hideaki / 吉藤英明提交于 1月 18, 2013

Because of rt->n removal, we do not need neigh argument any more.
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

12fd84f4

18 1月, 2013 8 次提交

ipv6: Complete neighbour entry removal from dst_entry. · 887c95cc

由 YOSHIFUJI Hideaki / 吉藤英明提交于 1月 17, 2013

CC: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

887c95cc

ipv6: Introduce rt6_nexthop() to select nexthop address. · 9bb5a148

由 YOSHIFUJI Hideaki / 吉藤英明提交于 1月 17, 2013

For RTF_GATEWAY route, return rt->rt6i_gateway.
Otherwise, return 2nd argument (destination address).

This will be used by following patches which remove rt->n
dependency patches in ip6_dst_lookup_tail() and ip6_finish_output2().
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9bb5a148

ndisc: Introduce __ipv6_neigh_lookup_noref(). · ac3175fe

由 YOSHIFUJI Hideaki / 吉藤英明提交于 1月 17, 2013

This function, which looks up neighbour entry for an IPv6 address
without touching refcnt, will be used for patches to remove
dependency on rt->n (neighbour entry in rt6_info).
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac3175fe

ndisc: Remove tbl argument for __ipv6_neigh_lookup(). · 8e022ee6

由 YOSHIFUJI Hideaki / 吉藤英明提交于 1月 17, 2013

We can refer to nd_tbl directly.
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8e022ee6

netfilter: ctnetlink: allow userspace to modify labels · 9b21f6a9

由 Florian Westphal 提交于 1月 11, 2013

Add the ability to set/clear labels assigned to a conntrack
via ctnetlink.

To allow userspace to only alter specific bits, Pablo suggested to add
a new CTA_LABELS_MASK attribute:

The new set of active labels is then determined via

active = (active & ~mask) ^ changeset

i.e., the mask selects those bits in the existing set that should be
changed.

This follows the same method already used by MARK and CONNMARK targets.

Omitting CTA_LABELS_MASK is the same as setting all bits in CTA_LABELS_MASK
to 1: The existing set is replaced by the one from userspace.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

9b21f6a9

netfilter: add connlabel conntrack extension · c539f017

由 Florian Westphal 提交于 1月 11, 2013

similar to connmarks, except labels are bit-based; i.e.
all labels may be attached to a flow at the same time.

Up to 128 labels are supported.  Supporting more labels
is possible, but requires increasing the ct offset delta
from u8 to u16 type due to increased extension sizes.

Mapping of bit-identifier to label name is done in userspace.

The extension is enabled at run-time once "-m connlabel" netfilter
rules are added.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

c539f017

ipv6: fix ipv6_prefix_equal64_half mask conversion · 512613d7

由 Fabio Baltieri 提交于 1月 16, 2013

Fix the 64bit optimized version of ipv6_prefix_equal to convert the
bitmask to network byte order only after the bit-shift.

The bug was introduced in:

38675170 ipv6: 64bit version of ipv6_prefix_equal().
Signed-off-by: NFabio Baltieri <fabio.baltieri@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

512613d7

net: increase fragment memory usage limits · c2a93660

由 Jesper Dangaard Brouer 提交于 1月 15, 2013

Increase the amount of memory usage limits for incomplete
IP fragments.

Arguing for new thresh high/low values:

 High threshold = 4 MBytes
 Low  threshold = 3 MBytes

The fragmentation memory accounting code, tries to account for the
real memory usage, by measuring both the size of frag queue struct
(inet_frag_queue (ipv4:ipq/ipv6:frag_queue)) and the SKB's truesize.

We want to be able to handle/hold-on-to enough fragments, to ensure
good performance, without causing incomplete fragments to hurt
scalability, by causing the number of inet_frag_queue to grow too much
(resulting longer searches for frag queues).

For IPv4, how much memory does the largest frag consume.

Maximum size fragment is 64K, which is approx 44 fragments with
MTU(1500) sized packets. Sizeof(struct ipq) is 200.  A 1500 byte
packet results in a truesize of 2944 (not 2048 as I first assumed)

  (44*2944)+200 = 129736 bytes

The current default high thresh of 262144 bytes, is obviously
problematic, as only two 64K fragments can fit in the queue at the
same time.

How many 64K fragment can we fit into 4 MBytes:

  4*2^20/((44*2944)+200) = 32.34 fragment in queues

An attacker could send a separate/distinct fake fragment packets per
queue, causing us to allocate one inet_frag_queue per packet, and thus
attacking the hash table and its lists.

How many frag queue do we need to store, and given a current hash size
of 64, what is the average list length.

Using one MTU sized fragment per inet_frag_queue, each consuming
(2944+200) 3144 bytes.

  4*2^20/(2944+200) = 1334 frag queues -> 21 avg list length

An attack could send small fragments, the smallest packet I could send
resulted in a truesize of 896 bytes (I'm a little surprised by this).

  4*2^20/(896+200)  = 3827 frag queues -> 59 avg list length

When increasing these number, we also need to followup with
improvements, that is going to help scalability.  Simply increasing
the hash size, is not enough as the current implementation does not
have a per hash bucket locking.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c2a93660

17 1月, 2013 3 次提交

sk-filter: Add ability to lock a socket filter program · d59577b6

由 Vincent Bernat 提交于 1月 16, 2013

While a privileged program can open a raw socket, attach some
restrictive filter and drop its privileges (or send the socket to an
unprivileged program through some Unix socket), the filter can still
be removed or modified by the unprivileged program. This commit adds a
socket option to lock the filter (SO_LOCK_FILTER) preventing any
modification of a socket filter program.

This is similar to OpenBSD BIOCLOCK ioctl on bpf sockets, except even
root is not allowed change/drop the filter.

The state of the lock can be read with getsockopt(). No error is
triggered if the state is not changed. -EPERM is returned when a user
tries to remove the lock or to change/remove the filter while the lock
is active. The check is done directly in sk_attach_filter() and
sk_detach_filter() and does not affect only setsockopt() syscall.
Signed-off-by: NVincent Bernat <bernat@luffy.cx>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d59577b6

ipv6: Fix endianess warning in ip6_flow_hdr(). · 07f623d3

由 YOSHIFUJI Hideaki 提交于 1月 17, 2013

Commit 3e4e4c1f ("ipv6: Introduce ip6_flow_hdr() to fill version,
tclass and flowlabel.) uses ntohl(), which should be htonl().

Found by Fengguang Wu <fengguang.wu@intel.com>.
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

07f623d3

cfg80211: check radar interface combinations · 11c4a075

由 Simon Wunderlich 提交于 1月 08, 2013

To ease further DFS development regarding interface combinations, use
the interface combinations structure to test for radar capabilities.
Drivers can specify which channel widths they support, and in which
modes. Right now only a single AP interface is allowed, but as the
DFS code evolves other combinations can be enabled.
Signed-off-by: NSimon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

11c4a075