提交 · 5c766d642bcaffd0c2a5b354db2068515b3846cf · openanolis / cloud-kernel

30 1月, 2013 4 次提交

ipv4: introduce address lifetime · 5c766d64

由 Jiri Pirko 提交于 1月 24, 2013

There are some usecase when lifetime of ipv4 addresses might be helpful.
For example:
1) initramfs networkmanager uses a DHCP daemon to learn network
configuration parameters
2) initramfs networkmanager addresses, routes and DNS configuration
3) initramfs networkmanager is requested to stop
4) initramfs networkmanager stops all daemons including dhclient
5) there are addresses and routes configured but no daemon running. If
the system doesn't start networkmanager for some reason, addresses and
routes will be used forever, which violates RFC 2131.

This patch is essentially a backport of ivp6 address lifetime mechanism
for ipv4 addresses.

Current "ip" tool supports this without any patch (since it does not
distinguish between ipv4 and ipv6 addresses in this perspective.

Also, this should be back-compatible with all current netlink users.
Reported-by: NPavel Šimerda <psimerda@redhat.com>
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c766d64

net: frag, move LRU list maintenance outside of rwlock · 3ef0eb0d

由 Jesper Dangaard Brouer 提交于 1月 28, 2013

Updating the fragmentation queues LRU (Least-Recently-Used) list,
required taking the hash writer lock.  However, the LRU list isn't
tied to the hash at all, so we can use a separate lock for it.
Original-idea-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ef0eb0d

net: use lib/percpu_counter API for fragmentation mem accounting · 6d7b857d

由 Jesper Dangaard Brouer 提交于 1月 28, 2013

Replace the per network namespace shared atomic "mem" accounting
variable, in the fragmentation code, with a lib/percpu_counter.

Getting percpu_counter to scale to the fragmentation code usage
requires some tweaks.

At first view, percpu_counter looks superfast, but it does not
scale on multi-CPU/NUMA machines, because the default batch size
is too small, for frag code usage.  Thus, I have adjusted the
batch size by using __percpu_counter_add() directly, instead of
percpu_counter_sub() and percpu_counter_add().

The batch size is increased to 130.000, based on the largest 64K
fragment memory usage.  This does introduce some imprecise
memory accounting, but its does not need to be strict for this
use-case.

It is also essential, that the percpu_counter, does not
share cacheline with other writers, to make this scale.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d7b857d

net: frag helper functions for mem limit tracking · d433673e

由 Jesper Dangaard Brouer 提交于 1月 28, 2013

This change is primarily a preparation to ease the extension of memory
limit tracking.

The change does reduce the number atomic operation, during freeing of
a frag queue.  This does introduce a some performance improvement, as
these atomic operations are at the core of the performance problems
seen on NUMA systems.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d433673e

29 1月, 2013 5 次提交

net neigh: Optimize neighbor entry size calculation. · 08433eff

由 YOSHIFUJI Hideaki / 吉藤英明提交于 1月 24, 2013

When allocating memory for neighbour cache entry, if
tbl->entry_size is not set, we always calculate
sizeof(struct neighbour) + tbl->key_len, which is common
in the same table.

With this change, set tbl->entry_size during the table
initialization phase, if it was not set, and use it in
neigh_alloc() and neighbour_priv().

This change also allow us to have both of protocol private
data and device priate data at tha same time.

Note that the only user of prototcol private is DECnet
and the only user of device private is ATM CLIP.
Since those are exclusive, we have not been facing issues
here.
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

08433eff

net: avoid to hang up on sending due to sysctl configuration overflow. · cdda8891

由 bingtian.ly@taobao.com 提交于 1月 23, 2013

    I found if we write a larger than 4GB value to some sysctl
variables, the sending syscall will hang up forever, because these
variables are 32 bits, such large values make them overflow to 0 or
negative.

    This patch try to fix overflow or prevent from zero value setup
of below sysctl variables:

net.core.wmem_default
net.core.rmem_default

net.core.rmem_max
net.core.wmem_max

net.ipv4.udp_rmem_min
net.ipv4.udp_wmem_min

net.ipv4.tcp_wmem
net.ipv4.tcp_rmem
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NLi Yu <raise.sail@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cdda8891

netpoll: use the net namespace of current process instead of init_net · 556e6256

由 Cong Wang 提交于 1月 27, 2013

This will allow us to setup netconsole in a different namespace
rather than where init_net is.

Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

556e6256

netpoll: use ipv6_addr_equal() to compare ipv6 addr · faeed828

由 Cong Wang 提交于 1月 27, 2013

ipv6_addr_equal() is faster.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

faeed828

can: rework skb reserved data handling · 2bf3440d

由 Oliver Hartkopp 提交于 1月 28, 2013

Added accessor and skb_reserve helpers for struct can_skb_priv.
Removed pointless skb_headroom() check.
Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
CC: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2bf3440d

28 1月, 2013 3 次提交

net: fix possible wrong checksum generation · cef401de

由 Eric Dumazet 提交于 1月 25, 2013

Pravin Shelar mentioned that GSO could potentially generate
wrong TX checksum if skb has fragments that are overwritten
by the user between the checksum computation and transmit.

He suggested to linearize skbs but this extra copy can be
avoided for normal tcp skbs cooked by tcp_sendmsg().

This patch introduces a new SKB_GSO_SHARED_FRAG flag, set
in skb_shinfo(skb)->gso_type if at least one frag can be
modified by the user.

Typical sources of such possible overwrites are {vm}splice(),
sendfile(), and macvtap/tun/virtio_net drivers.

Tested:

$ netperf -H 7.7.8.84
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
7.7.8.84 () port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00    3959.52

$ netperf -H 7.7.8.84 -t TCP_SENDFILE
TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 ()
port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00    3216.80

Performance of the SENDFILE is impacted by the extra allocation and
copy, and because we use order-0 pages, while the TCP_STREAM uses
bigger pages.
Reported-by: NPravin Shelar <pshelar@nicira.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cef401de

decnet: use correct RCU API to deref sk_dst_cache field · cec771d6

由 Cong Wang 提交于 1月 22, 2013

sock->sk_dst_cache is protected by RCU, therefore we should
use __sk_dst_get() to deref it once we lock the sock.

This fixes several sparse warnings.

Cc: linux-decnet-user@lists.sourceforge.net
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cec771d6

irda: buffer overflow in irnet_ctrl_read() · 4bf613c6

由 Dan Carpenter 提交于 1月 24, 2013

The comments here say that the /* Max event is 61 char */ but in 2003 we
changed the event format and now the max event size is 75.  The longest
event is:

	"Discovered %08x (%s) behind %08x {hints %02X-%02X}\n",
         12345678901    23  456789012    34567890    1    2 3
	            +8    +21        +8          +2   +2     +1
         = 75 characters.

There was a check to return -EOVERFLOW if the user gave us a "count"
value that was less than 64.  Raising it to 75 might break backwards
compatability.  Instead I removed the check and now it returns a
truncated string if "count" is too low.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4bf613c6

27 1月, 2013 1 次提交

soreuseport: fix use of uid in tb->fastuid · 9c5e0c0b

由 Tom Herbert 提交于 1月 26, 2013

Fix a reported compilation error where ia variable of type kuid_t
was being set to zero.

Eliminate two instances of setting tb->fastuid to zero.  tb->fastuid is
only used if tb->fastreuseport is set, so there should be no problem if
tb->fastuid is not initialized (when tb->fastreuesport is zero).
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9c5e0c0b

26 1月, 2013 6 次提交

can: gw: indicate and count deleted frames due to misconfiguration · e6afa00a

由 Oliver Hartkopp 提交于 1月 17, 2013

Add a statistic counter to detect deleted frames due to misconfiguration with
a new read-only CGW_DELETED netlink attribute for the CAN gateway.
Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

e6afa00a

can: gw: add a variable limit for CAN frame routings · be286baf

由 Oliver Hartkopp 提交于 1月 17, 2013

To prevent a possible misconfiguration (e.g. circular CAN frame routings)
limit the number of routings of a single CAN frame to a small variable value.

The limit can be specified by the module parameter 'max_hops' (1..6).
The default value is 1 (one hop), according to the original can-gw behaviour.
Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

be286baf

can: gw: make routing to the incoming CAN interface configurable · d904d3ed

由 Oliver Hartkopp 提交于 1月 17, 2013

Introduce new configuration flag CGW_FLAGS_CAN_IIF_TX_OK to configure if a
CAN sk_buff that has been routed with can-gw is allowed to be send back to
the originating CAN interface.
Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

d904d3ed

can: add private data space for CAN sk_buffs · 156c2bb9

由 Oliver Hartkopp 提交于 1月 17, 2013

The struct can_skb_priv is used to transport additional information along
with the stored struct can(fd)_frame that can not be contained in existing
struct sk_buff elements.

can_skb_priv is located in the skb headroom, which does not touch the existing
CAN sk_buff usage with skb->data and skb->len, so that even out-of-tree
CAN drivers can be used without changes.

Btw. out-of-tree CAN drivers without can_skb_priv in the sk_buff headroom
would not support features based on can_skb_priv.

The can_skb_priv->ifindex contains the first interface where the CAN frame
appeared on the local host. Unfortunately skb->skb_iif can not be used as this
value is overwritten in every netif_receive_skb() call.
Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

156c2bb9

M
can: Kconfig: switch on all CAN protocolls by default · f66b0301
由 Marc Kleine-Budde 提交于 7月 20, 2012
```
This patch enables all basic CAN protocol by default.
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
```
f66b0301

can: Kconfig: convert 'depends on CAN' into 'if CAN...endif' block · 77136836

由 Marc Kleine-Budde 提交于 7月 20, 2012

This patch adds an 'if CAN...endif' Block around all CAN symbols in
net/can/Kconfig. So the 'depends on CAN' dependencies can be removed.
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

77136836

24 1月, 2013 5 次提交

soreuseport: UDP/IPv6 implementation · 72289b96

由 Tom Herbert 提交于 1月 22, 2013

Motivation for soreuseport would be something like a DNS server. An
alternative would be to recv on the same socket from multiple threads.
As in the case of TCP, the load across these threads tends to be
disproportionate and we also see a lot of contection on the socket lock.
Note that SO_REUSEADDR already allows multiple UDP sockets to bind to
the same port, however there is no provision to prevent hijacking and
nothing to distribute packets across all the sockets sharing the same
bound port. This patch does not change the semantics of SO_REUSEADDR,
but provides usable functionality of it for unicast.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72289b96

soreuseport: TCP/IPv6 implementation · 5ba24953

由 Tom Herbert 提交于 1月 22, 2013

Motivation for soreuseport would be something like a web server
binding to port 80 running with multiple threads, where each thread
might have it's own listener socket.  This could be done as an
alternative to other models: 1) have one listener thread which
dispatches completed connections to workers. 2) accept on a single
listener socket from multiple threads.  In case #1 the listener thread
can easily become the bottleneck with high connection turn-over rate.
In case #2, the proportion of connections accepted per thread tends
to be uneven under high connection load (assuming simple event loop:
while (1) { accept(); process() }, wakeup does not promote fairness
among the sockets.  We have seen the  disproportion to be as high
as 3:1 ratio between thread accepting most connections and the one
accepting the fewest.  With so_reusport the distribution is
uniform.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5ba24953

soreuseport: UDP/IPv4 implementation · ba418fa3

由 Tom Herbert 提交于 1月 22, 2013

Allow multiple UDP sockets to bind to the same port.

Motivation soreuseport would be something like a DNS server. An
alternative would be to recv on the same socket from multiple threads.
As in the case of TCP, the load across these threads tends to be
disproportionate and we also see a lot of contection on the socketlock.
Note that SO_REUSEADDR already allows multiple UDP sockets to bind to
the same port, however there is no provision to prevent hijacking and
nothing to distribute packets across all the sockets sharing the same
bound port. This patch does not change the semantics of SO_REUSEADDR,
but provides usable functionality of it for unicast.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba418fa3

soreuseport: TCP/IPv4 implementation · da5e3630

由 Tom Herbert 提交于 1月 22, 2013

Allow multiple listener sockets to bind to the same port.

Motivation for soresuseport would be something like a web server
binding to port 80 running with multiple threads, where each thread
might have it's own listener socket.  This could be done as an
alternative to other models: 1) have one listener thread which
dispatches completed connections to workers. 2) accept on a single
listener socket from multiple threads.  In case #1 the listener thread
can easily become the bottleneck with high connection turn-over rate.
In case #2, the proportion of connections accepted per thread tends
to be uneven under high connection load (assuming simple event loop:
while (1) { accept(); process() }, wakeup does not promote fairness
among the sockets.  We have seen the  disproportion to be as high
as 3:1 ratio between thread accepting most connections and the one
accepting the fewest.  With so_reusport the distribution is
uniform.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

da5e3630

soreuseport: infrastructure · 055dc21a

由 Tom Herbert 提交于 1月 22, 2013

Definitions and macros for implementing soreusport.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

055dc21a

23 1月, 2013 16 次提交

netfilter: nf_conntrack: fix compilation if sysctl are disabled · 5f9f946b

由 Pablo Neira Ayuso 提交于 1月 23, 2013

In (f94161c1 netfilter: nf_conntrack: move initialization out of pernet
operations), some ifdefs were missing for sysctl dependent code.
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

5f9f946b

netfilter: nf_conntrack: refactor l4proto support for netns · c296bb4d

由 Gao feng 提交于 1月 23, 2013

Move the code that register/unregister l4proto to the
module_init/exit context.

Given that we have to modify some interfaces to accomodate
these changes, it is a good time to use shorter function names
for this using the nf_ct_* prefix instead of nf_conntrack_*,
that is:

nf_ct_l4proto_register
nf_ct_l4proto_pernet_register
nf_ct_l4proto_unregister
nf_ct_l4proto_pernet_unregister

We same many line breaks with it.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

c296bb4d

netfilter: nf_conntrack: refactor l3proto support for netns · 6330750d

由 Gao feng 提交于 1月 21, 2013

Move the code that register/unregister l3proto to the
module_init/exit context.

Given that we have to modify some interfaces to accomodate
these changes, it is a good time to use shorter function names
for this using the nf_ct_* prefix instead of nf_conntrack_*,
that is:

nf_ct_l3proto_register
nf_ct_l3proto_pernet_register
nf_ct_l3proto_unregister
nf_ct_l3proto_pernet_unregister

We same many line breaks with it.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

6330750d

netfilter: nf_ct_proto: move initialization out of pernet_operations · 04d87001

由 Gao feng 提交于 1月 21, 2013

Move the global initial codes to the module_init/exit context.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

04d87001

netfilter: nf_ct_labels: move initialization out of pernet_operations · 5f69b8f5

由 Gao feng 提交于 1月 21, 2013

Move the global initial codes to the module_init/exit context.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

5f69b8f5

netfilter: nf_ct_helper: move initialization out of pernet_operations · 5e615b22

由 Gao feng 提交于 1月 21, 2013

Move the global initial codes to the module_init/exit context.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

5e615b22

netfilter: nf_ct_timeout: move initialization out of pernet_operations · 8684094c

由 Gao feng 提交于 1月 21, 2013

Move the global initial codes to the module_init/exit context.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

8684094c

netfilter: nf_ct_ecache: move initialization out of pernet_operations · 3fe0f943

由 Gao feng 提交于 1月 21, 2013

Move the global initial codes to the module_init/exit context.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

3fe0f943

netfilter: nf_ct_tstamp: move initialization out of pernet_operations · 73f4001a

由 Gao feng 提交于 1月 21, 2013

Move the global initial codes to the module_init/exit context.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

73f4001a

netfilter: nf_ct_acct: move initialization out of pernet_operations · b7ff3a1f

由 Gao feng 提交于 1月 21, 2013

Move the global initial codes to the module_init/exit context.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

b7ff3a1f

netfilter: nf_ct_expect: move initialization out of pernet_operations · 83b4dbe1

由 Gao feng 提交于 1月 21, 2013

Move the global initial codes to the module_init/exit context.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

83b4dbe1

netfilter: nf_conntrack: move initialization out of pernet operations · f94161c1

由 Gao feng 提交于 1月 21, 2013

nf_conntrack initialization and cleanup codes happens in pernet
operations function. This task should be done in module_init/exit.
We can't use init_net to identify if it's the right time to initialize
or cleanup since we cannot make assumption on the order netns are
created/destroyed.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

f94161c1

netpoll: fix an uninitialized variable · e39363a9

由 Cong Wang 提交于 1月 22, 2013

Fengguang reported:

   net/core/netpoll.c: In function 'netpoll_setup':
   net/core/netpoll.c:1049:6: warning: 'err' may be used uninitialized in this function [-Wmaybe-uninitialized]

in !CONFIG_IPV6 case, we may error out without initializing
'err'.
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e39363a9

ipv6: remove duplicated declaration of ip6_fragment() · 9647bb80

由 Cong Wang 提交于 1月 22, 2013

It is declared in:
include/net/ip6_route.h:187:int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *));

and net/ip6_route.h is already included.

Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9647bb80

netfilter: Use IS_ERR_OR_NULL(). · 0cc8d8df

由 YOSHIFUJI Hideaki / 吉藤英明提交于 1月 22, 2013

Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0cc8d8df

ipv6: Use IS_ERR_OR_NULL(). · 3f0d2ba0

由 YOSHIFUJI Hideaki / 吉藤英明提交于 1月 22, 2013

Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f0d2ba0

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功