提交 · 9dfa9a27b620640322588df399eb8f624b48d877 · openanolis / cloud-kernel

13 11月, 2014 4 次提交

irda: Fix build failures after IRDA_DEBUG->pr_debug · a768851f

由 Joe Perches 提交于 11月 12, 2014

Fix the build failures that result from the use of pr_debug
without the referenced char * arrays being defined.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a768851f

ip_tunnel: Ops registration for secondary encap (fou, gue) · a8c5f90f

由 Tom Herbert 提交于 11月 12, 2014

Instead of calling fou and gue functions directly from ip_tunnel
use ops for these that were previously registered. This patch adds the
logic to add and remove encapsulation operations for ip_tunnel,
and modified fou (and gue) to register with ip_tunnels.

This patch also addresses a circular dependency between ip_tunnel
and fou that was causing link errors when CONFIG_NET_IP_TUNNEL=y
and CONFIG_NET_FOU=m. References to fou an gue have been removed from
ip_tunnel.c
Reported-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8c5f90f

udp: Neaten function pointer calls and add braces · 4243cdc2

由 Joe Perches 提交于 11月 11, 2014

Standardize function pointer uses.

Convert calling style from:
	(*foo)(args...);
to:
	foo(args...);

Other miscellanea:

o Add braces around loops with single ifs on multiple lines
o Realign arguments around these functions
o Invert logic in if to return immediately.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4243cdc2

irda: Convert IRDA_DEBUG to pr_debug · 955a9d20

由 Joe Perches 提交于 11月 11, 2014

Use the normal kernel debugging mechanism which also
enables dynamic_debug at the same time.

Other miscellanea:

o Remove sysctl for irda_debug
o Remove function tracing like uses (use ftrace instead)
o Coalesce formats
o Realign arguments
o Remove unnecessary OOM messages
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

955a9d20

12 11月, 2014 6 次提交

irda: Remove IRDA_<TYPE> logging macros · 6c91023d

由 Joe Perches 提交于 11月 11, 2014

And use the more common mechanisms directly.

Other miscellanea:

o Coalesce formats
o Add missing newlines
o Realign arguments
o Remove unnecessary OOM message logging as
  there's a generic stack dump already on OOM.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c91023d

neigh: remove dynamic neigh table registration support · d7480fd3

由 WANG Cong 提交于 11月 10, 2014

Currently there are only three neigh tables in the whole kernel:
arp table, ndisc table and decnet neigh table. What's more,
we don't support registering multiple tables per family.
Therefore we can just make these tables statically built-in.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d7480fd3

net: Convert LIMIT_NETDEBUG to net_dbg_ratelimited · ba7a46f1

由 Joe Perches 提交于 11月 11, 2014

Use the more common dynamic_debug capable net_dbg_ratelimited
and remove the LIMIT_NETDEBUG macro.

All messages are still ratelimited.

Some KERN_<LEVEL> uses are changed to KERN_DEBUG.

This may have some negative impact on messages that were
emitted at KERN_INFO that are not not enabled at all unless
DEBUG is defined or dynamic_debug is enabled.  Even so,
these messages are now _not_ emitted by default.

This also eliminates the use of the net_msg_warn sysctl
"/proc/sys/net/core/warnings".  For backward compatibility,
the sysctl is not removed, but it has no function.  The extern
declaration of net_msg_warn is removed from sock.h and made
static in net/core/sysctl_net_core.c

Miscellanea:

o Update the sysctl documentation
o Remove the embedded uses of pr_fmt
o Coalesce format fragments
o Realign arguments
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba7a46f1

dsa: Use netdev_<level> instead of printk · a2ae6007

由 Joe Perches 提交于 11月 09, 2014

Neaten and standardize the logging output.

Other miscellanea:

o Use pr_notice_once instead of a guard flag.
o Convert existing pr_<level> uses too.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a2ae6007

net: introduce SO_INCOMING_CPU · 2c8c56e1

由 Eric Dumazet 提交于 11月 11, 2014

Alternative to RPS/RFS is to use hardware support for multiple
queues.

Then split a set of million of sockets into worker threads, each
one using epoll() to manage events on its own socket pool.

Ideally, we want one thread per RX/TX queue/cpu, but we have no way to
know after accept() or connect() on which queue/cpu a socket is managed.

We normally use one cpu per RX queue (IRQ smp_affinity being properly
set), so remembering on socket structure which cpu delivered last packet
is enough to solve the problem.

After accept(), connect(), or even file descriptor passing around
processes, applications can use :

 int cpu;
 socklen_t len = sizeof(cpu);

 getsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, &len);

And use this information to put the socket into the right silo
for optimal performance, as all networking stack should run
on the appropriate cpu, without need to send IPI (RPS/RFS).
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2c8c56e1

tcp: move sk_mark_napi_id() at the right place · 3d97379a

由 Eric Dumazet 提交于 11月 11, 2014

sk_mark_napi_id() is used to record for a flow napi id of incoming
packets for busypoll sake.
We should do this only on established flows, not on listeners.

This was 'working' by virtue of the socket cloning, but doing
this on SYN packets in unecessary cache line dirtying.

Even if we move sk_napi_id in the same cache line than sk_lock,
we are working to make SYN processing lockless, so it is desirable
to set sk_napi_id only for established flows.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d97379a

11 11月, 2014 3 次提交

ipv4: Avoid reading user iov twice after raw_probe_proto_opt · c008ba5b

由 Herbert Xu 提交于 11月 07, 2014

Ever since raw_probe_proto_opt was added it had the problem of
causing the user iov to be read twice, once during the probe for
the protocol header and once again in ip_append_data.

This is a potential security problem since it means that whatever
we're probing may be invalid.  This patch plugs the hole by
firstly advancing the iov so we don't read the same spot again,
and secondly saving what we read the first time around for use
by ip_append_data.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c008ba5b

ipv4: Use standard iovec primitive in raw_probe_proto_opt · 32b5913a

由 Herbert Xu 提交于 11月 07, 2014

The function raw_probe_proto_opt tries to extract the first two
bytes from the user input in order to seed the IPsec lookup for
ICMP packets.  In doing so it's processing iovec by hand and
overcomplicating things.

This patch replaces the manual iovec processing with a call to
memcpy_fromiovecend.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

32b5913a

net: gro: add a per device gro flush timer · 3b47d303

由 Eric Dumazet 提交于 11月 06, 2014

Tuning coalescing parameters on NIC can be really hard.

Servers can handle both bulk and RPC like traffic, with conflicting
goals : bulk flows want as big GRO packets as possible, RPC want minimal
latencies.

To reach big GRO packets on 10Gbe NIC, one can use :

ethtool -C eth0 rx-usecs 4 rx-frames 44

But this penalizes rpc sessions, with an increase of latencies, up to
50% in some cases, as NICs generally do not force an interrupt when
a packet with TCP Push flag is received.

Some NICs do not have an absolute timer, only a timer rearmed for every
incoming packet.

This patch uses a different strategy : Let GRO stack decides what do do,
based on traffic pattern.

Packets with Push flag wont be delayed.
Packets without Push flag might be held in GRO engine, if we keep
receiving data.

This new mechanism is off by default, and shall be enabled by setting
/sys/class/net/ethX/gro_flush_timeout to a value in nanosecond.

To fully enable this mechanism, drivers should use napi_complete_done()
instead of napi_complete().

Tested:
 Ran 200 netperf TCP_STREAM from A to B (10Gbe mlx4 link, 8 RX queues)

Without this feature, we send back about 305,000 ACK per second.

GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet)

Setting a timer of 2000 nsec is enough to increase GRO packet sizes
and reduce number of ACK packets. (811/19.2 = 42)

Receiver performs less calls to upper stacks, less wakes up.
This also reduces cpu usage on the sender, as it receives less ACK
packets.

Note that reducing number of wakes up increases cpu efficiency, but can
decrease QPS, as applications wont have the chance to warmup cpu caches
doing a partial read of RPC requests/answers if they fit in one skb.

B:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average:         eth0 811269.80 305732.30 1199462.57  19705.72      0.00
0.00      0.50

B:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout

B:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average:         eth0 811577.30  19230.80 1199916.51   1239.80      0.00
0.00      0.50
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b47d303

10 11月, 2014 6 次提交

openvswitch: Add support for OVS_FLOW_ATTR_PROBE. · 05da5898

由 Jarno Rajahalme 提交于 11月 06, 2014

This new flag is useful for suppressing error logging while probing
for datapath features using flow commands.  For backwards
compatibility reasons the commands are executed normally, but error
logging is suppressed.
Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

05da5898

openvswitch: Constify various function arguments · 12eb18f7

由 Thomas Graf 提交于 11月 06, 2014

Help produce better optimized code.
Signed-off-by: NThomas Graf <tgraf@noironetworks.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

12eb18f7

openvswitch: Remove redundant key ref from upcall_info. · e8eedb85

由 Pravin B Shelar 提交于 11月 06, 2014

struct dp_upcall_info has pointer to pkt_key which is already
available in OVS_CB.  This also simplifies upcall handling
for gso packet.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NAndy Zhou <azhou@nicira.com>

e8eedb85

openvswitch: Optimize recirc action. · fff06c36

由 Pravin B Shelar 提交于 11月 06, 2014

OVS need to flow key for flow lookup in recic action. OVS
does key extract in recic action. Most of cases we could
use OVS_CB packet key directly and can avoid packet flow key
extract. SET action we can update flow-key along with packet
to keep it consistent. But there are some action like MPLS
pop which forces OVS to do flow-extract. In such cases we
can mark flow key as invalid so that subsequent recirc
action can do full flow extract.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NJarno Rajahalme <jrajahalme@nicira.com>
Acked-by: NAndy Zhou <azhou@nicira.com>

fff06c36

openvswitch: Extend packet attribute for egress tunnel info · 8f0aad6f

由 Wenyu Zhang 提交于 11月 06, 2014

OVS vswitch has extended IPFIX exporter to export tunnel headers
to improve network visibility.
To export this information userspace needs to know egress tunnel
for given packet. By extending packet attributes datapath can
export egress tunnel info for given packet. So that userspace
can ask for egress tunnel info in userspace action. This
information is used to build IPFIX data for given flow.
Signed-off-by: NWenyu Zhang <wenyuz@vmware.com>
Acked-by: NRomain Lenglet <rlenglet@vmware.com>
Acked-by: NBen Pfaff <blp@nicira.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

8f0aad6f

openvswitch: Export symbols as GPL symbols. · 9ba559d9

由 Pravin B Shelar 提交于 11月 06, 2014

vport can be compiled as modules, therefore openvswitch needs
to export few symbols. Export them as GPL symbols.

CC: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

9ba559d9

09 11月, 2014 1 次提交

dccp: Convert DCCP_WARN to net_warn_ratelimited · c0560b9c

由 Joe Perches 提交于 11月 06, 2014

Remove the dependency on the "warning" sysctl (net_msg_warn)
which is only used by the LIMIT_NETDEBUG macro.

Convert the LIMIT_NETDEBUG use in DCCP_WARN to the more
common net_warn_ratelimited mechanism.

This still ratelimits based on the net_ratelimit()
function, but removes the check for the sysctl.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0560b9c

08 11月, 2014 3 次提交

udp: Increment UDP_MIB_IGNOREDMULTI for arriving unmatched multicasts · 36cbb245

由 Rick Jones 提交于 11月 06, 2014

As NIC multicast filtering isn't perfect, and some platforms are
quite content to spew broadcasts, we should not trigger an event
for skb:kfree_skb when we do not have a match for such an incoming
datagram.  We do though want to avoid sweeping the matter under the
rug entirely, so increment a suitable statistic.

This incorporates feedback from David L. Stevens, Karl Neiss and Eric
Dumazet.

V3 - use bool per David Miller
Signed-off-by: NRick Jones <rick.jones2@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

36cbb245

net: Kill skb_copy_datagram_const_iovec · bfe1be38

由 Herbert Xu 提交于 11月 07, 2014

Now that both macvtap and tun are using skb_copy_datagram_iter, we
can kill the abomination that is skb_copy_datagram_const_iovec.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bfe1be38

inet: Add skb_copy_datagram_iter · a8f820aa

由 Herbert Xu 提交于 11月 07, 2014

This patch adds skb_copy_datagram_iter, which is identical to
skb_copy_datagram_iovec except that it operates on iov_iter
instead of iovec.

Eventually all users of skb_copy_datagram_iovec should switch
over to iov_iter and then we can remove skb_copy_datagram_iovec.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8f820aa

07 11月, 2014 6 次提交

net: esp: Convert NETDEBUG to pr_info · 45083497

由 Joe Perches 提交于 11月 05, 2014

Commit 64ce2073 ("[NET]: Make NETDEBUG pure printk wrappers")
originally had these NETDEBUG printks as always emitting.

Commit a2a316fd ("[NET]: Replace CONFIG_NET_DEBUG with sysctl")
added a net_msg_warn sysctl to these NETDEBUG uses.

Convert these NETDEBUG uses to normal pr_info calls.

This changes the output prefix from "ESP: " to include
"IPSec: " for the ipv4 case and "IPv6: " for the ipv6 case.

These output lines are now like the other messages in the files.

Other miscellanea:

Neaten the arithmetic spacing to be consistent with other
arithmetic spacing in the files.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45083497

net; ipv[46] - Remove 2 unnecessary NETDEBUG OOM messages · cbffccc9

由 Joe Perches 提交于 11月 05, 2014

These messages aren't useful as there's a generic dump_stack()
on OOM.

Neaten the comment and if test above the OOM by separating the
assign in if into an allocation then if test.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cbffccc9

net: dsa: slave: Fix autoneg for phys on switch MDIO bus · b31f65fb

由 Andrew Lunn 提交于 11月 05, 2014

When the ports phys are connected to the switches internal MDIO bus,
we need to connect the phy to the slave netdev, otherwise
auto-negotiation etc, does not work.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b31f65fb

sched: fix act file names in header comment · 0c6965dd

由 Jiri Pirko 提交于 11月 05, 2014

Fixes: 4bba3925 ("[PKT_SCHED]: Prefix tc actions with act_")
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c6965dd

ip6_tunnel: Add support for wildcard tunnel endpoints. · ea3dc960

由 Steffen Klassert 提交于 11月 05, 2014

This patch adds support for tunnels with local or
remote wildcard endpoints. With this we get a
NBMA tunnel mode like we have it for ipv4 and
sit tunnels.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea3dc960

ipv6: Allow sending packets through tunnels with wildcard endpoints · d5005140

由 Steffen Klassert 提交于 11月 05, 2014

Currently we need the IP6_TNL_F_CAP_XMIT capabiltiy to transmit
packets through an ipv6 tunnel. This capability is set when the
tunnel gets configured, based on the tunnel endpoint addresses.

On tunnels with wildcard tunnel endpoints, we need to do the
capabiltiy checking on a per packet basis like it is done in
the receive path.

This patch extends ip6_tnl_xmit_ctl() to take local and remote
addresses as parameters to allow for per packet capabiltiy
checking.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d5005140

06 11月, 2014 11 次提交

openvswitch: Avoid NULL mask check while building mask · a85311bf

由 Pravin B Shelar 提交于 10月 19, 2014

OVS does mask validation even if it does not need to convert
netlink mask attributes to mask structure.  ovs_nla_get_match()
caller can pass NULL mask structure pointer if the caller does
not need mask.  Therefore NULL check is required in SW_FLOW_KEY*
macros.  Following patch does not convert mask netlink attributes
if mask pointer is NULL, so we do not need these checks in
SW_FLOW_KEY* macro.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NDaniele Di Proietto <ddiproietto@vmware.com>
Acked-by: NAndy Zhou <azhou@nicira.com>

a85311bf

openvswitch: Refactor action alloc and copy api. · 2fdb957d

由 Pravin B Shelar 提交于 10月 19, 2014

There are two separate API to allocate and copy actions list. Anytime
OVS needs to copy action list, it needs to call both functions.
Following patch moves action allocation to copy function to avoid
code duplication.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NJarno Rajahalme <jrajahalme@nicira.com>

2fdb957d

openvswitch: Move key_attr_size() to flow_netlink.h. · 41af73e9

由 Joe Stringer 提交于 10月 18, 2014

flow-netlink has netlink related code.
Signed-off-by: NJoe Stringer <joestringer@nicira.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

41af73e9

openvswitch: Remove flow member from struct ovs_skb_cb · d98612b8

由 Lorand Jakab 提交于 10月 06, 2014

The 'flow' memeber was chosen for removal because it's only used
in ovs_execute_actions() we can pass it as argument to this
function.
Signed-off-by: NLorand Jakab <lojakab@cisco.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

d98612b8

openvswitch: Drop packets when interdev is not up · e1f9c356

由 Chunhe Li 提交于 9月 08, 2014

If the internal device is not up, it should drop received
packets. Sometimes it receive the broadcast or multicast
packets, and the ip protocol stack will casue more cpu
usage wasted.
Signed-off-by: NChunhe Li <lichunhe@huawei.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

e1f9c356

openvswitch: Refactor get_dp() function into multiple access APIs. · cc3a5ae6

由 Andy Zhou 提交于 9月 08, 2014

Avoid recursive read_rcu_lock() by using the lighter weight
get_dp_rcu() API. Add proper locking assertions to get_dp().
Signed-off-by: NAndy Zhou <azhou@nicira.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

cc3a5ae6

openvswitch: Refactor ovs_flow_cmd_fill_info(). · ca7105f2

由 Joe Stringer 提交于 9月 08, 2014

Split up ovs_flow_cmd_fill_info() to make it easier to cache parts of a
dump reply. This will be used to streamline flow_dump in a future patch.
Signed-off-by: NJoe Stringer <joestringer@nicira.com>
Acked-by: NThomas Graf <tgraf@noironetworks.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

ca7105f2

openvswitch: refactor do_output() to move NULL check out of fast path · 738967b8

由 Andy Zhou 提交于 9月 08, 2014

skb_clone() NULL check is implemented in do_output(), as past of the
common (fast) path. Refactoring so that NULL check is done in the
slow path, immediately after skb_clone() is called.

Besides optimization, this change also improves code readability by
making the skb_clone() NULL check consistent within OVS datapath
module.
Signed-off-by: NAndy Zhou <azhou@nicira.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

738967b8

openvswitch: Additional logging for -EINVAL on flow setups. · 426cda5c

由 Jesse Gross 提交于 10月 06, 2014

There are many possible ways that a flow can be invalid so we've
added logging for most of them. This adds logs for the remaining
possible cases so there isn't any ambiguity while debugging.

CC: Federico Iezzi <fiezzi@enter.it>
Signed-off-by: NJesse Gross <jesse@nicira.com>
Acked-by: NThomas Graf <tgraf@noironetworks.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

426cda5c

openvswitch: Remove redundant tcp_flags code. · 1b760fb9

由 Joe Stringer 提交于 9月 07, 2014

These two cases used to be treated differently for IPv4/IPv6,
but they are now identical.
Signed-off-by: NJoe Stringer <joestringer@nicira.com>
Acked-by: NJarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

1b760fb9

openvswitch: Move table destroy to dp-rcu callback. · 9b996e54

由 Pravin B Shelar 提交于 5月 06, 2014

Ths simplifies flow-table-destroy API. No need to pass explicit
parameter about context.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NThomas Graf <tgraf@redhat.com>

9b996e54

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功