提交 · 6a5d58b67e205f2ffc62d0a9ee4ef7d237e9a7fb · openanolis / cloud-kernel

20 9月, 2016 2 次提交

net sched ife action: add 16 bit helpers · 6a5d58b6

由 Jamal Hadi Salim 提交于 9月 18, 2016

encoder and checker for 16 bits metadata
Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6a5d58b6

gso: Support partial splitting at the frag_list pointer · 07b26c94

由 Steffen Klassert 提交于 9月 19, 2016

Since commit 8a29111c ("net: gro: allow to build full sized skb")
gro may build buffers with a frag_list. This can hurt forwarding
because most NICs can't offload such packets, they need to be
segmented in software. This patch splits buffers with a frag_list
at the frag_list pointer into buffers that can be TSO offloaded.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

07b26c94

19 9月, 2016 18 次提交

sched: add and use qdisc_skb_head helpers · 48da34b7

由 Florian Westphal 提交于 9月 18, 2016

This change replaces sk_buff_head struct in Qdiscs with new qdisc_skb_head.

Its similar to the skb_buff_head api, but does not use skb->prev pointers.

Qdiscs will commonly enqueue at the tail of a list and dequeue at head.
While skb_buff_head works fine for this, enqueue/dequeue needs to also
adjust the prev pointer of next element.

The ->prev pointer is not required for qdiscs so we can just leave
it undefined and avoid one cacheline write access for en/dequeue.
Suggested-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

48da34b7

sched: replace __skb_dequeue with __qdisc_dequeue_head · ed760cb8

由 Florian Westphal 提交于 9月 18, 2016

After previous patch these functions are identical.
Replace __skb_dequeue in qdiscs with __qdisc_dequeue_head.

Next patch will then make __qdisc_dequeue_head handle
single-linked list instead of strcut sk_buff_head argument.

Doesn't change generated code.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ed760cb8

sched: remove qdisc arg from __qdisc_dequeue_head · ec323368

由 Florian Westphal 提交于 9月 18, 2016

Moves qdisc stat accouting to qdisc_dequeue_head.

The only direct caller of the __qdisc_dequeue_head version open-codes
this now.

This allows us to later use __qdisc_dequeue_head as a replacement
of __skb_dequeue() (which operates on sk_buff_head list).
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec323368

sched: don't use skb queue helpers · 97d0678f

由 Florian Westphal 提交于 9月 18, 2016

A followup change will replace the sk_buff_head in the qdisc
struct with a slightly different list.

Use of the sk_buff_head helpers will thus cause compiler
warnings.

Open-code these accesses in an extra change to ease review.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

97d0678f

pie: use qdisc_dequeue_head wrapper · 1486587b

由 Florian Westphal 提交于 9月 18, 2016

Doesn't change generated code.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1486587b

sctp: Remove some redundant code · e8bc8f9a

由 Christophe Jaillet 提交于 9月 16, 2016

In commit 311b2177 ("sctp: simplify sk_receive_queue locking"), a call
to 'skb_queue_splice_tail_init()' has been made explicit. Previously it was
hidden in 'sctp_skb_list_tail()'

Now, the code around it looks redundant. The '_init()' part of
'skb_queue_splice_tail_init()' should already do the same.
Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e8bc8f9a

net: Add _nf_(un)register_hooks symbols · e8bffe0c

由 Mahesh Bandewar 提交于 9月 16, 2016

Add _nf_register_hooks() and _nf_unregister_hooks() calls which allow
caller to hold RTNL mutex.
Signed-off-by: NMahesh Bandewar <maheshb@google.com>
CC: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e8bffe0c

ipv6: Export p6_route_input_lookup symbol · d409b847

由 Mahesh Bandewar 提交于 9月 16, 2016

Make ip6_route_input_lookup available outside of ipv6 the module
similar to ip_route_input_noref in the IPv4 world.
Signed-off-by: NMahesh Bandewar <maheshb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d409b847

net: core: Add offload stats to if_stats_msg · 69ae6ad2

由 Nogah Frankel 提交于 9月 16, 2016

Add a nested attribute of offload stats to if_stats_msg
named IFLA_STATS_LINK_OFFLOAD_XSTATS.
Under it, add SW stats, meaning stats only per packets that went via
slowpath to the cpu, named IFLA_OFFLOAD_XSTATS_CPU_HIT.
Signed-off-by: NNogah Frankel <nogahf@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

69ae6ad2

pkt_sched: fq: use proper locking in fq_dump_stats() · 695b4ec0

由 Eric Dumazet 提交于 9月 15, 2016

When fq is used on 32bit kernels, we need to lock the qdisc before
copying 64bit fields.

Otherwise "tc -s qdisc ..." might report bogus values.

Fixes: afe4fd06 ("pkt_sched: fq: Fair Queue packet scheduler")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

695b4ec0

openvswitch: use percpu flow stats · db74a333

由 Thadeu Lima de Souza Cascardo 提交于 9月 15, 2016

Instead of using flow stats per NUMA node, use it per CPU. When using
megaflows, the stats lock can be a bottleneck in scalability.

On a E5-2690 12-core system, usual throughput went from ~4Mpps to
~15Mpps when forwarding between two 40GbE ports with a single flow
configured on the datapath.

This has been tested on a system with possible CPUs 0-7,16-23. After
module removal, there were no corruption on the slab cache.
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@redhat.com>
Cc: pravin shelar <pshelar@ovn.org>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

db74a333

openvswitch: fix flow stats accounting when node 0 is not possible · 40773966

由 Thadeu Lima de Souza Cascardo 提交于 9月 15, 2016

On a system with only node 1 as possible, all statistics is going to be
accounted on node 0 as it will have a single writer.

However, when getting and clearing the statistics, node 0 is not going
to be considered, as it's not a possible node.

Tested that statistics are not zero on a system with only node 1
possible. Also compile-tested with CONFIG_NUMA off.
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@redhat.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

40773966

sctp: not return ENOMEM err back in sctp_packet_transmit · 41001faf

由 Xin Long 提交于 9月 14, 2016

As David and Marcelo's suggestion, ENOMEM err shouldn't return back to
user in transmit path. Instead, sctp's retransmit would take care of
the chunks that fail to send because of ENOMEM.

This patch is only to do some release job when alloc_skb fails, not to
return ENOMEM back any more.

Besides, it also cleans up sctp_packet_transmit's err path, and fixes
some issues in err path:

 - It didn't free the head skb in nomem: path.
 - No need to check nskb in no_route: path.
 - It should goto err: path if alloc_skb fails for head.
 - Not all the NOMEMs should free nskb.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

41001faf

sctp: make sctp_outq_flush/tail/uncork return void · 83dbc3d4

由 Xin Long 提交于 9月 14, 2016

sctp_outq_flush return value is meaningless now, this patch is
to make sctp_outq_flush return void, as well as sctp_outq_fail
and sctp_outq_uncork.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

83dbc3d4

sctp: save transmit error to sk_err in sctp_outq_flush · 64519440

由 Xin Long 提交于 9月 14, 2016

Every time when sctp calls sctp_outq_flush, it sends out the chunks of
control queue, retransmit queue and data queue. Even if some trunks are
failed to transmit, it still has to flush all the transports, as it's
the only chance to clean that transmit_list.

So the latest transmit error here should be returned back. This transmit
error is an internal error of sctp stack.

I checked all the places where it uses the transmit error (the return
value of sctp_outq_flush), most of them are actually just save it to
sk_err.

Except for sctp_assoc/endpoint_bh_rcv, they will drop the chunk if
it's failed to send a REPLY, which is actually incorrect, as we can't
be sure the error that sctp_outq_flush returns is from sending that
REPLY.

So it's meaningless for sctp_outq_flush to return error back.

This patch is to save transmit error to sk_err in sctp_outq_flush, the
new error can update the old value. Eventually, sctp_wait_for_* would
check for it.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64519440

sctp: free msg->chunks when sctp_primitive_SEND return err · b61c654f

由 Xin Long 提交于 9月 14, 2016

Last patch "sctp: do not return the transmit err back to sctp_sendmsg"
made sctp_primitive_SEND return err only when asoc state is unavailable.
In this case, chunks are not enqueued, they have no chance to be freed if
we don't take care of them later.

This Patch is actually to revert commit 1cd4d5c4 ("sctp: remove the
unused sctp_datamsg_free()"), commit 69b5777f ("sctp: hold the chunks
only after the chunk is enqueued in outq") and commit 8b570dc9 ("sctp:
only drop the reference on the datamsg after sending a msg"), to use
sctp_datamsg_free to free the chunks of current msg.

Fixes: 8b570dc9 ("sctp: only drop the reference on the datamsg after sending a msg")
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b61c654f

sctp: do not return the transmit err back to sctp_sendmsg · 66388f2c

由 Xin Long 提交于 9月 14, 2016

Once a chunk is enqueued successfully, sctp queues can take care of it.
Even if it is failed to transmit (like because of nomem), it should be
put into retransmit queue.

If sctp report this error to users, it confuses them, they may resend
that msg, but actually in kernel sctp stack is in charge of retransmit
it already.

Besides, this error probably is not from the failure of transmitting
current msg, but transmitting or retransmitting another msg's chunks,
as sctp_outq_flush just tries to send out all transports' chunks.

This patch is to make sctp_cmd_send_msg return avoid, and not return the
transmit err back to sctp_sendmsg

Fixes: 8b570dc9 ("sctp: only drop the reference on the datamsg after sending a msg")
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66388f2c

sctp: remove the unnecessary state check in sctp_outq_tail · 2c89791e

由 Xin Long 提交于 9月 14, 2016

Data Chunks are only sent by sctp_primitive_SEND, in which sctp checks
the asoc's state through statetable before calling sctp_outq_tail. So
there's no need to check the asoc's state again in sctp_outq_tail.

Besides, sctp_do_sm is protected by lock_sock, even if sending msg is
interrupted by timer events, the event's processes still need to acquire
lock_sock first. It means no others CMDs can be enqueue into side effect
list before CMD_SEND_MSG to change asoc->state, so it's safe to remove it.

This patch is to remove redundant asoc->state check from sctp_outq_tail.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2c89791e

17 9月, 2016 20 次提交

ip6_tunnel: add collect_md mode to IPv6 tunnels · 8d79266b

由 Alexei Starovoitov 提交于 9月 15, 2016

Similar to gre, vxlan, geneve tunnels allow IPIP6 and IP6IP6 tunnels
to operate in 'collect metadata' mode.
Unlike ipv4 code here it's possible to reuse ip6_tnl_xmit() function
for both collect_md and traditional tunnels.
bpf_skb_[gs]et_tunnel_key() helpers and ovs (in the future) are the users.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NThomas Graf <tgraf@suug.ch>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d79266b

ip_tunnel: add collect_md mode to IPIP tunnel · cfc7381b

由 Alexei Starovoitov 提交于 9月 15, 2016

Similar to gre, vxlan, geneve tunnels allow IPIP tunnels to
operate in 'collect metadata' mode.
bpf_skb_[gs]et_tunnel_key() helpers can make use of it right away.
ovs can use it as well in the future (once appropriate ovs-vport
abstractions and user apis are added).
Note that just like in other tunnels we cannot cache the dst,
since tunnel_info metadata can be different for every packet.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NThomas Graf <tgraf@suug.ch>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cfc7381b

l2tp: constify net_device_ops structures · eb94737d

由 Julia Lawall 提交于 9月 15, 2016

Check for net_device_ops structures that are only stored in the netdev_ops
field of a net_device structure.  This field is declared const, so
net_device_ops structures that have this property can be declared as const
also.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@r disable optional_qualifier@
identifier i;
position p;
@@
static struct net_device_ops i@p = { ... };

@ok@
identifier r.i;
struct net_device e;
position p;
@@
e.netdev_ops = &i@p;

@bad@
position p != {r.p,ok.p};
identifier r.i;
struct net_device_ops e;
@@
e@i@p

@depends on !bad disable optional_qualifier@
identifier r.i;
@@
static
+const
 struct net_device_ops i = { ... };
// </smpl>

The result of size on this file before the change is:
   text	      data     bss     dec         hex	  filename
   3401        931      44    4376        1118	net/l2tp/l2tp_eth.o

and after the change it is:
   text	     data        bss	    dec	    hex	filename
   3993       347         44       4384    1120	net/l2tp/l2tp_eth.o
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eb94737d

llc: switch type to bool as the timeout is only tested versus 0 · 5ff904d5

由 Alan Cox 提交于 9月 15, 2016

(As asked by Dave in Februrary)
Signed-off-by: NAlan Cox <alan@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5ff904d5

tcp: prepare skbs for better sack shifting · 3613b3db

由 Eric Dumazet 提交于 9月 15, 2016

With large BDP TCP flows and lossy networks, it is very important
to keep a low number of skbs in the write queue.

RACK and SACK processing can perform a linear scan of it.

We should avoid putting any payload in skb->head, so that SACK
shifting can be done if needed.

With this patch, we allow to pack ~0.5 MB per skb instead of
the 64KB initially cooked at tcp_sendmsg() time.

This gives a reduction of number of skbs in write queue by eight.
tcp_rack_detect_loss() likes this.

We still allow payload in skb->head for first skb put in the queue,
to not impact RPC workloads.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3613b3db

rxrpc: Add config to inject packet loss · 8a681c36

由 David Howells 提交于 9月 17, 2016

Add a configuration option to inject packet loss by discarding
approximately every 8th packet received and approximately every 8th DATA
packet transmitted.

Note that no locking is used, but it shouldn't really matter.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

8a681c36

rxrpc: Improve skb tracing · 71f3ca40

由 David Howells 提交于 9月 17, 2016

Improve sk_buff tracing within AF_RXRPC by the following means:

 (1) Use an enum to note the event type rather than plain integers and use
     an array of event names rather than a big multi ?: list.

 (2) Distinguish Rx from Tx packets and account them separately.  This
     requires the call phase to be tracked so that we know what we might
     find in rxtx_buffer[].

 (3) Add a parameter to rxrpc_{new,see,get,free}_skb() to indicate the
     event type.

 (4) A pair of 'rotate' events are added to indicate packets that are about
     to be rotated out of the Rx and Tx windows.

 (5) A pair of 'lost' events are added, along with rxrpc_lose_skb() for
     packet loss injection recording.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

71f3ca40

rxrpc: Remove printks from rxrpc_recvmsg_data() to fix uninit var · ba39f3a0

由 David Howells 提交于 9月 17, 2016

Remove _enter/_debug/_leave calls from rxrpc_recvmsg_data() of which one
uses an uninitialised variable.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

ba39f3a0

rxrpc: Add a tracepoint to follow what recvmsg does · 84997905

由 David Howells 提交于 9月 17, 2016

Add a tracepoint to follow what recvmsg does within AF_RXRPC.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

84997905

rxrpc: Add a tracepoint to follow packets in the Rx buffer · 58dc63c9

由 David Howells 提交于 9月 17, 2016

Add a tracepoint to follow the life of packets that get added to a call's
receive buffer.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

58dc63c9

rxrpc: Add a tracepoint to log ACK transmission · f3639df2

由 David Howells 提交于 9月 17, 2016

Add a tracepoint to log information about ACK transmission.
Signed-off-by: NDavid Howels <dhowells@redhat.com>

f3639df2

rxrpc: Add a tracepoint to log received ACK packets · ec71eb9a

由 David Howells 提交于 9月 17, 2016

Add a tracepoint to log information from received ACK packets.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

ec71eb9a

rxrpc: Add a tracepoint to follow the life of a packet in the Tx buffer · a124fe3e

由 David Howells 提交于 9月 17, 2016

Add a tracepoint to follow the insertion of a packet into the transmit
buffer, its transmission and its rotation out of the buffer.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

a124fe3e

rxrpc: Add connection tracepoint and client conn state tracepoint · 363deeab

由 David Howells 提交于 9月 17, 2016

Add a pair of tracepoints, one to track rxrpc_connection struct ref
counting and the other to track the client connection cache state.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

363deeab

rxrpc: Add some additional call tracing · a84a46d7

由 David Howells 提交于 9月 17, 2016

Add additional call tracepoint points for noting call-connected,
call-released and connection-failed events.

Also fix one tracepoint that was using an integer instead of the
corresponding enum value as the point type.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

a84a46d7

rxrpc: Print the packet type name in the Rx packet trace · a3868bfc

由 David Howells 提交于 9月 17, 2016

Print a symbolic packet type name for each valid received packet in the
trace output, not just a number.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

a3868bfc

rxrpc: Fix the basic transmit DATA packet content size at 1412 bytes · 182f5056

由 David Howells 提交于 9月 17, 2016

Fix the basic transmit DATA packet content size at 1412 bytes so that they
can be arbitrarily assembled into jumbo packets.

In the future, I'm thinking of moving to keeping a jumbo packet header at
the beginning of each packet in the Tx queue and creating the packet header
on the spot when kernel_sendmsg() is invoked. That way, jumbo packets can
be assembled on the spur of the moment for (re-)transmission.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

182f5056

rxrpc: Be consistent about switch value in rxrpc_send_call_packet() · 2311e327

由 David Howells 提交于 9月 17, 2016

rxrpc_send_call_packet() should use type in both its switch-statements
rather than using pkt->whdr.type.  This might give the compiler an easier
job of uninitialised variable checking.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

2311e327

rxrpc: Don't transmit an ACK if there's no reason set · 27d0fc43

由 David Howells 提交于 9月 17, 2016

Don't transmit an ACK if call->ackr_reason in unset.  There's the
possibility of a race between recvmsg() sending an ACK and the background
processing thread trying to send the same one.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

27d0fc43

rxrpc: Fix retransmission algorithm · dfa7d920

由 David Howells 提交于 9月 17, 2016

Make the retransmission algorithm use for-loops instead of do-loops and
move the counter increments into the for-statement increment slots.

Though the do-loops are slighly more efficient since there will be at least
one pass through the each loop, the counter increments are harder to get
right as the continue-statements skip them.

Without this, if there are any positive acks within the loop, the do-loop
will cycle forever because the counter increment is never done.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

dfa7d920

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功