提交 · 6543ac523558b2392271f3f8088e6455b3f00bb1 · openanolis / cloud-kernel

07 9月, 2016 7 次提交

rxrpc: Use rxrpc_is_service_call() rather than rxrpc_conn_is_service() · 6543ac52

由 David Howells 提交于 9月 07, 2016

Use rxrpc_is_service_call() rather than rxrpc_conn_is_service() if the call
is available just in case call->conn is NULL.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

6543ac52

rxrpc: Pass the connection pointer to rxrpc_post_packet_to_call() · 8b7fac50

由 David Howells 提交于 9月 07, 2016

Pass the connection pointer to rxrpc_post_packet_to_call() as the call
might get disconnected whilst we're looking at it, but the connection
pointer determined by rxrpc_data_read() is guaranteed by RCU for the
duration of the call.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

8b7fac50

rxrpc: Cache the security index in the rxrpc_call struct · 278ac0cd

由 David Howells 提交于 9月 07, 2016

Cache the security index in the rxrpc_call struct so that we can get at it
even when the call has been disconnected and the connection pointer
cleared.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

278ac0cd

rxrpc: Use call->peer rather than call->conn->params.peer · f4fdb352

由 David Howells 提交于 9月 07, 2016

Use call->peer rather than call->conn->params.peer to avoid the possibility
of call->conn being NULL and, whilst we're at it, check it for NULL before we
access it.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

f4fdb352

rxrpc: Improve the call tracking tracepoint · fff72429

由 David Howells 提交于 9月 07, 2016

Improve the call tracking tracepoint by showing more differentiation
between some of the put and get events, including:

  (1) Getting and putting refs for the socket call user ID tree.

  (2) Getting and putting refs for queueing and failing to queue the call
      processor work item.

Note that these aren't necessarily used in this patch, but will be taken
advantage of in future patches.

An enum is added for the event subtype numbers rather than coding them
directly as decimal numbers and a table of 3-letter strings is provided
rather than a sequence of ?: operators.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

fff72429

D
rxrpc: Delete unused rxrpc_kernel_free_skb() · e796cb41
由 David Howells 提交于 9月 07, 2016
```
Delete rxrpc_kernel_free_skb() as it's unused.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
```
e796cb41
D
rxrpc: Whitespace cleanup · 71a17de3
由 David Howells 提交于 9月 07, 2016
```
Remove some whitespace.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
```
71a17de3

05 9月, 2016 8 次提交

rxrpc Move enum rxrpc_command to sendmsg.c · 3dc20f09

由 David Howells 提交于 9月 04, 2016

Move enum rxrpc_command to sendmsg.c as it's now only used in that file.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

3dc20f09

rxrpc: Rearrange net/rxrpc/sendmsg.c · df423a4a

由 David Howells 提交于 9月 02, 2016

Rearrange net/rxrpc/sendmsg.c to be in a more logical order.  This makes it
easier to follow and eliminates forward declarations.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

df423a4a

rxrpc: Split sendmsg from packet transmission code · 0b58b8a1

由 David Howells 提交于 9月 02, 2016

Split the sendmsg code from the packet transmission code (mostly to be
found in output.c).
Signed-off-by: NDavid Howells <dhowells@redhat.com>

0b58b8a1

rxrpc: Don't change the epoch · 090f85de

由 David Howells 提交于 9月 04, 2016

It seems the local epoch should only be changed on boot, so remove the code
that changes it for client connections.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

090f85de

rxrpc: Randomise epoch and starting client conn ID values · 5f2d9c44

由 David Howells 提交于 9月 02, 2016

Create a random epoch value rather than a time-based one on startup and set
the top bit to indicate that this is the case.

Also create a random starting client connection ID value.  This will be
incremented from here as new client connections are created.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

5f2d9c44

netns: avoid disabling irq for netns id · bc51dddf

由 WANG Cong 提交于 9月 01, 2016

We never read or change netns id in hardirq context,
the only place we read netns id in softirq context
is in vxlan_xmit(). So, it should be enough to just
disable BH.

Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc51dddf

vxlan: call peernet2id() in fdb notification · 38f507f1

由 WANG Cong 提交于 9月 01, 2016

netns id should be already allocated each time we change
netns, that is, in dev_change_net_namespace() (more precisely
in rtnl_fill_ifinfo()). It is safe to just call peernet2id() here.

Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

38f507f1

openvswitch: Free tmpl with tmpl_free. · 76644232

由 Joe Stringer 提交于 9月 01, 2016

When an error occurs during conntrack template creation as part of
actions validation, we need to free the template. Previously we've been
using nf_ct_put() to do this, but nf_ct_tmpl_free() is more appropriate.
Signed-off-by: NJoe Stringer <joe@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

76644232

04 9月, 2016 1 次提交

rxrpc: The client call state must be changed before attachment to conn · af338a9e

由 David Howells 提交于 9月 04, 2016

We must set the client call state to RXRPC_CALL_CLIENT_SEND_REQUEST before
attaching the call to the connection struct, not after, as it's liable to
receive errors and conn aborts as soon as the assignment is made - and
these will cause its state to be changed outside of the initiating thread's
control.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

af338a9e

03 9月, 2016 5 次提交

tipc: send broadcast nack directly upon sequence gap detection · e0a05ebe

由 Jon Paul Maloy 提交于 9月 01, 2016

Because of the risk of an excessive number of NACK messages and
retransissions, receivers have until now abstained from sending
broadcast NACKS directly upon detection of a packet sequence number
gap. We have instead relied on such gaps being detected by link
protocol STATE message exchange, something that by necessity delays
such detection and subsequent retransmissions.

With the introduction of unicast NACK transmission and rate control
of retransmissions we can now remove this limitation. We now allow
receiving nodes to send NACKS immediately, while coordinating the
permission to do so among the nodes in order to avoid NACK storms.
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0a05ebe

tipc: rate limit broadcast retransmissions · 7c4a54b9

由 Jon Paul Maloy 提交于 9月 01, 2016

As cluster sizes grow, so does the amount of identical or overlapping
broadcast NACKs generated by the packet receivers. This often leads to
'NACK crunches' resulting in huge numbers of redundant retransmissions
of the same packet ranges.

In this commit, we introduce rate control of broadcast retransmissions,
so that a retransmitted range cannot be retransmitted again until after
at least 10 ms. This reduces the frequency of duplicate, redundant
retransmissions by an order of magnitude, while having a significant
positive impact on overall throughput and scalability.
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c4a54b9

tipc: transfer broadcast nacks in link state messages · 02d11ca2

由 Jon Paul Maloy 提交于 9月 01, 2016

When we send broadcasts in clusters of more 70-80 nodes, we sometimes
see the broadcast link resetting because of an excessive number of
retransmissions. This is caused by a combination of two factors:

1) A 'NACK crunch", where loss of broadcast packets is discovered
   and NACK'ed by several nodes simultaneously, leading to multiple
   redundant broadcast retransmissions.

2) The fact that the NACKS as such also are sent as broadcast, leading
   to excessive load and packet loss on the transmitting switch/bridge.

This commit deals with the latter problem, by moving sending of
broadcast nacks from the dedicated BCAST_PROTOCOL/NACK message type
to regular unicast LINK_PROTOCOL/STATE messages. We allocate 10 unused
bits in word 8 of the said message for this purpose, and introduce a
new capability bit, TIPC_BCAST_STATE_NACK in order to keep the change
backwards compatible.
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02d11ca2

rxrpc: Fix uninitialised variable warning · 00b5407e

由 David Howells 提交于 9月 02, 2016

Fix the following uninitialised variable warning:

../net/rxrpc/call_event.c: In function 'rxrpc_process_call':
../net/rxrpc/call_event.c:879:58: warning: 'error' may be used uninitialized in this function [-Wmaybe-uninitialized]
    _debug("post net error %d", error);
                                                          ^
Signed-off-by: NDavid Howells <dhowells@redhat.com>

00b5407e

rxrpc: fix undefined behavior in rxrpc_mark_call_released · 30787a41

由 Arnd Bergmann 提交于 9月 02, 2016

gcc -Wmaybe-initialized correctly points out a newly introduced bug
through which we can end up calling rxrpc_queue_call() for a dead
connection:

net/rxrpc/call_object.c: In function 'rxrpc_mark_call_released':
net/rxrpc/call_object.c:600:5: error: 'sched' may be used uninitialized in this function [-Werror=maybe-uninitialized]

This sets the 'sched' variable to zero to restore the previous
behavior.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Fixes: f5c17aae ("rxrpc: Calls should only have one terminal state")
Signed-off-by: NDavid Howells <dhowells@redhat.com>

30787a41

02 9月, 2016 13 次提交

net: bridge: add per-port multicast flood flag · b6cb5ac8

由 Nikolay Aleksandrov 提交于 8月 31, 2016

Add a per-port flag to control the unknown multicast flood, similar to the
unknown unicast flood flag and break a few long lines in the netlink flag
exports.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b6cb5ac8

net: bridge: change unicast boolean to exact pkt_type · 8addd5e7

由 Nikolay Aleksandrov 提交于 8月 31, 2016

Remove the unicast flag and introduce an exact pkt_type. That would help us
for the upcoming per-port multicast flood flag and also slightly reduce the
tests in the input fast path.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8addd5e7

rtnetlink: fdb dump: optimize by saving last interface markers · d297653d

由 Roopa Prabhu 提交于 8月 30, 2016

fdb dumps spanning multiple skb's currently restart from the first
interface again for every skb. This results in unnecessary
iterations on the already visited interfaces and their fdb
entries. In large scale setups, we have seen this to slow
down fdb dumps considerably. On a system with 30k macs we
see fdb dumps spanning across more than 300 skbs.

To fix the problem, this patch replaces the existing single fdb
marker with three markers: netdev hash entries, netdevs and fdb
index to continue where we left off instead of restarting from the
first netdev. This is consistent with link dumps.

In the process of fixing the performance issue, this patch also
re-implements fix done by
commit 472681d5 ("net: ndo_fdb_dump should report -EMSGSIZE to rtnl_fdb_dump")
(with an internal fix from Wilson Kok) in the following ways:
- change ndo_fdb_dump handlers to return error code instead
of the last fdb index
- use cb->args strictly for dump frag markers and not error codes.
This is consistent with other dump functions.

Below results were taken on a system with 1000 netdevs
and 35085 fdb entries:
before patch:
$time bridge fdb show | wc -l
15065

real    1m11.791s
user    0m0.070s
sys 1m8.395s

(existing code does not return all macs)

after patch:
$time bridge fdb show | wc -l
35085

real    0m2.017s
user    0m0.113s
sys 0m1.942s
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NWilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d297653d

rxrpc: Don't expose skbs to in-kernel users [ver ] · d001648e

由 David Howells 提交于 8月 30, 2016

Don't expose skbs to in-kernel users, such as the AFS filesystem, but
instead provide a notification hook the indicates that a call needs
attention and another that indicates that there's a new call to be
collected.

This makes the following possibilities more achievable:

 (1) Call refcounting can be made simpler if skbs don't hold refs to calls.

 (2) skbs referring to non-data events will be able to be freed much sooner
     rather than being queued for AFS to pick up as rxrpc_kernel_recv_data
     will be able to consult the call state.

 (3) We can shortcut the receive phase when a call is remotely aborted
     because we don't have to go through all the packets to get to the one
     cancelling the operation.

 (4) It makes it easier to do encryption/decryption directly between AFS's
     buffers and sk_buffs.

 (5) Encryption/decryption can more easily be done in the AFS's thread
     contexts - usually that of the userspace process that issued a syscall
     - rather than in one of rxrpc's background threads on a workqueue.

 (6) AFS will be able to wait synchronously on a call inside AF_RXRPC.

To make this work, the following interface function has been added:

     int rxrpc_kernel_recv_data(
		struct socket *sock, struct rxrpc_call *call,
		void *buffer, size_t bufsize, size_t *_offset,
		bool want_more, u32 *_abort_code);

This is the recvmsg equivalent.  It allows the caller to find out about the
state of a specific call and to transfer received data into a buffer
piecemeal.

afs_extract_data() and rxrpc_kernel_recv_data() now do all the extraction
logic between them.  They don't wait synchronously yet because the socket
lock needs to be dealt with.

Five interface functions have been removed:

	rxrpc_kernel_is_data_last()
    	rxrpc_kernel_get_abort_code()
    	rxrpc_kernel_get_error_number()
    	rxrpc_kernel_free_skb()
    	rxrpc_kernel_data_consumed()

As a temporary hack, sk_buffs going to an in-kernel call are queued on the
rxrpc_call struct (->knlrecv_queue) rather than being handed over to the
in-kernel user.  To process the queue internally, a temporary function,
temp_deliver_data() has been added.  This will be replaced with common code
between the rxrpc_recvmsg() path and the kernel_rxrpc_recv_data() path in a
future patch.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d001648e

l2tp: make nla_policy const · f5bb341e

由 stephen hemminger 提交于 8月 31, 2016

Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f5bb341e

tcp: make nla_policy const · 4f70c96f

由 stephen hemminger 提交于 8月 31, 2016

Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4f70c96f

ila: make nla_policy const · 6501f34f

由 stephen hemminger 提交于 8月 31, 2016

Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6501f34f

fou: make nla_policy const · 3f18ff2b

由 stephen hemminger 提交于 8月 31, 2016

Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f18ff2b

netns: make nla_policy const · 3ee5256d

由 stephen hemminger 提交于 8月 31, 2016

Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ee5256d

batman: make netlink attributes const · deeb91f5

由 stephen hemminger 提交于 8月 31, 2016

Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

deeb91f5

drop_monitor: make genl_multicast_group const · 85bae4bd

由 stephen hemminger 提交于 8月 31, 2016

Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

85bae4bd

net: make genetlink ctrl ops const · 12d8de6d

由 stephen hemminger 提交于 8月 31, 2016

Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

12d8de6d

mpls: get rid of trivial returns · ce927bf1

由 stephen hemminger 提交于 9月 01, 2016

return at end of function is useless.
Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce927bf1

01 9月, 2016 1 次提交

net: dsa: add MDB support · 8df30255

由 Vivien Didelot 提交于 8月 31, 2016

Add SWITCHDEV_OBJ_ID_PORT_MDB support to the DSA layer.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8df30255

31 8月, 2016 3 次提交

net: mpls: Fixups for GSO · 48d2ab60

由 David Ahern 提交于 8月 24, 2016

As reported by Lennert the MPLS GSO code is failing to properly segment
large packets. There are a couple of problems:

1. the inner protocol is not set so the gso segment functions for inner
   protocol layers are not getting run, and

2  MPLS labels for packets that use the "native" (non-OVS) MPLS code
   are not properly accounted for in mpls_gso_segment.

The MPLS GSO code was added for OVS. It is re-using skb_mac_gso_segment
to call the gso segment functions for the higher layer protocols. That
means skb_mac_gso_segment is called twice -- once with the network
protocol set to MPLS and again with the network protocol set to the
inner protocol.

This patch sets the inner skb protocol addressing item 1 above and sets
the network_header and inner_network_header to mark where the MPLS labels
start and end. The MPLS code in OVS is also updated to set the two
network markers.

>From there the MPLS GSO code uses the difference between the network
header and the inner network header to know the size of the MPLS header
that was pushed. It then pulls the MPLS header, resets the mac_len and
protocol for the inner protocol and then calls skb_mac_gso_segment
to segment the skb.

Afterward the inner protocol segmentation is done the skb protocol
is set to mpls for each segment and the network and mac headers
restored.
Reported-by: NLennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

48d2ab60

net: lwtunnel: Handle fragmentation · 14972cbd

由 Roopa Prabhu 提交于 8月 24, 2016

Today mpls iptunnel lwtunnel_output redirect expects the tunnel
output function to handle fragmentation. This is ok but can be
avoided if we did not do the mpls output redirect too early.
ie we could wait until ip fragmentation is done and then call
mpls output for each ip fragment.

To make this work we will need,
1) the lwtunnel state to carry encap headroom
2) and do the redirect to the encap output handler on the ip fragment
(essentially do the output redirect after fragmentation)

This patch adds tunnel headroom in lwtstate to make sure we
account for tunnel data in mtu calculations during fragmentation
and adds new xmit redirect handler to redirect to lwtunnel xmit func
after ip fragmentation.

This includes IPV6 and some mtu fixes and testing from David Ahern.
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

14972cbd

net: batch calls to flush_all_backlogs() · 41852497

由 Eric Dumazet 提交于 8月 26, 2016

After commit 145dd5f9 ("net: flush the softnet backlog in process
context"), we can easily batch calls to flush_all_backlogs() for all
devices processed in rollback_registered_many()

Tested:

Before patch, on an idle host.

modprobe dummy numdummies=10000
perf stat -e context-switches -a rmmod dummy

 Performance counter stats for 'system wide':

         1,211,798      context-switches

       1.302137465 seconds time elapsed

After patch:

perf stat -e context-switches -a rmmod dummy

 Performance counter stats for 'system wide':

           225,523      context-switches

       0.721623566 seconds time elapsed
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

41852497

30 8月, 2016 2 次提交

rxrpc: Pass struct socket * to more rxrpc kernel interface functions · 4de48af6

由 David Howells 提交于 8月 30, 2016

Pass struct socket * to more rxrpc kernel interface functions.  They should
be starting from this rather than the socket pointer in the rxrpc_call
struct if they need to access the socket.

I have left:

	rxrpc_kernel_is_data_last()
	rxrpc_kernel_get_abort_code()
	rxrpc_kernel_get_error_number()
	rxrpc_kernel_free_skb()
	rxrpc_kernel_data_consumed()

unmodified as they're all about to be removed (and, in any case, don't
touch the socket).
Signed-off-by: NDavid Howells <dhowells@redhat.com>

4de48af6

rxrpc: Use call->peer rather than going to the connection · ea82aaec

由 David Howells 提交于 8月 30, 2016

Use call->peer rather than call->conn->params.peer as call->conn may become
NULL.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

ea82aaec

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功