提交 · ba39f3a0ed756ccd882adf4a77916ec863db3ce4 · openanolis / cloud-kernel

17 9月, 2016 24 次提交

rxrpc: Remove printks from rxrpc_recvmsg_data() to fix uninit var · ba39f3a0

由 David Howells 提交于 9月 17, 2016

Remove _enter/_debug/_leave calls from rxrpc_recvmsg_data() of which one
uses an uninitialised variable.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

ba39f3a0

rxrpc: Add a tracepoint to follow what recvmsg does · 84997905

由 David Howells 提交于 9月 17, 2016

Add a tracepoint to follow what recvmsg does within AF_RXRPC.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

84997905

rxrpc: Add a tracepoint to follow packets in the Rx buffer · 58dc63c9

由 David Howells 提交于 9月 17, 2016

Add a tracepoint to follow the life of packets that get added to a call's
receive buffer.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

58dc63c9

rxrpc: Add a tracepoint to log ACK transmission · f3639df2

由 David Howells 提交于 9月 17, 2016

Add a tracepoint to log information about ACK transmission.
Signed-off-by: NDavid Howels <dhowells@redhat.com>

f3639df2

rxrpc: Add a tracepoint to log received ACK packets · ec71eb9a

由 David Howells 提交于 9月 17, 2016

Add a tracepoint to log information from received ACK packets.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

ec71eb9a

rxrpc: Add a tracepoint to follow the life of a packet in the Tx buffer · a124fe3e

由 David Howells 提交于 9月 17, 2016

Add a tracepoint to follow the insertion of a packet into the transmit
buffer, its transmission and its rotation out of the buffer.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

a124fe3e

rxrpc: Add connection tracepoint and client conn state tracepoint · 363deeab

由 David Howells 提交于 9月 17, 2016

Add a pair of tracepoints, one to track rxrpc_connection struct ref
counting and the other to track the client connection cache state.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

363deeab

rxrpc: Add some additional call tracing · a84a46d7

由 David Howells 提交于 9月 17, 2016

Add additional call tracepoint points for noting call-connected,
call-released and connection-failed events.

Also fix one tracepoint that was using an integer instead of the
corresponding enum value as the point type.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

a84a46d7

rxrpc: Print the packet type name in the Rx packet trace · a3868bfc

由 David Howells 提交于 9月 17, 2016

Print a symbolic packet type name for each valid received packet in the
trace output, not just a number.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

a3868bfc

rxrpc: Fix the basic transmit DATA packet content size at 1412 bytes · 182f5056

由 David Howells 提交于 9月 17, 2016

Fix the basic transmit DATA packet content size at 1412 bytes so that they
can be arbitrarily assembled into jumbo packets.

In the future, I'm thinking of moving to keeping a jumbo packet header at
the beginning of each packet in the Tx queue and creating the packet header
on the spot when kernel_sendmsg() is invoked. That way, jumbo packets can
be assembled on the spur of the moment for (re-)transmission.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

182f5056

rxrpc: Be consistent about switch value in rxrpc_send_call_packet() · 2311e327

由 David Howells 提交于 9月 17, 2016

rxrpc_send_call_packet() should use type in both its switch-statements
rather than using pkt->whdr.type.  This might give the compiler an easier
job of uninitialised variable checking.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

2311e327

rxrpc: Don't transmit an ACK if there's no reason set · 27d0fc43

由 David Howells 提交于 9月 17, 2016

Don't transmit an ACK if call->ackr_reason in unset.  There's the
possibility of a race between recvmsg() sending an ACK and the background
processing thread trying to send the same one.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

27d0fc43

rxrpc: Fix retransmission algorithm · dfa7d920

由 David Howells 提交于 9月 17, 2016

Make the retransmission algorithm use for-loops instead of do-loops and
move the counter increments into the for-statement increment slots.

Though the do-loops are slighly more efficient since there will be at least
one pass through the each loop, the counter increments are harder to get
right as the continue-statements skip them.

Without this, if there are any positive acks within the loop, the do-loop
will cycle forever because the counter increment is never done.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

dfa7d920

rxrpc: Fix the parsing of soft-ACKs · d01dc4c3

由 David Howells 提交于 9月 17, 2016

The soft-ACK parser doesn't increment the pointer into the soft-ACK list,
resulting in the first ACK/NACK value being applied to all the relevant
packets in the Tx queue.  This has the potential to miss retransmissions
and cause excessive retransmissions.

Fix this by incrementing the pointer.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

d01dc4c3

rxrpc: Fix unexposed client conn release · 78883793

由 David Howells 提交于 9月 17, 2016

If the last call on a client connection is release after the connection has
had a bunch of calls allocated but before any DATA packets are sent (so
that it's not yet marked RXRPC_CONN_EXPOSED), an assertion will happen in
rxrpc_disconnect_client_call().

	af_rxrpc: Assertion failed - 1(0x1) >= 2(0x2) is false
	------------[ cut here ]------------
	kernel BUG at ../net/rxrpc/conn_client.c:753!

This is because it's expecting the conn to have been exposed and to have 2
or more refs - but this isn't necessarily the case.

Simply remove the assertion.  This allows the conn to be moved into the
inactive state and deleted if it isn't resurrected before the final put is
called.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

78883793

rxrpc: Call rxrpc_release_call() on error in rxrpc_new_client_call() · 357f5ef6

由 David Howells 提交于 9月 17, 2016

Call rxrpc_release_call() on getting an error in rxrpc_new_client_call()
rather than trying to do the cleanup ourselves.  This isn't a problem,
provided we set RXRPC_CALL_HAS_USERID only if we actually add the call to
the calls tree as cleanup code fragments that would otherwise cause
problems are conditional.

Without this, we miss some of the cleanup.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

357f5ef6

rxrpc: Fix the putting of client connections · 66d58af7

由 David Howells 提交于 9月 17, 2016

In rxrpc_put_one_client_conn(), if a connection has RXRPC_CONN_COUNTED set
on it, then it's accounted for in rxrpc_nr_client_conns and may be on
various lists - and this is cleaned up correctly.

However, if the connection doesn't have RXRPC_CONN_COUNTED set on it, then
the put routine returns rather than just skipping the extra bit of cleanup.

Fix this by making the extra bit of clean up conditional instead and always
killing off the connection.

This manifests itself as connections with a zero usage count hanging around
in /proc/net/rxrpc_conns because the connection allocated, but discarded,
due to a race with another process that set up a parallel connection, which
was then shared instead.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

66d58af7

rxrpc: Purge the to_be_accepted queue on socket release · 0360da6d

由 David Howells 提交于 9月 17, 2016

Purge the queue of to_be_accepted calls on socket release. Note that
purging sock_calls doesn't release the ref owned by to_be_accepted.

Probably the sock_calls list is redundant given a purges of the recvmsg_q,
the to_be_accepted queue and the calls tree.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

0360da6d

rxrpc: Record calls that need to be accepted · e6f3afb3

由 David Howells 提交于 9月 17, 2016

Record calls that need to be accepted using sk_acceptq_added() otherwise
the backlog counter goes negative because sk_acceptq_removed() is called.
This causes the preallocator to malfunction.

Calls that are preaccepted by AFS within the kernel aren't affected by
this.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

e6f3afb3

rxrpc: Fix handling of the last packet in rxrpc_recvmsg_data() · 816c9fce

由 David Howells 提交于 9月 17, 2016

The code for determining the last packet in rxrpc_recvmsg_data() has been
using the RXRPC_CALL_RX_LAST flag to determine if the rx_top pointer points
to the last packet or not.  This isn't a good idea, however, as the input
code may be running simultaneously on another CPU and that sets the flag
*before* updating the top pointer.

Fix this by the following means:

 (1) Restrict the use of RXRPC_CALL_RX_LAST to the input routines only.
     There's otherwise a synchronisation problem between detecting the flag
     and checking tx_top.  This could probably be dealt with by appropriate
     application of memory barriers, but there's a simpler way.

 (2) Set RXRPC_CALL_RX_LAST after setting rx_top.

 (3) Make rxrpc_rotate_rx_window() consult the flags header field of the
     DATA packet it's about to discard to see if that was the last packet.
     Use this as the basis for ending the Rx phase.  This shouldn't be a
     problem because the recvmsg side of things is guaranteed to see the
     packets in order.

 (4) Make rxrpc_recvmsg_data() return 1 to indicate the end of the data if:

     (a) the packet it has just processed is marked as RXRPC_LAST_PACKET

     (b) the call's Rx phase has been ended.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

816c9fce

rxrpc: Check the return value of rxrpc_locate_data() · 2e2ea51d

由 David Howells 提交于 9月 17, 2016

Check the return value of rxrpc_locate_data() in rxrpc_recvmsg_data().
Signed-off-by: NDavid Howells <dhowells@redhat.com>

2e2ea51d

rxrpc: Move the check of rx_pkt_offset from rxrpc_locate_data() to caller · 4b22457c

由 David Howells 提交于 9月 17, 2016

Move the check of rx_pkt_offset from rxrpc_locate_data() to the caller,
rxrpc_recvmsg_data(), so that it's more clear what's going on there.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

4b22457c

rxrpc: Remove some whitespace. · fabf9201

由 David Howells 提交于 9月 17, 2016

Remove a tab that's on a line that should otherwise be blank.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

fabf9201

rxrpc: Make IPv6 support conditional on CONFIG_IPV6 · d1912747

由 David Howells 提交于 9月 17, 2016

Add CONFIG_AF_RXRPC_IPV6 and make the IPv6 support code conditional on it.
This is then made conditional on CONFIG_IPV6.

Without this, the following can be seen:

   net/built-in.o: In function `rxrpc_init_peer':
>> peer_object.c:(.text+0x18c3c8): undefined reference to `ip6_route_output_flags'
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1912747

16 9月, 2016 9 次提交

net-next: dsa: add Qualcomm tag RX/TX handler · cafdc45c

由 John Crispin 提交于 9月 15, 2016

Add support for the 2-bytes Qualcomm tag that gigabit switches such as
the QCA8337/N might insert when receiving packets, or that we need
to insert while targeting specific switch ports. The tag is inserted
directly behind the ethernet header.
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NJohn Crispin <john@phrozen.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cafdc45c

tcp: fix a stale ooo_last_skb after a replace · 76f0dcbb

由 Eric Dumazet 提交于 9月 13, 2016

When skb replaces another one in ooo queue, I forgot to also
update tp->ooo_last_skb as well, if the replaced skb was the last one
in the queue.

To fix this, we simply can re-use the code that runs after an insertion,
trying to merge skbs at the right of current skb.

This not only fixes the bug, but also remove all small skbs that might
be a subset of the new one.

Example:

We receive segments 2001:3001,  4001:5001

Then we receive 2001:8001 : We should replace 2001:3001 with the big
skb, but also remove 4001:50001 from the queue to save space.

packetdrill test demonstrating the bug

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0

+0 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
+0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
+0.100 < . 1:1(0) ack 1 win 1024
+0 accept(3, ..., ...) = 4

+0.01 < . 1001:2001(1000) ack 1 win 1024
+0    > . 1:1(0) ack 1 <nop,nop, sack 1001:2001>

+0.01 < . 1001:3001(2000) ack 1 win 1024
+0    > . 1:1(0) ack 1 <nop,nop, sack 1001:2001 1001:3001>

Fixes: 9f5afeae ("tcp: use an RB tree for ooo receive queue")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NYuchung Cheng <ycheng@google.com>
Cc: Yaogong Wang <wygivan@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

76f0dcbb

openvswitch: avoid deferred execution of recirc actions · 2679d040

由 Lance Richardson 提交于 9月 13, 2016

The ovs kernel data path currently defers the execution of all
recirc actions until stack utilization is at a minimum.
This is too limiting for some packet forwarding scenarios due to
the small size of the deferred action FIFO (10 entries). For
example, broadcast traffic sent out more than 10 ports with
recirculation results in packet drops when the deferred action
FIFO becomes full, as reported here:

     http://openvswitch.org/pipermail/dev/2016-March/067672.html

Since the current recursion depth is available (it is already tracked
by the exec_actions_level pcpu variable), we can use it to determine
whether to execute recirculation actions immediately (safe when
recursion depth is low) or defer execution until more stack space is
available.

With this change, the deferred action fifo size becomes a non-issue
for currently failing scenarios because it is no longer used when
there are three or fewer recursions through ovs_execute_actions().
Suggested-by: NPravin Shelar <pshelar@ovn.org>
Signed-off-by: NLance Richardson <lrichard@redhat.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2679d040

net/sched: cls_flower: Remove an unused field from the filter key structure · a53d850a

由 Or Gerlitz 提交于 9月 15, 2016

Commit c3f83241 "net: Add full IPv6 addresses to flow_keys" added an
unused instance of struct flow_dissector_key_addrs into struct fl_flow_key,
remove it.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reported-by: NHadar Hen Zion <hadarh@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a53d850a

net/sched: cls_flower: Support masking for matching on tcp/udp ports · aa72d708

由 Or Gerlitz 提交于 9月 15, 2016

Add the definitions for src/dst udp/tcp port masks and use
them when setting && dumping the relevant keys.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NPaul Blakey <paulb@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa72d708

net_sched: Introduce skbmod action · 86da71b5

由 Jamal Hadi Salim 提交于 9月 12, 2016

This action is intended to be an upgrade from a usability perspective
from pedit (as well as operational debugability).
Compare this:

sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \
u32 match ip protocol 1 0xff flowid 1:2 \
action pedit munge offset -14 u8 set 0x02 \
munge offset -13 u8 set 0x15 \
munge offset -12 u8 set 0x15 \
munge offset -11 u8 set 0x15 \
munge offset -10 u16 set 0x1515 \
pipe

to:

sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \
u32 match ip protocol 1 0xff flowid 1:2 \
action skbmod dmac 02:15:15:15:15:15

Also try to do a MAC address swap with pedit or worse
try to debug a policy with destination mac, source mac and
etherype. Then make few rules out of those and you'll get my point.

In the future common use cases on pedit can be migrated to this action
(as an example different fields in ip v4/6, transports like tcp/udp/sctp
etc). For this first cut, this allows modifying basic ethernet header.

The most important ethernet use case at the moment is when redirecting or
mirroring packets to a remote machine. The dst mac address needs a re-write
so that it doesnt get dropped or confuse an interconnecting (learning) switch
or dropped by a target machine (which looks at the dst mac). And at times
when flipping back the packet a swap of the MAC addresses is needed.
Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86da71b5

bpf: use skb_at_tc_ingress helper in tcf_bpf · f53d8c7b

由 Daniel Borkmann 提交于 9月 12, 2016

We have a small skb_at_tc_ingress() helper for testing for ingress, so
make use of it. cls_bpf already uses it and so should act_bpf.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f53d8c7b

bpf: drop unnecessary test in cls_bpf_classify and tcf_bpf · 04b3f8de

由 Daniel Borkmann 提交于 9月 12, 2016

The skb_mac_header_was_set() test in cls_bpf's and act_bpf's fast-path is
actually unnecessary and can be removed altogether. This was added by
commit a166151c ("bpf: fix bpf helpers to use skb->mac_header relative
offsets"), which was later on improved by 3431205e ("bpf: make programs
see skb->data == L2 for ingress and egress"). We're always guaranteed to
have valid mac header at the time we invoke cls_bpf_classify() or tcf_bpf().

Reason is that since 6d1ccff6 ("net: reset mac header in dev_start_xmit()")
we do skb_reset_mac_header() in __dev_queue_xmit() before we could call
into sch_handle_egress() or any subsequent enqueue. sch_handle_ingress()
always sees a valid mac header as well (things like skb_reset_mac_len()
would badly fail otherwise). Thus, drop the unnecessary test in classifier
and action case.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04b3f8de

net/sched: act_tunnel_key: Remove rcu_read_lock protection · 07c0f09e

由 Hadar Hen Zion 提交于 9月 12, 2016

Remove rcu_read_lock protection from tunnel_key_dump and use
rtnl_dereference, dump operation is protected by  rtnl lock.

Also, remove rcu_read_lock from tunnel_key_release and use
rcu_dereference_protected.

Both operations are running exclusively and a writer couldn't modify
t->params while those functions are executed.

Fixes: 54d94fd89d90 ('net/sched: Introduce act_tunnel_key')
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

07c0f09e

14 9月, 2016 7 次提交

rxrpc: Add IPv6 support · 75b54cb5

由 David Howells 提交于 9月 13, 2016

Add IPv6 support to AF_RXRPC.  With this, AF_RXRPC sockets can be created:

	service = socket(AF_RXRPC, SOCK_DGRAM, PF_INET6);

instead of:

	service = socket(AF_RXRPC, SOCK_DGRAM, PF_INET);

The AFS filesystem doesn't support IPv6 at the moment, though, since that
requires upgrades to some of the RPC calls.

Note that a good portion of this patch is replacing "%pI4:%u" in print
statements with "%pISpc" which is able to handle both protocols and print
the port.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

75b54cb5

rxrpc: Use rxrpc_extract_addr_from_skb() rather than doing this manually · 1c2bc7b9

由 David Howells 提交于 9月 13, 2016

There are two places that want to transmit a packet in response to one just
received and manually pick the address to reply to out of the sk_buff.
Make them use rxrpc_extract_addr_from_skb() instead so that IPv6 is handled
automatically.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

1c2bc7b9

rxrpc: Don't specify protocol to when creating transport socket · aaa31cbc

由 David Howells 提交于 9月 13, 2016

Pass 0 as the protocol argument when creating the transport socket rather
than IPPROTO_UDP.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

aaa31cbc

rxrpc: Create an address for sendmsg() to bind unbound socket with · cd5892c7

由 David Howells 提交于 9月 13, 2016

Create an address for sendmsg() to bind unbound socket with rather than
using a completely blank address otherwise the transport socket creation
will fail because it will try to use address family 0.

We use the address family specified in the protocol argument when the
AF_RXRPC socket was created and SOCK_DGRAM as the default.  For anything
else, bind() must be used.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

cd5892c7

rxrpc: Correctly initialise, limit and transmit call->rx_winsize · 75e42126

由 David Howells 提交于 9月 13, 2016

call->rx_winsize should be initialised to the sysctl setting and the sysctl
setting should be limited to the maximum we want to permit. Further, we
need to place this in the ACK info instead of the sysctl setting.

Furthermore, discard the idea of accepting the subpackets of a jumbo packet
that lie beyond the receive window when the first packet of the jumbo is
within the window. Just discard the excess subpackets instead. This
allows the receive window to be opened up right to the buffer size less one
for the dead slot.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

75e42126

rxrpc: Fix prealloc refcounting · 3432a757

由 David Howells 提交于 9月 13, 2016

The preallocated call buffer holds a ref on the calls within that buffer.
The ref was being released in the wrong place - it worked okay for incoming
calls to the AFS cache manager service, but doesn't work right for incoming
calls to a userspace service.

Instead of releasing an extra ref service calls in rxrpc_release_call(),
the ref needs to be released during the acceptance/rejectance process.  To
this end:

 (1) The prealloc ref is now normally released during
     rxrpc_new_incoming_call().

 (2) For preallocated kernel API calls, the kernel API's ref needs to be
     released when the call is discarded on socket close.

 (3) We shouldn't take a second ref in rxrpc_accept_call().

 (4) rxrpc_recvmsg_new_call() needs to get a ref of its own when it adds
     the call to the to_be_accepted socket queue.

In doing (4) above, we would prefer not to put the call's refcount down to
0 as that entails doing cleanup in softirq context, but it's unlikely as
there are several refs held elsewhere, at least one of which must be put by
someone in process context calling rxrpc_release_call().  However, it's not
a problem if we do have to do that.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

3432a757

rxrpc: Adjust the call ref tracepoint to show kernel API refs · cbd00891

由 David Howells 提交于 9月 13, 2016

Adjust the call ref tracepoint to show references held on a call by the
kernel API separately as much as possible and add an additional trace to at
the allocation point from the preallocation buffer for an incoming call.

Note that this doesn't show the allocation of a client call for the kernel
separately at the moment.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

cbd00891

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功