提交 · ef68622da9cc0c4e5202f90093a3a5314e41e9e9 · openanolis / cloud-kernel

02 3月, 2017 1 次提交

rxrpc: Fix deadlock between call creation and sendmsg/recvmsg · 540b1c48

由 David Howells 提交于 2月 27, 2017

All the routines by which rxrpc is accessed from the outside are serialised
by means of the socket lock (sendmsg, recvmsg, bind,
rxrpc_kernel_begin_call(), ...) and this presents a problem:

 (1) If a number of calls on the same socket are in the process of
     connection to the same peer, a maximum of four concurrent live calls
     are permitted before further calls need to wait for a slot.

 (2) If a call is waiting for a slot, it is deep inside sendmsg() or
     rxrpc_kernel_begin_call() and the entry function is holding the socket
     lock.

 (3) sendmsg() and recvmsg() or the in-kernel equivalents are prevented
     from servicing the other calls as they need to take the socket lock to
     do so.

 (4) The socket is stuck until a call is aborted and makes its slot
     available to the waiter.

Fix this by:

 (1) Provide each call with a mutex ('user_mutex') that arbitrates access
     by the users of rxrpc separately for each specific call.

 (2) Make rxrpc_sendmsg() and rxrpc_recvmsg() unlock the socket as soon as
     they've got a call and taken its mutex.

     Note that I'm returning EWOULDBLOCK from recvmsg() if MSG_DONTWAIT is
     set but someone else has the lock.  Should I instead only return
     EWOULDBLOCK if there's nothing currently to be done on a socket, and
     sleep in this particular instance because there is something to be
     done, but we appear to be blocked by the interrupt handler doing its
     ping?

 (3) Make rxrpc_new_client_call() unlock the socket after allocating a new
     call, locking its user mutex and adding it to the socket's call tree.
     The call is returned locked so that sendmsg() can add data to it
     immediately.

     From the moment the call is in the socket tree, it is subject to
     access by sendmsg() and recvmsg() - even if it isn't connected yet.

 (4) Lock new service calls in the UDP data_ready handler (in
     rxrpc_new_incoming_call()) because they may already be in the socket's
     tree and the data_ready handler makes them live immediately if a user
     ID has already been preassigned.

     Note that the new call is locked before any notifications are sent
     that it is live, so doing mutex_trylock() *ought* to always succeed.
     Userspace is prevented from doing sendmsg() on calls that are in a
     too-early state in rxrpc_do_sendmsg().

 (5) Make rxrpc_new_incoming_call() return the call with the user mutex
     held so that a ping can be scheduled immediately under it.

     Note that it might be worth moving the ping call into
     rxrpc_new_incoming_call() and then we can drop the mutex there.

 (6) Make rxrpc_accept_call() take the lock on the call it is accepting and
     release the socket after adding the call to the socket's tree.  This
     is slightly tricky as we've dequeued the call by that point and have
     to requeue it.

     Note that requeuing emits a trace event.

 (7) Make rxrpc_kernel_send_data() and rxrpc_kernel_recv_data() take the
     new mutex immediately and don't bother with the socket mutex at all.

This patch has the nice bonus that calls on the same socket are now to some
extent parallelisable.

Note that we might want to move rxrpc_service_prealloc() calls out from the
socket lock and give it its own lock, so that we don't hang progress in
other calls because we're waiting for the allocator.

We probably also want to avoid calling rxrpc_notify_socket() from within
the socket lock (rxrpc_accept_call()).
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Tested-by: NMarc Dionne <marc.c.dionne@auristor.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

540b1c48

05 1月, 2017 2 次提交

rxrpc: Add some more tracing · b1d9f7fd

由 David Howells 提交于 1月 05, 2017

Add the following extra tracing information:

 (1) Modify the rxrpc_transmit tracepoint to record the Tx window size as
     this is varied by the slow-start algorithm.

 (2) Modify the rxrpc_rx_ack tracepoint to record more information from
     received ACK packets.

 (3) Add an rxrpc_rx_data tracepoint to record the information in DATA
     packets.

 (4) Add an rxrpc_disconnect_call tracepoint to record call disconnection,
     including the reason the call was disconnected.

 (5) Add an rxrpc_improper_term tracepoint to record implicit termination
     of a call by a client either by starting a new call on a particular
     connection channel without first transmitting the final ACK for the
     previous call.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

b1d9f7fd

rxrpc: Fix handling of enums-to-string translation in tracing · b54a134a

由 David Howells 提交于 1月 05, 2017

Fix the way enum values are translated into strings in AF_RXRPC
tracepoints. The problem with just doing a lookup in a normal flat array
of strings or chars is that external tracing infrastructure can't find it.
Rather, TRACE_DEFINE_ENUM must be used.

Also sort the enums and string tables to make it easier to keep them in
order so that a future patch to __print_symbolic() can be optimised to try
a direct lookup into the table first before iterating over it.

A couple of _proto() macro calls are removed because they refered to tables
that got moved to the tracing infrastructure. The relevant data can be
found by way of tracing.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

b54a134a

30 9月, 2016 3 次提交

rxrpc: Keep the call timeouts as ktimes rather than jiffies · df0adc78

由 David Howells 提交于 9月 26, 2016

Keep that call timeouts as ktimes rather than jiffies so that they can be
expressed as functions of RTT.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

df0adc78

rxrpc: Actually display the tx_data trace retransmission note · 265a44bb

由 David Howells 提交于 9月 30, 2016

Actually display in the tx_data trace the retransmission note added in a
previous patch.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

265a44bb

rxrpc: Make Tx loss-injection go through normal return and adjust tracing · a1767077

由 David Howells 提交于 9月 29, 2016

In rxrpc_send_data_packet() make the loss-injection path return through the
same code as the transmission path so that the RTT determination is
initiated and any future timer shuffling will be done, despite the packet
having been binned.

Whilst we're at it:

 (1) Add to the tx_data tracepoint an indication of whether or not we're
     retransmitting a data packet.

 (2) When we're deciding whether or not to request an ACK, rather than
     checking if we're in fast-retransmit mode check instead if we're
     retransmitting.

 (3) Don't invoke the lose_skb tracepoint when losing a Tx packet as we're
     not altering the sk_buff refcount nor are we just seeing it after
     getting it off the Tx list.

 (4) The rxrpc_skb_tx_lost note is then no longer used so remove it.

 (5) rxrpc_lose_skb() no longer needs to deal with rxrpc_skb_tx_lost.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

a1767077

25 9月, 2016 1 次提交

rxrpc: Implement slow-start · 57494343

由 David Howells 提交于 9月 24, 2016

Implement RxRPC slow-start, which is similar to RFC 5681 for TCP.  A
tracepoint is added to log the state of the congestion management algorithm
and the decisions it makes.

Notes:

 (1) Since we send fixed-size DATA packets (apart from the final packet in
     each phase), counters and calculations are in terms of packets rather
     than bytes.

 (2) The ACK packet carries the equivalent of TCP SACK.

 (3) The FLIGHT_SIZE calculation in RFC 5681 doesn't seem particularly
     suited to SACK of a small number of packets.  It seems that, almost
     inevitably, by the time three 'duplicate' ACKs have been seen, we have
     narrowed the loss down to one or two missing packets, and the
     FLIGHT_SIZE calculation ends up as 2.

 (4) In rxrpc_resend(), if there was no data that apparently needed
     retransmission, we transmit a PING ACK to ask the peer to tell us what
     its Rx window state is.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

57494343

23 9月, 2016 5 次提交

rxrpc: Add a tracepoint to log which packets will be retransmitted · c6672e3f

由 David Howells 提交于 9月 23, 2016

Add a tracepoint to log in rxrpc_resend() which packets will be
retransmitted.  Note that if a positive ACK comes in whilst we have dropped
the lock to retransmit another packet, the actual retransmission may not
happen, though some of the effects will (such as altering the congestion
management).
Signed-off-by: NDavid Howells <dhowells@redhat.com>

c6672e3f

rxrpc: Add tracepoint for ACK proposal · 9c7ad434

由 David Howells 提交于 9月 23, 2016

Add a tracepoint to log proposed ACKs, including whether the proposal is
used to update a pending ACK or is discarded in favour of an easlier,
higher priority ACK.

Whilst we're at it, get rid of the rxrpc_acks() function and access the
name array directly.  We do, however, need to validate the ACK reason
number given to trace_rxrpc_rx_ack() to make sure we don't overrun the
array.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

9c7ad434

rxrpc: Add a tracepoint to log injected Rx packet loss · 89b475ab

由 David Howells 提交于 9月 23, 2016

Add a tracepoint to log received packets that get discarded due to Rx
packet loss.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

89b475ab

rxrpc: Add data Tx tracepoint and adjust Tx ACK tracepoint · be832aec

由 David Howells 提交于 9月 23, 2016

Add a tracepoint to log transmission of DATA packets (including loss
injection).

Adjust the ACK transmission tracepoint to include the packet serial number
and to line this up with the DATA transmission display.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

be832aec

rxrpc: Add a tracepoint for the call timer · fc7ab6d2

由 David Howells 提交于 9月 23, 2016

Add a tracepoint to log call timer initiation, setting and expiry.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

fc7ab6d2

22 9月, 2016 1 次提交

rxrpc: Add per-peer RTT tracker · cf1a6474

由 David Howells 提交于 9月 22, 2016

Add a function to track the average RTT for a peer.  Sources of RTT data
will be added in subsequent patches.

The RTT data will be useful in the future for determining resend timeouts
and for handling the slow-start part of the Rx protocol.

Also add a pair of tracepoints, one to log transmissions to elicit a
response for RTT purposes and one to log responses that contribute RTT
data.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

cf1a6474

17 9月, 2016 8 次提交

rxrpc: Improve skb tracing · 71f3ca40

由 David Howells 提交于 9月 17, 2016

Improve sk_buff tracing within AF_RXRPC by the following means:

 (1) Use an enum to note the event type rather than plain integers and use
     an array of event names rather than a big multi ?: list.

 (2) Distinguish Rx from Tx packets and account them separately.  This
     requires the call phase to be tracked so that we know what we might
     find in rxtx_buffer[].

 (3) Add a parameter to rxrpc_{new,see,get,free}_skb() to indicate the
     event type.

 (4) A pair of 'rotate' events are added to indicate packets that are about
     to be rotated out of the Rx and Tx windows.

 (5) A pair of 'lost' events are added, along with rxrpc_lose_skb() for
     packet loss injection recording.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

71f3ca40

rxrpc: Add a tracepoint to follow what recvmsg does · 84997905

由 David Howells 提交于 9月 17, 2016

Add a tracepoint to follow what recvmsg does within AF_RXRPC.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

84997905

rxrpc: Add a tracepoint to follow packets in the Rx buffer · 58dc63c9

由 David Howells 提交于 9月 17, 2016

Add a tracepoint to follow the life of packets that get added to a call's
receive buffer.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

58dc63c9

rxrpc: Add a tracepoint to log ACK transmission · f3639df2

由 David Howells 提交于 9月 17, 2016

Add a tracepoint to log information about ACK transmission.
Signed-off-by: NDavid Howels <dhowells@redhat.com>

f3639df2

rxrpc: Add a tracepoint to log received ACK packets · ec71eb9a

由 David Howells 提交于 9月 17, 2016

Add a tracepoint to log information from received ACK packets.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

ec71eb9a

rxrpc: Add a tracepoint to follow the life of a packet in the Tx buffer · a124fe3e

由 David Howells 提交于 9月 17, 2016

Add a tracepoint to follow the insertion of a packet into the transmit
buffer, its transmission and its rotation out of the buffer.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

a124fe3e

rxrpc: Add connection tracepoint and client conn state tracepoint · 363deeab

由 David Howells 提交于 9月 17, 2016

Add a pair of tracepoints, one to track rxrpc_connection struct ref
counting and the other to track the client connection cache state.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

363deeab

rxrpc: Print the packet type name in the Rx packet trace · a3868bfc

由 David Howells 提交于 9月 17, 2016

Print a symbolic packet type name for each valid received packet in the
trace output, not just a number.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

a3868bfc

08 9月, 2016 2 次提交

rxrpc: Add tracepoints to record received packets and end of data_ready · 49e19ec7

由 David Howells 提交于 9月 08, 2016

Add two tracepoints:

 (1) Record the RxRPC protocol header of packets retrieved from the UDP
     socket by the data_ready handler.

 (2) Record the outcome of the data_ready handler.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

49e19ec7

rxrpc: Remove skb_count from struct rxrpc_call · 2ab27215

由 David Howells 提交于 9月 08, 2016

Remove the sk_buff count from the rxrpc_call struct as it's less useful
once we stop queueing sk_buffs.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

2ab27215

07 9月, 2016 2 次提交

rxrpc: Add tracepoint for working out where aborts happen · 5a42976d

由 David Howells 提交于 9月 06, 2016

Add a tracepoint for working out where local aborts happen. Each
tracepoint call is labelled with a 3-letter code so that they can be
distinguished - and the DATA sequence number is added too where available.

rxrpc_kernel_abort_call() also takes a 3-letter code so that AFS can
indicate the circumstances when it aborts a call.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

5a42976d

rxrpc: Improve the call tracking tracepoint · fff72429

由 David Howells 提交于 9月 07, 2016

Improve the call tracking tracepoint by showing more differentiation
between some of the put and get events, including:

  (1) Getting and putting refs for the socket call user ID tree.

  (2) Getting and putting refs for queueing and failing to queue the call
      processor work item.

Note that these aren't necessarily used in this patch, but will be taken
advantage of in future patches.

An enum is added for the event subtype numbers rather than coding them
directly as decimal numbers and a table of 3-letter strings is provided
rather than a sequence of ?: operators.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

fff72429

30 8月, 2016 1 次提交

rxrpc: Trace rxrpc_call usage · e34d4234

由 David Howells 提交于 8月 30, 2016

Add a trace event for debuging rxrpc_call struct usage.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

e34d4234

23 8月, 2016 1 次提交

rxrpc: Use a tracepoint for skb accounting debugging · df844fd4

由 David Howells 提交于 8月 23, 2016

Use a tracepoint to log various skb accounting points to help in debugging
refcounting errors.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

df844fd4

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功