提交 · d7b4c24f45d2efe51b8f213da4593fefd49240ba · openanolis / cloud-kernel

28 9月, 2018 2 次提交

rxrpc: Emit BUSY packets when supposed to rather than ABORTs · ece64fec

由 David Howells 提交于 9月 27, 2018

In the input path, a received sk_buff can be marked for rejection by
setting RXRPC_SKB_MARK_* in skb->mark and, if needed, some auxiliary data
(such as an abort code) in skb->priority.  The rejection is handled by
queueing the sk_buff up for dealing with in process context.  The output
code reads the mark and priority and, theoretically, generates an
appropriate response packet.

However, if RXRPC_SKB_MARK_BUSY is set, this isn't noticed and an ABORT
message with a random abort code is generated (since skb->priority wasn't
set to anything).

Fix this by outputting the appropriate sort of packet.

Also, whilst we're at it, most of the marks are no longer used, so remove
them and rename the remaining two to something more obvious.

Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: NDavid Howells <dhowells@redhat.com>

ece64fec

rxrpc: Fix RTT gathering · b604dd98

由 David Howells 提交于 9月 27, 2018

Fix RTT information gathering in AF_RXRPC by the following means:

 (1) Enable Rx timestamping on the transport socket with SO_TIMESTAMPNS.

 (2) If the sk_buff doesn't have a timestamp set when rxrpc_data_ready()
     collects it, set it at that point.

 (3) Allow ACKs to be requested on the last packet of a client call, but
     not a service call.  We need to be careful lest we undo:

	bf7d620a
	Author: David Howells <dhowells@redhat.com>
	Date:   Thu Oct 6 08:11:51 2016 +0100
	rxrpc: Don't request an ACK on the last DATA packet of a call's Tx phase

     but that only really applies to service calls that we're handling,
     since the client side gets to send the final ACK (or not).

 (4) When about to transmit an ACK or DATA packet, record the Tx timestamp
     before only; don't update the timestamp afterwards.

 (5) Switch the ordering between recording the serial and recording the
     timestamp to always set the serial number first.  The serial number
     shouldn't be seen referenced by an ACK packet until we've transmitted
     the packet bearing it - so in the Rx path, we don't need the timestamp
     until we've checked the serial number.

Fixes: cf1a6474 ("rxrpc: Add per-peer RTT tracker")
Signed-off-by: NDavid Howells <dhowells@redhat.com>

b604dd98

09 8月, 2018 1 次提交

rxrpc: Fix the keepalive generator [ver ] · 330bdcfa

由 David Howells 提交于 8月 08, 2018

AF_RXRPC has a keepalive message generator that generates a message for a
peer ~20s after the last transmission to that peer to keep firewall ports
open.  The implementation is incorrect in the following ways:

 (1) It mixes up ktime_t and time64_t types.

 (2) It uses ktime_get_real(), the output of which may jump forward or
     backward due to adjustments to the time of day.

 (3) If the current time jumps forward too much or jumps backwards, the
     generator function will crank the base of the time ring round one slot
     at a time (ie. a 1s period) until it catches up, spewing out VERSION
     packets as it goes.

Fix the problem by:

 (1) Only using time64_t.  There's no need for sub-second resolution.

 (2) Use ktime_get_seconds() rather than ktime_get_real() so that time
     isn't perceived to go backwards.

 (3) Simplifying rxrpc_peer_keepalive_worker() by splitting it into two
     parts:

     (a) The "worker" function that manages the buckets and the timer.

     (b) The "dispatch" function that takes the pending peers and
     	 potentially transmits a keepalive packet before putting them back
     	 in the ring into the slot appropriate to the revised last-Tx time.

 (4) Taking everything that's pending out of the ring and splicing it into
     a temporary collector list for processing.

     In the case that there's been a significant jump forward, the ring
     gets entirely emptied and then the time base can be warped forward
     before the peers are processed.

     The warping can't happen if the ring isn't empty because the slot a
     peer is in is keepalive-time dependent, relative to the base time.

 (5) Limit the number of iterations of the bucket array when scanning it.

 (6) Set the timer to skip any empty slots as there's no point waking up if
     there's nothing to do yet.

This can be triggered by an incoming call from a server after a reboot with
AF_RXRPC and AFS built into the kernel causing a peer record to be set up
before userspace is started.  The system clock is then adjusted by
userspace, thereby potentially causing the keepalive generator to have a
meltdown - which leads to a message like:

	watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1:23]
	...
	Workqueue: krxrpcd rxrpc_peer_keepalive_worker
	EIP: lock_acquire+0x69/0x80
	...
	Call Trace:
	 ? rxrpc_peer_keepalive_worker+0x5e/0x350
	 ? _raw_spin_lock_bh+0x29/0x60
	 ? rxrpc_peer_keepalive_worker+0x5e/0x350
	 ? rxrpc_peer_keepalive_worker+0x5e/0x350
	 ? __lock_acquire+0x3d3/0x870
	 ? process_one_work+0x110/0x340
	 ? process_one_work+0x166/0x340
	 ? process_one_work+0x110/0x340
	 ? worker_thread+0x39/0x3c0
	 ? kthread+0xdb/0x110
	 ? cancel_delayed_work+0x90/0x90
	 ? kthread_stop+0x70/0x70
	 ? ret_from_fork+0x19/0x24

Fixes: ace45bec ("rxrpc: Fix firewall route keepalive")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

330bdcfa

01 8月, 2018 1 次提交

rxrpc: Trace packet transmission · 4764c0da

由 David Howells 提交于 7月 23, 2018

Trace successful packet transmission (kernel_sendmsg() succeeded, that is)
in AF_RXRPC.  We can share the enum that defines the transmission points
with the trace_rxrpc_tx_fail() tracepoint, so rename its constants to be
applicable to both.

Also, save the internal call->debug_id in the rxrpc_channel struct so that
it can be used in retransmission trace lines.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

4764c0da

11 5月, 2018 2 次提交

rxrpc: Trace UDP transmission failure · 6b47fe1d

由 David Howells 提交于 5月 10, 2018

Add a tracepoint to log transmission failure from the UDP transport socket
being used by AF_RXRPC.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

6b47fe1d

rxrpc: Fix missing start of call timeout · c54e43d7

由 David Howells 提交于 5月 10, 2018

The expect_rx_by call timeout is supposed to be set when a call is started
to indicate that we need to receive a packet by that point.  This is
currently put back every time we receive a packet, but it isn't started
when we first send a packet.  Without this, the call may wait forever if
the server doesn't deign to reply.

Fix this by setting the timeout upon a successful UDP sendmsg call for the
first DATA packet.  The timeout is initiated only for initial transmission
and not for subsequent retries as we don't want the retry mechanism to
extend the timeout indefinitely.

Fixes: a158bdd3 ("rxrpc: Fix call timeouts")
Reported-by: NMarc Dionne <marc.dionne@auristor.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>

c54e43d7

31 3月, 2018 1 次提交

rxrpc: Fix firewall route keepalive · ace45bec

由 David Howells 提交于 3月 30, 2018

Fix the firewall route keepalive part of AF_RXRPC which is currently
function incorrectly by replying to VERSION REPLY packets from the server
with VERSION REQUEST packets.

Instead, send VERSION REPLY packets to the peers of service connections to
act as keep-alives 20s after the latest packet was transmitted to that
peer.

Also, just discard VERSION REPLY packets rather than replying to them.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

ace45bec

23 2月, 2018 1 次提交

rxrpc: Fix send in rxrpc_send_data_packet() · 93c62c45

由 David Howells 提交于 2月 22, 2018

All the kernel_sendmsg() calls in rxrpc_send_data_packet() need to send
both parts of the iov[] buffer, but one of them does not.  Fix it so that
it does.

Without this, short IPv6 rxrpc DATA packets may be seen that have the rxrpc
header included, but no payload.

Fixes: 5a924b89 ("rxrpc: Don't store the rxrpc header in the Tx queue sk_buffs")
Reported-by: NMarc Dionne <marc.dionne@auristor.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93c62c45

24 11月, 2017 2 次提交

rxrpc: Add keepalive for a call · 415f44e4

由 David Howells 提交于 11月 24, 2017

We need to transmit a packet every so often to act as a keepalive for the
peer (which has a timeout from the last time it received a packet) and also
to prevent any intervening firewalls from closing the route.

Do this by resetting a timer every time we transmit a packet. If the timer
ever expires, we transmit a PING ACK packet and thereby also elicit a PING
RESPONSE ACK from the other side - which prevents our last-rx timeout from
expiring.

The timer is set to 1/6 of the last-rx timeout so that we can detect the
other side going away if it misses 6 replies in a row.

This is particularly necessary for servers where the processing of the
service function may take a significant amount of time.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

415f44e4

rxrpc: Add a timeout for detecting lost ACKs/lost DATA · bd1fdf8c

由 David Howells 提交于 11月 24, 2017

Add an extra timeout that is set/updated when we send a DATA packet that
has the request-ack flag set.  This allows us to detect if we don't get an
ACK in response to the latest flagged packet.

The ACK packet is adjudged to have been lost if it doesn't turn up within
2*RTT of the transmission.

If the timeout occurs, we schedule the sending of a PING ACK to find out
the state of the other side.  If a new DATA packet is ready to go sooner,
we cancel the sending of the ping and set the request-ack flag on that
instead.

If we get back a PING-RESPONSE ACK that indicates a lower tx_top than what
we had at the time of the ping transmission, we adjudge all the DATA
packets sent between the response tx_top and the ping-time tx_top to have
been lost and retransmit immediately.

Rather than sending a PING ACK, we could just pick a DATA packet and
speculatively retransmit that with request-ack set.  It should result in
either a REQUESTED ACK or a DUPLICATE ACK which we can then use in lieu the
a PING-RESPONSE ACK mentioned above.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

bd1fdf8c

02 11月, 2017 2 次提交

rxrpc: Fix call expiry handling · dcbefc30

由 David Howells 提交于 11月 02, 2017

Fix call expiry handling in the following ways

 (1) If all the request data from a client call is acked, don't send a
     follow up IDLE ACK with firstPacket == 1 and previousPacket == 0 as
     this appears to fool some servers into thinking everything has been
     accepted.

 (2) Never send an abort back to the server once it has ACK'd all the
     request packets; rather just try to reuse the channel for the next
     call.  The first request DATA packet of the next call on the same
     channel will implicitly ACK the entire reply of the dead call - even
     if we haven't transmitted it yet.

 (3) Don't send RX_CALL_TIMEOUT in an ABORT packet, librx uses abort codes
     to pass local errors to the caller in addition to remote errors, and
     this is meant to be local only.

The following also need to be addressed in future patches:

 (4) Service calls should send PING ACKs as 'keep alives' if the server is
     still processing the call.

 (5) VERSION REPLY packets should be sent to the peers of service
     connections to act as keep-alives.  This is used to keep firewall
     routes in place.  The AFS CM should enable this.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

dcbefc30

rxrpc: Fix a null ptr deref in rxrpc_fill_out_ack() · 1457cc4c

由 David Howells 提交于 11月 02, 2017

rxrpc_fill_out_ack() needs to be passed the connection pointer from its
caller rather than using call->conn as the call may be disconnected in
parallel with it, clearing call->conn, leading to:

	BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
	IP: rxrpc_send_ack_packet+0x231/0x6a4
Signed-off-by: NDavid Howells <dhowells@redhat.com>

1457cc4c

29 8月, 2017 1 次提交

rxrpc: Fix IPv6 support · 7b674e39

由 David Howells 提交于 8月 29, 2017

Fix IPv6 support in AF_RXRPC in the following ways:

 (1) When extracting the address from a received IPv4 packet, if the local
     transport socket is open for IPv6 then fill out the sockaddr_rxrpc
     struct for an IPv4-mapped-to-IPv6 AF_INET6 transport address instead
     of an AF_INET one.

 (2) When sending CHALLENGE or RESPONSE packets, the transport length needs
     to be set from the sockaddr_rxrpc::transport_len field rather than
     sizeof() on the IPv4 transport address.

 (3) When processing an IPv4 ICMP packet received by an IPv6 socket, set up
     the address correctly before searching for the affected peer.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

7b674e39

05 6月, 2017 1 次提交

rxrpc: Add service upgrade support for client connections · 4e255721

由 David Howells 提交于 6月 05, 2017

Make it possible for a client to use AuriStor's service upgrade facility.

The client does this by adding an RXRPC_UPGRADE_SERVICE control message to
the first sendmsg() of a call.  This takes no parameters.

When recvmsg() starts returning data from the call, the service ID field in
the returned msg_name will reflect the result of the upgrade attempt.  If
the upgrade was ignored, srx_service will match what was set in the
sendmsg(); if the upgrade happened the srx_service will be altered to
indicate the service the server upgraded to.

Note that:

 (1) The choice of upgrade service is up to the server

 (2) Further client calls to the same server that would share a connection
     are blocked if an upgrade probe is in progress.

 (3) This should only be used to probe the service.  Clients should then
     use the returned service ID in all subsequent communications with that
     server (and not set the upgrade).  Note that the kernel will not
     retain this information should the connection expire from its cache.

 (4) If a server that supports upgrading is replaced by one that doesn't,
     whilst a connection is live, and if the replacement is running, say,
     OpenAFS 1.6.4 or older or an older IBM AFS, then the replacement
     server will not respond to packets sent to the upgraded connection.

     At this point, calls will time out and the server must be reprobed.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

4e255721

06 10月, 2016 3 次提交

rxrpc: Don't request an ACK on the last DATA packet of a call's Tx phase · bf7d620a

由 David Howells 提交于 10月 06, 2016

Don't request an ACK on the last DATA packet of a call's Tx phase as for a
client there will be a reply packet or some sort of ACK to shift phase. If
the ACK is requested, OpenAFS sends a REQUESTED-ACK ACK with soft-ACKs in
it and doesn't follow up with a hard-ACK.

If we don't set the flag, OpenAFS will send a DELAY ACK that hard-ACKs the
reply data, thereby allowing the call to terminate cleanly.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

bf7d620a

rxrpc: Fix loss of PING RESPONSE ACK production due to PING ACKs · a5af7e1f

由 David Howells 提交于 10月 06, 2016

Separate the output of PING ACKs from the output of other sorts of ACK so
that if we receive a PING ACK and schedule transmission of a PING RESPONSE
ACK, the response doesn't get cancelled by a PING ACK we happen to be
scheduling transmission of at the same time.

If a PING RESPONSE gets lost, the other side might just sit there waiting
for it and refuse to proceed otherwise.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

a5af7e1f

rxrpc: Fix warning by splitting rxrpc_send_call_packet() · 26cb02aa

由 David Howells 提交于 10月 06, 2016

Split rxrpc_send_data_packet() to separate ACK generation (which is more
complicated) from ABORT generation. This simplifies the code a bit and
fixes the following warning:

In file included from ../net/rxrpc/output.c:20:0:
net/rxrpc/output.c: In function 'rxrpc_send_call_packet':
net/rxrpc/ar-internal.h:1187:27: error: 'top' may be used uninitialized in this function [-Werror=maybe-uninitialized]
net/rxrpc/output.c:103:24: note: 'top' was declared here
net/rxrpc/output.c:225:25: error: 'hard_ack' may be used uninitialized in this function [-Werror=maybe-uninitialized]
Reported-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NDavid Howells <dhowells@redhat.com>

26cb02aa

30 9月, 2016 2 次提交

rxrpc: Request more ACKs in slow-start mode · b112a670

由 David Howells 提交于 9月 29, 2016

Set the request-ACK on more DATA packets whilst we're in slow start mode so
that we get sufficient ACKs back to supply information to configure the
window.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

b112a670

rxrpc: Make Tx loss-injection go through normal return and adjust tracing · a1767077

由 David Howells 提交于 9月 29, 2016

In rxrpc_send_data_packet() make the loss-injection path return through the
same code as the transmission path so that the RTT determination is
initiated and any future timer shuffling will be done, despite the packet
having been binned.

Whilst we're at it:

 (1) Add to the tx_data tracepoint an indication of whether or not we're
     retransmitting a data packet.

 (2) When we're deciding whether or not to request an ACK, rather than
     checking if we're in fast-retransmit mode check instead if we're
     retransmitting.

 (3) Don't invoke the lose_skb tracepoint when losing a Tx packet as we're
     not altering the sk_buff refcount nor are we just seeing it after
     getting it off the Tx list.

 (4) The rxrpc_skb_tx_lost note is then no longer used so remove it.

 (5) rxrpc_lose_skb() no longer needs to deal with rxrpc_skb_tx_lost.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

a1767077

25 9月, 2016 2 次提交

rxrpc: Implement slow-start · 57494343

由 David Howells 提交于 9月 24, 2016

Implement RxRPC slow-start, which is similar to RFC 5681 for TCP.  A
tracepoint is added to log the state of the congestion management algorithm
and the decisions it makes.

Notes:

 (1) Since we send fixed-size DATA packets (apart from the final packet in
     each phase), counters and calculations are in terms of packets rather
     than bytes.

 (2) The ACK packet carries the equivalent of TCP SACK.

 (3) The FLIGHT_SIZE calculation in RFC 5681 doesn't seem particularly
     suited to SACK of a small number of packets.  It seems that, almost
     inevitably, by the time three 'duplicate' ACKs have been seen, we have
     narrowed the loss down to one or two missing packets, and the
     FLIGHT_SIZE calculation ends up as 2.

 (4) In rxrpc_resend(), if there was no data that apparently needed
     retransmission, we transmit a PING ACK to ask the peer to tell us what
     its Rx window state is.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

57494343

rxrpc: Send an ACK after every few DATA packets we receive · 805b21b9

由 David Howells 提交于 9月 24, 2016

Send an ACK if we haven't sent one for the last two packets we've received.
This keeps the other end apprised of where we've got to - which is
important if they're doing slow-start.

We do this in recvmsg so that we can dispatch a packet directly without the
need to wake up the background thread.

This should possibly be made configurable in future.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

805b21b9

23 9月, 2016 3 次提交

rxrpc: Add tracepoint for ACK proposal · 9c7ad434

由 David Howells 提交于 9月 23, 2016

Add a tracepoint to log proposed ACKs, including whether the proposal is
used to update a pending ACK or is discarded in favour of an easlier,
higher priority ACK.

Whilst we're at it, get rid of the rxrpc_acks() function and access the
name array directly.  We do, however, need to validate the ACK reason
number given to trace_rxrpc_rx_ack() to make sure we don't overrun the
array.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

9c7ad434

rxrpc: Add data Tx tracepoint and adjust Tx ACK tracepoint · be832aec

由 David Howells 提交于 9月 23, 2016

Add a tracepoint to log transmission of DATA packets (including loss
injection).

Adjust the ACK transmission tracepoint to include the packet serial number
and to line this up with the DATA transmission display.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

be832aec

rxrpc: Don't call the tx_ack tracepoint if don't generate an ACK · b86e218e

由 David Howells 提交于 9月 23, 2016

rxrpc_send_call_packet() is invoking the tx_ack tracepoint before it checks
whether there's an ACK to transmit (another thread may jump in and transmit
it).

Fix this by only invoking the tracepoint if we get a valid ACK to transmit.

Further, only allocate a serial number if we're going to actually transmit
something.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

b86e218e

22 9月, 2016 4 次提交

rxrpc: Reduce the number of ACK-Requests sent · 0d4b103c

由 David Howells 提交于 9月 22, 2016

Reduce the number of ACK-Requests we set on DATA packets that we're sending
to reduce network traffic. We set the flag on odd-numbered DATA packets to
start off the RTT cache until we have at least three entries in it and then
probe once per second thereafter to keep it topped up.

This could be made tunable in future.

Note that from this point, the RXRPC_REQUEST_ACK flag is set on DATA
packets as we transmit them and not stored statically in the sk_buff.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

0d4b103c

rxrpc: Obtain RTT data by requesting ACKs on DATA packets · 50235c4b

由 David Howells 提交于 9月 22, 2016

In addition to sending a PING ACK to gain RTT data, we can set the
RXRPC_REQUEST_ACK flag on a DATA packet and get a REQUESTED-ACK ACK. The
ACK packet contains the serial number of the packet it is in response to,
so we can look through the Tx buffer for a matching DATA packet.

This requires that the data packets be stamped with the time of
transmission as a ktime rather than having the resend_at time in jiffies.

This further requires the resend code to do the resend determination in
ktimes and convert to jiffies to set the timer.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

50235c4b

rxrpc: Send pings to get RTT data · 8e83134d

由 David Howells 提交于 9月 22, 2016

Send a PING ACK packet to the peer when we get a new incoming call from a
peer we don't have a record for.  The PING RESPONSE ACK packet will tell us
the following about the peer:

 (1) its receive window size

 (2) its MTU sizes

 (3) its support for jumbo DATA packets

 (4) if it supports slow start (similar to RFC 5681)

 (5) an estimate of the RTT

This is necessary because the peer won't normally send us an ACK until it
gets to the Rx phase and we send it a packet, but we would like to know
some of this information before we start sending packets.

A pair of tracepoints are added so that RTT determination can be observed.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

8e83134d

rxrpc: Don't store the rxrpc header in the Tx queue sk_buffs · 5a924b89

由 David Howells 提交于 9月 22, 2016

Don't store the rxrpc protocol header in sk_buffs on the transmit queue,
but rather generate it on the fly and pass it to kernel_sendmsg() as a
separate iov. This reduces the amount of storage required.

Note that the security header is still stored in the sk_buff as it may get
encrypted along with the data (and doesn't change with each transmission).
Signed-off-by: NDavid Howells <dhowells@redhat.com>

5a924b89

17 9月, 2016 6 次提交

rxrpc: Add config to inject packet loss · 8a681c36

由 David Howells 提交于 9月 17, 2016

Add a configuration option to inject packet loss by discarding
approximately every 8th packet received and approximately every 8th DATA
packet transmitted.

Note that no locking is used, but it shouldn't really matter.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

8a681c36

rxrpc: Improve skb tracing · 71f3ca40

由 David Howells 提交于 9月 17, 2016

Improve sk_buff tracing within AF_RXRPC by the following means:

 (1) Use an enum to note the event type rather than plain integers and use
     an array of event names rather than a big multi ?: list.

 (2) Distinguish Rx from Tx packets and account them separately.  This
     requires the call phase to be tracked so that we know what we might
     find in rxtx_buffer[].

 (3) Add a parameter to rxrpc_{new,see,get,free}_skb() to indicate the
     event type.

 (4) A pair of 'rotate' events are added to indicate packets that are about
     to be rotated out of the Rx and Tx windows.

 (5) A pair of 'lost' events are added, along with rxrpc_lose_skb() for
     packet loss injection recording.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

71f3ca40

rxrpc: Add a tracepoint to log ACK transmission · f3639df2

由 David Howells 提交于 9月 17, 2016

Add a tracepoint to log information about ACK transmission.
Signed-off-by: NDavid Howels <dhowells@redhat.com>

f3639df2

rxrpc: Be consistent about switch value in rxrpc_send_call_packet() · 2311e327

由 David Howells 提交于 9月 17, 2016

rxrpc_send_call_packet() should use type in both its switch-statements
rather than using pkt->whdr.type.  This might give the compiler an easier
job of uninitialised variable checking.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

2311e327

rxrpc: Don't transmit an ACK if there's no reason set · 27d0fc43

由 David Howells 提交于 9月 17, 2016

Don't transmit an ACK if call->ackr_reason in unset.  There's the
possibility of a race between recvmsg() sending an ACK and the background
processing thread trying to send the same one.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

27d0fc43

rxrpc: Make IPv6 support conditional on CONFIG_IPV6 · d1912747

由 David Howells 提交于 9月 17, 2016

Add CONFIG_AF_RXRPC_IPV6 and make the IPv6 support code conditional on it.
This is then made conditional on CONFIG_IPV6.

Without this, the following can be seen:

   net/built-in.o: In function `rxrpc_init_peer':
>> peer_object.c:(.text+0x18c3c8): undefined reference to `ip6_route_output_flags'
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1912747

14 9月, 2016 3 次提交

rxrpc: Add IPv6 support · 75b54cb5

由 David Howells 提交于 9月 13, 2016

Add IPv6 support to AF_RXRPC.  With this, AF_RXRPC sockets can be created:

	service = socket(AF_RXRPC, SOCK_DGRAM, PF_INET6);

instead of:

	service = socket(AF_RXRPC, SOCK_DGRAM, PF_INET);

The AFS filesystem doesn't support IPv6 at the moment, though, since that
requires upgrades to some of the RPC calls.

Note that a good portion of this patch is replacing "%pI4:%u" in print
statements with "%pISpc" which is able to handle both protocols and print
the port.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

75b54cb5

rxrpc: Use rxrpc_extract_addr_from_skb() rather than doing this manually · 1c2bc7b9

由 David Howells 提交于 9月 13, 2016

There are two places that want to transmit a packet in response to one just
received and manually pick the address to reply to out of the sk_buff.
Make them use rxrpc_extract_addr_from_skb() instead so that IPv6 is handled
automatically.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

1c2bc7b9

rxrpc: Correctly initialise, limit and transmit call->rx_winsize · 75e42126

由 David Howells 提交于 9月 13, 2016

call->rx_winsize should be initialised to the sysctl setting and the sysctl
setting should be limited to the maximum we want to permit. Further, we
need to place this in the ACK info instead of the sysctl setting.

Furthermore, discard the idea of accepting the subpackets of a jumbo packet
that lie beyond the receive window when the first packet of the jumbo is
within the window. Just discard the excess subpackets instead. This
allows the receive window to be opened up right to the buffer size less one
for the dead slot.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

75e42126

08 9月, 2016 1 次提交

rxrpc: Rewrite the data and ack handling code · 248f219c

由 David Howells 提交于 9月 08, 2016

Rewrite the data and ack handling code such that:

 (1) Parsing of received ACK and ABORT packets and the distribution and the
     filing of DATA packets happens entirely within the data_ready context
     called from the UDP socket.  This allows us to process and discard ACK
     and ABORT packets much more quickly (they're no longer stashed on a
     queue for a background thread to process).

 (2) We avoid calling skb_clone(), pskb_pull() and pskb_trim().  We instead
     keep track of the offset and length of the content of each packet in
     the sk_buff metadata.  This means we don't do any allocation in the
     receive path.

 (3) Jumbo DATA packet parsing is now done in data_ready context.  Rather
     than cloning the packet once for each subpacket and pulling/trimming
     it, we file the packet multiple times with an annotation for each
     indicating which subpacket is there.  From that we can directly
     calculate the offset and length.

 (4) A call's receive queue can be accessed without taking locks (memory
     barriers do have to be used, though).

 (5) Incoming calls are set up from preallocated resources and immediately
     made live.  They can than have packets queued upon them and ACKs
     generated.  If insufficient resources exist, DATA packet #1 is given a
     BUSY reply and other DATA packets are discarded).

 (6) sk_buffs no longer take a ref on their parent call.

To make this work, the following changes are made:

 (1) Each call's receive buffer is now a circular buffer of sk_buff
     pointers (rxtx_buffer) rather than a number of sk_buff_heads spread
     between the call and the socket.  This permits each sk_buff to be in
     the buffer multiple times.  The receive buffer is reused for the
     transmit buffer.

 (2) A circular buffer of annotations (rxtx_annotations) is kept parallel
     to the data buffer.  Transmission phase annotations indicate whether a
     buffered packet has been ACK'd or not and whether it needs
     retransmission.

     Receive phase annotations indicate whether a slot holds a whole packet
     or a jumbo subpacket and, if the latter, which subpacket.  They also
     note whether the packet has been decrypted in place.

 (3) DATA packet window tracking is much simplified.  Each phase has just
     two numbers representing the window (rx_hard_ack/rx_top and
     tx_hard_ack/tx_top).

     The hard_ack number is the sequence number before base of the window,
     representing the last packet the other side says it has consumed.
     hard_ack starts from 0 and the first packet is sequence number 1.

     The top number is the sequence number of the highest-numbered packet
     residing in the buffer.  Packets between hard_ack+1 and top are
     soft-ACK'd to indicate they've been received, but not yet consumed.

     Four macros, before(), before_eq(), after() and after_eq() are added
     to compare sequence numbers within the window.  This allows for the
     top of the window to wrap when the hard-ack sequence number gets close
     to the limit.

     Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also
     to indicate when rx_top and tx_top point at the packets with the
     LAST_PACKET bit set, indicating the end of the phase.

 (4) Calls are queued on the socket 'receive queue' rather than packets.
     This means that we don't need have to invent dummy packets to queue to
     indicate abnormal/terminal states and we don't have to keep metadata
     packets (such as ABORTs) around

 (5) The offset and length of a (sub)packet's content are now passed to
     the verify_packet security op.  This is currently expected to decrypt
     the packet in place and validate it.

     However, there's now nowhere to store the revised offset and length of
     the actual data within the decrypted blob (there may be a header and
     padding to skip) because an sk_buff may represent multiple packets, so
     a locate_data security op is added to retrieve these details from the
     sk_buff content when needed.

 (6) recvmsg() now has to handle jumbo subpackets, where each subpacket is
     individually secured and needs to be individually decrypted.  The code
     to do this is broken out into rxrpc_recvmsg_data() and shared with the
     kernel API.  It now iterates over the call's receive buffer rather
     than walking the socket receive queue.

Additional changes:

 (1) The timers are condensed to a single timer that is set for the soonest
     of three timeouts (delayed ACK generation, DATA retransmission and
     call lifespan).

 (2) Transmission of ACK and ABORT packets is effected immediately from
     process-context socket ops/kernel API calls that cause them instead of
     them being punted off to a background work item.  The data_ready
     handler still has to defer to the background, though.

 (3) A shutdown op is added to the AF_RXRPC socket so that the AFS
     filesystem can shut down the socket and flush its own work items
     before closing the socket to deal with any in-progress service calls.

Future additional changes that will need to be considered:

 (1) Make sure that a call doesn't hog the front of the queue by receiving
     data from the network as fast as userspace is consuming it to the
     exclusion of other calls.

 (2) Transmit delayed ACKs from within recvmsg() when we've consumed
     sufficiently more packets to avoid the background work item needing to
     run.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

248f219c

07 9月, 2016 1 次提交

rxrpc: Calls shouldn't hold socket refs · 8d94aa38

由 David Howells 提交于 9月 07, 2016

rxrpc calls shouldn't hold refs on the sock struct.  This was done so that
the socket wouldn't go away whilst the call was in progress, such that the
call could reach the socket's queues.

However, we can mark the socket as requiring an RCU release and rely on the
RCU read lock.

To make this work, we do:

 (1) rxrpc_release_call() removes the call's call user ID.  This is now
     only called from socket operations and not from the call processor:

	rxrpc_accept_call() / rxrpc_kernel_accept_call()
	rxrpc_reject_call() / rxrpc_kernel_reject_call()
	rxrpc_kernel_end_call()
	rxrpc_release_calls_on_socket()
	rxrpc_recvmsg()

     Though it is also called in the cleanup path of
     rxrpc_accept_incoming_call() before we assign a user ID.

 (2) Pass the socket pointer into rxrpc_release_call() rather than getting
     it from the call so that we can get rid of uninitialised calls.

 (3) Fix call processor queueing to pass a ref to the work queue and to
     release that ref at the end of the processor function (or to pass it
     back to the work queue if we have to requeue).

 (4) Skip out of the call processor function asap if the call is complete
     and don't requeue it if the call is complete.

 (5) Clean up the call immediately that the refcount reaches 0 rather than
     trying to defer it.  Actual deallocation is deferred to RCU, however.

 (6) Don't hold socket refs for allocated calls.

 (7) Use the RCU read lock when queueing a message on a socket and treat
     the call's socket pointer according to RCU rules and check it for
     NULL.

     We also need to use the RCU read lock when viewing a call through
     procfs.

 (8) Transmit the final ACK/ABORT to a client call in rxrpc_release_call()
     if this hasn't been done yet so that we can then disconnect the call.
     Once the call is disconnected, it won't have any access to the
     connection struct and the UDP socket for the call work processor to be
     able to send the ACK.  Terminal retransmission will be handled by the
     connection processor.

 (9) Release all calls immediately on the closing of a socket rather than
     trying to defer this.  Incomplete calls will be aborted.

The call refcount model is much simplified.  Refs are held on the call by:

 (1) A socket's user ID tree.

 (2) A socket's incoming call secureq and acceptq.

 (3) A kernel service that has a call in progress.

 (4) A queued call work processor.  We have to take care to put any call
     that we failed to queue.

 (5) sk_buffs on a socket's receive queue.  A future patch will get rid of
     this.

Whilst we're at it, we can do:

 (1) Get rid of the RXRPC_CALL_EV_RELEASE event.  Release is now done
     entirely from the socket routines and never from the call's processor.

 (2) Get rid of the RXRPC_CALL_DEAD state.  Calls now end in the
     RXRPC_CALL_COMPLETE state.

 (3) Get rid of the rxrpc_call::destroyer work item.  Calls are now torn
     down when their refcount reaches 0 and then handed over to RCU for
     final cleanup.

 (4) Get rid of the rxrpc_call::deadspan timer.  Calls are cleaned up
     immediately they're finished with and don't hang around.
     Post-completion retransmission is handled by the connection processor
     once the call is disconnected.

 (5) Get rid of the dead call expiry setting as there's no longer a timer
     to set.

 (6) rxrpc_destroy_all_calls() can just check that the call list is empty.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

8d94aa38

05 9月, 2016 1 次提交

rxrpc: Split sendmsg from packet transmission code · 0b58b8a1

由 David Howells 提交于 9月 02, 2016

Split the sendmsg code from the packet transmission code (mostly to be
found in output.c).
Signed-off-by: NDavid Howells <dhowells@redhat.com>

0b58b8a1

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功