提交 · a343b174b4bdde851033996960bca5ad1394d04b · openeuler / Kernel

06 1月, 2023 2 次提交

rxrpc: Only set/transmit aborts in the I/O thread · a343b174

由 David Howells 提交于 10月 12, 2022

Only set the abort call completion state in the I/O thread and only
transmit ABORT packets from there.  rxrpc_abort_call() can then be made to
actually send the packet.

Further, ABORT packets should only be sent if the call has been exposed to
the network (ie. at least one attempted DATA transmission has occurred for
it).
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

a343b174

rxrpc: Make the local endpoint hold a ref on a connected call · 5040011d

由 David Howells 提交于 11月 02, 2022

Make the local endpoint and it's I/O thread hold a reference on a connected
call until that call is disconnected.  Without this, we're reliant on
either the AF_RXRPC socket to hold a ref (which is dropped when the call is
released) or a queued work item to hold a ref (the work item is being
replaced with the I/O thread).
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

5040011d

19 12月, 2022 1 次提交

rxrpc: Fix security setting propagation · fdb99487

由 David Howells 提交于 12月 15, 2022

Fix the propagation of the security settings from sendmsg to the rxrpc_call
struct.

Fixes: f3441d41 ("rxrpc: Copy client call parameters into rxrpc_call earlier")
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fdb99487

01 12月, 2022 15 次提交

rxrpc: Remove the _bh annotation from all the spinlocks · 3dd9c8b5

由 David Howells 提交于 1月 24, 2020

None of the spinlocks in rxrpc need a _bh annotation now as the RCU
callback routines no longer take spinlocks and the bulk of the packet
wrangling code is now run in the I/O thread, not softirq context.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

3dd9c8b5

rxrpc: Make the I/O thread take over the call and local processor work · 5e6ef4f1

由 David Howells 提交于 1月 23, 2020

Move the functions from the call->processor and local->processor work items
into the domain of the I/O thread.

The call event processor, now called from the I/O thread, then takes over
the job of cranking the call state machine, processing incoming packets and
transmitting DATA, ACK and ABORT packets.  In a future patch,
rxrpc_send_ACK() will transmit the ACK on the spot rather than queuing it
for later transmission.

The call event processor becomes purely received-skb driven.  It only
transmits things in response to events.  We use "pokes" to queue a dummy
skb to make it do things like start/resume transmitting data.  Timer expiry
also results in pokes.

The connection event processor, becomes similar, though crypto events, such
as dealing with CHALLENGE and RESPONSE packets is offloaded to a work item
to avoid doing crypto in the I/O thread.

The local event processor is removed and VERSION response packets are
generated directly from the packet parser.  Similarly, ABORTs generated in
response to protocol errors will be transmitted immediately rather than
being pushed onto a queue for later transmission.

Changes:
========
ver #2)
 - Fix a couple of introduced lock context imbalances.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

5e6ef4f1

rxrpc: Remove RCU from peer->error_targets list · 29fb4ec3

由 David Howells 提交于 10月 12, 2022

Remove the RCU requirements from the peer's list of error targets so that
the error distributor can call sleeping functions.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

29fb4ec3

rxrpc: Move DATA transmission into call processor work item · cf37b598

由 David Howells 提交于 3月 31, 2022

Move DATA transmission into the call processor work item.  In a future
patch, this will be called from the I/O thread rather than being itsown
work item.

This will allow DATA transmission to be driven directly by incoming ACKs,
pokes and timers as those are processed.

The Tx queue is also split: The queue of packets prepared by sendmsg is now
places in call->tx_sendmsg and the packet dispatcher decants the packets
into call->tx_buffer as space becomes available in the transmission
window.  This allows sendmsg to run ahead of the available space to try and
prevent an underflow in transmission.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

cf37b598

rxrpc: Copy client call parameters into rxrpc_call earlier · f3441d41

由 David Howells 提交于 10月 20, 2022

Copy client call parameters into rxrpc_call earlier so that that can be
used to convey them to the connection code - which can then be offloaded to
the I/O thread.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

f3441d41

rxrpc: Implement a mechanism to send an event notification to a call · 15f661dc

由 David Howells 提交于 10月 10, 2022

Provide a means by which an event notification can be sent to a call such
that the I/O thread can process it rather than it being done in a separate
workqueue.  This will allow a lot of locking to be removed.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

15f661dc

rxrpc: Remove call->input_lock · 4041a8ff

由 David Howells 提交于 1月 23, 2020

Remove call->input_lock as it was only necessary to serialise access to the
state stored in the rxrpc_call struct by simultaneous softirq handlers
presenting received packets.  They now dump the packets in a queue and a
single process-context handler now processes them.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

4041a8ff

rxrpc: Move packet reception processing into I/O thread · 446b3e14

由 David Howells 提交于 10月 10, 2022

Split the packet input handler to make the softirq side just dump the
received packet into the local endpoint receive queue and then call the
remainder of the input function from the I/O thread.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

446b3e14

rxrpc: Don't hold a ref for call timer or workqueue · 3feda9d6

由 David Howells 提交于 11月 25, 2022

Currently, rxrpc gives the call timer a ref on the call when it starts it
and this is passed along to the workqueue by the timer expiration function.
The problem comes when queue_work() fails (ie. the work item is already
queued): the timer routine must put the ref - but this may cause the
cleanup code to run.

This has the unfortunate effect that the cleanup code may then be run in
softirq context - which means that any spinlocks it might need to touch
have to be guarded to disable softirqs (ie. they need a "_bh" suffix).

Fix this by:

 (1) Don't give a ref to the timer.

 (2) Making the expiration function not do anything if the refcount is 0.
     Note that this is more of an optimisation.

 (3) Make sure that the cleanup routine waits for timer to complete.

However, this has a consequence that timer cannot give a ref to the work
item.  Therefore the following fixes are also necessary:

 (4) Don't give a ref to the work item.

 (5) Make the work item return asap if it sees the ref count is 0.

 (6) Make sure that the cleanup routine waits for the work item to
     complete.

Unfortunately, neither the timer nor the work item can simply get around
the problem by just using refcount_inc_not_zero() as the waits would still
have to be done, and there would still be the possibility of having to put
the ref in the expiration function.

Note the call work item is going to go away with the work being transferred
to the I/O thread, so the wait in (6) will become obsolete.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

3feda9d6

rxrpc: trace: Don't use __builtin_return_address for sk_buff tracing · 9a36a6bc

由 David Howells 提交于 10月 21, 2022

In rxrpc tracing, use enums to generate lists of points of interest rather
than __builtin_return_address() for the sk_buff tracepoint.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

9a36a6bc

rxrpc: trace: Don't use __builtin_return_address for rxrpc_call tracing · cb0fc0c9

由 David Howells 提交于 10月 21, 2022

In rxrpc tracing, use enums to generate lists of points of interest rather
than __builtin_return_address() for the rxrpc_call tracepoint
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

cb0fc0c9

rxrpc: trace: Don't use __builtin_return_address for rxrpc_conn tracing · 7fa25105

由 David Howells 提交于 10月 21, 2022

In rxrpc tracing, use enums to generate lists of points of interest rather
than __builtin_return_address() for the rxrpc_conn tracepoint
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

7fa25105

rxrpc: trace: Don't use __builtin_return_address for rxrpc_peer tracing · 47c810a7

由 David Howells 提交于 10月 21, 2022

In rxrpc tracing, use enums to generate lists of points of interest rather
than __builtin_return_address() for the rxrpc_peer tracepoint
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

47c810a7

rxrpc: Drop rxrpc_conn_parameters from rxrpc_connection and rxrpc_bundle · 2cc80086

由 David Howells 提交于 10月 19, 2022

Remove the rxrpc_conn_parameters struct from the rxrpc_connection and
rxrpc_bundle structs and emplace the members directly.  These are going to
get filled in from the rxrpc_call struct in future.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

2cc80086

rxrpc: Remove the [_k]net() debugging macros · e969c92c

由 David Howells 提交于 10月 20, 2022

Remove the _net() and knet() debugging macros in favour of tracepoints.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

e969c92c

09 11月, 2022 9 次提交

rxrpc: Fix congestion management · 1fc4fa2a

由 David Howells 提交于 10月 03, 2022

rxrpc has a problem in its congestion management in that it saves the
congestion window size (cwnd) from one call to another, but if this is 0 at
the time is saved, then the next call may not actually manage to ever
transmit anything.

To this end:

 (1) Don't save cwnd between calls, but rather reset back down to the
     initial cwnd and re-enter slow-start if data transmission is idle for
     more than an RTT.

 (2) Preserve ssthresh instead, as that is a handy estimate of pipe
     capacity.  Knowing roughly when to stop slow start and enter
     congestion avoidance can reduce the tendency to overshoot and drop
     larger amounts of packets when probing.

In future, cwind growth also needs to be constrained when the window isn't
being filled due to being application limited.
Reported-by: NSimon Wilkinson <sxw@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

1fc4fa2a

rxrpc: Remove the rxtx ring · 6869ddb8

由 David Howells 提交于 6月 15, 2022

The Rx/Tx ring is no longer used, so remove it.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

6869ddb8

rxrpc: Save last ACK's SACK table rather than marking txbufs · d57a3a15

由 David Howells 提交于 5月 07, 2022

Improve the tracking of which packets need to be transmitted by saving the
last ACK packet that we receive that has a populated soft-ACK table rather
than marking packets.  Then we can step through the soft-ACK table and look
at the packets we've transmitted beyond that to determine which packets we
might want to retransmit.

We also look at the highest serial number that has been acked to try and
guess which packets we've transmitted the peer is likely to have seen.  If
necessary, we send a ping to retrieve that number.

One downside that might be a problem is that we can't then compare the
previous acked/unacked state so easily in rxrpc_input_soft_acks() - which
is a potential problem for the slow-start algorithm.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

d57a3a15

rxrpc: Remove call->lock · 4e76bd40

由 David Howells 提交于 5月 06, 2022

call->lock is no longer necessary, so remove it.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

4e76bd40

rxrpc: Don't use a ring buffer for call Tx queue · a4ea4c47

由 David Howells 提交于 3月 31, 2022

Change the way the Tx queueing works to make the following ends easier to
achieve:

 (1) The filling of packets, the encryption of packets and the transmission
     of packets can be handled in parallel by separate threads, rather than
     rxrpc_sendmsg() allocating, filling, encrypting and transmitting each
     packet before moving onto the next one.

 (2) Get rid of the fixed-size ring which sets a hard limit on the number
     of packets that can be retained in the ring.  This allows the number
     of packets to increase without having to allocate a very large ring or
     having variable-sized rings.

     [Note: the downside of this is that it's then less efficient to locate
     a packet for retransmission as we then have to step through a list and
     examine each buffer in the list.]

 (3) Allow the filler/encrypter to run ahead of the transmission window.

 (4) Make it easier to do zero copy UDP from the packet buffers.

 (5) Make it easier to do zero copy from userspace to the packet buffers -
     and thence to UDP (only if for unauthenticated connections).

To that end, the following changes are made:

 (1) Use the new rxrpc_txbuf struct instead of sk_buff for keeping packets
     to be transmitted in.  This allows them to be placed on multiple
     queues simultaneously.  An sk_buff isn't really necessary as it's
     never passed on to lower-level networking code.

 (2) Keep the transmissable packets in a linked list on the call struct
     rather than in a ring.  As a consequence, the annotation buffer isn't
     used either; rather a flag is set on the packet to indicate ackedness.

 (3) Use the RXRPC_CALL_TX_LAST flag to indicate that the last packet to be
     transmitted has been queued.  Add RXRPC_CALL_TX_ALL_ACKED to indicate
     that all packets up to and including the last got hard acked.

 (4) Wire headers are now stored in the txbuf rather than being concocted
     on the stack and they're stored immediately before the data, thereby
     allowing zerocopy of a single span.

 (5) Don't bother with instant-resend on transmission failure; rather,
     leave it for a timer or an ACK packet to trigger.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

a4ea4c47

rxrpc: Get rid of the Rx ring · 5d7edbc9

由 David Howells 提交于 8月 27, 2022

Get rid of the Rx ring and replace it with a pair of queues instead.  One
queue gets the packets that are in-sequence and are ready for processing by
recvmsg(); the other queue gets the out-of-sequence packets for addition to
the first queue as the holes get filled.

The annotation ring is removed and replaced with a SACK table.  The SACK
table has the bits set that correspond exactly to the sequence number of
the packet being acked.  The SACK ring is copied when an ACK packet is
being assembled and rotated so that the first ACK is in byte 0.

Flow control handling is altered so that packets that are moved to the
in-sequence queue are hard-ACK'd even before they're consumed - and then
the Rx window size in the ACK packet (rsize) is shrunk down to compensate
(even going to 0 if the window is full).
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

5d7edbc9

rxrpc: Clean up ACK handling · 530403d9

由 David Howells 提交于 1月 30, 2020

Clean up the rxrpc_propose_ACK() function.  If deferred PING ACK proposal
is split out, it's only really needed for deferred DELAY ACKs.  All other
ACKs, bar terminal IDLE ACK are sent immediately.  The deferred IDLE ACK
submission can be handled by conversion of a DELAY ACK into an IDLE ACK if
there's nothing to be SACK'd.

Also, because there's a delay between an ACK being generated and being
transmitted, it's possible that other ACKs of the same type will be
generated during that interval.  Apart from the ACK time and the serial
number responded to, most of the ACK body, including window and SACK
parameters, are not filled out till the point of transmission - so we can
avoid generating a new ACK if there's one pending that will cover the SACK
data we need to convey.

Therefore, don't propose a new DELAY or IDLE ACK for a call if there's one
already pending.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

530403d9

rxrpc: Remove call->tx_phase · a11e6ff9

由 David Howells 提交于 10月 07, 2022

Remove call->tx_phase as it's only ever set.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

a11e6ff9

rxrpc: Split call timer-expiration from call timer-set tracepoint · 334dfbfc

由 David Howells 提交于 4月 22, 2022

Split the tracepoint for call timer-set to separate out the call
timer-expiration event
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

334dfbfc

26 8月, 2022 1 次提交

rxrpc: Fix locking in rxrpc's sendmsg · b0f571ec

由 David Howells 提交于 8月 24, 2022

Fix three bugs in the rxrpc's sendmsg implementation:

 (1) rxrpc_new_client_call() should release the socket lock when returning
     an error from rxrpc_get_call_slot().

 (2) rxrpc_wait_for_tx_window_intr() will return without the call mutex
     held in the event that we're interrupted by a signal whilst waiting
     for tx space on the socket or relocking the call mutex afterwards.

     Fix this by: (a) moving the unlock/lock of the call mutex up to
     rxrpc_send_data() such that the lock is not held around all of
     rxrpc_wait_for_tx_window*() and (b) indicating to higher callers
     whether we're return with the lock dropped.  Note that this means
     recvmsg() will not block on this call whilst we're waiting.

 (3) After dropping and regaining the call mutex, rxrpc_send_data() needs
     to go and recheck the state of the tx_pending buffer and the
     tx_total_len check in case we raced with another sendmsg() on the same
     call.

Thinking on this some more, it might make sense to have different locks for
sendmsg() and recvmsg().  There's probably no need to make recvmsg() wait
for sendmsg().  It does mean that recvmsg() can return MSG_EOR indicating
that a call is dead before a sendmsg() to that call returns - but that can
currently happen anyway.

Without fix (2), something like the following can be induced:

	WARNING: bad unlock balance detected!
	5.16.0-rc6-syzkaller #0 Not tainted
	-------------------------------------
	syz-executor011/3597 is trying to release lock (&call->user_mutex) at:
	[<ffffffff885163a3>] rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748
	but there are no more locks to release!

	other info that might help us debug this:
	no locks held by syz-executor011/3597.
	...
	Call Trace:
	 <TASK>
	 __dump_stack lib/dump_stack.c:88 [inline]
	 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
	 print_unlock_imbalance_bug include/trace/events/lock.h:58 [inline]
	 __lock_release kernel/locking/lockdep.c:5306 [inline]
	 lock_release.cold+0x49/0x4e kernel/locking/lockdep.c:5657
	 __mutex_unlock_slowpath+0x99/0x5e0 kernel/locking/mutex.c:900
	 rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748
	 rxrpc_sendmsg+0x420/0x630 net/rxrpc/af_rxrpc.c:561
	 sock_sendmsg_nosec net/socket.c:704 [inline]
	 sock_sendmsg+0xcf/0x120 net/socket.c:724
	 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
	 ___sys_sendmsg+0xf3/0x170 net/socket.c:2463
	 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
	 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
	 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
	 entry_SYSCALL_64_after_hwframe+0x44/0xae

[Thanks to Hawkins Jiawei and Khalid Masum for their attempts to fix this]

Fixes: bc5e3a54 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
Reported-by: syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NMarc Dionne <marc.dionne@auristor.com>
Tested-by: syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com
cc: Hawkins Jiawei <yin31149@gmail.com>
cc: Khalid Masum <khalid.masum.92@gmail.com>
cc: Dan Carpenter <dan.carpenter@oracle.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/166135894583.600315.7170979436768124075.stgit@warthog.procyon.org.ukSigned-off-by: NJakub Kicinski <kuba@kernel.org>

b0f571ec

23 5月, 2022 2 次提交

rxrpc: Fix locking issue · ad25f5cb

由 David Howells 提交于 5月 21, 2022

There's a locking issue with the per-netns list of calls in rxrpc.  The
pieces of code that add and remove a call from the list use write_lock()
and the calls procfile uses read_lock() to access it.  However, the timer
callback function may trigger a removal by trying to queue a call for
processing and finding that it's already queued - at which point it has a
spare refcount that it has to do something with.  Unfortunately, if it puts
the call and this reduces the refcount to 0, the call will be removed from
the list.  Unfortunately, since the _bh variants of the locking functions
aren't used, this can deadlock.

================================
WARNING: inconsistent lock state
5.18.0-rc3-build4+ #10 Not tainted
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
ksoftirqd/2/25 [HC0[0]:SC1[1]:HE1:SE0] takes:
ffff888107ac4038 (&rxnet->call_lock){+.?.}-{2:2}, at: rxrpc_put_call+0x103/0x14b
{SOFTIRQ-ON-W} state was registered at:
...
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&rxnet->call_lock);
  <Interrupt>
    lock(&rxnet->call_lock);

 *** DEADLOCK ***

1 lock held by ksoftirqd/2/25:
 #0: ffff8881008ffdb0 ((&call->timer)){+.-.}-{0:0}, at: call_timer_fn+0x5/0x23d

Changes
=======
ver #2)
 - Changed to using list_next_rcu() rather than rcu_dereference() directly.

Fixes: 17926a79 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad25f5cb

rxrpc: Use refcount_t rather than atomic_t · a0575429

由 David Howells 提交于 5月 21, 2022

Move to using refcount_t rather than atomic_t for refcounts in rxrpc.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a0575429

31 3月, 2022 1 次提交

rxrpc: Fix call timer start racing with call destruction · 4a7f62f9

由 David Howells 提交于 3月 30, 2022

The rxrpc_call struct has a timer used to handle various timed events
relating to a call. This timer can get started from the packet input
routines that are run in softirq mode with just the RCU read lock held.
Unfortunately, because only the RCU read lock is held - and neither ref or
other lock is taken - the call can start getting destroyed at the same time
a packet comes in addressed to that call. This causes the timer - which
was already stopped - to get restarted. Later, the timer dispatch code may
then oops if the timer got deallocated first.

Fix this by trying to take a ref on the rxrpc_call struct and, if
successful, passing that ref along to the timer. If the timer was already
running, the ref is discarded.

The timer completion routine can then pass the ref along to the call's work
item when it queues it. If the timer or work item where already
queued/running, the extra ref is discarded.

Fixes: a158bdd3 ("rxrpc: Fix call timeouts")
Reported-by: NMarc Dionne <marc.dionne@auristor.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NMarc Dionne <marc.dionne@auristor.com>
Tested-by: NMarc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: http://lists.infradead.org/pipermail/linux-afs/2022-March/005073.html
Link: https://lore.kernel.org/r/164865115696.2943015.11097991776647323586.stgit@warthog.procyon.org.ukSigned-off-by: NPaolo Abeni <pabeni@redhat.com>

4a7f62f9

05 2月, 2021 1 次提交

rxrpc: Fix clearance of Tx/Rx ring when releasing a call · 7b5eab57

由 David Howells 提交于 2月 03, 2021

At the end of rxrpc_release_call(), rxrpc_cleanup_ring() is called to clear
the Rx/Tx skbuff ring, but this doesn't lock the ring whilst it's accessing
it.  Unfortunately, rxrpc_resend() might be trying to retransmit a packet
concurrently with this - and whilst it does lock the ring, this isn't
protection against rxrpc_cleanup_call().

Fix this by removing the call to rxrpc_cleanup_ring() from
rxrpc_release_call().  rxrpc_cleanup_ring() will be called again anyway
from rxrpc_cleanup_call().  The earlier call is just an optimisation to
recycle skbuffs more quickly.

Alternative solutions include rxrpc_release_call() could try to cancel the
work item or wait for it to complete or rxrpc_cleanup_ring() could lock
when accessing the ring (which would require a bh lock).

This can produce a report like the following:

  BUG: KASAN: use-after-free in rxrpc_send_data_packet+0x19b4/0x1e70 net/rxrpc/output.c:372
  Read of size 4 at addr ffff888011606e04 by task kworker/0:0/5
  ...
  Workqueue: krxrpcd rxrpc_process_call
  Call Trace:
   ...
   kasan_report.cold+0x79/0xd5 mm/kasan/report.c:413
   rxrpc_send_data_packet+0x19b4/0x1e70 net/rxrpc/output.c:372
   rxrpc_resend net/rxrpc/call_event.c:266 [inline]
   rxrpc_process_call+0x1634/0x1f60 net/rxrpc/call_event.c:412
   process_one_work+0x98d/0x15f0 kernel/workqueue.c:2275
   ...

  Allocated by task 2318:
   ...
   sock_alloc_send_pskb+0x793/0x920 net/core/sock.c:2348
   rxrpc_send_data+0xb51/0x2bf0 net/rxrpc/sendmsg.c:358
   rxrpc_do_sendmsg+0xc03/0x1350 net/rxrpc/sendmsg.c:744
   rxrpc_sendmsg+0x420/0x630 net/rxrpc/af_rxrpc.c:560
   ...

  Freed by task 2318:
   ...
   kfree_skb+0x140/0x3f0 net/core/skbuff.c:704
   rxrpc_free_skb+0x11d/0x150 net/rxrpc/skbuff.c:78
   rxrpc_cleanup_ring net/rxrpc/call_object.c:485 [inline]
   rxrpc_release_call+0x5dd/0x860 net/rxrpc/call_object.c:552
   rxrpc_release_calls_on_socket+0x21c/0x300 net/rxrpc/call_object.c:579
   rxrpc_release_sock net/rxrpc/af_rxrpc.c:885 [inline]
   rxrpc_release+0x263/0x5a0 net/rxrpc/af_rxrpc.c:916
   __sock_release+0xcd/0x280 net/socket.c:597
   ...

  The buggy address belongs to the object at ffff888011606dc0
   which belongs to the cache skbuff_head_cache of size 232

Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
Reported-by: syzbot+174de899852504e4a74a@syzkaller.appspotmail.com
Reported-by: syzbot+3d1c772efafd3c38d007@syzkaller.appspotmail.com
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
Link: https://lore.kernel.org/r/161234207610.653119.5287360098400436976.stgit@warthog.procyon.org.ukSigned-off-by: NJakub Kicinski <kuba@kernel.org>

7b5eab57

05 10月, 2020 1 次提交

rxrpc: Fix accept on a connection that need securing · 2d914c1b

由 David Howells 提交于 9月 30, 2020

When a new incoming call arrives at an userspace rxrpc socket on a new
connection that has a security class set, the code currently pushes it onto
the accept queue to hold a ref on it for the socket.  This doesn't work,
however, as recvmsg() pops it off, notices that it's in the SERVER_SECURING
state and discards the ref.  This means that the call runs out of refs too
early and the kernel oopses.

By contrast, a kernel rxrpc socket manually pre-charges the incoming call
pool with calls that already have user call IDs assigned, so they are ref'd
by the call tree on the socket.

Change the mode of operation for userspace rxrpc server sockets to work
like this too.  Although this is a UAPI change, server sockets aren't
currently functional.

Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: NDavid Howells <dhowells@redhat.com>

2d914c1b

09 9月, 2020 1 次提交

rxrpc: Impose a maximum number of client calls · b7a7d674

由 David Howells 提交于 7月 02, 2020

Impose a maximum on the number of client rxrpc calls that are allowed
simultaneously. This will be in lieu of a maximum number of client
connections as this is easier to administed as, unlike connections, calls
aren't reusable (to be changed in a subsequent patch)..

This doesn't affect the limits on service calls and connections.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

b7a7d674

21 8月, 2020 1 次提交

rxrpc: Fix loss of RTT samples due to interposed ACK · 4700c4d8

由 David Howells 提交于 8月 19, 2020

The Rx protocol has a mechanism to help generate RTT samples that works by
a client transmitting a REQUESTED-type ACK when it receives a DATA packet
that has the REQUEST_ACK flag set.

The peer, however, may interpose other ACKs before transmitting the
REQUESTED-ACK, as can be seen in the following trace excerpt:

rxrpc_tx_data: c=00000044 DATA d0b5ece8:00000001 00000001 q=00000001 fl=07
rxrpc_rx_ack: c=00000044 00000001 PNG r=00000000 f=00000002 p=00000000 n=0
rxrpc_rx_ack: c=00000044 00000002 REQ r=00000001 f=00000002 p=00000001 n=0
...

DATA packet 1 (q=xx) has REQUEST_ACK set (bit 1 of fl=xx). The incoming
ping (labelled PNG) hard-acks the request DATA packet (f=xx exceeds the
sequence number of the DATA packet), causing it to be discarded from the Tx
ring. The ACK that was requested (labelled REQ, r=xx references the serial
of the DATA packet) comes after the ping, but the sk_buff holding the
timestamp has gone and the RTT sample is lost.

This is particularly noticeable on RPC calls used to probe the service
offered by the peer. A lot of peers end up with an unknown RTT because we
only ever sent a single RPC. This confuses the server rotation algorithm.

Fix this by caching the information about the outgoing packet in RTT
calculations in the rxrpc_call struct rather than looking in the Tx ring.

A four-deep buffer is maintained and both REQUEST_ACK-flagged DATA and
PING-ACK transmissions are recorded in there. When the appropriate
response ACK is received, the buffer is checked for a match and, if found,
an RTT sample is recorded.

If a received ACK refers to a packet with a later serial number than an
entry in the cache, that entry is presumed lost and the entry is made
available to record a new transmission.

ACKs types other than REQUESTED-type and PING-type cause any matching
sample to be cancelled as they don't necessarily represent a useful
measurement.

If there's no space in the buffer on ping/data transmission, the sample
base is discarded.

Fixes: 50235c4b ("rxrpc: Obtain RTT data by requesting ACKs on DATA packets")
Signed-off-by: NDavid Howells <dhowells@redhat.com>

4700c4d8

31 7月, 2020 1 次提交

rxrpc: Fix race between recvmsg and sendmsg on immediate call failure · 65550098

由 David Howells 提交于 7月 29, 2020

There's a race between rxrpc_sendmsg setting up a call, but then failing to
send anything on it due to an error, and recvmsg() seeing the call
completion occur and trying to return the state to the user.

An assertion fails in rxrpc_recvmsg() because the call has already been
released from the socket and is about to be released again as recvmsg deals
with it.  (The recvmsg_q queue on the socket holds a ref, so there's no
problem with use-after-free.)

We also have to be careful not to end up reporting an error twice, in such
a way that both returns indicate to userspace that the user ID supplied
with the call is no longer in use - which could cause the client to
malfunction if it recycles the user ID fast enough.

Fix this by the following means:

 (1) When sendmsg() creates a call after the point that the call has been
     successfully added to the socket, don't return any errors through
     sendmsg(), but rather complete the call and let recvmsg() retrieve
     them.  Make sendmsg() return 0 at this point.  Further calls to
     sendmsg() for that call will fail with ESHUTDOWN.

     Note that at this point, we haven't send any packets yet, so the
     server doesn't yet know about the call.

 (2) If sendmsg() returns an error when it was expected to create a new
     call, it means that the user ID wasn't used.

 (3) Mark the call disconnected before marking it completed to prevent an
     oops in rxrpc_release_call().

 (4) recvmsg() will then retrieve the error and set MSG_EOR to indicate
     that the user ID is no longer known by the kernel.

An oops like the following is produced:

	kernel BUG at net/rxrpc/recvmsg.c:605!
	...
	RIP: 0010:rxrpc_recvmsg+0x256/0x5ae
	...
	Call Trace:
	 ? __init_waitqueue_head+0x2f/0x2f
	 ____sys_recvmsg+0x8a/0x148
	 ? import_iovec+0x69/0x9c
	 ? copy_msghdr_from_user+0x5c/0x86
	 ___sys_recvmsg+0x72/0xaa
	 ? __fget_files+0x22/0x57
	 ? __fget_light+0x46/0x51
	 ? fdget+0x9/0x1b
	 do_recvmmsg+0x15e/0x232
	 ? _raw_spin_unlock+0xa/0xb
	 ? vtime_delta+0xf/0x25
	 __x64_sys_recvmmsg+0x2c/0x2f
	 do_syscall_64+0x4c/0x78
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 357f5ef6 ("rxrpc: Call rxrpc_release_call() on error in rxrpc_new_client_call()")
Reported-by: syzbot+b54969381df354936d96@syzkaller.appspotmail.com
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NMarc Dionne <marc.dionne@auristor.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

65550098

14 3月, 2020 1 次提交

rxrpc: Fix call interruptibility handling · e138aa7d

由 David Howells 提交于 3月 13, 2020

Fix the interruptibility of kernel-initiated client calls so that they're
either only interruptible when they're waiting for a call slot to come
available or they're not interruptible at all.  Either way, they're not
interruptible during transmission.

This should help prevent StoreData calls from being interrupted when
writeback is in progress.  It doesn't, however, handle interruption during
the receive phase.

Userspace-initiated calls are still interruptable.  After the signal has
been handled, sendmsg() will return the amount of data copied out of the
buffer and userspace can perform another sendmsg() call to continue
transmission.

Fixes: bc5e3a54 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
Signed-off-by: NDavid Howells <dhowells@redhat.com>

e138aa7d

07 2月, 2020 1 次提交

rxrpc: Fix call RCU cleanup using non-bh-safe locks · 963485d4

由 David Howells 提交于 2月 06, 2020

rxrpc_rcu_destroy_call(), which is called as an RCU callback to clean up a
put call, calls rxrpc_put_connection() which, deep in its bowels, takes a
number of spinlocks in a non-BH-safe way, including rxrpc_conn_id_lock and
local->client_conns_lock. RCU callbacks, however, are normally called from
softirq context, which can cause lockdep to notice the locking
inconsistency.

To get lockdep to detect this, it's necessary to have the connection
cleaned up on the put at the end of the last of its calls, though normally
the clean up is deferred. This can be induced, however, by starting a call
on an AF_RXRPC socket and then closing the socket without reading the
reply.

Fix this by having rxrpc_rcu_destroy_call() punt the destruction to a
workqueue if in softirq-mode and defer the destruction to process context.

Note that another way to fix this could be to add a bunch of bh-disable
annotations to the spinlocks concerned - and there might be more than just
those two - but that means spending more time with BHs disabled.

Note also that some of these places were covered by bh-disable spinlocks
belonging to the rxrpc_transport object, but these got removed without the
_bh annotation being retained on the next lock in.

Fixes: 999b69f8 ("rxrpc: Kill the client connection bundle concept")
Reported-by: syzbot+d82f3ac8d87e7ccbb2c9@syzkaller.appspotmail.com
Reported-by: syzbot+3f1fd6b8cbf8702d134e@syzkaller.appspotmail.com
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

963485d4

03 2月, 2020 1 次提交

rxrpc: Fix NULL pointer deref due to call->conn being cleared on disconnect · 5273a191

由 David Howells 提交于 1月 30, 2020

When a call is disconnected, the connection pointer from the call is
cleared to make sure it isn't used again and to prevent further attempted
transmission for the call. Unfortunately, there might be a daemon trying
to use it at the same time to transmit a packet.

Fix this by keeping call->conn set, but setting a flag on the call to
indicate disconnection instead.

Remove also the bits in the transmission functions where the conn pointer is
checked and a ref taken under spinlock as this is now redundant.

Fixes: 8d94aa38 ("rxrpc: Calls shouldn't hold socket refs")
Signed-off-by: NDavid Howells <dhowells@redhat.com>

5273a191

07 10月, 2019 1 次提交

rxrpc: Fix call crypto state cleanup · 91fcfbe8

由 David Howells 提交于 10月 07, 2019

Fix the cleanup of the crypto state on a call after the call has been
disconnected.  As the call has been disconnected, its connection ref has
been discarded and so we can't go through that to get to the security ops
table.

Fix this by caching the security ops pointer in the rxrpc_call struct and
using that when freeing the call security state.  Also use this in other
places we're dealing with call-specific security.

The symptoms look like:

    BUG: KASAN: use-after-free in rxrpc_release_call+0xb2d/0xb60
    net/rxrpc/call_object.c:481
    Read of size 8 at addr ffff888062ffeb50 by task syz-executor.5/4764

Fixes: 1db88c53 ("rxrpc: Fix -Wframe-larger-than= warnings from on-stack crypto")
Reported-by: syzbot+eed305768ece6682bb7f@syzkaller.appspotmail.com
Signed-off-by: NDavid Howells <dhowells@redhat.com>

91fcfbe8

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功