1. 06 1月, 2023 2 次提交
    • D
      rxrpc: Only set/transmit aborts in the I/O thread · a343b174
      David Howells 提交于
      Only set the abort call completion state in the I/O thread and only
      transmit ABORT packets from there.  rxrpc_abort_call() can then be made to
      actually send the packet.
      
      Further, ABORT packets should only be sent if the call has been exposed to
      the network (ie. at least one attempted DATA transmission has occurred for
      it).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      a343b174
    • D
      rxrpc: Make the local endpoint hold a ref on a connected call · 5040011d
      David Howells 提交于
      Make the local endpoint and it's I/O thread hold a reference on a connected
      call until that call is disconnected.  Without this, we're reliant on
      either the AF_RXRPC socket to hold a ref (which is dropped when the call is
      released) or a queued work item to hold a ref (the work item is being
      replaced with the I/O thread).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      5040011d
  2. 19 12月, 2022 1 次提交
  3. 01 12月, 2022 15 次提交
    • D
      rxrpc: Remove the _bh annotation from all the spinlocks · 3dd9c8b5
      David Howells 提交于
      None of the spinlocks in rxrpc need a _bh annotation now as the RCU
      callback routines no longer take spinlocks and the bulk of the packet
      wrangling code is now run in the I/O thread, not softirq context.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      3dd9c8b5
    • D
      rxrpc: Make the I/O thread take over the call and local processor work · 5e6ef4f1
      David Howells 提交于
      Move the functions from the call->processor and local->processor work items
      into the domain of the I/O thread.
      
      The call event processor, now called from the I/O thread, then takes over
      the job of cranking the call state machine, processing incoming packets and
      transmitting DATA, ACK and ABORT packets.  In a future patch,
      rxrpc_send_ACK() will transmit the ACK on the spot rather than queuing it
      for later transmission.
      
      The call event processor becomes purely received-skb driven.  It only
      transmits things in response to events.  We use "pokes" to queue a dummy
      skb to make it do things like start/resume transmitting data.  Timer expiry
      also results in pokes.
      
      The connection event processor, becomes similar, though crypto events, such
      as dealing with CHALLENGE and RESPONSE packets is offloaded to a work item
      to avoid doing crypto in the I/O thread.
      
      The local event processor is removed and VERSION response packets are
      generated directly from the packet parser.  Similarly, ABORTs generated in
      response to protocol errors will be transmitted immediately rather than
      being pushed onto a queue for later transmission.
      
      Changes:
      ========
      ver #2)
       - Fix a couple of introduced lock context imbalances.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      5e6ef4f1
    • D
      rxrpc: Remove RCU from peer->error_targets list · 29fb4ec3
      David Howells 提交于
      Remove the RCU requirements from the peer's list of error targets so that
      the error distributor can call sleeping functions.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      29fb4ec3
    • D
      rxrpc: Move DATA transmission into call processor work item · cf37b598
      David Howells 提交于
      Move DATA transmission into the call processor work item.  In a future
      patch, this will be called from the I/O thread rather than being itsown
      work item.
      
      This will allow DATA transmission to be driven directly by incoming ACKs,
      pokes and timers as those are processed.
      
      The Tx queue is also split: The queue of packets prepared by sendmsg is now
      places in call->tx_sendmsg and the packet dispatcher decants the packets
      into call->tx_buffer as space becomes available in the transmission
      window.  This allows sendmsg to run ahead of the available space to try and
      prevent an underflow in transmission.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      cf37b598
    • D
      rxrpc: Copy client call parameters into rxrpc_call earlier · f3441d41
      David Howells 提交于
      Copy client call parameters into rxrpc_call earlier so that that can be
      used to convey them to the connection code - which can then be offloaded to
      the I/O thread.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      f3441d41
    • D
      rxrpc: Implement a mechanism to send an event notification to a call · 15f661dc
      David Howells 提交于
      Provide a means by which an event notification can be sent to a call such
      that the I/O thread can process it rather than it being done in a separate
      workqueue.  This will allow a lot of locking to be removed.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      15f661dc
    • D
      rxrpc: Remove call->input_lock · 4041a8ff
      David Howells 提交于
      Remove call->input_lock as it was only necessary to serialise access to the
      state stored in the rxrpc_call struct by simultaneous softirq handlers
      presenting received packets.  They now dump the packets in a queue and a
      single process-context handler now processes them.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      4041a8ff
    • D
      rxrpc: Move packet reception processing into I/O thread · 446b3e14
      David Howells 提交于
      Split the packet input handler to make the softirq side just dump the
      received packet into the local endpoint receive queue and then call the
      remainder of the input function from the I/O thread.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      446b3e14
    • D
      rxrpc: Don't hold a ref for call timer or workqueue · 3feda9d6
      David Howells 提交于
      Currently, rxrpc gives the call timer a ref on the call when it starts it
      and this is passed along to the workqueue by the timer expiration function.
      The problem comes when queue_work() fails (ie. the work item is already
      queued): the timer routine must put the ref - but this may cause the
      cleanup code to run.
      
      This has the unfortunate effect that the cleanup code may then be run in
      softirq context - which means that any spinlocks it might need to touch
      have to be guarded to disable softirqs (ie. they need a "_bh" suffix).
      
      Fix this by:
      
       (1) Don't give a ref to the timer.
      
       (2) Making the expiration function not do anything if the refcount is 0.
           Note that this is more of an optimisation.
      
       (3) Make sure that the cleanup routine waits for timer to complete.
      
      However, this has a consequence that timer cannot give a ref to the work
      item.  Therefore the following fixes are also necessary:
      
       (4) Don't give a ref to the work item.
      
       (5) Make the work item return asap if it sees the ref count is 0.
      
       (6) Make sure that the cleanup routine waits for the work item to
           complete.
      
      Unfortunately, neither the timer nor the work item can simply get around
      the problem by just using refcount_inc_not_zero() as the waits would still
      have to be done, and there would still be the possibility of having to put
      the ref in the expiration function.
      
      Note the call work item is going to go away with the work being transferred
      to the I/O thread, so the wait in (6) will become obsolete.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      3feda9d6
    • D
      rxrpc: trace: Don't use __builtin_return_address for sk_buff tracing · 9a36a6bc
      David Howells 提交于
      In rxrpc tracing, use enums to generate lists of points of interest rather
      than __builtin_return_address() for the sk_buff tracepoint.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      9a36a6bc
    • D
      rxrpc: trace: Don't use __builtin_return_address for rxrpc_call tracing · cb0fc0c9
      David Howells 提交于
      In rxrpc tracing, use enums to generate lists of points of interest rather
      than __builtin_return_address() for the rxrpc_call tracepoint
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      cb0fc0c9
    • D
      rxrpc: trace: Don't use __builtin_return_address for rxrpc_conn tracing · 7fa25105
      David Howells 提交于
      In rxrpc tracing, use enums to generate lists of points of interest rather
      than __builtin_return_address() for the rxrpc_conn tracepoint
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      7fa25105
    • D
      rxrpc: trace: Don't use __builtin_return_address for rxrpc_peer tracing · 47c810a7
      David Howells 提交于
      In rxrpc tracing, use enums to generate lists of points of interest rather
      than __builtin_return_address() for the rxrpc_peer tracepoint
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      47c810a7
    • D
      rxrpc: Drop rxrpc_conn_parameters from rxrpc_connection and rxrpc_bundle · 2cc80086
      David Howells 提交于
      Remove the rxrpc_conn_parameters struct from the rxrpc_connection and
      rxrpc_bundle structs and emplace the members directly.  These are going to
      get filled in from the rxrpc_call struct in future.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      2cc80086
    • D
      rxrpc: Remove the [_k]net() debugging macros · e969c92c
      David Howells 提交于
      Remove the _net() and knet() debugging macros in favour of tracepoints.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      e969c92c
  4. 09 11月, 2022 9 次提交
    • D
      rxrpc: Fix congestion management · 1fc4fa2a
      David Howells 提交于
      rxrpc has a problem in its congestion management in that it saves the
      congestion window size (cwnd) from one call to another, but if this is 0 at
      the time is saved, then the next call may not actually manage to ever
      transmit anything.
      
      To this end:
      
       (1) Don't save cwnd between calls, but rather reset back down to the
           initial cwnd and re-enter slow-start if data transmission is idle for
           more than an RTT.
      
       (2) Preserve ssthresh instead, as that is a handy estimate of pipe
           capacity.  Knowing roughly when to stop slow start and enter
           congestion avoidance can reduce the tendency to overshoot and drop
           larger amounts of packets when probing.
      
      In future, cwind growth also needs to be constrained when the window isn't
      being filled due to being application limited.
      Reported-by: NSimon Wilkinson <sxw@auristor.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      1fc4fa2a
    • D
      rxrpc: Remove the rxtx ring · 6869ddb8
      David Howells 提交于
      The Rx/Tx ring is no longer used, so remove it.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      6869ddb8
    • D
      rxrpc: Save last ACK's SACK table rather than marking txbufs · d57a3a15
      David Howells 提交于
      Improve the tracking of which packets need to be transmitted by saving the
      last ACK packet that we receive that has a populated soft-ACK table rather
      than marking packets.  Then we can step through the soft-ACK table and look
      at the packets we've transmitted beyond that to determine which packets we
      might want to retransmit.
      
      We also look at the highest serial number that has been acked to try and
      guess which packets we've transmitted the peer is likely to have seen.  If
      necessary, we send a ping to retrieve that number.
      
      One downside that might be a problem is that we can't then compare the
      previous acked/unacked state so easily in rxrpc_input_soft_acks() - which
      is a potential problem for the slow-start algorithm.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      d57a3a15
    • D
      rxrpc: Remove call->lock · 4e76bd40
      David Howells 提交于
      call->lock is no longer necessary, so remove it.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      4e76bd40
    • D
      rxrpc: Don't use a ring buffer for call Tx queue · a4ea4c47
      David Howells 提交于
      Change the way the Tx queueing works to make the following ends easier to
      achieve:
      
       (1) The filling of packets, the encryption of packets and the transmission
           of packets can be handled in parallel by separate threads, rather than
           rxrpc_sendmsg() allocating, filling, encrypting and transmitting each
           packet before moving onto the next one.
      
       (2) Get rid of the fixed-size ring which sets a hard limit on the number
           of packets that can be retained in the ring.  This allows the number
           of packets to increase without having to allocate a very large ring or
           having variable-sized rings.
      
           [Note: the downside of this is that it's then less efficient to locate
           a packet for retransmission as we then have to step through a list and
           examine each buffer in the list.]
      
       (3) Allow the filler/encrypter to run ahead of the transmission window.
      
       (4) Make it easier to do zero copy UDP from the packet buffers.
      
       (5) Make it easier to do zero copy from userspace to the packet buffers -
           and thence to UDP (only if for unauthenticated connections).
      
      To that end, the following changes are made:
      
       (1) Use the new rxrpc_txbuf struct instead of sk_buff for keeping packets
           to be transmitted in.  This allows them to be placed on multiple
           queues simultaneously.  An sk_buff isn't really necessary as it's
           never passed on to lower-level networking code.
      
       (2) Keep the transmissable packets in a linked list on the call struct
           rather than in a ring.  As a consequence, the annotation buffer isn't
           used either; rather a flag is set on the packet to indicate ackedness.
      
       (3) Use the RXRPC_CALL_TX_LAST flag to indicate that the last packet to be
           transmitted has been queued.  Add RXRPC_CALL_TX_ALL_ACKED to indicate
           that all packets up to and including the last got hard acked.
      
       (4) Wire headers are now stored in the txbuf rather than being concocted
           on the stack and they're stored immediately before the data, thereby
           allowing zerocopy of a single span.
      
       (5) Don't bother with instant-resend on transmission failure; rather,
           leave it for a timer or an ACK packet to trigger.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      a4ea4c47
    • D
      rxrpc: Get rid of the Rx ring · 5d7edbc9
      David Howells 提交于
      Get rid of the Rx ring and replace it with a pair of queues instead.  One
      queue gets the packets that are in-sequence and are ready for processing by
      recvmsg(); the other queue gets the out-of-sequence packets for addition to
      the first queue as the holes get filled.
      
      The annotation ring is removed and replaced with a SACK table.  The SACK
      table has the bits set that correspond exactly to the sequence number of
      the packet being acked.  The SACK ring is copied when an ACK packet is
      being assembled and rotated so that the first ACK is in byte 0.
      
      Flow control handling is altered so that packets that are moved to the
      in-sequence queue are hard-ACK'd even before they're consumed - and then
      the Rx window size in the ACK packet (rsize) is shrunk down to compensate
      (even going to 0 if the window is full).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      5d7edbc9
    • D
      rxrpc: Clean up ACK handling · 530403d9
      David Howells 提交于
      Clean up the rxrpc_propose_ACK() function.  If deferred PING ACK proposal
      is split out, it's only really needed for deferred DELAY ACKs.  All other
      ACKs, bar terminal IDLE ACK are sent immediately.  The deferred IDLE ACK
      submission can be handled by conversion of a DELAY ACK into an IDLE ACK if
      there's nothing to be SACK'd.
      
      Also, because there's a delay between an ACK being generated and being
      transmitted, it's possible that other ACKs of the same type will be
      generated during that interval.  Apart from the ACK time and the serial
      number responded to, most of the ACK body, including window and SACK
      parameters, are not filled out till the point of transmission - so we can
      avoid generating a new ACK if there's one pending that will cover the SACK
      data we need to convey.
      
      Therefore, don't propose a new DELAY or IDLE ACK for a call if there's one
      already pending.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      530403d9
    • D
      rxrpc: Remove call->tx_phase · a11e6ff9
      David Howells 提交于
      Remove call->tx_phase as it's only ever set.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      a11e6ff9
    • D
      rxrpc: Split call timer-expiration from call timer-set tracepoint · 334dfbfc
      David Howells 提交于
      Split the tracepoint for call timer-set to separate out the call
      timer-expiration event
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      334dfbfc
  5. 26 8月, 2022 1 次提交
    • D
      rxrpc: Fix locking in rxrpc's sendmsg · b0f571ec
      David Howells 提交于
      Fix three bugs in the rxrpc's sendmsg implementation:
      
       (1) rxrpc_new_client_call() should release the socket lock when returning
           an error from rxrpc_get_call_slot().
      
       (2) rxrpc_wait_for_tx_window_intr() will return without the call mutex
           held in the event that we're interrupted by a signal whilst waiting
           for tx space on the socket or relocking the call mutex afterwards.
      
           Fix this by: (a) moving the unlock/lock of the call mutex up to
           rxrpc_send_data() such that the lock is not held around all of
           rxrpc_wait_for_tx_window*() and (b) indicating to higher callers
           whether we're return with the lock dropped.  Note that this means
           recvmsg() will not block on this call whilst we're waiting.
      
       (3) After dropping and regaining the call mutex, rxrpc_send_data() needs
           to go and recheck the state of the tx_pending buffer and the
           tx_total_len check in case we raced with another sendmsg() on the same
           call.
      
      Thinking on this some more, it might make sense to have different locks for
      sendmsg() and recvmsg().  There's probably no need to make recvmsg() wait
      for sendmsg().  It does mean that recvmsg() can return MSG_EOR indicating
      that a call is dead before a sendmsg() to that call returns - but that can
      currently happen anyway.
      
      Without fix (2), something like the following can be induced:
      
      	WARNING: bad unlock balance detected!
      	5.16.0-rc6-syzkaller #0 Not tainted
      	-------------------------------------
      	syz-executor011/3597 is trying to release lock (&call->user_mutex) at:
      	[<ffffffff885163a3>] rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748
      	but there are no more locks to release!
      
      	other info that might help us debug this:
      	no locks held by syz-executor011/3597.
      	...
      	Call Trace:
      	 <TASK>
      	 __dump_stack lib/dump_stack.c:88 [inline]
      	 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
      	 print_unlock_imbalance_bug include/trace/events/lock.h:58 [inline]
      	 __lock_release kernel/locking/lockdep.c:5306 [inline]
      	 lock_release.cold+0x49/0x4e kernel/locking/lockdep.c:5657
      	 __mutex_unlock_slowpath+0x99/0x5e0 kernel/locking/mutex.c:900
      	 rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748
      	 rxrpc_sendmsg+0x420/0x630 net/rxrpc/af_rxrpc.c:561
      	 sock_sendmsg_nosec net/socket.c:704 [inline]
      	 sock_sendmsg+0xcf/0x120 net/socket.c:724
      	 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
      	 ___sys_sendmsg+0xf3/0x170 net/socket.c:2463
      	 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
      	 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      	 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
      	 entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      [Thanks to Hawkins Jiawei and Khalid Masum for their attempts to fix this]
      
      Fixes: bc5e3a54 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
      Reported-by: syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NMarc Dionne <marc.dionne@auristor.com>
      Tested-by: syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com
      cc: Hawkins Jiawei <yin31149@gmail.com>
      cc: Khalid Masum <khalid.masum.92@gmail.com>
      cc: Dan Carpenter <dan.carpenter@oracle.com>
      cc: linux-afs@lists.infradead.org
      Link: https://lore.kernel.org/r/166135894583.600315.7170979436768124075.stgit@warthog.procyon.org.ukSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      b0f571ec
  6. 23 5月, 2022 2 次提交
    • D
      rxrpc: Fix locking issue · ad25f5cb
      David Howells 提交于
      There's a locking issue with the per-netns list of calls in rxrpc.  The
      pieces of code that add and remove a call from the list use write_lock()
      and the calls procfile uses read_lock() to access it.  However, the timer
      callback function may trigger a removal by trying to queue a call for
      processing and finding that it's already queued - at which point it has a
      spare refcount that it has to do something with.  Unfortunately, if it puts
      the call and this reduces the refcount to 0, the call will be removed from
      the list.  Unfortunately, since the _bh variants of the locking functions
      aren't used, this can deadlock.
      
      ================================
      WARNING: inconsistent lock state
      5.18.0-rc3-build4+ #10 Not tainted
      --------------------------------
      inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      ksoftirqd/2/25 [HC0[0]:SC1[1]:HE1:SE0] takes:
      ffff888107ac4038 (&rxnet->call_lock){+.?.}-{2:2}, at: rxrpc_put_call+0x103/0x14b
      {SOFTIRQ-ON-W} state was registered at:
      ...
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&rxnet->call_lock);
        <Interrupt>
          lock(&rxnet->call_lock);
      
       *** DEADLOCK ***
      
      1 lock held by ksoftirqd/2/25:
       #0: ffff8881008ffdb0 ((&call->timer)){+.-.}-{0:0}, at: call_timer_fn+0x5/0x23d
      
      Changes
      =======
      ver #2)
       - Changed to using list_next_rcu() rather than rcu_dereference() directly.
      
      Fixes: 17926a79 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad25f5cb
    • D
      rxrpc: Use refcount_t rather than atomic_t · a0575429
      David Howells 提交于
      Move to using refcount_t rather than atomic_t for refcounts in rxrpc.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a0575429
  7. 31 3月, 2022 1 次提交
  8. 05 2月, 2021 1 次提交
    • D
      rxrpc: Fix clearance of Tx/Rx ring when releasing a call · 7b5eab57
      David Howells 提交于
      At the end of rxrpc_release_call(), rxrpc_cleanup_ring() is called to clear
      the Rx/Tx skbuff ring, but this doesn't lock the ring whilst it's accessing
      it.  Unfortunately, rxrpc_resend() might be trying to retransmit a packet
      concurrently with this - and whilst it does lock the ring, this isn't
      protection against rxrpc_cleanup_call().
      
      Fix this by removing the call to rxrpc_cleanup_ring() from
      rxrpc_release_call().  rxrpc_cleanup_ring() will be called again anyway
      from rxrpc_cleanup_call().  The earlier call is just an optimisation to
      recycle skbuffs more quickly.
      
      Alternative solutions include rxrpc_release_call() could try to cancel the
      work item or wait for it to complete or rxrpc_cleanup_ring() could lock
      when accessing the ring (which would require a bh lock).
      
      This can produce a report like the following:
      
        BUG: KASAN: use-after-free in rxrpc_send_data_packet+0x19b4/0x1e70 net/rxrpc/output.c:372
        Read of size 4 at addr ffff888011606e04 by task kworker/0:0/5
        ...
        Workqueue: krxrpcd rxrpc_process_call
        Call Trace:
         ...
         kasan_report.cold+0x79/0xd5 mm/kasan/report.c:413
         rxrpc_send_data_packet+0x19b4/0x1e70 net/rxrpc/output.c:372
         rxrpc_resend net/rxrpc/call_event.c:266 [inline]
         rxrpc_process_call+0x1634/0x1f60 net/rxrpc/call_event.c:412
         process_one_work+0x98d/0x15f0 kernel/workqueue.c:2275
         ...
      
        Allocated by task 2318:
         ...
         sock_alloc_send_pskb+0x793/0x920 net/core/sock.c:2348
         rxrpc_send_data+0xb51/0x2bf0 net/rxrpc/sendmsg.c:358
         rxrpc_do_sendmsg+0xc03/0x1350 net/rxrpc/sendmsg.c:744
         rxrpc_sendmsg+0x420/0x630 net/rxrpc/af_rxrpc.c:560
         ...
      
        Freed by task 2318:
         ...
         kfree_skb+0x140/0x3f0 net/core/skbuff.c:704
         rxrpc_free_skb+0x11d/0x150 net/rxrpc/skbuff.c:78
         rxrpc_cleanup_ring net/rxrpc/call_object.c:485 [inline]
         rxrpc_release_call+0x5dd/0x860 net/rxrpc/call_object.c:552
         rxrpc_release_calls_on_socket+0x21c/0x300 net/rxrpc/call_object.c:579
         rxrpc_release_sock net/rxrpc/af_rxrpc.c:885 [inline]
         rxrpc_release+0x263/0x5a0 net/rxrpc/af_rxrpc.c:916
         __sock_release+0xcd/0x280 net/socket.c:597
         ...
      
        The buggy address belongs to the object at ffff888011606dc0
         which belongs to the cache skbuff_head_cache of size 232
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Reported-by: syzbot+174de899852504e4a74a@syzkaller.appspotmail.com
      Reported-by: syzbot+3d1c772efafd3c38d007@syzkaller.appspotmail.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Hillf Danton <hdanton@sina.com>
      Link: https://lore.kernel.org/r/161234207610.653119.5287360098400436976.stgit@warthog.procyon.org.ukSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      7b5eab57
  9. 05 10月, 2020 1 次提交
    • D
      rxrpc: Fix accept on a connection that need securing · 2d914c1b
      David Howells 提交于
      When a new incoming call arrives at an userspace rxrpc socket on a new
      connection that has a security class set, the code currently pushes it onto
      the accept queue to hold a ref on it for the socket.  This doesn't work,
      however, as recvmsg() pops it off, notices that it's in the SERVER_SECURING
      state and discards the ref.  This means that the call runs out of refs too
      early and the kernel oopses.
      
      By contrast, a kernel rxrpc socket manually pre-charges the incoming call
      pool with calls that already have user call IDs assigned, so they are ref'd
      by the call tree on the socket.
      
      Change the mode of operation for userspace rxrpc server sockets to work
      like this too.  Although this is a UAPI change, server sockets aren't
      currently functional.
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      2d914c1b
  10. 09 9月, 2020 1 次提交
    • D
      rxrpc: Impose a maximum number of client calls · b7a7d674
      David Howells 提交于
      Impose a maximum on the number of client rxrpc calls that are allowed
      simultaneously.  This will be in lieu of a maximum number of client
      connections as this is easier to administed as, unlike connections, calls
      aren't reusable (to be changed in a subsequent patch)..
      
      This doesn't affect the limits on service calls and connections.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      b7a7d674
  11. 21 8月, 2020 1 次提交
    • D
      rxrpc: Fix loss of RTT samples due to interposed ACK · 4700c4d8
      David Howells 提交于
      The Rx protocol has a mechanism to help generate RTT samples that works by
      a client transmitting a REQUESTED-type ACK when it receives a DATA packet
      that has the REQUEST_ACK flag set.
      
      The peer, however, may interpose other ACKs before transmitting the
      REQUESTED-ACK, as can be seen in the following trace excerpt:
      
       rxrpc_tx_data: c=00000044 DATA d0b5ece8:00000001 00000001 q=00000001 fl=07
       rxrpc_rx_ack: c=00000044 00000001 PNG r=00000000 f=00000002 p=00000000 n=0
       rxrpc_rx_ack: c=00000044 00000002 REQ r=00000001 f=00000002 p=00000001 n=0
       ...
      
      DATA packet 1 (q=xx) has REQUEST_ACK set (bit 1 of fl=xx).  The incoming
      ping (labelled PNG) hard-acks the request DATA packet (f=xx exceeds the
      sequence number of the DATA packet), causing it to be discarded from the Tx
      ring.  The ACK that was requested (labelled REQ, r=xx references the serial
      of the DATA packet) comes after the ping, but the sk_buff holding the
      timestamp has gone and the RTT sample is lost.
      
      This is particularly noticeable on RPC calls used to probe the service
      offered by the peer.  A lot of peers end up with an unknown RTT because we
      only ever sent a single RPC.  This confuses the server rotation algorithm.
      
      Fix this by caching the information about the outgoing packet in RTT
      calculations in the rxrpc_call struct rather than looking in the Tx ring.
      
      A four-deep buffer is maintained and both REQUEST_ACK-flagged DATA and
      PING-ACK transmissions are recorded in there.  When the appropriate
      response ACK is received, the buffer is checked for a match and, if found,
      an RTT sample is recorded.
      
      If a received ACK refers to a packet with a later serial number than an
      entry in the cache, that entry is presumed lost and the entry is made
      available to record a new transmission.
      
      ACKs types other than REQUESTED-type and PING-type cause any matching
      sample to be cancelled as they don't necessarily represent a useful
      measurement.
      
      If there's no space in the buffer on ping/data transmission, the sample
      base is discarded.
      
      Fixes: 50235c4b ("rxrpc: Obtain RTT data by requesting ACKs on DATA packets")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      4700c4d8
  12. 31 7月, 2020 1 次提交
    • D
      rxrpc: Fix race between recvmsg and sendmsg on immediate call failure · 65550098
      David Howells 提交于
      There's a race between rxrpc_sendmsg setting up a call, but then failing to
      send anything on it due to an error, and recvmsg() seeing the call
      completion occur and trying to return the state to the user.
      
      An assertion fails in rxrpc_recvmsg() because the call has already been
      released from the socket and is about to be released again as recvmsg deals
      with it.  (The recvmsg_q queue on the socket holds a ref, so there's no
      problem with use-after-free.)
      
      We also have to be careful not to end up reporting an error twice, in such
      a way that both returns indicate to userspace that the user ID supplied
      with the call is no longer in use - which could cause the client to
      malfunction if it recycles the user ID fast enough.
      
      Fix this by the following means:
      
       (1) When sendmsg() creates a call after the point that the call has been
           successfully added to the socket, don't return any errors through
           sendmsg(), but rather complete the call and let recvmsg() retrieve
           them.  Make sendmsg() return 0 at this point.  Further calls to
           sendmsg() for that call will fail with ESHUTDOWN.
      
           Note that at this point, we haven't send any packets yet, so the
           server doesn't yet know about the call.
      
       (2) If sendmsg() returns an error when it was expected to create a new
           call, it means that the user ID wasn't used.
      
       (3) Mark the call disconnected before marking it completed to prevent an
           oops in rxrpc_release_call().
      
       (4) recvmsg() will then retrieve the error and set MSG_EOR to indicate
           that the user ID is no longer known by the kernel.
      
      An oops like the following is produced:
      
      	kernel BUG at net/rxrpc/recvmsg.c:605!
      	...
      	RIP: 0010:rxrpc_recvmsg+0x256/0x5ae
      	...
      	Call Trace:
      	 ? __init_waitqueue_head+0x2f/0x2f
      	 ____sys_recvmsg+0x8a/0x148
      	 ? import_iovec+0x69/0x9c
      	 ? copy_msghdr_from_user+0x5c/0x86
      	 ___sys_recvmsg+0x72/0xaa
      	 ? __fget_files+0x22/0x57
      	 ? __fget_light+0x46/0x51
      	 ? fdget+0x9/0x1b
      	 do_recvmmsg+0x15e/0x232
      	 ? _raw_spin_unlock+0xa/0xb
      	 ? vtime_delta+0xf/0x25
      	 __x64_sys_recvmmsg+0x2c/0x2f
      	 do_syscall_64+0x4c/0x78
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 357f5ef6 ("rxrpc: Call rxrpc_release_call() on error in rxrpc_new_client_call()")
      Reported-by: syzbot+b54969381df354936d96@syzkaller.appspotmail.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      65550098
  13. 14 3月, 2020 1 次提交
    • D
      rxrpc: Fix call interruptibility handling · e138aa7d
      David Howells 提交于
      Fix the interruptibility of kernel-initiated client calls so that they're
      either only interruptible when they're waiting for a call slot to come
      available or they're not interruptible at all.  Either way, they're not
      interruptible during transmission.
      
      This should help prevent StoreData calls from being interrupted when
      writeback is in progress.  It doesn't, however, handle interruption during
      the receive phase.
      
      Userspace-initiated calls are still interruptable.  After the signal has
      been handled, sendmsg() will return the amount of data copied out of the
      buffer and userspace can perform another sendmsg() call to continue
      transmission.
      
      Fixes: bc5e3a54 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      e138aa7d
  14. 07 2月, 2020 1 次提交
    • D
      rxrpc: Fix call RCU cleanup using non-bh-safe locks · 963485d4
      David Howells 提交于
      rxrpc_rcu_destroy_call(), which is called as an RCU callback to clean up a
      put call, calls rxrpc_put_connection() which, deep in its bowels, takes a
      number of spinlocks in a non-BH-safe way, including rxrpc_conn_id_lock and
      local->client_conns_lock.  RCU callbacks, however, are normally called from
      softirq context, which can cause lockdep to notice the locking
      inconsistency.
      
      To get lockdep to detect this, it's necessary to have the connection
      cleaned up on the put at the end of the last of its calls, though normally
      the clean up is deferred.  This can be induced, however, by starting a call
      on an AF_RXRPC socket and then closing the socket without reading the
      reply.
      
      Fix this by having rxrpc_rcu_destroy_call() punt the destruction to a
      workqueue if in softirq-mode and defer the destruction to process context.
      
      Note that another way to fix this could be to add a bunch of bh-disable
      annotations to the spinlocks concerned - and there might be more than just
      those two - but that means spending more time with BHs disabled.
      
      Note also that some of these places were covered by bh-disable spinlocks
      belonging to the rxrpc_transport object, but these got removed without the
      _bh annotation being retained on the next lock in.
      
      Fixes: 999b69f8 ("rxrpc: Kill the client connection bundle concept")
      Reported-by: syzbot+d82f3ac8d87e7ccbb2c9@syzkaller.appspotmail.com
      Reported-by: syzbot+3f1fd6b8cbf8702d134e@syzkaller.appspotmail.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Hillf Danton <hdanton@sina.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      963485d4
  15. 03 2月, 2020 1 次提交
    • D
      rxrpc: Fix NULL pointer deref due to call->conn being cleared on disconnect · 5273a191
      David Howells 提交于
      When a call is disconnected, the connection pointer from the call is
      cleared to make sure it isn't used again and to prevent further attempted
      transmission for the call.  Unfortunately, there might be a daemon trying
      to use it at the same time to transmit a packet.
      
      Fix this by keeping call->conn set, but setting a flag on the call to
      indicate disconnection instead.
      
      Remove also the bits in the transmission functions where the conn pointer is
      checked and a ref taken under spinlock as this is now redundant.
      
      Fixes: 8d94aa38 ("rxrpc: Calls shouldn't hold socket refs")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      5273a191
  16. 07 10月, 2019 1 次提交
    • D
      rxrpc: Fix call crypto state cleanup · 91fcfbe8
      David Howells 提交于
      Fix the cleanup of the crypto state on a call after the call has been
      disconnected.  As the call has been disconnected, its connection ref has
      been discarded and so we can't go through that to get to the security ops
      table.
      
      Fix this by caching the security ops pointer in the rxrpc_call struct and
      using that when freeing the call security state.  Also use this in other
      places we're dealing with call-specific security.
      
      The symptoms look like:
      
          BUG: KASAN: use-after-free in rxrpc_release_call+0xb2d/0xb60
          net/rxrpc/call_object.c:481
          Read of size 8 at addr ffff888062ffeb50 by task syz-executor.5/4764
      
      Fixes: 1db88c53 ("rxrpc: Fix -Wframe-larger-than= warnings from on-stack crypto")
      Reported-by: syzbot+eed305768ece6682bb7f@syzkaller.appspotmail.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      91fcfbe8