1. 03 8月, 2022 2 次提交
  2. 23 5月, 2022 1 次提交
    • D
      rxrpc, afs: Fix selection of abort codes · de696c47
      David Howells 提交于
      The RX_USER_ABORT code should really only be used to indicate that the user
      of the rxrpc service (ie. userspace) implicitly caused a call to be aborted
      - for instance if the AF_RXRPC socket is closed whilst the call was in
      progress.  (The user may also explicitly abort a call and specify the abort
      code to use).
      
      Change some of the points of generation to use other abort codes instead:
      
       (1) Abort the call with RXGEN_SS_UNMARSHAL or RXGEN_CC_UNMARSHAL if we see
           ENOMEM and EFAULT during received data delivery and abort with
           RX_CALL_DEAD in the default case.
      
       (2) Abort with RXGEN_SS_MARSHAL if we get ENOMEM whilst trying to send a
           reply.
      
       (3) Abort with RX_CALL_DEAD if we stop hearing from the peer if we had
           heard from the peer and abort with RX_CALL_TIMEOUT if we hadn't.
      
       (4) Abort with RX_CALL_DEAD if we try to disconnect a call that's not
           completed successfully or been aborted.
      Reported-by: NJeffrey Altman <jaltman@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de696c47
  3. 23 4月, 2021 3 次提交
  4. 24 8月, 2020 1 次提交
  5. 04 6月, 2020 2 次提交
    • D
      afs: Reorganise volume and server trees to be rooted on the cell · 20325960
      David Howells 提交于
      Reorganise afs_volume objects such that they're in a tree keyed on volume
      ID, rooted at on an afs_cell object rather than being in multiple trees,
      each of which is rooted on an afs_server object.
      
      afs_server structs become per-cell and acquire a pointer to the cell.
      
      The process of breaking a callback then starts with finding the server by
      its network address, following that to the cell and then looking up each
      volume ID in the volume tree.
      
      This is simpler than the afs_vol_interest/afs_cb_interest N:M mapping web
      and allows those structs and the code for maintaining them to be simplified
      or removed.
      
      It does make a couple of things a bit more tricky, though:
      
       (1) Operations now start with a volume, not a server, so there can be more
           than one answer as to whether or not the server we'll end up using
           supports the FS.InlineBulkStatus RPC.
      
       (2) CB RPC operations that specify the server UUID.  There's still a tree
           of servers by UUID on the afs_net struct, but the UUIDs in it aren't
           guaranteed unique.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      20325960
    • D
      afs: Build an abstraction around an "operation" concept · e49c7b2f
      David Howells 提交于
      Turn the afs_operation struct into the main way that most fileserver
      operations are managed.  Various things are added to the struct, including
      the following:
      
       (1) All the parameters and results of the relevant operations are moved
           into it, removing corresponding fields from the afs_call struct.
           afs_call gets a pointer to the op.
      
       (2) The target volume is made the main focus of the operation, rather than
           the target vnode(s), and a bunch of op->vnode->volume are made
           op->volume instead.
      
       (3) Two vnode records are defined (op->file[]) for the vnode(s) involved
           in most operations.  The vnode record (struct afs_vnode_param)
           contains:
      
      	- The vnode pointer.
      
      	- The fid of the vnode to be included in the parameters or that was
                returned in the reply (eg. FS.MakeDir).
      
      	- The status and callback information that may be returned in the
           	  reply about the vnode.
      
      	- Callback break and data version tracking for detecting
                simultaneous third-parth changes.
      
       (4) Pointers to dentries to be updated with new inodes.
      
       (5) An operations table pointer.  The table includes pointers to functions
           for issuing AFS and YFS-variant RPCs, handling the success and abort
           of an operation and handling post-I/O-lock local editing of a
           directory.
      
      To make this work, the following function restructuring is made:
      
       (A) The rotation loop that issues calls to fileservers that can be found
           in each function that wants to issue an RPC (such as afs_mkdir()) is
           extracted out into common code, in a new file called fs_operation.c.
      
       (B) The rotation loops, such as the one in afs_mkdir(), are replaced with
           a much smaller piece of code that allocates an operation, sets the
           parameters and then calls out to the common code to do the actual
           work.
      
       (C) The code for handling the success and failure of an operation are
           moved into operation functions (as (5) above) and these are called
           from the core code at appropriate times.
      
       (D) The pseudo inode getting stuff used by the dynamic root code is moved
           over into dynroot.c.
      
       (E) struct afs_iget_data is absorbed into the operation struct and
           afs_iget() expects to be given an op pointer and a vnode record.
      
       (F) Point (E) doesn't work for the root dir of a volume, but we know the
           FID in advance (it's always vnode 1, unique 1), so a separate inode
           getter, afs_root_iget(), is provided to special-case that.
      
       (G) The inode status init/update functions now also take an op and a vnode
           record.
      
       (H) The RPC marshalling functions now, for the most part, just take an
           afs_operation struct as their only argument.  All the data they need
           is held there.  The result delivery functions write their answers
           there as well.
      
       (I) The call is attached to the operation and then the operation core does
           the waiting.
      
      And then the new operation code is, for the moment, made to just initialise
      the operation, get the appropriate vnode I/O locks and do the same rotation
      loop as before.
      
      This lays the foundation for the following changes in the future:
      
       (*) Overhauling the rotation (again).
      
       (*) Support for asynchronous I/O, where the fileserver rotation must be
           done asynchronously also.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      e49c7b2f
  6. 31 5月, 2020 3 次提交
    • D
      afs: Remove the error argument from afs_protocol_error() · 7126ead9
      David Howells 提交于
      Remove the error argument from afs_protocol_error() as it's always
      -EBADMSG.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      7126ead9
    • D
      afs: Set error flag rather than return error from file status decode · 38355eec
      David Howells 提交于
      Set a flag in the call struct to indicate an unmarshalling error rather
      than return and handle an error from the decoding of file statuses.  This
      flag is checked on a successful return from the delivery function.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      38355eec
    • D
      afs: Split the usage count on struct afs_server · 977e5f8e
      David Howells 提交于
      Split the usage count on the afs_server struct to have an active count that
      registers who's actually using it separately from the reference count on
      the object.
      
      This allows a future patch to dispatch polling probes without advancing the
      "unuse" time into the future each time we emit a probe, which would
      otherwise prevent unused server records from expiring.
      
      Included in this:
      
       (1) The latter part of afs_destroy_server() in which the RCU destruction
           of afs_server objects is invoked and the outstanding server count is
           decremented is split out into __afs_put_server().
      
       (2) afs_put_server() now calls __afs_put_server() rather then setting the
           management timer.
      
       (3) The calls begun by afs_fs_give_up_all_callbacks() and
           afs_fs_get_capabilities() can now take a ref on the server record, so
           afs_destroy_server() can just drop its ref and needn't wait for the
           completion of these calls.  They'll put the ref when they're done.
      
       (4) Because of (3), afs_fs_probe_done() no longer needs to wake up
           afs_destroy_server() with server->probe_outstanding.
      
       (5) afs_gc_servers can be simplified.  It only needs to check if
           server->active is 0 rather than playing games with the refcount.
      
       (6) afs_manage_servers() can propose a server for gc if usage == 0 rather
           than if ref == 1.  The gc is effected by (5).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      977e5f8e
  7. 29 5月, 2020 1 次提交
  8. 14 3月, 2020 4 次提交
    • D
      afs: Fix client call Rx-phase signal handling · 7d7587db
      David Howells 提交于
      Fix the handling of signals in client rxrpc calls made by the afs
      filesystem.  Ignore signals completely, leaving call abandonment or
      connection loss to be detected by timeouts inside AF_RXRPC.
      
      Allowing a filesystem call to be interrupted after the entire request has
      been transmitted and an abort sent means that the server may or may not
      have done the action - and we don't know.  It may even be worse than that
      for older servers.
      
      Fixes: bc5e3a54 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      7d7587db
    • D
      afs: Fix handling of an abort from a service handler · dde9f095
      David Howells 提交于
      When an AFS service handler function aborts a call, AF_RXRPC marks the call
      as complete - which means that it's not going to get any more packets from
      the receiver.  This is a problem because reception of the final ACK is what
      triggers afs_deliver_to_call() to drop the final ref on the afs_call
      object.
      
      Instead, aborted AFS service calls may then just sit around waiting for
      ever or until they're displaced by a new call on the same connection
      channel or a connection-level abort.
      
      Fix this by calling afs_set_call_complete() to finalise the afs_call struct
      representing the call.
      
      However, we then need to drop the ref that stops the call from being
      deallocated.  We can do this in afs_set_call_complete(), as the work queue
      is holding a separate ref of its own, but then we shouldn't do it in
      afs_process_async_call() and afs_delete_async_call().
      
      call->drop_ref is set to indicate that a ref needs dropping for a call and
      this is dealt with when we transition a call to AFS_CALL_COMPLETE.
      
      But then we also need to get rid of the ref that pins an asynchronous
      client call.  We can do this by the same mechanism, setting call->drop_ref
      for an async client call too.
      
      We can also get rid of call->incoming since nothing ever sets it and only
      one thing ever checks it (futilely).
      
      
      A trace of the rxrpc_call and afs_call struct ref counting looks like:
      
                <idle>-0     [001] ..s5   164.764892: rxrpc_call: c=00000002 SEE u=3 sp=rxrpc_new_incoming_call+0x473/0xb34 a=00000000442095b5
                <idle>-0     [001] .Ns5   164.766001: rxrpc_call: c=00000002 QUE u=4 sp=rxrpc_propose_ACK+0xbe/0x551 a=00000000442095b5
                <idle>-0     [001] .Ns4   164.766005: rxrpc_call: c=00000002 PUT u=3 sp=rxrpc_new_incoming_call+0xa3f/0xb34 a=00000000442095b5
                <idle>-0     [001] .Ns7   164.766433: afs_call: c=00000002 WAKE  u=2 o=11 sp=rxrpc_notify_socket+0x196/0x33c
           kworker/1:2-1810  [001] ...1   164.768409: rxrpc_call: c=00000002 SEE u=3 sp=rxrpc_process_call+0x25/0x7ae a=00000000442095b5
           kworker/1:2-1810  [001] ...1   164.769439: rxrpc_tx_packet: c=00000002 e9f1a7a8:95786a88:00000008:09c5 00000001 00000000 02 22 ACK CallAck
           kworker/1:2-1810  [001] ...1   164.769459: rxrpc_call: c=00000002 PUT u=2 sp=rxrpc_process_call+0x74f/0x7ae a=00000000442095b5
           kworker/1:2-1810  [001] ...1   164.770794: afs_call: c=00000002 QUEUE u=3 o=12 sp=afs_deliver_to_call+0x449/0x72c
           kworker/1:2-1810  [001] ...1   164.770829: afs_call: c=00000002 PUT   u=2 o=12 sp=afs_process_async_call+0xdb/0x11e
           kworker/1:2-1810  [001] ...2   164.771084: rxrpc_abort: c=00000002 95786a88:00000008 s=0 a=1 e=1 K-1
           kworker/1:2-1810  [001] ...1   164.771461: rxrpc_tx_packet: c=00000002 e9f1a7a8:95786a88:00000008:09c5 00000002 00000000 04 00 ABORT CallAbort
           kworker/1:2-1810  [001] ...1   164.771466: afs_call: c=00000002 PUT   u=1 o=12 sp=SRXAFSCB_ProbeUuid+0xc1/0x106
      
      The abort generated in SRXAFSCB_ProbeUuid(), labelled "K-1", indicates that
      the local filesystem/cache manager didn't recognise the UUID as its own.
      
      Fixes: 2067b2b3 ("afs: Fix the CB.ProbeUuid service handler to reply correctly")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      dde9f095
    • D
      afs: Fix some tracing details · 4636cf18
      David Howells 提交于
      Fix a couple of tracelines to indicate the usage count after the atomic op,
      not the usage count before it to be consistent with other afs and rxrpc
      trace lines.
      
      Change the wording of the afs_call_trace_work trace ID label from "WORK" to
      "QUEUE" to reflect the fact that it's queueing work, not doing work.
      
      Fixes: 341f741f ("afs: Refcount the afs_call struct")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      4636cf18
    • D
      rxrpc: Fix call interruptibility handling · e138aa7d
      David Howells 提交于
      Fix the interruptibility of kernel-initiated client calls so that they're
      either only interruptible when they're waiting for a call slot to come
      available or they're not interruptible at all.  Either way, they're not
      interruptible during transmission.
      
      This should help prevent StoreData calls from being interrupted when
      writeback is in progress.  It doesn't, however, handle interruption during
      the receive phase.
      
      Userspace-initiated calls are still interruptable.  After the signal has
      been handled, sendmsg() will return the amount of data copied out of the
      buffer and userspace can perform another sendmsg() call to continue
      transmission.
      
      Fixes: bc5e3a54 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      e138aa7d
  9. 21 11月, 2019 1 次提交
  10. 20 11月, 2019 1 次提交
    • D
      afs: Fix missing timeout reset · c74386d5
      David Howells 提交于
      In afs_wait_for_call_to_complete(), rather than immediately aborting an
      operation if a signal occurs, the code attempts to wait for it to
      complete, using a schedule timeout of 2*RTT (or min 2 jiffies) and a
      check that we're still receiving relevant packets from the server before
      we consider aborting the call.  We may even ping the server to check on
      the status of the call.
      
      However, there's a missing timeout reset in the event that we do
      actually get a packet to process, such that if we then get a couple of
      short stalls, we then time out when progress is actually being made.
      
      Fix this by resetting the timeout any time we get something to process.
      If it's the failure of the call then the call state will get changed and
      we'll exit the loop shortly thereafter.
      
      A symptom of this is data fetches and stores failing with EINTR when
      they really shouldn't.
      
      Fixes: bc5e3a54 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c74386d5
  11. 21 6月, 2019 1 次提交
  12. 31 5月, 2019 1 次提交
  13. 16 5月, 2019 5 次提交
    • D
      afs: Always get the reply time · 4571577f
      David Howells 提交于
      Always ask for the reply time from AF_RXRPC as it's used to calculate the
      callback expiry time and lock expiry times, so it's needed by most FS
      operations.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      4571577f
    • D
      afs: Get rid of afs_call::reply[] · ffba718e
      David Howells 提交于
      Replace the afs_call::reply[] array with a bunch of typed members so that
      the compiler can use type-checking on them.  It's also easier for the eye
      to see what's going on.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      ffba718e
    • D
      afs: Make some RPC operations non-interruptible · 20b8391f
      David Howells 提交于
      Make certain RPC operations non-interruptible, including:
      
       (*) Set attributes
       (*) Store data
      
           We don't want to get interrupted during a flush on close, flush on
           unlock, writeback or an inode update, leaving us in a state where we
           still need to do the writeback or update.
      
       (*) Extend lock
       (*) Release lock
      
           We don't want to get lock extension interrupted as the file locks on
           the server are time-limited.  Interruption during lock release is less
           of an issue since the lock is time-limited, but it's better to
           complete the release to avoid a several-minute wait to recover it.
      
           *Setting* the lock isn't a problem if it's interrupted since we can
            just return to the user and tell them they were interrupted - at
            which point they can elect to retry.
      
       (*) Silly unlink
      
           We want to remove silly unlink files if we can, rather than leaving
           them for the salvager to clear up.
      
      Note that whilst these calls are no longer interruptible, they do have
      timeouts on them, so if the server stops responding the call will fail with
      something like ETIME or ECONNRESET.
      
      Without this, the following:
      
      	kAFS: Unexpected error from FS.StoreData -512
      
      appears in dmesg when a pending store data gets interrupted and some
      processes may just hang.
      
      Additionally, make the code that checks/updates the server record ignore
      failure due to interruption if the main call is uninterruptible and if the
      server has an address list.  The next op will check it again since the
      expiration time on the old list has past.
      
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Reported-by: NJonathan Billings <jsbillings@jsbillings.org>
      Reported-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      20b8391f
    • D
      rxrpc: Allow the kernel to mark a call as being non-interruptible · b960a34b
      David Howells 提交于
      Allow kernel services using AF_RXRPC to indicate that a call should be
      non-interruptible.  This allows kafs to make things like lock-extension and
      writeback data storage calls non-interruptible.
      
      If this is set, signals will be ignored for operations on that call where
      possible - such as waiting to get a call channel on an rxrpc connection.
      
      It doesn't prevent UDP sendmsg from being interrupted, but that will be
      handled by packet retransmission.
      
      rxrpc_kernel_recv_data() isn't affected by this since that never waits,
      preferring instead to return -EAGAIN and leave the waiting to the caller.
      
      Userspace initiated calls can't be set to be uninterruptible at this time.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      b960a34b
    • D
      afs: Fix the maximum lifespan of VL and probe calls · 94f699c9
      David Howells 提交于
      If an older AFS server doesn't support an operation, it may accept the call
      and then sit on it forever, happily responding to pings that make kafs
      think that the call is still alive.
      
      Fix this by setting the maximum lifespan of Volume Location service calls
      in particular and probe calls in general so that they don't run on
      endlessly if they're not supported.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      94f699c9
  14. 25 4月, 2019 1 次提交
    • D
      afs: Split wait from afs_make_call() · 0b9bf381
      David Howells 提交于
      Split the call to afs_wait_for_call_to_complete() from afs_make_call() to
      make it easier to handle asynchronous calls and to make it easier to
      convert a synchronous call to an asynchronous one in future, for instance
      when someone tries to interrupt an operation by pressing Ctrl-C.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      0b9bf381
  15. 13 4月, 2019 3 次提交
  16. 09 4月, 2019 1 次提交
    • G
      afs: Mark expected switch fall-throughs · e690c9e3
      Gustavo A. R. Silva 提交于
      In preparation to enabling -Wimplicit-fallthrough, mark switch cases
      where we are expecting to fall through.
      
      Notice that in many cases I placed a /* Fall through */ comment
      at the bottom of the case, which what GCC is expecting to find.
      
      In other cases I had to tweak a bit the format of the comments.
      
      This patch suppresses ALL missing-break-in-switch false positives
      in fs/afs
      
      Addresses-Coverity-ID: 115042 ("Missing break in switch")
      Addresses-Coverity-ID: 115043 ("Missing break in switch")
      Addresses-Coverity-ID: 115045 ("Missing break in switch")
      Addresses-Coverity-ID: 1357430 ("Missing break in switch")
      Addresses-Coverity-ID: 115047 ("Missing break in switch")
      Addresses-Coverity-ID: 115050 ("Missing break in switch")
      Addresses-Coverity-ID: 115051 ("Missing break in switch")
      Addresses-Coverity-ID: 1467806 ("Missing break in switch")
      Addresses-Coverity-ID: 1467807 ("Missing break in switch")
      Addresses-Coverity-ID: 1467811 ("Missing break in switch")
      Addresses-Coverity-ID: 115041 ("Missing break in switch")
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      e690c9e3
  17. 17 1月, 2019 2 次提交
    • D
      afs: Fix race in async call refcounting · 34fa4761
      David Howells 提交于
      There's a race between afs_make_call() and afs_wake_up_async_call() in the
      case that an error is returned from rxrpc_kernel_send_data() after it has
      queued the final packet.
      
      afs_make_call() will try and clean up the mess, but the call state may have
      been moved on thereby causing afs_process_async_call() to also try and to
      delete the call.
      
      Fix this by:
      
       (1) Getting an extra ref for an asynchronous call for the call itself to
           hold.  This makes sure the call doesn't evaporate on us accidentally
           and will allow the call to be retained by the caller in a future
           patch.  The ref is released on leaving afs_make_call() or
           afs_wait_for_call_to_complete().
      
       (2) In the event of an error from rxrpc_kernel_send_data():
      
           (a) Don't set the call state to AFS_CALL_COMPLETE until *after* the
           	 call has been aborted and ended.  This prevents
           	 afs_deliver_to_call() from doing anything with any notifications
           	 it gets.
      
           (b) Explicitly end the call immediately to prevent further callbacks.
      
           (c) Cancel any queued async_work and wait for the work if it's
           	 executing.  This allows us to be sure the race won't recur when we
           	 change the state.  We put the work queue's ref on the call if we
           	 managed to cancel it.
      
           (d) Put the call's ref that we got in (1).  This belongs to us as long
           	 as the call is in state AFS_CALL_CL_REQUESTING.
      
      Fixes: 341f741f ("afs: Refcount the afs_call struct")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      34fa4761
    • D
      afs: Provide a function to get a ref on a call · 7a75b007
      David Howells 提交于
      Provide a function to get a reference on an afs_call struct.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      7a75b007
  18. 16 11月, 2018 1 次提交
    • D
      rxrpc: Fix life check · 7150ceaa
      David Howells 提交于
      The life-checking function, which is used by kAFS to make sure that a call
      is still live in the event of a pending signal, only samples the received
      packet serial number counter; it doesn't actually provoke a change in the
      counter, rather relying on the server to happen to give us a packet in the
      time window.
      
      Fix this by adding a function to force a ping to be transmitted.
      
      kAFS then keeps track of whether there's been a stall, and if so, uses the
      new function to ping the server, resetting the timeout to allow the reply
      to come back.
      
      If there's a stall, a ping and the call is *still* stalled in the same
      place after another period, then the call will be aborted.
      
      Fixes: bc5e3a54 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
      Fixes: f4d15fb6 ("rxrpc: Provide functions for allowing cleaner handling of signals")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7150ceaa
  19. 24 10月, 2018 6 次提交