1. 03 8月, 2022 1 次提交
  2. 22 1月, 2022 1 次提交
  3. 16 10月, 2020 2 次提交
    • D
      afs: Add tracing for cell refcount and active user count · dca54a7b
      David Howells 提交于
      Add a tracepoint to log the cell refcount and active user count and pass in
      a reason code through various functions that manipulate these counters.
      
      Additionally, a helper function, afs_see_cell(), is provided to log
      interesting places that deal with a cell without actually doing any
      accounting directly.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      dca54a7b
    • D
      afs: Fix cell refcounting by splitting the usage counter · 88c853c3
      David Howells 提交于
      Management of the lifetime of afs_cell struct has some problems due to the
      usage counter being used to determine whether objects of that type are in
      use in addition to whether anyone might be interested in the structure.
      
      This is made trickier by cell objects being cached for a period of time in
      case they're quickly reused as they hold the result of a setup process that
      may be slow (DNS lookups, AFS RPC ops).
      
      Problems include the cached root volume from alias resolution pinning its
      parent cell record, rmmod occasionally hanging and occasionally producing
      assertion failures.
      
      Fix this by splitting the count of active users from the struct reference
      count.  Things then work as follows:
      
       (1) The cell cache keeps +1 on the cell's activity count and this has to
           be dropped before the cell can be removed.  afs_manage_cell() tries to
           exchange the 1 to a 0 with the cells_lock write-locked, and if
           successful, the record is removed from the net->cells.
      
       (2) One struct ref is 'owned' by the activity count.  That is put when the
           active count is reduced to 0 (final_destruction label).
      
       (3) A ref can be held on a cell whilst it is queued for management on a
           work queue without confusing the active count.  afs_queue_cell() is
           added to wrap this.
      
       (4) The queue's ref is dropped at the end of the management.  This is
           split out into a separate function, afs_manage_cell_work().
      
       (5) The root volume record is put after a cell is removed (at the
           final_destruction label) rather then in the RCU destruction routine.
      
       (6) Volumes hold struct refs, but aren't active users.
      
       (7) Both counts are displayed in /proc/net/afs/cells.
      
      There are some management function changes:
      
       (*) afs_put_cell() now just decrements the refcount and triggers the RCU
           destruction if it becomes 0.  It no longer sets a timer to have the
           manager do this.
      
       (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease
           the active count.  afs_unuse_cell() sets the management timer.
      
       (*) afs_queue_cell() is added to queue a cell with approprate refs.
      
      There are also some other fixes:
      
       (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL.
      
       (*) Make sure that candidate cells in lookups are properly destroyed
           rather than being simply kfree'd.  This ensures the bits it points to
           are destroyed also.
      
       (*) afs_dec_cells_outstanding() is now called in cell destruction rather
           than at "final_destruction".  This ensures that cell->net is still
           valid to the end of the destructor.
      
       (*) As a consequence of the previous two changes, move the increment of
           net->cells_outstanding that was at the point of insertion into the
           tree to the allocation routine to correctly balance things.
      
      Fixes: 989782dc ("afs: Overhaul cell database management")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      88c853c3
  4. 21 8月, 2020 2 次提交
  5. 09 6月, 2020 1 次提交
  6. 04 6月, 2020 3 次提交
    • D
      afs: Show more a bit more server state in /proc/net/afs/servers · 32275d3f
      David Howells 提交于
      Display more information about the state of a server record, including the
      flags, rtt and break counter plus the probe state for each server in
      /proc/net/afs/servers.
      
      Rearrange the server flags a bit to make them easier to read at a glance in
      the proc file.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      32275d3f
    • D
      afs: Reorganise volume and server trees to be rooted on the cell · 20325960
      David Howells 提交于
      Reorganise afs_volume objects such that they're in a tree keyed on volume
      ID, rooted at on an afs_cell object rather than being in multiple trees,
      each of which is rooted on an afs_server object.
      
      afs_server structs become per-cell and acquire a pointer to the cell.
      
      The process of breaking a callback then starts with finding the server by
      its network address, following that to the cell and then looking up each
      volume ID in the volume tree.
      
      This is simpler than the afs_vol_interest/afs_cb_interest N:M mapping web
      and allows those structs and the code for maintaining them to be simplified
      or removed.
      
      It does make a couple of things a bit more tricky, though:
      
       (1) Operations now start with a volume, not a server, so there can be more
           than one answer as to whether or not the server we'll end up using
           supports the FS.InlineBulkStatus RPC.
      
       (2) CB RPC operations that specify the server UUID.  There's still a tree
           of servers by UUID on the afs_net struct, but the UUIDs in it aren't
           guaranteed unique.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      20325960
    • D
      afs: Detect cell aliases 1 - Cells with root volumes · 8a070a96
      David Howells 提交于
      Put in the first phase of cell alias detection.  This part handles alias
      detection for cells that have root.cell volumes (which is expected to be
      likely).
      
      When a cell becomes newly active, it is probed for its root.cell volume,
      and if it has one, this volume is compared against other root.cell volumes
      to find out if the list of fileserver UUIDs have any in common - and if
      that's the case, do the address lists of those fileservers have any
      addresses in common.  If they do, the new cell is adjudged to be an alias
      of the old cell and the old cell is used instead.
      
      Comparing is aided by the server list in struct afs_server_list being
      sorted in UUID order and the addresses in the fileserver address lists
      being sorted in address order.
      
      The cell then retains the afs_volume object for the root.cell volume, even
      if it's not mounted for future alias checking.
      
      This necessary because:
      
       (1) Whilst fileservers have UUIDs that are meant to be globally unique, in
           practice they are not because cells get cloned without changing the
           UUIDs - so afs_server records need to be per cell.
      
       (2) Sometimes the DNS is used to make cell aliases - but if we don't know
           they're the same, we may end up with multiple superblocks and multiple
           afs_server records for the same thing, impairing our ability to
           deliver callback notifications of third party changes
      
       (3) The fileserver RPC API doesn't contain the cell name, so it can't tell
           us which cell it's notifying and can't see that a change made to to
           one cell should notify the same client that's also accessed as the
           other cell.
      Reported-by: NJeffrey Altman <jaltman@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      8a070a96
  7. 31 5月, 2020 2 次提交
    • D
      afs: Show more information in /proc/net/afs/servers · 6d043a57
      David Howells 提交于
      Show more information in /proc/net/afs/servers to make it easier to see
      what's going on with the server probing.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      6d043a57
    • D
      afs: Split the usage count on struct afs_server · 977e5f8e
      David Howells 提交于
      Split the usage count on the afs_server struct to have an active count that
      registers who's actually using it separately from the reference count on
      the object.
      
      This allows a future patch to dispatch polling probes without advancing the
      "unuse" time into the future each time we emit a probe, which would
      otherwise prevent unused server records from expiring.
      
      Included in this:
      
       (1) The latter part of afs_destroy_server() in which the RCU destruction
           of afs_server objects is invoked and the outstanding server count is
           decremented is split out into __afs_put_server().
      
       (2) afs_put_server() now calls __afs_put_server() rather then setting the
           management timer.
      
       (3) The calls begun by afs_fs_give_up_all_callbacks() and
           afs_fs_get_capabilities() can now take a ref on the server record, so
           afs_destroy_server() can just drop its ref and needn't wait for the
           completion of these calls.  They'll put the ref when they're done.
      
       (4) Because of (3), afs_fs_probe_done() no longer needs to wake up
           afs_destroy_server() with server->probe_outstanding.
      
       (5) afs_gc_servers can be simplified.  It only needs to check if
           server->active is 0 rather than playing games with the refcount.
      
       (6) afs_manage_servers() can propose a server for gc if usage == 0 rather
           than if ref == 1.  The gc is effected by (5).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      977e5f8e
  8. 12 12月, 2019 1 次提交
  9. 31 5月, 2019 1 次提交
  10. 16 5月, 2019 1 次提交
  11. 24 10月, 2018 4 次提交
    • D
      afs: Probe multiple fileservers simultaneously · 3bf0fb6f
      David Howells 提交于
      Send probes to all the unprobed fileservers in a fileserver list on all
      addresses simultaneously in an attempt to find out the fastest route whilst
      not getting stuck for 20s on any server or address that we don't get a
      reply from.
      
      This alleviates the problem whereby attempting to access a new server can
      take a long time because the rotation algorithm ends up rotating through
      all servers and addresses until it finds one that responds.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      3bf0fb6f
    • D
      afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS · 3b6492df
      David Howells 提交于
      Increase the sizes of the volume ID to 64 bits and the vnode ID (inode
      number equivalent) to 96 bits to allow the support of YFS.
      
      This requires the iget comparator to check the vnode->fid rather than i_ino
      and i_generation as i_ino is not sufficiently capacious.  It also requires
      this data to be placed into the vnode cache key for fscache.
      
      For the moment, just discard the top 32 bits of the vnode ID when returning
      it though stat.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      3b6492df
    • D
      afs: Fix TTL on VL server and address lists · ded2f4c5
      David Howells 提交于
      Currently the TTL on VL server and address lists isn't set in all
      circumstances and may be set to poor choices in others, since the TTL is
      derived from the SRV/AFSDB DNS record if and when available.
      
      Fix the TTL by limiting the range to a minimum and maximum from the current
      time.  At some point these can be made into sysctl knobs.  Further, use the
      TTL we obtained from the upcall to set the expiry on negative results too;
      in future a mechanism can be added to force reloading of such data.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      ded2f4c5
    • D
      afs: Implement VL server rotation · 0a5143f2
      David Howells 提交于
      Track VL servers as independent entities rather than lumping all their
      addresses together into one set and implement server-level rotation by:
      
       (1) Add the concept of a VL server list, where each server has its own
           separate address list.  This code is similar to the FS server list.
      
       (2) Use the DNS resolver to retrieve a set of servers and their associated
           addresses, ports, preference and weight ratings.
      
       (3) In the case of a legacy DNS resolver or an address list given directly
           through /proc/net/afs/cells, create a list containing just a dummy
           server record and attach all the addresses to that.
      
       (4) Implement a simple rotation policy, for the moment ignoring the
           priorities and weights assigned to the servers.
      
       (5) Show the address list through /proc/net/afs/<cell>/vlservers.  This
           also displays the source and status of the data as indicated by the
           upcall.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      0a5143f2
  12. 12 10月, 2018 1 次提交
    • D
      afs: Fix cell proc list · 6b3944e4
      David Howells 提交于
      Access to the list of cells by /proc/net/afs/cells has a couple of
      problems:
      
       (1) It should be checking against SEQ_START_TOKEN for the keying the
           header line.
      
       (2) It's only holding the RCU read lock, so it can't just walk over the
           list without following the proper RCU methods.
      
      Fix these by using an hlist instead of an ordinary list and using the
      appropriate accessor functions to follow it with RCU.
      
      Since the code that adds a cell to the list must also necessarily change,
      sort the list on insertion whilst we're at it.
      
      Fixes: 989782dc ("afs: Overhaul cell database management")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6b3944e4
  13. 08 9月, 2018 1 次提交
    • D
      afs: Fix cell specification to permit an empty address list · ecfe951f
      David Howells 提交于
      Fix the cell specification mechanism to allow cells to be pre-created
      without having to specify at least one address (the addresses will be
      upcalled for).
      
      This allows the cell information preload service to avoid the need to issue
      loads of DNS lookups during boot to get the addresses for each cell (500+
      lookups for the 'standard' cell list[*]).  The lookups can be done later as
      each cell is accessed through the filesystem.
      
      Also remove the print statement that prints a line every time a new cell is
      added.
      
      [*] There are 144 cells in the list.  Each cell is first looked up for an
          SRV record, and if that fails, for an AFSDB record.  These get a list
          of server names, each of which then has to be looked up to get the
          addresses for that server.  E.g.:
      
      	dig srv _afs3-vlserver._udp.grand.central.org
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ecfe951f
  14. 15 6月, 2018 1 次提交
  15. 23 5月, 2018 2 次提交
  16. 18 5月, 2018 4 次提交
  17. 16 5月, 2018 1 次提交
  18. 10 4月, 2018 8 次提交
    • D
      afs: Add stats for data transfer operations · 76a5cb6f
      David Howells 提交于
      Add statistics to /proc/fs/afs/stats for data transfer RPC operations.  New
      lines are added that look like:
      
      	file-rd : n=55794 nb=10252282150
      	file-wr : n=9789 nb=3247763645
      
      where n= indicates the number of ops completed and nb= indicates the number
      of bytes successfully transferred.  file-rd is the counts for read/fetch
      operations and file-wr the counts for write/store operations.
      
      Note that directory and symlink downloading are included in the file-rd
      stats at the moment.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      76a5cb6f
    • D
      afs: Locally edit directory data for mkdir/create/unlink/... · 63a4681f
      David Howells 提交于
      Locally edit the contents of an AFS directory upon a successful inode
      operation that modifies that directory (such as mkdir, create and unlink)
      so that we can avoid the current practice of re-downloading the directory
      after each change.
      
      This is viable provided that the directory version number we get back from
      the modifying RPC op is exactly incremented by 1 from what we had
      previously.  The data in the directory contents is in a defined format that
      we have to parse locally to perform lookups and readdir, so modifying isn't
      a problem.
      
      If the edit fails, we just clear the VALID flag on the directory and it
      will be reloaded next time it is needed.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      63a4681f
    • D
      afs: Fix directory handling · f3ddee8d
      David Howells 提交于
      AFS directories are structured blobs that are downloaded just like files
      and then parsed by the lookup and readdir code and, as such, are currently
      handled in the pagecache like any other file, with the entire directory
      content being thrown away each time the directory changes.
      
      However, since the blob is a known structure and since the data version
      counter on a directory increases by exactly one for each change committed
      to that directory, we can actually edit the directory locally rather than
      fetching it from the server after each locally-induced change.
      
      What we can't do, though, is mix data from the server and data from the
      client since the server is technically at liberty to rearrange or compress
      a directory if it sees fit, provided it updates the data version number
      when it does so and breaks the callback (ie. sends a notification).
      
      Further, lookup with lookup-ahead, readdir and, when it arrives, local
      editing are likely want to scan the whole of a directory.
      
      So directory handling needs to be improved to maintain the coherency of the
      directory blob prior to permitting local directory editing.
      
      To this end:
      
       (1) If any directory page gets discarded, invalidate and reread the entire
           directory.
      
       (2) If readpage notes that if when it fetches a single page that the
           version number has changed, the entire directory is flagged for
           invalidation.
      
       (3) Read as much of the directory in one go as we can.
      
      Note that this removes local caching of directories in fscache for the
      moment as we can't pass the pages to fscache_read_or_alloc_pages() since
      page->lru is in use by the LRU.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f3ddee8d
    • D
      afs: Introduce a statistics proc file · d55b4da4
      David Howells 提交于
      Introduce a proc file that displays a bunch of statistics for the AFS
      filesystem in the current network namespace.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      d55b4da4
    • D
      afs: Implement @cell substitution handling · 37ab6368
      David Howells 提交于
      Implement @cell substitution handling such that if @cell is seen as a name
      in a dynamic root mount, then the name of the root cell for that network
      namespace will be substituted for @cell during lookup.
      
      The substitution of @cell for the current net namespace is set by writing
      the cell name to /proc/fs/afs/rootcell.  The value can be obtained by
      reading the file.
      
      For example:
      
      	# mount -t afs none /kafs -o dyn
      	# echo grand.central.org >/proc/fs/afs/rootcell
      	# ls /kafs/@cell
      	archive/  cvs/  doc/  local/  project/  service/  software/  user/  www/
      	# cat /proc/fs/afs/rootcell
      	grand.central.org
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      37ab6368
    • D
      afs: Implement @sys substitution handling · 6f8880d8
      David Howells 提交于
      Implement the AFS feature by which @sys at the end of a pathname component
      may be substituted for one of a list of values, typically naming the
      operating system.  Up to 16 alternatives may be specified and these are
      tried in turn until one works.  Each network namespace has[*] a separate
      independent list.
      
      Upon creation of a new network namespace, the list of values is
      initialised[*] to a single OpenAFS-compatible string representing arch type
      plus "_linux26".  For example, on x86_64, the sysname is "amd64_linux26".
      
      [*] Or will, once network namespace support is finalised in kAFS.
      
      The list may be set by:
      
      	# for i in foo bar linux-x86_64; do echo $i; done >/proc/fs/afs/sysname
      
      for which separate writes to the same fd are amalgamated and applied on
      close.  The LF character may be used as a separator to specify multiple
      items in the same write() call.
      
      The list may be cleared by:
      
      	# echo >/proc/fs/afs/sysname
      
      and read by:
      
      	# cat /proc/fs/afs/sysname
      	foo
      	bar
      	linux-x86_64
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      6f8880d8
    • D
      afs: Don't over-increment the cell usage count when pinning it · 17814aef
      David Howells 提交于
      AFS cells that are added or set as the workstation cell through /proc are
      pinned against removal by setting the AFS_CELL_FL_NO_GC flag on them and
      taking a ref.  The ref should be only taken if the flag wasn't already set.
      
      Fix this by making it conditional.
      
      Without this an assertion failure will occur during module removal
      indicating that the refcount is too elevated.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      17814aef
    • D
      afs: Fix checker warnings · fe342cf7
      David Howells 提交于
      Fix warnings raised by checker, including:
      
       (*) Warnings raised by unequal comparison for the purposes of sorting,
           where the endianness doesn't matter:
      
      fs/afs/addr_list.c:246:21: warning: restricted __be16 degrades to integer
      fs/afs/addr_list.c:246:30: warning: restricted __be16 degrades to integer
      fs/afs/addr_list.c:248:21: warning: restricted __be32 degrades to integer
      fs/afs/addr_list.c:248:49: warning: restricted __be32 degrades to integer
      fs/afs/addr_list.c:283:21: warning: restricted __be16 degrades to integer
      fs/afs/addr_list.c:283:30: warning: restricted __be16 degrades to integer
      
       (*) afs_set_cb_interest() is not actually used and can be removed.
      
       (*) afs_cell_gc_delay() should be provided with a sysctl.
      
       (*) afs_cell_destroy() needs to use rcu_access_pointer() to read
           cell->vl_addrs.
      
       (*) afs_init_fs_cursor() should be static.
      
       (*) struct afs_vnode::permit_cache needs to be marked __rcu.
      
       (*) afs_server_rcu() needs to use rcu_access_pointer().
      
       (*) afs_destroy_server() should use rcu_access_pointer() on
           server->addresses as the server object is no longer accessible.
      
       (*) afs_find_server() casts __be16/__be32 values to int in order to
           directly compare them for the purpose of finding a match in a list,
           but is should also annotate the cast with __force to avoid checker
           warnings.
      
       (*) afs_check_permit() accesses vnode->permit_cache outside of the RCU
           readlock, though it doesn't then access the value; the extraneous
           access is deleted.
      
      False positives:
      
       (*) Conditional locking around the code in xdr_decode_AFSFetchStatus.  This
           can be dealt with in a separate patch.
      
      fs/afs/fsclient.c:148:9: warning: context imbalance in 'xdr_decode_AFSFetchStatus' - different lock contexts for basic block
      
       (*) Incorrect handling of seq-retry lock context balance:
      
      fs/afs/inode.c:455:38: warning: context imbalance in 'afs_getattr' - different
      lock contexts for basic block
      fs/afs/server.c:52:17: warning: context imbalance in 'afs_find_server' - different lock contexts for basic block
      fs/afs/server.c:128:17: warning: context imbalance in 'afs_find_server_by_uuid' - different lock contexts for basic block
      
      Errors:
      
       (*) afs_lookup_cell_rcu() needs to break out of the seq-retry loop, not go
           round again if it successfully found the workstation cell.
      
       (*) Fix UUID decode in afs_deliver_cb_probe_uuid().
      
       (*) afs_cache_permit() has a missing rcu_read_unlock() before one of the
           jumps to the someone_else_changed_it label.  Move the unlock to after
           the label.
      
       (*) afs_vl_get_addrs_u() is using ntohl() rather than htonl() when
           encoding to XDR.
      
       (*) afs_deliver_yfsvl_get_endpoints() is using htonl() rather than ntohl()
           when decoding from XDR.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      fe342cf7
  19. 13 11月, 2017 3 次提交
    • D
      afs: Overhaul volume and server record caching and fileserver rotation · d2ddc776
      David Howells 提交于
      The current code assumes that volumes and servers are per-cell and are
      never shared, but this is not enforced, and, indeed, public cells do exist
      that are aliases of each other.  Further, an organisation can, say, set up
      a public cell and a private cell with overlapping, but not identical, sets
      of servers.  The difference is purely in the database attached to the VL
      servers.
      
      The current code will malfunction if it sees a server in two cells as it
      assumes global address -> server record mappings and that each server is in
      just one cell.
      
      Further, each server may have multiple addresses - and may have addresses
      of different families (IPv4 and IPv6, say).
      
      To this end, the following structural changes are made:
      
       (1) Server record management is overhauled:
      
           (a) Server records are made independent of cell.  The namespace keeps
           	 track of them, volume records have lists of them and each vnode
           	 has a server on which its callback interest currently resides.
      
           (b) The cell record no longer keeps a list of servers known to be in
           	 that cell.
      
           (c) The server records are now kept in a flat list because there's no
           	 single address to sort on.
      
           (d) Server records are now keyed by their UUID within the namespace.
      
           (e) The addresses for a server are obtained with the VL.GetAddrsU
           	 rather than with VL.GetEntryByName, using the server's UUID as a
           	 parameter.
      
           (f) Cached server records are garbage collected after a period of
           	 non-use and are counted out of existence before purging is allowed
           	 to complete.  This protects the work functions against rmmod.
      
           (g) The servers list is now in /proc/fs/afs/servers.
      
       (2) Volume record management is overhauled:
      
           (a) An RCU-replaceable server list is introduced.  This tracks both
           	 servers and their coresponding callback interests.
      
           (b) The superblock is now keyed on cell record and numeric volume ID.
      
           (c) The volume record is now tied to the superblock which mounts it,
           	 and is activated when mounted and deactivated when unmounted.
           	 This makes it easier to handle the cache cookie without causing a
           	 double-use in fscache.
      
           (d) The volume record is loaded from the VLDB using VL.GetEntryByNameU
           	 to get the server UUID list.
      
           (e) The volume name is updated if it is seen to have changed when the
           	 volume is updated (the update is keyed on the volume ID).
      
       (3) The vlocation record is got rid of and VLDB records are no longer
           cached.  Sufficient information is stored in the volume record, though
           an update to a volume record is now no longer shared between related
           volumes (volumes come in bundles of three: R/W, R/O and backup).
      
      and the following procedural changes are made:
      
       (1) The fileserver cursor introduced previously is now fleshed out and
           used to iterate over fileservers and their addresses.
      
       (2) Volume status is checked during iteration, and the server list is
           replaced if a change is detected.
      
       (3) Server status is checked during iteration, and the address list is
           replaced if a change is detected.
      
       (4) The abort code is saved into the address list cursor and -ECONNABORTED
           returned in afs_make_call() if a remote abort happened rather than
           translating the abort into an error message.  This allows actions to
           be taken depending on the abort code more easily.
      
           (a) If a VMOVED abort is seen then this is handled by rechecking the
           	 volume and restarting the iteration.
      
           (b) If a VBUSY, VRESTARTING or VSALVAGING abort is seen then this is
               handled by sleeping for a short period and retrying and/or trying
               other servers that might serve that volume.  A message is also
               displayed once until the condition has cleared.
      
           (c) If a VOFFLINE abort is seen, then this is handled as VBUSY for the
           	 moment.
      
           (d) If a VNOVOL abort is seen, the volume is rechecked in the VLDB to
           	 see if it has been deleted; if not, the fileserver is probably
           	 indicating that the volume couldn't be attached and needs
           	 salvaging.
      
           (e) If statfs() sees one of these aborts, it does not sleep, but
           	 rather returns an error, so as not to block the umount program.
      
       (5) The fileserver iteration functions in vnode.c are now merged into
           their callers and more heavily macroised around the cursor.  vnode.c
           is removed.
      
       (6) Operations on a particular vnode are serialised on that vnode because
           the server will lock that vnode whilst it operates on it, so a second
           op sent will just have to wait.
      
       (7) Fileservers are probed with FS.GetCapabilities before being used.
           This is where service upgrade will be done.
      
       (8) A callback interest on a fileserver is set up before an FS operation
           is performed and passed through to afs_make_call() so that it can be
           set on the vnode if the operation returns a callback.  The callback
           interest is passed through to afs_iget() also so that it can be set
           there too.
      
      In general, record updating is done on an as-needed basis when we try to
      access servers, volumes or vnodes rather than offloading it to work items
      and special threads.
      
      Notes:
      
       (1) Pre AFS-3.4 servers are no longer supported, though this can be added
           back if necessary (AFS-3.4 was released in 1998).
      
       (2) VBUSY is retried forever for the moment at intervals of 1s.
      
       (3) /proc/fs/afs/<cell>/servers no longer exists.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      d2ddc776
    • D
      afs: Add an address list concept · 8b2a464c
      David Howells 提交于
      Add an RCU replaceable address list structure to hold a list of server
      addresses.  The list also holds the
      
      To this end:
      
       (1) A cell's VL server address list can be loaded directly via insmod or
           echo to /proc/fs/afs/cells or dynamically from a DNS query for AFSDB
           or SRV records.
      
       (2) Anyone wanting to use a cell's VL server address must wait until the
           cell record comes online and has tried to obtain some addresses.
      
       (3) An FS server's address list, for the moment, has a single entry that
           is the key to the server list.  This will change in the future when a
           server is instead keyed on its UUID and the VL.GetAddrsU operation is
           used.
      
       (4) An 'address cursor' concept is introduced to handle iteration through
           the address list.  This is passed to the afs_make_call() as, in the
           future, stuff (such as abort code) that doesn't outlast the call will
           be returned in it.
      
      In the future, we might want to annotate the list with information about
      how each address fares.  We might then want to propagate such annotations
      over address list replacement.
      
      Whilst we're at it, we allow IPv6 addresses to be specified in
      colon-delimited lists by enclosing them in square brackets.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      8b2a464c
    • D
      afs: Overhaul cell database management · 989782dc
      David Howells 提交于
      Overhaul the way that the in-kernel AFS client keeps track of cells in the
      following manner:
      
       (1) Cells are now held in an rbtree to make walking them quicker and RCU
           managed (though this is probably overkill).
      
       (2) Cells now have a manager work item that:
      
           (A) Looks after fetching and refreshing the VL server list.
      
           (B) Manages cell record lifetime, including initialising and
           	 destruction.
      
           (B) Manages cell record caching whereby threads are kept around for a
           	 certain time after last use and then destroyed.
      
           (C) Manages the FS-Cache index cookie for a cell.  It is not permitted
           	 for a cookie to be in use twice, so we have to be careful to not
           	 allow a new cell record to exist at the same time as an old record
           	 of the same name.
      
       (3) Each AFS network namespace is given a manager work item that manages
           the cells within it, maintaining a single timer to prod cells into
           updating their DNS records.
      
           This uses the reduce_timer() facility to make the timer expire at the
           soonest timed event that needs happening.
      
       (4) When a module is being unloaded, cells and cell managers are now
           counted out using dec_after_work() to make sure the module text is
           pinned until after the data structures have been cleaned up.
      
       (5) Each cell's VL server list is now protected by a seqlock rather than a
           semaphore.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      989782dc