1. 28 10月, 2020 1 次提交
  2. 04 6月, 2020 1 次提交
    • D
      afs: Build an abstraction around an "operation" concept · e49c7b2f
      David Howells 提交于
      Turn the afs_operation struct into the main way that most fileserver
      operations are managed.  Various things are added to the struct, including
      the following:
      
       (1) All the parameters and results of the relevant operations are moved
           into it, removing corresponding fields from the afs_call struct.
           afs_call gets a pointer to the op.
      
       (2) The target volume is made the main focus of the operation, rather than
           the target vnode(s), and a bunch of op->vnode->volume are made
           op->volume instead.
      
       (3) Two vnode records are defined (op->file[]) for the vnode(s) involved
           in most operations.  The vnode record (struct afs_vnode_param)
           contains:
      
      	- The vnode pointer.
      
      	- The fid of the vnode to be included in the parameters or that was
                returned in the reply (eg. FS.MakeDir).
      
      	- The status and callback information that may be returned in the
           	  reply about the vnode.
      
      	- Callback break and data version tracking for detecting
                simultaneous third-parth changes.
      
       (4) Pointers to dentries to be updated with new inodes.
      
       (5) An operations table pointer.  The table includes pointers to functions
           for issuing AFS and YFS-variant RPCs, handling the success and abort
           of an operation and handling post-I/O-lock local editing of a
           directory.
      
      To make this work, the following function restructuring is made:
      
       (A) The rotation loop that issues calls to fileservers that can be found
           in each function that wants to issue an RPC (such as afs_mkdir()) is
           extracted out into common code, in a new file called fs_operation.c.
      
       (B) The rotation loops, such as the one in afs_mkdir(), are replaced with
           a much smaller piece of code that allocates an operation, sets the
           parameters and then calls out to the common code to do the actual
           work.
      
       (C) The code for handling the success and failure of an operation are
           moved into operation functions (as (5) above) and these are called
           from the core code at appropriate times.
      
       (D) The pseudo inode getting stuff used by the dynamic root code is moved
           over into dynroot.c.
      
       (E) struct afs_iget_data is absorbed into the operation struct and
           afs_iget() expects to be given an op pointer and a vnode record.
      
       (F) Point (E) doesn't work for the root dir of a volume, but we know the
           FID in advance (it's always vnode 1, unique 1), so a separate inode
           getter, afs_root_iget(), is provided to special-case that.
      
       (G) The inode status init/update functions now also take an op and a vnode
           record.
      
       (H) The RPC marshalling functions now, for the most part, just take an
           afs_operation struct as their only argument.  All the data they need
           is held there.  The result delivery functions write their answers
           there as well.
      
       (I) The call is attached to the operation and then the operation core does
           the waiting.
      
      And then the new operation code is, for the moment, made to just initialise
      the operation, get the appropriate vnode I/O locks and do the same rotation
      loop as before.
      
      This lays the foundation for the following changes in the future:
      
       (*) Overhauling the rotation (again).
      
       (*) Support for asynchronous I/O, where the fileserver rotation must be
           done asynchronously also.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      e49c7b2f
  3. 31 5月, 2020 1 次提交
  4. 21 11月, 2019 1 次提交
  5. 24 5月, 2019 1 次提交
  6. 16 5月, 2019 4 次提交
    • D
      afs: Fix application of status and callback to be under same lock · a58823ac
      David Howells 提交于
      When applying the status and callback in the response of an operation,
      apply them in the same critical section so that there's no race between
      checking the callback state and checking status-dependent state (such as
      the data version).
      
      Fix this by:
      
       (1) Allocating a joint {status,callback} record (afs_status_cb) before
           calling the RPC function for each vnode for which the RPC reply
           contains a status or a status plus a callback.  A flag is set in the
           record to indicate if a callback was actually received.
      
       (2) These records are passed into the RPC functions to be filled in.  The
           afs_decode_status() and yfs_decode_status() functions are removed and
           the cb_lock is no longer taken.
      
       (3) xdr_decode_AFSFetchStatus() and xdr_decode_YFSFetchStatus() no longer
           update the vnode.
      
       (4) xdr_decode_AFSCallBack() and xdr_decode_YFSCallBack() no longer update
           the vnode.
      
       (5) vnodes, expected data-version numbers and callback break counters
           (cb_break) no longer need to be passed to the reply delivery
           functions.
      
           Note that, for the moment, the file locking functions still need
           access to both the call and the vnode at the same time.
      
       (6) afs_vnode_commit_status() is now given the cb_break value and the
           expected data_version and the task of applying the status and the
           callback to the vnode are now done here.
      
           This is done under a single taking of vnode->cb_lock.
      
       (7) afs_pages_written_back() is now called by afs_store_data() rather than
           by the reply delivery function.
      
           afs_pages_written_back() has been moved to before the call point and
           is now given the first and last page numbers rather than a pointer to
           the call.
      
       (8) The indicator from YFS.RemoveFile2 as to whether the target file
           actually got removed (status.abort_code == VNOVNODE) rather than
           merely dropping a link is now checked in afs_unlink rather than in
           xdr_decode_YFSFetchStatus().
      
      Supplementary fixes:
      
       (*) afs_cache_permit() now gets the caller_access mask from the
           afs_status_cb object rather than picking it out of the vnode's status
           record.  afs_fetch_status() returns caller_access through its argument
           list for this purpose also.
      
       (*) afs_inode_init_from_status() now uses a write lock on cb_lock rather
           than a read lock and now sets the callback inside the same critical
           section.
      
      Fixes: c435ee34 ("afs: Overhaul the callback handling")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a58823ac
    • D
      afs: Make some RPC operations non-interruptible · 20b8391f
      David Howells 提交于
      Make certain RPC operations non-interruptible, including:
      
       (*) Set attributes
       (*) Store data
      
           We don't want to get interrupted during a flush on close, flush on
           unlock, writeback or an inode update, leaving us in a state where we
           still need to do the writeback or update.
      
       (*) Extend lock
       (*) Release lock
      
           We don't want to get lock extension interrupted as the file locks on
           the server are time-limited.  Interruption during lock release is less
           of an issue since the lock is time-limited, but it's better to
           complete the release to avoid a several-minute wait to recover it.
      
           *Setting* the lock isn't a problem if it's interrupted since we can
            just return to the user and tell them they were interrupted - at
            which point they can elect to retry.
      
       (*) Silly unlink
      
           We want to remove silly unlink files if we can, rather than leaving
           them for the salvager to clear up.
      
      Note that whilst these calls are no longer interruptible, they do have
      timeouts on them, so if the server stops responding the call will fail with
      something like ETIME or ECONNRESET.
      
      Without this, the following:
      
      	kAFS: Unexpected error from FS.StoreData -512
      
      appears in dmesg when a pending store data gets interrupted and some
      processes may just hang.
      
      Additionally, make the code that checks/updates the server record ignore
      failure due to interruption if the main call is uninterruptible and if the
      server has an address list.  The next op will check it again since the
      expiration time on the old list has past.
      
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Reported-by: NJonathan Billings <jsbillings@jsbillings.org>
      Reported-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      20b8391f
    • D
      afs: Fix afs_xattr_get_yfs() to not try freeing an error value · 773e0c40
      David Howells 提交于
      afs_xattr_get_yfs() tries to free yacl, which may hold an error value (say
      if yfs_fs_fetch_opaque_acl() failed and returned an error).
      
      Fix this by allocating yacl up front (since it's a fixed-length struct,
      unlike afs_acl) and passing it in to the RPC function.  This also allows
      the flags to be placed in the object rather than passing them through to
      the RPC function.
      
      Fixes: ae46578b ("afs: Get YFS ACLs and information through xattrs")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      773e0c40
    • D
      afs: Fix incorrect error handling in afs_xattr_get_acl() · cc1dd5c8
      David Howells 提交于
      Fix incorrect error handling in afs_xattr_get_acl() where there appears to
      be a redundant assignment before return, but in fact the return should be a
      goto to the error handling at the end of the function.
      
      Fixes: 260f082b ("afs: Get an AFS3 ACL as an xattr")
      Addresses-Coverity: ("Unused Value")
      Reported-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Joe Perches <joe@perches.com>
      cc1dd5c8
  7. 07 5月, 2019 6 次提交
    • D
      afs: Implement YFS ACL setting · f5e45463
      David Howells 提交于
      Implement the setting of YFS ACLs in AFS through the interface of setting
      the afs.yfs.acl extended attribute on the file.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f5e45463
    • D
      afs: Get YFS ACLs and information through xattrs · ae46578b
      David Howells 提交于
      The YFS/AuriStor variant of AFS provides more capable ACLs and provides
      per-volume ACLs and per-file ACLs as well as per-directory ACLs.  It also
      provides some extra information that can be retrieved through four ACLs:
      
       (1) afs.yfs.acl
      
           The YFS file ACL (not the same format as afs.acl).
      
       (2) afs.yfs.vol_acl
      
           The YFS volume ACL.
      
       (3) afs.yfs.acl_inherited
      
           "1" if a file's ACL is inherited from its parent directory, "0"
           otherwise.
      
       (4) afs.yfs.acl_num_cleaned
      
           The number of of ACEs removed from the ACL by the server because the
           PT entries were removed from the PTS database (ie. the subject is no
           longer known).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      ae46578b
    • J
      afs: implement acl setting · b10494af
      Joe Gorse 提交于
      Implements the setting of ACLs in AFS by means of setting the
      afs.acl extended attribute on the file.
      Signed-off-by: NJoe Gorse <jhgorse@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      b10494af
    • D
      afs: Get an AFS3 ACL as an xattr · 260f082b
      David Howells 提交于
      Implement an xattr on AFS files called "afs.acl" that retrieves a file's
      ACL.  It returns the raw AFS3 ACL from the result of calling FS.FetchACL,
      leaving any interpretation to userspace.
      
      Note that whilst YFS servers will respond to FS.FetchACL, this will render
      a more-advanced YFS ACL down.  Use "afs.yfs.acl" instead for that.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      260f082b
    • D
      afs: Fix getting the afs.fid xattr · a2f611a3
      David Howells 提交于
      The AFS3 FID is three 32-bit unsigned numbers and is represented as three
      up-to-8-hex-digit numbers separated by colons to the afs.fid xattr.
      However, with the advent of support for YFS, the FID is now a 64-bit volume
      number, a 96-bit vnode/inode number and a 32-bit uniquifier (as before).
      Whilst the sprintf in afs_xattr_get_fid() has been partially updated (it
      currently ignores the upper 32 bits of the 96-bit vnode number), the size
      of the stack-based buffer has not been increased to match, thereby allowing
      stack corruption to occur.
      
      Fix this by increasing the buffer size appropriately and conditionally
      including the upper part of the vnode number if it is non-zero.  The latter
      requires the lower part to be zero-padded if the upper part is non-zero.
      
      Fixes: 3b6492df ("afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a2f611a3
    • D
      afs: Fix the afs.cell and afs.volume xattr handlers · c73aa410
      David Howells 提交于
      Fix the ->get handlers for the afs.cell and afs.volume xattrs to pass the
      source data size to memcpy() rather than target buffer size.
      
      Overcopying the source data occasionally causes the kernel to oops.
      
      Fixes: d3e3b7ea ("afs: Add metadata xattrs")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      c73aa410
  8. 24 10月, 2018 1 次提交
    • D
      afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS · 3b6492df
      David Howells 提交于
      Increase the sizes of the volume ID to 64 bits and the vnode ID (inode
      number equivalent) to 96 bits to allow the support of YFS.
      
      This requires the iget comparator to check the vnode->fid rather than i_ino
      and i_generation as i_ino is not sufficiently capacious.  It also requires
      this data to be placed into the vnode cache key for fscache.
      
      For the moment, just discard the top 32 bits of the vnode ID when returning
      it though stat.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      3b6492df
  9. 13 11月, 2017 2 次提交
    • D
      afs: Overhaul volume and server record caching and fileserver rotation · d2ddc776
      David Howells 提交于
      The current code assumes that volumes and servers are per-cell and are
      never shared, but this is not enforced, and, indeed, public cells do exist
      that are aliases of each other.  Further, an organisation can, say, set up
      a public cell and a private cell with overlapping, but not identical, sets
      of servers.  The difference is purely in the database attached to the VL
      servers.
      
      The current code will malfunction if it sees a server in two cells as it
      assumes global address -> server record mappings and that each server is in
      just one cell.
      
      Further, each server may have multiple addresses - and may have addresses
      of different families (IPv4 and IPv6, say).
      
      To this end, the following structural changes are made:
      
       (1) Server record management is overhauled:
      
           (a) Server records are made independent of cell.  The namespace keeps
           	 track of them, volume records have lists of them and each vnode
           	 has a server on which its callback interest currently resides.
      
           (b) The cell record no longer keeps a list of servers known to be in
           	 that cell.
      
           (c) The server records are now kept in a flat list because there's no
           	 single address to sort on.
      
           (d) Server records are now keyed by their UUID within the namespace.
      
           (e) The addresses for a server are obtained with the VL.GetAddrsU
           	 rather than with VL.GetEntryByName, using the server's UUID as a
           	 parameter.
      
           (f) Cached server records are garbage collected after a period of
           	 non-use and are counted out of existence before purging is allowed
           	 to complete.  This protects the work functions against rmmod.
      
           (g) The servers list is now in /proc/fs/afs/servers.
      
       (2) Volume record management is overhauled:
      
           (a) An RCU-replaceable server list is introduced.  This tracks both
           	 servers and their coresponding callback interests.
      
           (b) The superblock is now keyed on cell record and numeric volume ID.
      
           (c) The volume record is now tied to the superblock which mounts it,
           	 and is activated when mounted and deactivated when unmounted.
           	 This makes it easier to handle the cache cookie without causing a
           	 double-use in fscache.
      
           (d) The volume record is loaded from the VLDB using VL.GetEntryByNameU
           	 to get the server UUID list.
      
           (e) The volume name is updated if it is seen to have changed when the
           	 volume is updated (the update is keyed on the volume ID).
      
       (3) The vlocation record is got rid of and VLDB records are no longer
           cached.  Sufficient information is stored in the volume record, though
           an update to a volume record is now no longer shared between related
           volumes (volumes come in bundles of three: R/W, R/O and backup).
      
      and the following procedural changes are made:
      
       (1) The fileserver cursor introduced previously is now fleshed out and
           used to iterate over fileservers and their addresses.
      
       (2) Volume status is checked during iteration, and the server list is
           replaced if a change is detected.
      
       (3) Server status is checked during iteration, and the address list is
           replaced if a change is detected.
      
       (4) The abort code is saved into the address list cursor and -ECONNABORTED
           returned in afs_make_call() if a remote abort happened rather than
           translating the abort into an error message.  This allows actions to
           be taken depending on the abort code more easily.
      
           (a) If a VMOVED abort is seen then this is handled by rechecking the
           	 volume and restarting the iteration.
      
           (b) If a VBUSY, VRESTARTING or VSALVAGING abort is seen then this is
               handled by sleeping for a short period and retrying and/or trying
               other servers that might serve that volume.  A message is also
               displayed once until the condition has cleared.
      
           (c) If a VOFFLINE abort is seen, then this is handled as VBUSY for the
           	 moment.
      
           (d) If a VNOVOL abort is seen, the volume is rechecked in the VLDB to
           	 see if it has been deleted; if not, the fileserver is probably
           	 indicating that the volume couldn't be attached and needs
           	 salvaging.
      
           (e) If statfs() sees one of these aborts, it does not sleep, but
           	 rather returns an error, so as not to block the umount program.
      
       (5) The fileserver iteration functions in vnode.c are now merged into
           their callers and more heavily macroised around the cursor.  vnode.c
           is removed.
      
       (6) Operations on a particular vnode are serialised on that vnode because
           the server will lock that vnode whilst it operates on it, so a second
           op sent will just have to wait.
      
       (7) Fileservers are probed with FS.GetCapabilities before being used.
           This is where service upgrade will be done.
      
       (8) A callback interest on a fileserver is set up before an FS operation
           is performed and passed through to afs_make_call() so that it can be
           set on the vnode if the operation returns a callback.  The callback
           interest is passed through to afs_iget() also so that it can be set
           there too.
      
      In general, record updating is done on an as-needed basis when we try to
      access servers, volumes or vnodes rather than offloading it to work items
      and special threads.
      
      Notes:
      
       (1) Pre AFS-3.4 servers are no longer supported, though this can be added
           back if necessary (AFS-3.4 was released in 1998).
      
       (2) VBUSY is retried forever for the moment at intervals of 1s.
      
       (3) /proc/fs/afs/<cell>/servers no longer exists.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      d2ddc776
    • D
      afs: Overhaul cell database management · 989782dc
      David Howells 提交于
      Overhaul the way that the in-kernel AFS client keeps track of cells in the
      following manner:
      
       (1) Cells are now held in an rbtree to make walking them quicker and RCU
           managed (though this is probably overkill).
      
       (2) Cells now have a manager work item that:
      
           (A) Looks after fetching and refreshing the VL server list.
      
           (B) Manages cell record lifetime, including initialising and
           	 destruction.
      
           (B) Manages cell record caching whereby threads are kept around for a
           	 certain time after last use and then destroyed.
      
           (C) Manages the FS-Cache index cookie for a cell.  It is not permitted
           	 for a cookie to be in use twice, so we have to be careful to not
           	 allow a new cell record to exist at the same time as an old record
           	 of the same name.
      
       (3) Each AFS network namespace is given a manager work item that manages
           the cells within it, maintaining a single timer to prod cells into
           updating their DNS records.
      
           This uses the reduce_timer() facility to make the timer expire at the
           soonest timed event that needs happening.
      
       (4) When a module is being unloaded, cells and cell managers are now
           counted out using dec_after_work() to make sure the module text is
           pinned until after the data structures have been cleaned up.
      
       (5) Each cell's VL server list is now protected by a seqlock rather than a
           semaphore.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      989782dc
  10. 10 7月, 2017 1 次提交
    • D
      afs: Add metadata xattrs · d3e3b7ea
      David Howells 提交于
      Add xattrs to allow the user to get/set metadata in lieu of having pioctl()
      available.  The following xattrs are now available:
      
       - "afs.cell"
      
         The name of the cell in which the vnode's volume resides.
      
       - "afs.fid"
      
         The volume ID, vnode ID and vnode uniquifier of the file as three hex
         numbers separated by colons.
      
       - "afs.volume"
      
         The name of the volume in which the vnode resides.
      
      For example:
      
      	# getfattr -d -m ".*" /mnt/scratch
      	getfattr: Removing leading '/' from absolute path names
      	# file: mnt/scratch
      	afs.cell="mycell.myorg.org"
      	afs.fid="10000b:1:1"
      	afs.volume="scratch"
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d3e3b7ea