1. 16 6月, 2020 1 次提交
  2. 15 6月, 2020 1 次提交
    • D
      afs: Concoct ctimes · da8d0755
      David Howells 提交于
      The in-kernel afs filesystem ignores ctime because the AFS fileserver
      protocol doesn't support ctimes.  This, however, causes various xfstests to
      fail.
      
      Work around this by:
      
       (1) Setting ctime to attr->ia_ctime in afs_setattr().
      
       (2) Not ignoring ATTR_MTIME_SET, ATTR_TIMES_SET and ATTR_TOUCH settings.
      
       (3) Setting the ctime from the server mtime when on the target file when
           creating a hard link to it.
      
       (4) Setting the ctime on directories from their revised mtimes when
           renaming/moving a file.
      
      Found by the generic/221 and generic/309 xfstests.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      da8d0755
  3. 10 6月, 2020 1 次提交
  4. 04 6月, 2020 2 次提交
    • D
      afs: Reorganise volume and server trees to be rooted on the cell · 20325960
      David Howells 提交于
      Reorganise afs_volume objects such that they're in a tree keyed on volume
      ID, rooted at on an afs_cell object rather than being in multiple trees,
      each of which is rooted on an afs_server object.
      
      afs_server structs become per-cell and acquire a pointer to the cell.
      
      The process of breaking a callback then starts with finding the server by
      its network address, following that to the cell and then looking up each
      volume ID in the volume tree.
      
      This is simpler than the afs_vol_interest/afs_cb_interest N:M mapping web
      and allows those structs and the code for maintaining them to be simplified
      or removed.
      
      It does make a couple of things a bit more tricky, though:
      
       (1) Operations now start with a volume, not a server, so there can be more
           than one answer as to whether or not the server we'll end up using
           supports the FS.InlineBulkStatus RPC.
      
       (2) CB RPC operations that specify the server UUID.  There's still a tree
           of servers by UUID on the afs_net struct, but the UUIDs in it aren't
           guaranteed unique.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      20325960
    • D
      afs: Build an abstraction around an "operation" concept · e49c7b2f
      David Howells 提交于
      Turn the afs_operation struct into the main way that most fileserver
      operations are managed.  Various things are added to the struct, including
      the following:
      
       (1) All the parameters and results of the relevant operations are moved
           into it, removing corresponding fields from the afs_call struct.
           afs_call gets a pointer to the op.
      
       (2) The target volume is made the main focus of the operation, rather than
           the target vnode(s), and a bunch of op->vnode->volume are made
           op->volume instead.
      
       (3) Two vnode records are defined (op->file[]) for the vnode(s) involved
           in most operations.  The vnode record (struct afs_vnode_param)
           contains:
      
      	- The vnode pointer.
      
      	- The fid of the vnode to be included in the parameters or that was
                returned in the reply (eg. FS.MakeDir).
      
      	- The status and callback information that may be returned in the
           	  reply about the vnode.
      
      	- Callback break and data version tracking for detecting
                simultaneous third-parth changes.
      
       (4) Pointers to dentries to be updated with new inodes.
      
       (5) An operations table pointer.  The table includes pointers to functions
           for issuing AFS and YFS-variant RPCs, handling the success and abort
           of an operation and handling post-I/O-lock local editing of a
           directory.
      
      To make this work, the following function restructuring is made:
      
       (A) The rotation loop that issues calls to fileservers that can be found
           in each function that wants to issue an RPC (such as afs_mkdir()) is
           extracted out into common code, in a new file called fs_operation.c.
      
       (B) The rotation loops, such as the one in afs_mkdir(), are replaced with
           a much smaller piece of code that allocates an operation, sets the
           parameters and then calls out to the common code to do the actual
           work.
      
       (C) The code for handling the success and failure of an operation are
           moved into operation functions (as (5) above) and these are called
           from the core code at appropriate times.
      
       (D) The pseudo inode getting stuff used by the dynamic root code is moved
           over into dynroot.c.
      
       (E) struct afs_iget_data is absorbed into the operation struct and
           afs_iget() expects to be given an op pointer and a vnode record.
      
       (F) Point (E) doesn't work for the root dir of a volume, but we know the
           FID in advance (it's always vnode 1, unique 1), so a separate inode
           getter, afs_root_iget(), is provided to special-case that.
      
       (G) The inode status init/update functions now also take an op and a vnode
           record.
      
       (H) The RPC marshalling functions now, for the most part, just take an
           afs_operation struct as their only argument.  All the data they need
           is held there.  The result delivery functions write their answers
           there as well.
      
       (I) The call is attached to the operation and then the operation core does
           the waiting.
      
      And then the new operation code is, for the moment, made to just initialise
      the operation, get the appropriate vnode I/O locks and do the same rotation
      loop as before.
      
      This lays the foundation for the following changes in the future:
      
       (*) Overhauling the rotation (again).
      
       (*) Support for asynchronous I/O, where the fileserver rotation must be
           done asynchronously also.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      e49c7b2f
  5. 31 5月, 2020 2 次提交
  6. 13 4月, 2020 3 次提交
    • D
      afs: Fix afs_d_validate() to set the right directory version · 40fc8102
      David Howells 提交于
      If a dentry's version is somewhere between invalid_before and the current
      directory version, we should be setting it forward to the current version,
      not backwards to the invalid_before version.  Note that we're only doing
      this at all because dentry::d_fsdata isn't large enough on a 32-bit system.
      
      Fix this by using a separate variable for invalid_before so that we don't
      accidentally clobber the current dir version.
      
      Fixes: a4ff7401 ("afs: Keep track of invalid-before version for dentry coherency")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      40fc8102
    • D
      afs: Fix race between post-modification dir edit and readdir/d_revalidate · 2105c282
      David Howells 提交于
      AFS directories are retained locally as a structured file, with lookup
      being effected by a local search of the file contents.  When a modification
      (such as mkdir) happens, the dir file content is modified locally rather
      than redownloading the directory.
      
      The directory contents are accessed in a number of ways, with a number of
      different locks schemes:
      
       (1) Download of contents - dvnode->validate_lock/write in afs_read_dir().
      
       (2) Lookup and readdir - dvnode->validate_lock/read in afs_dir_iterate(),
           downgrading from (1) if necessary.
      
       (3) d_revalidate of child dentry - dvnode->validate_lock/read in
           afs_do_lookup_one() downgrading from (1) if necessary.
      
       (4) Edit of dir after modification - page locks on individual dir pages.
      
      Unfortunately, because (4) uses different locking scheme to (1) - (3),
      nothing protects against the page being scanned whilst the edit is
      underway.  Even download is not safe as it doesn't lock the pages - relying
      instead on the validate_lock to serialise as a whole (the theory being that
      directory contents are treated as a block and always downloaded as a
      block).
      
      Fix this by write-locking dvnode->validate_lock around the edits.  Care
      must be taken in the rename case as there may be two different dirs - but
      they need not be locked at the same time.  In any case, once the lock is
      taken, the directory version must be rechecked, and the edit skipped if a
      later version has been downloaded by revalidation (there can't have been
      any local changes because the VFS holds the inode lock, but there can have
      been remote changes).
      
      Fixes: 63a4681f ("afs: Locally edit directory data for mkdir/create/unlink/...")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      2105c282
    • D
      afs: Fix rename operation status delivery · b98f0ec9
      David Howells 提交于
      The afs_deliver_fs_rename() and yfs_deliver_fs_rename() functions both only
      decode the second file status returned unless the parent directories are
      different - unfortunately, this means that the xdr pointer isn't advanced
      and the volsync record will be read incorrectly in such an instance.
      
      Fix this by always decoding the second status into the second
      status/callback block which wasn't being used if the dirs were the same.
      
      The afs_update_dentry_version() calls that update the directory data
      version numbers on the dentries can then unconditionally use the second
      status record as this will always reflect the state of the destination dir
      (the two records will be identical if the destination dir is the same as
      the source dir)
      
      Fixes: 260a9803 ("[AFS]: Add "directory write" support.")
      Fixes: 30062bd1 ("afs: Implement YFS support in the fs client")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      b98f0ec9
  7. 15 1月, 2020 2 次提交
  8. 16 11月, 2019 1 次提交
    • D
      afs: Fix race in commit bulk status fetch · a28f239e
      David Howells 提交于
      When a lookup is done, the afs filesystem will perform a bulk status-fetch
      operation on the requested vnode (file) plus the next 49 other vnodes from
      the directory list (in AFS, directory contents are downloaded as blobs and
      parsed locally).  When the results are received, it will speculatively
      populate the inode cache from the extra data.
      
      However, if the lookup races with another lookup on the same directory, but
      for a different file - one that's in the 49 extra fetches, then if the bulk
      status-fetch operation finishes first, it will try and update the inode
      from the other lookup.
      
      If this other inode is still in the throes of being created, however, this
      will cause an assertion failure in afs_apply_status():
      
      	BUG_ON(test_bit(AFS_VNODE_UNSET, &vnode->flags));
      
      on or about fs/afs/inode.c:175 because it expects data to be there already
      that it can compare to.
      
      Fix this by skipping the update if the inode is being created as the
      creator will presumably set up the inode with the same information.
      
      Fixes: 39db9815 ("afs: Fix application of the results of a inline bulk status fetch")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a28f239e
  9. 02 9月, 2019 1 次提交
    • D
      afs: Support RCU pathwalk · a0753c29
      David Howells 提交于
      Make afs_permission() and afs_d_revalidate() do initial checks in RCU-mode
      pathwalk to reduce latency in pathwalk elements that get done multiple
      times.  We don't need to query the server unless we've received a
      notification from it that something has changed or the callback has
      expired.
      
      This requires that we can request a key and check permits under RCU
      conditions if we need to.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a0753c29
  10. 22 8月, 2019 1 次提交
    • M
      afs: Fix possible oops in afs_lookup trace event · c4c613ff
      Marc Dionne 提交于
      The afs_lookup trace event can cause the following:
      
      [  216.576777] BUG: kernel NULL pointer dereference, address: 000000000000023b
      [  216.576803] #PF: supervisor read access in kernel mode
      [  216.576813] #PF: error_code(0x0000) - not-present page
      ...
      [  216.576913] RIP: 0010:trace_event_raw_event_afs_lookup+0x9e/0x1c0 [kafs]
      
      If the inode from afs_do_lookup() is an error other than ENOENT, or if it
      is ENOENT and afs_try_auto_mntpt() returns an error, the trace event will
      try to dereference the error pointer as a valid pointer.
      
      Use IS_ERR_OR_NULL to only pass a valid pointer for the trace, or NULL.
      
      Ideally the trace would include the error value, but for now just avoid
      the oops.
      
      Fixes: 80548b03 ("afs: Add more tracepoints")
      Signed-off-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      c4c613ff
  11. 30 7月, 2019 3 次提交
    • D
      afs: Fix missing dentry data version updating · 9dd0b82e
      David Howells 提交于
      In the in-kernel afs filesystem, the d_fsdata dentry field is used to hold
      the data version of the parent directory when it was created or when
      d_revalidate() last caused it to be updated.  This is compared to the
      ->invalid_before field in the directory inode, rather than the actual data
      version number, thereby allowing changes due to local edits to be ignored.
      Only if the server data version gets bumped unexpectedly (eg. by a
      competing client), do we need to revalidate stuff.
      
      However, the d_fsdata field should also be updated if an rpc op is
      performed that modifies that particular dentry.  Such ops return the
      revised data version of the directory(ies) involved, so we should use that.
      
      This is particularly problematic for rename, since a dentry from one
      directory may be moved directly into another directory (ie. mv a/x b/x).
      It would then be sporting the wrong data version - and if this is in the
      future, for the destination directory, revalidations would be missed,
      leading to foreign renames and hard-link deletion being missed.
      
      Fix this by the following means:
      
       (1) Return the data version number from operations that read the directory
           contents - if they issue the read.  This starts in afs_dir_iterate()
           and is used, ignored or passed back by its callers.
      
       (2) In afs_lookup*(), set the dentry version to the version returned by
           (1) before d_splice_alias() is called and the dentry published.
      
       (3) In afs_d_revalidate(), set the dentry version to that returned from
           (1) if an rpc call was issued.  This means that if a parallel
           procedure, such as mkdir(), modifies the directory, we won't
           accidentally use the data version from that.
      
       (4) In afs_{mkdir,create,link,symlink}(), set the new dentry's version to
           the directory data version before d_instantiate() is called.
      
       (5) In afs_{rmdir,unlink}, update the target dentry's version to the
           directory data version as soon as we've updated the directory inode.
      
       (6) In afs_rename(), we need to unhash the old dentry before we start so
           that we don't get afs_d_revalidate() reverting the version change in
           cross-directory renames.
      
           We then need to set both the old and the new dentry versions the data
           version of the new directory before we call d_move() as d_move() will
           rehash them.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      9dd0b82e
    • D
      afs: Only update d_fsdata if different in afs_d_revalidate() · 5dc84855
      David Howells 提交于
      In the in-kernel afs filesystem, d_fsdata is set with the data version of
      the parent directory.  afs_d_revalidate() will update this to the current
      directory version, but it shouldn't do this if it the value it read from
      d_fsdata is the same as no lock is held and cmpxchg() is not used.
      
      Fix the code to only change the value if it is different from the current
      directory version.
      
      Fixes: 260a9803 ("[AFS]: Add "directory write" support.")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      5dc84855
    • D
      afs: Fix off-by-one in afs_rename() expected data version calculation · 37c0bbb3
      David Howells 提交于
      When afs_rename() calculates the expected data version of the target
      directory in a cross-directory rename, it doesn't increment it as it
      should, so it always thinks that the target inode is unexpectedly modified
      on the server.
      
      Fixes: a58823ac ("afs: Fix application of status and callback to be under same lock")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      37c0bbb3
  12. 21 6月, 2019 3 次提交
  13. 31 5月, 2019 1 次提交
  14. 17 5月, 2019 4 次提交
    • D
      afs: Fix application of the results of a inline bulk status fetch · 39db9815
      David Howells 提交于
      Fix afs_do_lookup() such that when it does an inline bulk status fetch op,
      it will update inodes that are already extant (something that afs_iget()
      doesn't do) and to cache permits for each inode created (thereby avoiding a
      follow up FS.FetchStatus call to determine this).
      
      Extant inodes need looking up in advance so that their cb_break counters
      before and after the operation can be compared.  To this end, the inode
      pointers are cached so that they don't need looking up again after the op.
      
      Fixes: 5cf9dd55 ("afs: Prospectively look up extra files when doing a single lookup")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      39db9815
    • D
      afs: Pass pre-fetch server and volume break counts into afs_iget5_set() · b8359153
      David Howells 提交于
      Pass the server and volume break counts from before the status fetch
      operation that queried the attributes of a file into afs_iget5_set() so
      that the new vnode's break counters can be initialised appropriately.
      
      This allows detection of a volume or server break that happened whilst we
      were fetching the status or setting up the vnode.
      
      Fixes: c435ee34 ("afs: Overhaul the callback handling")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      b8359153
    • D
      afs: Fix unlink to handle YFS.RemoveFile2 better · a38a7558
      David Howells 提交于
      Make use of the status update for the target file that the YFS.RemoveFile2
      RPC op returns to correctly update the vnode as to whether the file was
      actually deleted or just had nlink reduced.
      
      Fixes: 30062bd1 ("afs: Implement YFS support in the fs client")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a38a7558
    • D
      afs: Make vnode->cb_interest RCU safe · f642404a
      David Howells 提交于
      Use RCU-based freeing for afs_cb_interest struct objects and use RCU on
      vnode->cb_interest.  Use that change to allow afs_check_validity() to use
      read_seqbegin_or_lock() instead of read_seqlock_excl().
      
      This also requires the caller of afs_check_validity() to hold the RCU read
      lock across the call.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f642404a
  15. 16 5月, 2019 3 次提交
    • D
      afs: Fix application of status and callback to be under same lock · a58823ac
      David Howells 提交于
      When applying the status and callback in the response of an operation,
      apply them in the same critical section so that there's no race between
      checking the callback state and checking status-dependent state (such as
      the data version).
      
      Fix this by:
      
       (1) Allocating a joint {status,callback} record (afs_status_cb) before
           calling the RPC function for each vnode for which the RPC reply
           contains a status or a status plus a callback.  A flag is set in the
           record to indicate if a callback was actually received.
      
       (2) These records are passed into the RPC functions to be filled in.  The
           afs_decode_status() and yfs_decode_status() functions are removed and
           the cb_lock is no longer taken.
      
       (3) xdr_decode_AFSFetchStatus() and xdr_decode_YFSFetchStatus() no longer
           update the vnode.
      
       (4) xdr_decode_AFSCallBack() and xdr_decode_YFSCallBack() no longer update
           the vnode.
      
       (5) vnodes, expected data-version numbers and callback break counters
           (cb_break) no longer need to be passed to the reply delivery
           functions.
      
           Note that, for the moment, the file locking functions still need
           access to both the call and the vnode at the same time.
      
       (6) afs_vnode_commit_status() is now given the cb_break value and the
           expected data_version and the task of applying the status and the
           callback to the vnode are now done here.
      
           This is done under a single taking of vnode->cb_lock.
      
       (7) afs_pages_written_back() is now called by afs_store_data() rather than
           by the reply delivery function.
      
           afs_pages_written_back() has been moved to before the call point and
           is now given the first and last page numbers rather than a pointer to
           the call.
      
       (8) The indicator from YFS.RemoveFile2 as to whether the target file
           actually got removed (status.abort_code == VNOVNODE) rather than
           merely dropping a link is now checked in afs_unlink rather than in
           xdr_decode_YFSFetchStatus().
      
      Supplementary fixes:
      
       (*) afs_cache_permit() now gets the caller_access mask from the
           afs_status_cb object rather than picking it out of the vnode's status
           record.  afs_fetch_status() returns caller_access through its argument
           list for this purpose also.
      
       (*) afs_inode_init_from_status() now uses a write lock on cb_lock rather
           than a read lock and now sets the callback inside the same critical
           section.
      
      Fixes: c435ee34 ("afs: Overhaul the callback handling")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a58823ac
    • D
      afs: Fix order-1 allocation in afs_do_lookup() · 87182759
      David Howells 提交于
      afs_do_lookup() will do an order-1 allocation to allocate status records if
      there are more than 39 vnodes to stat.
      
      Fix this by allocating an array of {status,callback} records for each vnode
      we want to examine using vmalloc() if larger than a page.
      
      This not only gets rid of the order-1 allocation, but makes it easier to
      grow beyond 50 records for YFS servers.  It also allows us to move to
      {status,callback} tuples for other calls too and makes it easier to lock
      across the application of the status and the callback to the vnode.
      
      Fixes: 5cf9dd55 ("afs: Prospectively look up extra files when doing a single lookup")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      87182759
    • D
      afs: Make some RPC operations non-interruptible · 20b8391f
      David Howells 提交于
      Make certain RPC operations non-interruptible, including:
      
       (*) Set attributes
       (*) Store data
      
           We don't want to get interrupted during a flush on close, flush on
           unlock, writeback or an inode update, leaving us in a state where we
           still need to do the writeback or update.
      
       (*) Extend lock
       (*) Release lock
      
           We don't want to get lock extension interrupted as the file locks on
           the server are time-limited.  Interruption during lock release is less
           of an issue since the lock is time-limited, but it's better to
           complete the release to avoid a several-minute wait to recover it.
      
           *Setting* the lock isn't a problem if it's interrupted since we can
            just return to the user and tell them they were interrupted - at
            which point they can elect to retry.
      
       (*) Silly unlink
      
           We want to remove silly unlink files if we can, rather than leaving
           them for the salvager to clear up.
      
      Note that whilst these calls are no longer interruptible, they do have
      timeouts on them, so if the server stops responding the call will fail with
      something like ETIME or ECONNRESET.
      
      Without this, the following:
      
      	kAFS: Unexpected error from FS.StoreData -512
      
      appears in dmesg when a pending store data gets interrupted and some
      processes may just hang.
      
      Additionally, make the code that checks/updates the server record ignore
      failure due to interruption if the main call is uninterruptible and if the
      server has an address list.  The next op will check it again since the
      expiration time on the old list has past.
      
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Reported-by: NJonathan Billings <jsbillings@jsbillings.org>
      Reported-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      20b8391f
  16. 07 5月, 2019 1 次提交
  17. 25 4月, 2019 4 次提交
    • D
      afs: Add more tracepoints · 80548b03
      David Howells 提交于
      Add four more tracepoints:
      
       (1) afs_make_fs_call1 - Split from afs_make_fs_call but takes a filename
           to log also.
      
       (2) afs_make_fs_call2 - Like the above but takes two filenames to log.
      
       (3) afs_lookup - Log the result of doing a successful lookup, including a
           negative result (fid 0:0).
      
       (4) afs_get_tree - Log the set up of a volume for mounting.
      
      It also extends the name buffer on the afs_edit_dir tracepoint to 24 chars
      and puts quotes around the filename in the text representation.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      80548b03
    • D
      afs: Implement sillyrename for unlink and rename · 79ddbfa5
      David Howells 提交于
      Implement sillyrename for AFS unlink and rename, using the NFS variant
      implementation as a basis.
      
      Note that the asynchronous file locking extender/releaser has to be
      notified with a state change to stop it complaining if there's a race
      between that and the actual file deletion.
      
      A tracepoint, afs_silly_rename, is also added to note the silly rename and
      the cleanup.  The afs_edit_dir tracepoint is given some extra reason
      indicators and the afs_flock_ev tracepoint is given a silly-delete file
      lock cancellation indicator.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      79ddbfa5
    • D
      afs: Add directory reload tracepoint · 99987c56
      David Howells 提交于
      Add a tracepoint (afs_reload_dir) to indicate when a directory is being
      reloaded.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      99987c56
    • D
      afs: Improve dir check failure reports · 445b1028
      David Howells 提交于
      Improve the content of directory check failure reports from:
      
      	kAFS: afs_dir_check_page(6d57): bad magic 1/2 is 0000
      
      to dump more information about the individual blocks in a directory page.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      445b1028
  18. 30 11月, 2018 1 次提交
  19. 24 10月, 2018 5 次提交