1. 23 4月, 2021 3 次提交
  2. 24 3月, 2021 1 次提交
  3. 15 11月, 2020 1 次提交
  4. 29 10月, 2020 7 次提交
    • D
      afs: Fix dirty-region encoding on ppc32 with 64K pages · 2d9900f2
      David Howells 提交于
      The dirty region bounds stored in page->private on an afs page are 15 bits
      on a 32-bit box and can, at most, represent a range of up to 32K within a
      32K page with a resolution of 1 byte.  This is a problem for powerpc32 with
      64K pages enabled.
      
      Further, transparent huge pages may get up to 2M, which will be a problem
      for the afs filesystem on all 32-bit arches in the future.
      
      Fix this by decreasing the resolution.  For the moment, a 64K page will
      have a resolution determined from PAGE_SIZE.  In the future, the page will
      need to be passed in to the helper functions so that the page size can be
      assessed and the resolution determined dynamically.
      
      Note that this might not be the ideal way to handle this, since it may
      allow some leakage of undirtied zero bytes to the server's copy in the case
      of a 3rd-party conflict.  Fixing that would require a separately allocated
      record and is a more complicated fix.
      
      Fixes: 4343d008 ("afs: Get rid of the afs_writeback record")
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      2d9900f2
    • D
      afs: Fix afs_invalidatepage to adjust the dirty region · f86726a6
      David Howells 提交于
      Fix afs_invalidatepage() to adjust the dirty region recorded in
      page->private when truncating a page.  If the dirty region is entirely
      removed, then the private data is cleared and the page dirty state is
      cleared.
      
      Without this, if the page is truncated and then expanded again by truncate,
      zeros from the expanded, but no-longer dirty region may get written back to
      the server if the page gets laundered due to a conflicting 3rd-party write.
      
      It mustn't, however, shorten the dirty region of the page if that page is
      still mmapped and has been marked dirty by afs_page_mkwrite(), so a flag is
      stored in page->private to record this.
      
      Fixes: 4343d008 ("afs: Get rid of the afs_writeback record")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f86726a6
    • D
      afs: Alter dirty range encoding in page->private · 65dd2d60
      David Howells 提交于
      Currently, page->private on an afs page is used to store the range of
      dirtied data within the page, where the range includes the lower bound, but
      excludes the upper bound (e.g. 0-1 is a range covering a single byte).
      
      This, however, requires a superfluous bit for the last-byte bound so that
      on a 4KiB page, it can say 0-4096 to indicate the whole page, the idea
      being that having both numbers the same would indicate an empty range.
      This is unnecessary as the PG_private bit is clear if it's an empty range
      (as is PG_dirty).
      
      Alter the way the dirty range is encoded in page->private such that the
      upper bound is reduced by 1 (e.g. 0-0 is then specified the same single
      byte range mentioned above).
      
      Applying this to both bounds frees up two bits, one of which can be used in
      a future commit.
      
      This allows the afs filesystem to be compiled on ppc32 with 64K pages;
      without this, the following warnings are seen:
      
      ../fs/afs/internal.h: In function 'afs_page_dirty_to':
      ../fs/afs/internal.h:881:15: warning: right shift count >= width of type [-Wshift-count-overflow]
        881 |  return (priv >> __AFS_PAGE_PRIV_SHIFT) & __AFS_PAGE_PRIV_MASK;
            |               ^~
      ../fs/afs/internal.h: In function 'afs_page_dirty':
      ../fs/afs/internal.h:886:28: warning: left shift count >= width of type [-Wshift-count-overflow]
        886 |  return ((unsigned long)to << __AFS_PAGE_PRIV_SHIFT) | from;
            |                            ^~
      
      Fixes: 4343d008 ("afs: Get rid of the afs_writeback record")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      65dd2d60
    • D
      afs: Wrap page->private manipulations in inline functions · 185f0c70
      David Howells 提交于
      The afs filesystem uses page->private to store the dirty range within a
      page such that in the event of a conflicting 3rd-party write to the server,
      we write back just the bits that got changed locally.
      
      However, there are a couple of problems with this:
      
       (1) I need a bit to note if the page might be mapped so that partial
           invalidation doesn't shrink the range.
      
       (2) There aren't necessarily sufficient bits to store the entire range of
           data altered (say it's a 32-bit system with 64KiB pages or transparent
           huge pages are in use).
      
      So wrap the accesses in inline functions so that future commits can change
      how this works.
      
      Also move them out of the tracing header into the in-directory header.
      There's not really any need for them to be in the tracing header.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      185f0c70
    • D
      afs: Fix where page->private is set during write · f792e3ac
      David Howells 提交于
      In afs, page->private is set to indicate the dirty region of a page.  This
      is done in afs_write_begin(), but that can't take account of whether the
      copy into the page actually worked.
      
      Fix this by moving the change of page->private into afs_write_end().
      
      Fixes: 4343d008 ("afs: Get rid of the afs_writeback record")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f792e3ac
    • D
      afs: Fix page leak on afs_write_begin() failure · 21db2cdc
      David Howells 提交于
      Fix the leak of the target page in afs_write_begin() when it fails.
      
      Fixes: 15b4650e ("afs: convert to new aops")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Nick Piggin <npiggin@gmail.com>
      21db2cdc
    • D
      afs: Fix to take ref on page when PG_private is set · fa04a40b
      David Howells 提交于
      Fix afs to take a ref on a page when it sets PG_private on it and to drop
      the ref when removing the flag.
      
      Note that in afs_write_begin(), a lot of the time, PG_private is already
      set on a page to which we're going to add some data.  In such a case, we
      leave the bit set and mustn't increment the page count.
      
      As suggested by Matthew Wilcox, use attach/detach_page_private() where
      possible.
      
      Fixes: 31143d5d ("AFS: implement basic file write support")
      Reported-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      fa04a40b
  5. 28 10月, 2020 1 次提交
  6. 09 10月, 2020 1 次提交
    • D
      afs: Fix deadlock between writeback and truncate · ec0fa0b6
      David Howells 提交于
      The afs filesystem has a lock[*] that it uses to serialise I/O operations
      going to the server (vnode->io_lock), as the server will only perform one
      modification operation at a time on any given file or directory.  This
      prevents the the filesystem from filling up all the call slots to a server
      with calls that aren't going to be executed in parallel anyway, thereby
      allowing operations on other files to obtain slots.
      
        [*] Note that is probably redundant for directories at least since
            i_rwsem is used to serialise directory modifications and
            lookup/reading vs modification.  The server does allow parallel
            non-modification ops, however.
      
      When a file truncation op completes, we truncate the in-memory copy of the
      file to match - but we do it whilst still holding the io_lock, the idea
      being to prevent races with other operations.
      
      However, if writeback starts in a worker thread simultaneously with
      truncation (whilst notify_change() is called with i_rwsem locked, writeback
      pays it no heed), it may manage to set PG_writeback bits on the pages that
      will get truncated before afs_setattr_success() manages to call
      truncate_pagecache().  Truncate will then wait for those pages - whilst
      still inside io_lock:
      
          # cat /proc/8837/stack
          [<0>] wait_on_page_bit_common+0x184/0x1e7
          [<0>] truncate_inode_pages_range+0x37f/0x3eb
          [<0>] truncate_pagecache+0x3c/0x53
          [<0>] afs_setattr_success+0x4d/0x6e
          [<0>] afs_wait_for_operation+0xd8/0x169
          [<0>] afs_do_sync_operation+0x16/0x1f
          [<0>] afs_setattr+0x1fb/0x25d
          [<0>] notify_change+0x2cf/0x3c4
          [<0>] do_truncate+0x7f/0xb2
          [<0>] do_sys_ftruncate+0xd1/0x104
          [<0>] do_syscall_64+0x2d/0x3a
          [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The writeback operation, however, stalls indefinitely because it needs to
      get the io_lock to proceed:
      
          # cat /proc/5940/stack
          [<0>] afs_get_io_locks+0x58/0x1ae
          [<0>] afs_begin_vnode_operation+0xc7/0xd1
          [<0>] afs_store_data+0x1b2/0x2a3
          [<0>] afs_write_back_from_locked_page+0x418/0x57c
          [<0>] afs_writepages_region+0x196/0x224
          [<0>] afs_writepages+0x74/0x156
          [<0>] do_writepages+0x2d/0x56
          [<0>] __writeback_single_inode+0x84/0x207
          [<0>] writeback_sb_inodes+0x238/0x3cf
          [<0>] __writeback_inodes_wb+0x68/0x9f
          [<0>] wb_writeback+0x145/0x26c
          [<0>] wb_do_writeback+0x16a/0x194
          [<0>] wb_workfn+0x74/0x177
          [<0>] process_one_work+0x174/0x264
          [<0>] worker_thread+0x117/0x1b9
          [<0>] kthread+0xec/0xf1
          [<0>] ret_from_fork+0x1f/0x30
      
      and thus deadlock has occurred.
      
      Note that whilst afs_setattr() calls filemap_write_and_wait(), the fact
      that the caller is holding i_rwsem doesn't preclude more pages being
      dirtied through an mmap'd region.
      
      Fix this by:
      
       (1) Use the vnode validate_lock to mediate access between afs_setattr()
           and afs_writepages():
      
           (a) Exclusively lock validate_lock in afs_setattr() around the whole
           	 RPC operation.
      
           (b) If WB_SYNC_ALL isn't set on entry to afs_writepages(), trying to
           	 shared-lock validate_lock and returning immediately if we couldn't
           	 get it.
      
           (c) If WB_SYNC_ALL is set, wait for the lock.
      
           The validate_lock is also used to validate a file and to zap its cache
           if the file was altered by a third party, so it's probably a good fit
           for this.
      
       (2) Move the truncation outside of the io_lock in setattr, using the same
           hook as is used for local directory editing.
      
           This requires the old i_size to be retained in the operation record as
           we commit the revised status to the inode members inside the io_lock
           still, but we still need to know if we reduced the file size.
      
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ec0fa0b6
  7. 24 8月, 2020 1 次提交
  8. 16 7月, 2020 1 次提交
    • D
      afs: Fix interruption of operations · 811f04ba
      David Howells 提交于
      The afs filesystem driver allows unstarted operations to be cancelled by
      signal, but most of these can easily be restarted (mkdir for example).  The
      primary culprits for reproducing this are those applications that use
      SIGALRM to display a progress counter.
      
      File lock-extension operation is marked uninterruptible as we have a
      limited time in which to do it, and the release op is marked
      uninterruptible also as if we fail to unlock a file, we'll have to wait 20
      mins before anyone can lock it again.
      
      The store operation logs a warning if it gets interruption, e.g.:
      
      	kAFS: Unexpected error from FS.StoreData -4
      
      because it's run from the background - but it can also be run from
      fdatasync()-type things.  However, store options aren't marked
      interruptible at the moment.
      
      Fix this in the following ways:
      
       (1) Mark store operations as uninterruptible.  It might make sense to
           relax this for certain situations, but I'm not sure how to make sure
           that background store ops aren't affected by signals to foreground
           processes that happen to trigger them.
      
       (2) In afs_get_io_locks(), where we're getting the serialisation lock for
           talking to the fileserver, return ERESTARTSYS rather than EINTR
           because a lot of the operations (e.g. mkdir) are restartable if we
           haven't yet started sending the op to the server.
      
      Fixes: e49c7b2f ("afs: Build an abstraction around an "operation" concept")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      811f04ba
  9. 15 6月, 2020 4 次提交
    • D
      afs: Fix truncation issues and mmap writeback size · 793fe82e
      David Howells 提交于
      Fix the following issues:
      
       (1) Fix writeback to reduce the size of a store operation to i_size,
           effectively discarding the extra data.
      
           The problem comes when afs_page_mkwrite() records that a page is about
           to be modified by mmap().  It doesn't know what bits of the page are
           going to be modified, so it records the whole page as being dirty
           (this is stored in page->private as start and end offsets).
      
           Without this, the marshalling for the store to the server extends the
           size of the file to the end of the page (in afs_fs_store_data() and
           yfs_fs_store_data()).
      
       (2) Fix setattr to actually truncate the pagecache, thereby clearing
           the discarded part of a file.
      
       (3) Fix setattr to check that the new size is okay and to disable
           ATTR_SIZE if i_size wouldn't change.
      
       (4) Force i_size to be updated as the result of a truncate.
      
       (5) Don't truncate if ATTR_SIZE is not set.
      
       (6) Call pagecache_isize_extended() if the file was enlarged.
      
      Note that truncate_set_size() isn't used because the setting of i_size is
      done inside afs_vnode_commit_status() under the vnode->cb_lock.
      
      Found with the generic/029 and generic/393 xfstests.
      
      Fixes: 31143d5d ("AFS: implement basic file write support")
      Fixes: 4343d008 ("afs: Get rid of the afs_writeback record")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      793fe82e
    • D
      afs: Concoct ctimes · da8d0755
      David Howells 提交于
      The in-kernel afs filesystem ignores ctime because the AFS fileserver
      protocol doesn't support ctimes.  This, however, causes various xfstests to
      fail.
      
      Work around this by:
      
       (1) Setting ctime to attr->ia_ctime in afs_setattr().
      
       (2) Not ignoring ATTR_MTIME_SET, ATTR_TIMES_SET and ATTR_TOUCH settings.
      
       (3) Setting the ctime from the server mtime when on the target file when
           creating a hard link to it.
      
       (4) Setting the ctime on directories from their revised mtimes when
           renaming/moving a file.
      
      Found by the generic/221 and generic/309 xfstests.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      da8d0755
    • D
      afs: afs_write_end() should change i_size under the right lock · 1f32ef79
      David Howells 提交于
      Fix afs_write_end() to change i_size under vnode->cb_lock rather than
      ->wb_lock so that it doesn't race with afs_vnode_commit_status() and
      afs_getattr().
      
      The ->wb_lock is only meant to guard access to ->wb_keys which isn't
      accessed by that piece of code.
      
      Fixes: 4343d008 ("afs: Get rid of the afs_writeback record")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      1f32ef79
    • D
      afs: Fix non-setting of mtime when writing into mmap · bb413489
      David Howells 提交于
      The mtime on an inode needs to be updated when a write is made into an
      mmap'ed section.  There are three ways in which this could be done: update
      it when page_mkwrite is called, update it when a page is changed from dirty
      to writeback or leave it to the server and fix the mtime up from the reply
      to the StoreData RPC.
      
      Found with the generic/215 xfstest.
      
      Fixes: 1cf7a151 ("afs: Implement shared-writeable mmap")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      bb413489
  10. 12 6月, 2020 1 次提交
  11. 04 6月, 2020 1 次提交
    • D
      afs: Build an abstraction around an "operation" concept · e49c7b2f
      David Howells 提交于
      Turn the afs_operation struct into the main way that most fileserver
      operations are managed.  Various things are added to the struct, including
      the following:
      
       (1) All the parameters and results of the relevant operations are moved
           into it, removing corresponding fields from the afs_call struct.
           afs_call gets a pointer to the op.
      
       (2) The target volume is made the main focus of the operation, rather than
           the target vnode(s), and a bunch of op->vnode->volume are made
           op->volume instead.
      
       (3) Two vnode records are defined (op->file[]) for the vnode(s) involved
           in most operations.  The vnode record (struct afs_vnode_param)
           contains:
      
      	- The vnode pointer.
      
      	- The fid of the vnode to be included in the parameters or that was
                returned in the reply (eg. FS.MakeDir).
      
      	- The status and callback information that may be returned in the
           	  reply about the vnode.
      
      	- Callback break and data version tracking for detecting
                simultaneous third-parth changes.
      
       (4) Pointers to dentries to be updated with new inodes.
      
       (5) An operations table pointer.  The table includes pointers to functions
           for issuing AFS and YFS-variant RPCs, handling the success and abort
           of an operation and handling post-I/O-lock local editing of a
           directory.
      
      To make this work, the following function restructuring is made:
      
       (A) The rotation loop that issues calls to fileservers that can be found
           in each function that wants to issue an RPC (such as afs_mkdir()) is
           extracted out into common code, in a new file called fs_operation.c.
      
       (B) The rotation loops, such as the one in afs_mkdir(), are replaced with
           a much smaller piece of code that allocates an operation, sets the
           parameters and then calls out to the common code to do the actual
           work.
      
       (C) The code for handling the success and failure of an operation are
           moved into operation functions (as (5) above) and these are called
           from the core code at appropriate times.
      
       (D) The pseudo inode getting stuff used by the dynamic root code is moved
           over into dynroot.c.
      
       (E) struct afs_iget_data is absorbed into the operation struct and
           afs_iget() expects to be given an op pointer and a vnode record.
      
       (F) Point (E) doesn't work for the root dir of a volume, but we know the
           FID in advance (it's always vnode 1, unique 1), so a separate inode
           getter, afs_root_iget(), is provided to special-case that.
      
       (G) The inode status init/update functions now also take an op and a vnode
           record.
      
       (H) The RPC marshalling functions now, for the most part, just take an
           afs_operation struct as their only argument.  All the data they need
           is held there.  The result delivery functions write their answers
           there as well.
      
       (I) The call is attached to the operation and then the operation core does
           the waiting.
      
      And then the new operation code is, for the moment, made to just initialise
      the operation, get the appropriate vnode I/O locks and do the same rotation
      loop as before.
      
      This lays the foundation for the following changes in the future:
      
       (*) Overhauling the rotation (again).
      
       (*) Support for asynchronous I/O, where the fileserver rotation must be
           done asynchronously also.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      e49c7b2f
  12. 31 5月, 2020 1 次提交
  13. 21 6月, 2019 1 次提交
  14. 31 5月, 2019 1 次提交
  15. 16 5月, 2019 2 次提交
    • D
      afs: Fix application of status and callback to be under same lock · a58823ac
      David Howells 提交于
      When applying the status and callback in the response of an operation,
      apply them in the same critical section so that there's no race between
      checking the callback state and checking status-dependent state (such as
      the data version).
      
      Fix this by:
      
       (1) Allocating a joint {status,callback} record (afs_status_cb) before
           calling the RPC function for each vnode for which the RPC reply
           contains a status or a status plus a callback.  A flag is set in the
           record to indicate if a callback was actually received.
      
       (2) These records are passed into the RPC functions to be filled in.  The
           afs_decode_status() and yfs_decode_status() functions are removed and
           the cb_lock is no longer taken.
      
       (3) xdr_decode_AFSFetchStatus() and xdr_decode_YFSFetchStatus() no longer
           update the vnode.
      
       (4) xdr_decode_AFSCallBack() and xdr_decode_YFSCallBack() no longer update
           the vnode.
      
       (5) vnodes, expected data-version numbers and callback break counters
           (cb_break) no longer need to be passed to the reply delivery
           functions.
      
           Note that, for the moment, the file locking functions still need
           access to both the call and the vnode at the same time.
      
       (6) afs_vnode_commit_status() is now given the cb_break value and the
           expected data_version and the task of applying the status and the
           callback to the vnode are now done here.
      
           This is done under a single taking of vnode->cb_lock.
      
       (7) afs_pages_written_back() is now called by afs_store_data() rather than
           by the reply delivery function.
      
           afs_pages_written_back() has been moved to before the call point and
           is now given the first and last page numbers rather than a pointer to
           the call.
      
       (8) The indicator from YFS.RemoveFile2 as to whether the target file
           actually got removed (status.abort_code == VNOVNODE) rather than
           merely dropping a link is now checked in afs_unlink rather than in
           xdr_decode_YFSFetchStatus().
      
      Supplementary fixes:
      
       (*) afs_cache_permit() now gets the caller_access mask from the
           afs_status_cb object rather than picking it out of the vnode's status
           record.  afs_fetch_status() returns caller_access through its argument
           list for this purpose also.
      
       (*) afs_inode_init_from_status() now uses a write lock on cb_lock rather
           than a read lock and now sets the callback inside the same critical
           section.
      
      Fixes: c435ee34 ("afs: Overhaul the callback handling")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a58823ac
    • D
      afs: Make some RPC operations non-interruptible · 20b8391f
      David Howells 提交于
      Make certain RPC operations non-interruptible, including:
      
       (*) Set attributes
       (*) Store data
      
           We don't want to get interrupted during a flush on close, flush on
           unlock, writeback or an inode update, leaving us in a state where we
           still need to do the writeback or update.
      
       (*) Extend lock
       (*) Release lock
      
           We don't want to get lock extension interrupted as the file locks on
           the server are time-limited.  Interruption during lock release is less
           of an issue since the lock is time-limited, but it's better to
           complete the release to avoid a several-minute wait to recover it.
      
           *Setting* the lock isn't a problem if it's interrupted since we can
            just return to the user and tell them they were interrupted - at
            which point they can elect to retry.
      
       (*) Silly unlink
      
           We want to remove silly unlink files if we can, rather than leaving
           them for the salvager to clear up.
      
      Note that whilst these calls are no longer interruptible, they do have
      timeouts on them, so if the server stops responding the call will fail with
      something like ETIME or ECONNRESET.
      
      Without this, the following:
      
      	kAFS: Unexpected error from FS.StoreData -512
      
      appears in dmesg when a pending store data gets interrupted and some
      processes may just hang.
      
      Additionally, make the code that checks/updates the server record ignore
      failure due to interruption if the main call is uninterruptible and if the
      server has an address list.  The next op will check it again since the
      expiration time on the old list has past.
      
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Reported-by: NJonathan Billings <jsbillings@jsbillings.org>
      Reported-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      20b8391f
  16. 13 4月, 2019 1 次提交
  17. 24 10月, 2018 3 次提交
  18. 24 8月, 2018 1 次提交
  19. 14 5月, 2018 1 次提交
    • D
      afs: Fix whole-volume callback handling · 68251f0a
      David Howells 提交于
      It's possible for an AFS file server to issue a whole-volume notification
      that callbacks on all the vnodes in the file have been broken.  This is
      done for R/O and backup volumes (which don't have per-file callbacks) and
      for things like a volume being taken offline.
      
      Fix callback handling to detect whole-volume notifications, to track it
      across operations and to check it during inode validation.
      
      Fixes: c435ee34 ("afs: Overhaul the callback handling")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      68251f0a
  20. 12 4月, 2018 1 次提交
  21. 10 4月, 2018 3 次提交
    • D
      afs: Do better accretion of small writes on newly created content · 5a813276
      David Howells 提交于
      Processes like ld that do lots of small writes that aren't necessarily
      contiguous result in a lot of small StoreData operations to the server, the
      idea being that if someone else changes the data on the server, we only
      write our changes over that and not the space between.  Further, we don't
      want to write back empty space if we can avoid it to make it easier for the
      server to do sparse files.
      
      However, making lots of tiny RPC ops is a lot less efficient for the server
      than one big one because each op requires allocation of resources and the
      taking of locks, so we want to compromise a bit.
      
      Reduce the load by the following:
      
       (1) If a file is just created locally or has just been truncated with
           O_TRUNC locally, allow subsequent writes to the file to be merged with
           intervening space if that space doesn't cross an entire intervening
           page.
      
       (2) Don't flush the file on ->flush() but rather on ->release() if the
           file was open for writing.
      
      Just linking vmlinux.o, without this patch, looking in /proc/fs/afs/stats:
      
      	file-wr : n=441 nb=513581204
      
      and after the patch:
      
      	file-wr : n=62 nb=513668555
      
      there were 379 fewer StoreData RPC operations at the expense of an extra
      87K being written.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      5a813276
    • D
      afs: Add stats for data transfer operations · 76a5cb6f
      David Howells 提交于
      Add statistics to /proc/fs/afs/stats for data transfer RPC operations.  New
      lines are added that look like:
      
      	file-rd : n=55794 nb=10252282150
      	file-wr : n=9789 nb=3247763645
      
      where n= indicates the number of ops completed and nb= indicates the number
      of bytes successfully transferred.  file-rd is the counts for read/fetch
      operations and file-wr the counts for write/store operations.
      
      Note that directory and symlink downloading are included in the file-rd
      stats at the moment.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      76a5cb6f
    • D
      afs: Fix directory handling · f3ddee8d
      David Howells 提交于
      AFS directories are structured blobs that are downloaded just like files
      and then parsed by the lookup and readdir code and, as such, are currently
      handled in the pagecache like any other file, with the entire directory
      content being thrown away each time the directory changes.
      
      However, since the blob is a known structure and since the data version
      counter on a directory increases by exactly one for each change committed
      to that directory, we can actually edit the directory locally rather than
      fetching it from the server after each locally-induced change.
      
      What we can't do, though, is mix data from the server and data from the
      client since the server is technically at liberty to rearrange or compress
      a directory if it sees fit, provided it updates the data version number
      when it does so and breaks the callback (ie. sends a notification).
      
      Further, lookup with lookup-ahead, readdir and, when it arrives, local
      editing are likely want to scan the whole of a directory.
      
      So directory handling needs to be improved to maintain the coherency of the
      directory blob prior to permitting local directory editing.
      
      To this end:
      
       (1) If any directory page gets discarded, invalidate and reread the entire
           directory.
      
       (2) If readpage notes that if when it fetches a single page that the
           version number has changed, the entire directory is flagged for
           invalidation.
      
       (3) Read as much of the directory in one go as we can.
      
      Note that this removes local caching of directories in fscache for the
      moment as we can't pass the pages to fscache_read_or_alloc_pages() since
      page->lru is in use by the LRU.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f3ddee8d
  22. 02 1月, 2018 1 次提交
  23. 24 11月, 2017 1 次提交
    • D
      afs: Make afs_write_begin() avoid writing to a page that's being stored · 5a039c32
      David Howells 提交于
      Make afs_write_begin() wait for a page that's marked PG_writeback because:
      
       (1) We need to avoid interference with the data being stored so that the
           data on the server ends up in a defined state.
      
       (2) page->private is used to track the window of dirty data within a page,
           but it's also used by the storage code to track what's being written,
           being cleared by the completion notification.  Ownership can't be
           relinquished by the storage code until completion because it a store
           fails, the data must be remarked dirty.
      
      Tracing shows something like the following (edited):
      
       x86_64-linux-gn-15940 [1] afs_page_dirty: vn=ffff8800bef33800 9c75 begin 0-125
          kworker/u8:3-114   [2] afs_page_dirty: vn=ffff8800bef33800 9c75 store+ 0-125
       x86_64-linux-gn-15940 [1] afs_page_dirty: vn=ffff8800bef33800 9c75 begin 0-2052
          kworker/u8:3-114   [2] afs_page_dirty: vn=ffff8800bef33800 9c75 clear 0-2052
          kworker/u8:3-114   [2] afs_page_dirty: vn=ffff8800bef33800 9c75 store 0-0
          kworker/u8:3-114   [2] afs_page_dirty: vn=ffff8800bef33800 9c75 WARN 0-0
      
      The clear (completion) corresponding to the store+ (store continuation from
      a previous page) happens between the second begin (afs_write_begin) and the
      store corresponding to that.  This results in the second store not seeing
      any data to write back, leading to the following warning:
      
      WARNING: CPU: 2 PID: 114 at ../fs/afs/write.c:403 afs_write_back_from_locked_page+0x19d/0x76c [kafs]
      Modules linked in: kafs(E)
      CPU: 2 PID: 114 Comm: kworker/u8:3 Tainted: G            E   4.14.0-fscache+ #242
      Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
      Workqueue: writeback wb_workfn (flush-afs-2)
      task: ffff8800cad72600 task.stack: ffff8800cad44000
      RIP: 0010:afs_write_back_from_locked_page+0x19d/0x76c [kafs]
      RSP: 0018:ffff8800cad47aa0 EFLAGS: 00010246
      RAX: 0000000000000001 RBX: ffff8800bef33a20 RCX: 0000000000000000
      RDX: 000000000000000f RSI: ffffffff81c5d0e0 RDI: ffff8800cad72e78
      RBP: ffff8800d31ea1e8 R08: ffff8800c1358000 R09: ffff8800ca00e400
      R10: ffff8800cad47a38 R11: ffff8800c5d9e400 R12: 0000000000000000
      R13: ffffea0002d9df00 R14: ffffffffa0023c1c R15: 0000000000007fdf
      FS:  0000000000000000(0000) GS:ffff8800ca700000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f85ac6c4000 CR3: 0000000001c10001 CR4: 00000000001606e0
      Call Trace:
       ? clear_page_dirty_for_io+0x23a/0x267
       afs_writepages_region+0x1be/0x286 [kafs]
       afs_writepages+0x60/0x127 [kafs]
       do_writepages+0x36/0x70
       __writeback_single_inode+0x12f/0x635
       writeback_sb_inodes+0x2cc/0x452
       __writeback_inodes_wb+0x68/0x9f
       wb_writeback+0x208/0x470
       ? wb_workfn+0x22b/0x565
       wb_workfn+0x22b/0x565
       ? worker_thread+0x230/0x2ac
       process_one_work+0x2cc/0x517
       ? worker_thread+0x230/0x2ac
       worker_thread+0x1d4/0x2ac
       ? rescuer_thread+0x29b/0x29b
       kthread+0x15d/0x165
       ? kthread_create_on_node+0x3f/0x3f
       ? call_usermodehelper_exec_async+0x118/0x11f
       ret_from_fork+0x24/0x30
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      5a039c32
  24. 16 11月, 2017 1 次提交