1. 29 10月, 2020 2 次提交
  2. 28 10月, 2020 1 次提交
  3. 09 10月, 2020 1 次提交
    • D
      afs: Fix deadlock between writeback and truncate · ec0fa0b6
      David Howells 提交于
      The afs filesystem has a lock[*] that it uses to serialise I/O operations
      going to the server (vnode->io_lock), as the server will only perform one
      modification operation at a time on any given file or directory.  This
      prevents the the filesystem from filling up all the call slots to a server
      with calls that aren't going to be executed in parallel anyway, thereby
      allowing operations on other files to obtain slots.
      
        [*] Note that is probably redundant for directories at least since
            i_rwsem is used to serialise directory modifications and
            lookup/reading vs modification.  The server does allow parallel
            non-modification ops, however.
      
      When a file truncation op completes, we truncate the in-memory copy of the
      file to match - but we do it whilst still holding the io_lock, the idea
      being to prevent races with other operations.
      
      However, if writeback starts in a worker thread simultaneously with
      truncation (whilst notify_change() is called with i_rwsem locked, writeback
      pays it no heed), it may manage to set PG_writeback bits on the pages that
      will get truncated before afs_setattr_success() manages to call
      truncate_pagecache().  Truncate will then wait for those pages - whilst
      still inside io_lock:
      
          # cat /proc/8837/stack
          [<0>] wait_on_page_bit_common+0x184/0x1e7
          [<0>] truncate_inode_pages_range+0x37f/0x3eb
          [<0>] truncate_pagecache+0x3c/0x53
          [<0>] afs_setattr_success+0x4d/0x6e
          [<0>] afs_wait_for_operation+0xd8/0x169
          [<0>] afs_do_sync_operation+0x16/0x1f
          [<0>] afs_setattr+0x1fb/0x25d
          [<0>] notify_change+0x2cf/0x3c4
          [<0>] do_truncate+0x7f/0xb2
          [<0>] do_sys_ftruncate+0xd1/0x104
          [<0>] do_syscall_64+0x2d/0x3a
          [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The writeback operation, however, stalls indefinitely because it needs to
      get the io_lock to proceed:
      
          # cat /proc/5940/stack
          [<0>] afs_get_io_locks+0x58/0x1ae
          [<0>] afs_begin_vnode_operation+0xc7/0xd1
          [<0>] afs_store_data+0x1b2/0x2a3
          [<0>] afs_write_back_from_locked_page+0x418/0x57c
          [<0>] afs_writepages_region+0x196/0x224
          [<0>] afs_writepages+0x74/0x156
          [<0>] do_writepages+0x2d/0x56
          [<0>] __writeback_single_inode+0x84/0x207
          [<0>] writeback_sb_inodes+0x238/0x3cf
          [<0>] __writeback_inodes_wb+0x68/0x9f
          [<0>] wb_writeback+0x145/0x26c
          [<0>] wb_do_writeback+0x16a/0x194
          [<0>] wb_workfn+0x74/0x177
          [<0>] process_one_work+0x174/0x264
          [<0>] worker_thread+0x117/0x1b9
          [<0>] kthread+0xec/0xf1
          [<0>] ret_from_fork+0x1f/0x30
      
      and thus deadlock has occurred.
      
      Note that whilst afs_setattr() calls filemap_write_and_wait(), the fact
      that the caller is holding i_rwsem doesn't preclude more pages being
      dirtied through an mmap'd region.
      
      Fix this by:
      
       (1) Use the vnode validate_lock to mediate access between afs_setattr()
           and afs_writepages():
      
           (a) Exclusively lock validate_lock in afs_setattr() around the whole
           	 RPC operation.
      
           (b) If WB_SYNC_ALL isn't set on entry to afs_writepages(), trying to
           	 shared-lock validate_lock and returning immediately if we couldn't
           	 get it.
      
           (c) If WB_SYNC_ALL is set, wait for the lock.
      
           The validate_lock is also used to validate a file and to zap its cache
           if the file was altered by a third party, so it's probably a good fit
           for this.
      
       (2) Move the truncation outside of the io_lock in setattr, using the same
           hook as is used for local directory editing.
      
           This requires the old i_size to be retained in the operation record as
           we commit the revised status to the inode members inside the io_lock
           still, but we still need to know if we reduced the file size.
      
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ec0fa0b6
  4. 24 8月, 2020 1 次提交
  5. 16 7月, 2020 1 次提交
    • D
      afs: Fix interruption of operations · 811f04ba
      David Howells 提交于
      The afs filesystem driver allows unstarted operations to be cancelled by
      signal, but most of these can easily be restarted (mkdir for example).  The
      primary culprits for reproducing this are those applications that use
      SIGALRM to display a progress counter.
      
      File lock-extension operation is marked uninterruptible as we have a
      limited time in which to do it, and the release op is marked
      uninterruptible also as if we fail to unlock a file, we'll have to wait 20
      mins before anyone can lock it again.
      
      The store operation logs a warning if it gets interruption, e.g.:
      
      	kAFS: Unexpected error from FS.StoreData -4
      
      because it's run from the background - but it can also be run from
      fdatasync()-type things.  However, store options aren't marked
      interruptible at the moment.
      
      Fix this in the following ways:
      
       (1) Mark store operations as uninterruptible.  It might make sense to
           relax this for certain situations, but I'm not sure how to make sure
           that background store ops aren't affected by signals to foreground
           processes that happen to trigger them.
      
       (2) In afs_get_io_locks(), where we're getting the serialisation lock for
           talking to the fileserver, return ERESTARTSYS rather than EINTR
           because a lot of the operations (e.g. mkdir) are restartable if we
           haven't yet started sending the op to the server.
      
      Fixes: e49c7b2f ("afs: Build an abstraction around an "operation" concept")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      811f04ba
  6. 15 6月, 2020 4 次提交
    • D
      afs: Fix truncation issues and mmap writeback size · 793fe82e
      David Howells 提交于
      Fix the following issues:
      
       (1) Fix writeback to reduce the size of a store operation to i_size,
           effectively discarding the extra data.
      
           The problem comes when afs_page_mkwrite() records that a page is about
           to be modified by mmap().  It doesn't know what bits of the page are
           going to be modified, so it records the whole page as being dirty
           (this is stored in page->private as start and end offsets).
      
           Without this, the marshalling for the store to the server extends the
           size of the file to the end of the page (in afs_fs_store_data() and
           yfs_fs_store_data()).
      
       (2) Fix setattr to actually truncate the pagecache, thereby clearing
           the discarded part of a file.
      
       (3) Fix setattr to check that the new size is okay and to disable
           ATTR_SIZE if i_size wouldn't change.
      
       (4) Force i_size to be updated as the result of a truncate.
      
       (5) Don't truncate if ATTR_SIZE is not set.
      
       (6) Call pagecache_isize_extended() if the file was enlarged.
      
      Note that truncate_set_size() isn't used because the setting of i_size is
      done inside afs_vnode_commit_status() under the vnode->cb_lock.
      
      Found with the generic/029 and generic/393 xfstests.
      
      Fixes: 31143d5d ("AFS: implement basic file write support")
      Fixes: 4343d008 ("afs: Get rid of the afs_writeback record")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      793fe82e
    • D
      afs: Concoct ctimes · da8d0755
      David Howells 提交于
      The in-kernel afs filesystem ignores ctime because the AFS fileserver
      protocol doesn't support ctimes.  This, however, causes various xfstests to
      fail.
      
      Work around this by:
      
       (1) Setting ctime to attr->ia_ctime in afs_setattr().
      
       (2) Not ignoring ATTR_MTIME_SET, ATTR_TIMES_SET and ATTR_TOUCH settings.
      
       (3) Setting the ctime from the server mtime when on the target file when
           creating a hard link to it.
      
       (4) Setting the ctime on directories from their revised mtimes when
           renaming/moving a file.
      
      Found by the generic/221 and generic/309 xfstests.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      da8d0755
    • D
      afs: afs_write_end() should change i_size under the right lock · 1f32ef79
      David Howells 提交于
      Fix afs_write_end() to change i_size under vnode->cb_lock rather than
      ->wb_lock so that it doesn't race with afs_vnode_commit_status() and
      afs_getattr().
      
      The ->wb_lock is only meant to guard access to ->wb_keys which isn't
      accessed by that piece of code.
      
      Fixes: 4343d008 ("afs: Get rid of the afs_writeback record")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      1f32ef79
    • D
      afs: Fix non-setting of mtime when writing into mmap · bb413489
      David Howells 提交于
      The mtime on an inode needs to be updated when a write is made into an
      mmap'ed section.  There are three ways in which this could be done: update
      it when page_mkwrite is called, update it when a page is changed from dirty
      to writeback or leave it to the server and fix the mtime up from the reply
      to the StoreData RPC.
      
      Found with the generic/215 xfstest.
      
      Fixes: 1cf7a151 ("afs: Implement shared-writeable mmap")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      bb413489
  7. 12 6月, 2020 1 次提交
  8. 04 6月, 2020 1 次提交
    • D
      afs: Build an abstraction around an "operation" concept · e49c7b2f
      David Howells 提交于
      Turn the afs_operation struct into the main way that most fileserver
      operations are managed.  Various things are added to the struct, including
      the following:
      
       (1) All the parameters and results of the relevant operations are moved
           into it, removing corresponding fields from the afs_call struct.
           afs_call gets a pointer to the op.
      
       (2) The target volume is made the main focus of the operation, rather than
           the target vnode(s), and a bunch of op->vnode->volume are made
           op->volume instead.
      
       (3) Two vnode records are defined (op->file[]) for the vnode(s) involved
           in most operations.  The vnode record (struct afs_vnode_param)
           contains:
      
      	- The vnode pointer.
      
      	- The fid of the vnode to be included in the parameters or that was
                returned in the reply (eg. FS.MakeDir).
      
      	- The status and callback information that may be returned in the
           	  reply about the vnode.
      
      	- Callback break and data version tracking for detecting
                simultaneous third-parth changes.
      
       (4) Pointers to dentries to be updated with new inodes.
      
       (5) An operations table pointer.  The table includes pointers to functions
           for issuing AFS and YFS-variant RPCs, handling the success and abort
           of an operation and handling post-I/O-lock local editing of a
           directory.
      
      To make this work, the following function restructuring is made:
      
       (A) The rotation loop that issues calls to fileservers that can be found
           in each function that wants to issue an RPC (such as afs_mkdir()) is
           extracted out into common code, in a new file called fs_operation.c.
      
       (B) The rotation loops, such as the one in afs_mkdir(), are replaced with
           a much smaller piece of code that allocates an operation, sets the
           parameters and then calls out to the common code to do the actual
           work.
      
       (C) The code for handling the success and failure of an operation are
           moved into operation functions (as (5) above) and these are called
           from the core code at appropriate times.
      
       (D) The pseudo inode getting stuff used by the dynamic root code is moved
           over into dynroot.c.
      
       (E) struct afs_iget_data is absorbed into the operation struct and
           afs_iget() expects to be given an op pointer and a vnode record.
      
       (F) Point (E) doesn't work for the root dir of a volume, but we know the
           FID in advance (it's always vnode 1, unique 1), so a separate inode
           getter, afs_root_iget(), is provided to special-case that.
      
       (G) The inode status init/update functions now also take an op and a vnode
           record.
      
       (H) The RPC marshalling functions now, for the most part, just take an
           afs_operation struct as their only argument.  All the data they need
           is held there.  The result delivery functions write their answers
           there as well.
      
       (I) The call is attached to the operation and then the operation core does
           the waiting.
      
      And then the new operation code is, for the moment, made to just initialise
      the operation, get the appropriate vnode I/O locks and do the same rotation
      loop as before.
      
      This lays the foundation for the following changes in the future:
      
       (*) Overhauling the rotation (again).
      
       (*) Support for asynchronous I/O, where the fileserver rotation must be
           done asynchronously also.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      e49c7b2f
  9. 31 5月, 2020 1 次提交
  10. 21 6月, 2019 1 次提交
  11. 31 5月, 2019 1 次提交
  12. 16 5月, 2019 2 次提交
    • D
      afs: Fix application of status and callback to be under same lock · a58823ac
      David Howells 提交于
      When applying the status and callback in the response of an operation,
      apply them in the same critical section so that there's no race between
      checking the callback state and checking status-dependent state (such as
      the data version).
      
      Fix this by:
      
       (1) Allocating a joint {status,callback} record (afs_status_cb) before
           calling the RPC function for each vnode for which the RPC reply
           contains a status or a status plus a callback.  A flag is set in the
           record to indicate if a callback was actually received.
      
       (2) These records are passed into the RPC functions to be filled in.  The
           afs_decode_status() and yfs_decode_status() functions are removed and
           the cb_lock is no longer taken.
      
       (3) xdr_decode_AFSFetchStatus() and xdr_decode_YFSFetchStatus() no longer
           update the vnode.
      
       (4) xdr_decode_AFSCallBack() and xdr_decode_YFSCallBack() no longer update
           the vnode.
      
       (5) vnodes, expected data-version numbers and callback break counters
           (cb_break) no longer need to be passed to the reply delivery
           functions.
      
           Note that, for the moment, the file locking functions still need
           access to both the call and the vnode at the same time.
      
       (6) afs_vnode_commit_status() is now given the cb_break value and the
           expected data_version and the task of applying the status and the
           callback to the vnode are now done here.
      
           This is done under a single taking of vnode->cb_lock.
      
       (7) afs_pages_written_back() is now called by afs_store_data() rather than
           by the reply delivery function.
      
           afs_pages_written_back() has been moved to before the call point and
           is now given the first and last page numbers rather than a pointer to
           the call.
      
       (8) The indicator from YFS.RemoveFile2 as to whether the target file
           actually got removed (status.abort_code == VNOVNODE) rather than
           merely dropping a link is now checked in afs_unlink rather than in
           xdr_decode_YFSFetchStatus().
      
      Supplementary fixes:
      
       (*) afs_cache_permit() now gets the caller_access mask from the
           afs_status_cb object rather than picking it out of the vnode's status
           record.  afs_fetch_status() returns caller_access through its argument
           list for this purpose also.
      
       (*) afs_inode_init_from_status() now uses a write lock on cb_lock rather
           than a read lock and now sets the callback inside the same critical
           section.
      
      Fixes: c435ee34 ("afs: Overhaul the callback handling")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a58823ac
    • D
      afs: Make some RPC operations non-interruptible · 20b8391f
      David Howells 提交于
      Make certain RPC operations non-interruptible, including:
      
       (*) Set attributes
       (*) Store data
      
           We don't want to get interrupted during a flush on close, flush on
           unlock, writeback or an inode update, leaving us in a state where we
           still need to do the writeback or update.
      
       (*) Extend lock
       (*) Release lock
      
           We don't want to get lock extension interrupted as the file locks on
           the server are time-limited.  Interruption during lock release is less
           of an issue since the lock is time-limited, but it's better to
           complete the release to avoid a several-minute wait to recover it.
      
           *Setting* the lock isn't a problem if it's interrupted since we can
            just return to the user and tell them they were interrupted - at
            which point they can elect to retry.
      
       (*) Silly unlink
      
           We want to remove silly unlink files if we can, rather than leaving
           them for the salvager to clear up.
      
      Note that whilst these calls are no longer interruptible, they do have
      timeouts on them, so if the server stops responding the call will fail with
      something like ETIME or ECONNRESET.
      
      Without this, the following:
      
      	kAFS: Unexpected error from FS.StoreData -512
      
      appears in dmesg when a pending store data gets interrupted and some
      processes may just hang.
      
      Additionally, make the code that checks/updates the server record ignore
      failure due to interruption if the main call is uninterruptible and if the
      server has an address list.  The next op will check it again since the
      expiration time on the old list has past.
      
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Reported-by: NJonathan Billings <jsbillings@jsbillings.org>
      Reported-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      20b8391f
  13. 13 4月, 2019 1 次提交
  14. 24 10月, 2018 3 次提交
  15. 24 8月, 2018 1 次提交
  16. 14 5月, 2018 1 次提交
    • D
      afs: Fix whole-volume callback handling · 68251f0a
      David Howells 提交于
      It's possible for an AFS file server to issue a whole-volume notification
      that callbacks on all the vnodes in the file have been broken.  This is
      done for R/O and backup volumes (which don't have per-file callbacks) and
      for things like a volume being taken offline.
      
      Fix callback handling to detect whole-volume notifications, to track it
      across operations and to check it during inode validation.
      
      Fixes: c435ee34 ("afs: Overhaul the callback handling")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      68251f0a
  17. 12 4月, 2018 1 次提交
  18. 10 4月, 2018 3 次提交
    • D
      afs: Do better accretion of small writes on newly created content · 5a813276
      David Howells 提交于
      Processes like ld that do lots of small writes that aren't necessarily
      contiguous result in a lot of small StoreData operations to the server, the
      idea being that if someone else changes the data on the server, we only
      write our changes over that and not the space between.  Further, we don't
      want to write back empty space if we can avoid it to make it easier for the
      server to do sparse files.
      
      However, making lots of tiny RPC ops is a lot less efficient for the server
      than one big one because each op requires allocation of resources and the
      taking of locks, so we want to compromise a bit.
      
      Reduce the load by the following:
      
       (1) If a file is just created locally or has just been truncated with
           O_TRUNC locally, allow subsequent writes to the file to be merged with
           intervening space if that space doesn't cross an entire intervening
           page.
      
       (2) Don't flush the file on ->flush() but rather on ->release() if the
           file was open for writing.
      
      Just linking vmlinux.o, without this patch, looking in /proc/fs/afs/stats:
      
      	file-wr : n=441 nb=513581204
      
      and after the patch:
      
      	file-wr : n=62 nb=513668555
      
      there were 379 fewer StoreData RPC operations at the expense of an extra
      87K being written.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      5a813276
    • D
      afs: Add stats for data transfer operations · 76a5cb6f
      David Howells 提交于
      Add statistics to /proc/fs/afs/stats for data transfer RPC operations.  New
      lines are added that look like:
      
      	file-rd : n=55794 nb=10252282150
      	file-wr : n=9789 nb=3247763645
      
      where n= indicates the number of ops completed and nb= indicates the number
      of bytes successfully transferred.  file-rd is the counts for read/fetch
      operations and file-wr the counts for write/store operations.
      
      Note that directory and symlink downloading are included in the file-rd
      stats at the moment.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      76a5cb6f
    • D
      afs: Fix directory handling · f3ddee8d
      David Howells 提交于
      AFS directories are structured blobs that are downloaded just like files
      and then parsed by the lookup and readdir code and, as such, are currently
      handled in the pagecache like any other file, with the entire directory
      content being thrown away each time the directory changes.
      
      However, since the blob is a known structure and since the data version
      counter on a directory increases by exactly one for each change committed
      to that directory, we can actually edit the directory locally rather than
      fetching it from the server after each locally-induced change.
      
      What we can't do, though, is mix data from the server and data from the
      client since the server is technically at liberty to rearrange or compress
      a directory if it sees fit, provided it updates the data version number
      when it does so and breaks the callback (ie. sends a notification).
      
      Further, lookup with lookup-ahead, readdir and, when it arrives, local
      editing are likely want to scan the whole of a directory.
      
      So directory handling needs to be improved to maintain the coherency of the
      directory blob prior to permitting local directory editing.
      
      To this end:
      
       (1) If any directory page gets discarded, invalidate and reread the entire
           directory.
      
       (2) If readpage notes that if when it fetches a single page that the
           version number has changed, the entire directory is flagged for
           invalidation.
      
       (3) Read as much of the directory in one go as we can.
      
      Note that this removes local caching of directories in fscache for the
      moment as we can't pass the pages to fscache_read_or_alloc_pages() since
      page->lru is in use by the LRU.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f3ddee8d
  19. 02 1月, 2018 1 次提交
  20. 24 11月, 2017 1 次提交
    • D
      afs: Make afs_write_begin() avoid writing to a page that's being stored · 5a039c32
      David Howells 提交于
      Make afs_write_begin() wait for a page that's marked PG_writeback because:
      
       (1) We need to avoid interference with the data being stored so that the
           data on the server ends up in a defined state.
      
       (2) page->private is used to track the window of dirty data within a page,
           but it's also used by the storage code to track what's being written,
           being cleared by the completion notification.  Ownership can't be
           relinquished by the storage code until completion because it a store
           fails, the data must be remarked dirty.
      
      Tracing shows something like the following (edited):
      
       x86_64-linux-gn-15940 [1] afs_page_dirty: vn=ffff8800bef33800 9c75 begin 0-125
          kworker/u8:3-114   [2] afs_page_dirty: vn=ffff8800bef33800 9c75 store+ 0-125
       x86_64-linux-gn-15940 [1] afs_page_dirty: vn=ffff8800bef33800 9c75 begin 0-2052
          kworker/u8:3-114   [2] afs_page_dirty: vn=ffff8800bef33800 9c75 clear 0-2052
          kworker/u8:3-114   [2] afs_page_dirty: vn=ffff8800bef33800 9c75 store 0-0
          kworker/u8:3-114   [2] afs_page_dirty: vn=ffff8800bef33800 9c75 WARN 0-0
      
      The clear (completion) corresponding to the store+ (store continuation from
      a previous page) happens between the second begin (afs_write_begin) and the
      store corresponding to that.  This results in the second store not seeing
      any data to write back, leading to the following warning:
      
      WARNING: CPU: 2 PID: 114 at ../fs/afs/write.c:403 afs_write_back_from_locked_page+0x19d/0x76c [kafs]
      Modules linked in: kafs(E)
      CPU: 2 PID: 114 Comm: kworker/u8:3 Tainted: G            E   4.14.0-fscache+ #242
      Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
      Workqueue: writeback wb_workfn (flush-afs-2)
      task: ffff8800cad72600 task.stack: ffff8800cad44000
      RIP: 0010:afs_write_back_from_locked_page+0x19d/0x76c [kafs]
      RSP: 0018:ffff8800cad47aa0 EFLAGS: 00010246
      RAX: 0000000000000001 RBX: ffff8800bef33a20 RCX: 0000000000000000
      RDX: 000000000000000f RSI: ffffffff81c5d0e0 RDI: ffff8800cad72e78
      RBP: ffff8800d31ea1e8 R08: ffff8800c1358000 R09: ffff8800ca00e400
      R10: ffff8800cad47a38 R11: ffff8800c5d9e400 R12: 0000000000000000
      R13: ffffea0002d9df00 R14: ffffffffa0023c1c R15: 0000000000007fdf
      FS:  0000000000000000(0000) GS:ffff8800ca700000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f85ac6c4000 CR3: 0000000001c10001 CR4: 00000000001606e0
      Call Trace:
       ? clear_page_dirty_for_io+0x23a/0x267
       afs_writepages_region+0x1be/0x286 [kafs]
       afs_writepages+0x60/0x127 [kafs]
       do_writepages+0x36/0x70
       __writeback_single_inode+0x12f/0x635
       writeback_sb_inodes+0x2cc/0x452
       __writeback_inodes_wb+0x68/0x9f
       wb_writeback+0x208/0x470
       ? wb_workfn+0x22b/0x565
       wb_workfn+0x22b/0x565
       ? worker_thread+0x230/0x2ac
       process_one_work+0x2cc/0x517
       ? worker_thread+0x230/0x2ac
       worker_thread+0x1d4/0x2ac
       ? rescuer_thread+0x29b/0x29b
       kthread+0x15d/0x165
       ? kthread_create_on_node+0x3f/0x3f
       ? call_usermodehelper_exec_async+0x118/0x11f
       ret_from_fork+0x24/0x30
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      5a039c32
  21. 16 11月, 2017 2 次提交
  22. 13 11月, 2017 5 次提交
    • D
      afs: Trace page dirty/clean · 13524ab3
      David Howells 提交于
      Add a trace event that logs the dirtying and cleaning of pages attached to
      AFS inodes.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      13524ab3
    • D
      afs: Implement shared-writeable mmap · 1cf7a151
      David Howells 提交于
      Implement shared-writeable mmap for AFS.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      1cf7a151
    • D
      afs: Get rid of the afs_writeback record · 4343d008
      David Howells 提交于
      Get rid of the afs_writeback record that kAFS is using to match keys with
      writes made by that key.
      
      Instead, keep a list of keys that have a file open for writing and/or
      sync'ing and iterate through those.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      4343d008
    • D
      afs: Introduce a file-private data record · 215804a9
      David Howells 提交于
      Introduce a file-private data record for kAFS and put the key into it
      rather than storing the key in file->private_data.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      215804a9
    • D
      afs: Overhaul volume and server record caching and fileserver rotation · d2ddc776
      David Howells 提交于
      The current code assumes that volumes and servers are per-cell and are
      never shared, but this is not enforced, and, indeed, public cells do exist
      that are aliases of each other.  Further, an organisation can, say, set up
      a public cell and a private cell with overlapping, but not identical, sets
      of servers.  The difference is purely in the database attached to the VL
      servers.
      
      The current code will malfunction if it sees a server in two cells as it
      assumes global address -> server record mappings and that each server is in
      just one cell.
      
      Further, each server may have multiple addresses - and may have addresses
      of different families (IPv4 and IPv6, say).
      
      To this end, the following structural changes are made:
      
       (1) Server record management is overhauled:
      
           (a) Server records are made independent of cell.  The namespace keeps
           	 track of them, volume records have lists of them and each vnode
           	 has a server on which its callback interest currently resides.
      
           (b) The cell record no longer keeps a list of servers known to be in
           	 that cell.
      
           (c) The server records are now kept in a flat list because there's no
           	 single address to sort on.
      
           (d) Server records are now keyed by their UUID within the namespace.
      
           (e) The addresses for a server are obtained with the VL.GetAddrsU
           	 rather than with VL.GetEntryByName, using the server's UUID as a
           	 parameter.
      
           (f) Cached server records are garbage collected after a period of
           	 non-use and are counted out of existence before purging is allowed
           	 to complete.  This protects the work functions against rmmod.
      
           (g) The servers list is now in /proc/fs/afs/servers.
      
       (2) Volume record management is overhauled:
      
           (a) An RCU-replaceable server list is introduced.  This tracks both
           	 servers and their coresponding callback interests.
      
           (b) The superblock is now keyed on cell record and numeric volume ID.
      
           (c) The volume record is now tied to the superblock which mounts it,
           	 and is activated when mounted and deactivated when unmounted.
           	 This makes it easier to handle the cache cookie without causing a
           	 double-use in fscache.
      
           (d) The volume record is loaded from the VLDB using VL.GetEntryByNameU
           	 to get the server UUID list.
      
           (e) The volume name is updated if it is seen to have changed when the
           	 volume is updated (the update is keyed on the volume ID).
      
       (3) The vlocation record is got rid of and VLDB records are no longer
           cached.  Sufficient information is stored in the volume record, though
           an update to a volume record is now no longer shared between related
           volumes (volumes come in bundles of three: R/W, R/O and backup).
      
      and the following procedural changes are made:
      
       (1) The fileserver cursor introduced previously is now fleshed out and
           used to iterate over fileservers and their addresses.
      
       (2) Volume status is checked during iteration, and the server list is
           replaced if a change is detected.
      
       (3) Server status is checked during iteration, and the address list is
           replaced if a change is detected.
      
       (4) The abort code is saved into the address list cursor and -ECONNABORTED
           returned in afs_make_call() if a remote abort happened rather than
           translating the abort into an error message.  This allows actions to
           be taken depending on the abort code more easily.
      
           (a) If a VMOVED abort is seen then this is handled by rechecking the
           	 volume and restarting the iteration.
      
           (b) If a VBUSY, VRESTARTING or VSALVAGING abort is seen then this is
               handled by sleeping for a short period and retrying and/or trying
               other servers that might serve that volume.  A message is also
               displayed once until the condition has cleared.
      
           (c) If a VOFFLINE abort is seen, then this is handled as VBUSY for the
           	 moment.
      
           (d) If a VNOVOL abort is seen, the volume is rechecked in the VLDB to
           	 see if it has been deleted; if not, the fileserver is probably
           	 indicating that the volume couldn't be attached and needs
           	 salvaging.
      
           (e) If statfs() sees one of these aborts, it does not sleep, but
           	 rather returns an error, so as not to block the umount program.
      
       (5) The fileserver iteration functions in vnode.c are now merged into
           their callers and more heavily macroised around the cursor.  vnode.c
           is removed.
      
       (6) Operations on a particular vnode are serialised on that vnode because
           the server will lock that vnode whilst it operates on it, so a second
           op sent will just have to wait.
      
       (7) Fileservers are probed with FS.GetCapabilities before being used.
           This is where service upgrade will be done.
      
       (8) A callback interest on a fileserver is set up before an FS operation
           is performed and passed through to afs_make_call() so that it can be
           set on the vnode if the operation returns a callback.  The callback
           interest is passed through to afs_iget() also so that it can be set
           there too.
      
      In general, record updating is done on an as-needed basis when we try to
      access servers, volumes or vnodes rather than offloading it to work items
      and special threads.
      
      Notes:
      
       (1) Pre AFS-3.4 servers are no longer supported, though this can be added
           back if necessary (AFS-3.4 was released in 1998).
      
       (2) VBUSY is retried forever for the moment at intervals of 1s.
      
       (3) /proc/fs/afs/<cell>/servers no longer exists.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      d2ddc776
  23. 01 8月, 2017 1 次提交
    • J
      fs: convert a pile of fsync routines to errseq_t based reporting · 3b49c9a1
      Jeff Layton 提交于
      This patch converts most of the in-kernel filesystems that do writeback
      out of the pagecache to report errors using the errseq_t-based
      infrastructure that was recently added. This allows them to report
      errors once for each open file description.
      
      Most filesystems have a fairly straightforward fsync operation. They
      call filemap_write_and_wait_range to write back all of the data and
      wait on it, and then (sometimes) sync out the metadata.
      
      For those filesystems this is a straightforward conversion from calling
      filemap_write_and_wait_range in their fsync operation to calling
      file_write_and_wait_range.
      Acked-by: NJan Kara <jack@suse.cz>
      Acked-by: NDave Kleikamp <dave.kleikamp@oracle.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      3b49c9a1
  24. 17 3月, 2017 3 次提交