提交 · 185f0c7073bd5c78f86265f703f5daf1306ab5a7 · openeuler / Kernel

29 10月, 2020 1 次提交

afs: Wrap page->private manipulations in inline functions · 185f0c70

由 David Howells 提交于 10月 26, 2020

The afs filesystem uses page->private to store the dirty range within a
page such that in the event of a conflicting 3rd-party write to the server,
we write back just the bits that got changed locally.

However, there are a couple of problems with this:

 (1) I need a bit to note if the page might be mapped so that partial
     invalidation doesn't shrink the range.

 (2) There aren't necessarily sufficient bits to store the entire range of
     data altered (say it's a 32-bit system with 64KiB pages or transparent
     huge pages are in use).

So wrap the accesses in inline functions so that future commits can change
how this works.

Also move them out of the tracing header into the in-directory header.
There's not really any need for them to be in the tracing header.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

185f0c70

28 10月, 2020 1 次提交

afs: Fix afs_launder_page to not clear PG_writeback · d383e346

由 David Howells 提交于 10月 22, 2020

Fix afs_launder_page() to not clear PG_writeback on the page it is
laundering as the flag isn't set in this case.

Fixes: 4343d008 ("afs: Get rid of the afs_writeback record")
Signed-off-by: NDavid Howells <dhowells@redhat.com>

d383e346

16 10月, 2020 4 次提交

afs: Add tracing for cell refcount and active user count · dca54a7b

由 David Howells 提交于 10月 13, 2020

Add a tracepoint to log the cell refcount and active user count and pass in
a reason code through various functions that manipulate these counters.

Additionally, a helper function, afs_see_cell(), is provided to log
interesting places that deal with a cell without actually doing any
accounting directly.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

dca54a7b

afs: Fix cell removal · 1d0e850a

由 David Howells 提交于 10月 16, 2020

Fix cell removal by inserting a more final state than AFS_CELL_FAILED that
indicates that the cell has been unpublished in case the manager is already
requeued and will go through again.  The new AFS_CELL_REMOVED state will
just immediately leave the manager function.

Going through a second time in the AFS_CELL_FAILED state will cause it to
try to remove the cell again, potentially leading to the proc list being
removed.

Fixes: 989782dc ("afs: Overhaul cell database management")
Reported-by: syzbot+b994ecf2b023f14832c1@syzkaller.appspotmail.com
Reported-by: syzbot+0e0db88e1eb44a91ae8d@syzkaller.appspotmail.com
Reported-by: syzbot+2d0585e5efcd43d113c2@syzkaller.appspotmail.com
Reported-by: syzbot+1ecc2f9d3387f1d79d42@syzkaller.appspotmail.com
Reported-by: syzbot+18d51774588492bf3f69@syzkaller.appspotmail.com
Reported-by: syzbot+a5e4946b04d6ca8fa5f3@syzkaller.appspotmail.com
Suggested-by: NHillf Danton <hdanton@sina.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>

1d0e850a

afs: Fix cell refcounting by splitting the usage counter · 88c853c3

由 David Howells 提交于 7月 23, 2019

Management of the lifetime of afs_cell struct has some problems due to the
usage counter being used to determine whether objects of that type are in
use in addition to whether anyone might be interested in the structure.

This is made trickier by cell objects being cached for a period of time in
case they're quickly reused as they hold the result of a setup process that
may be slow (DNS lookups, AFS RPC ops).

Problems include the cached root volume from alias resolution pinning its
parent cell record, rmmod occasionally hanging and occasionally producing
assertion failures.

Fix this by splitting the count of active users from the struct reference
count.  Things then work as follows:

 (1) The cell cache keeps +1 on the cell's activity count and this has to
     be dropped before the cell can be removed.  afs_manage_cell() tries to
     exchange the 1 to a 0 with the cells_lock write-locked, and if
     successful, the record is removed from the net->cells.

 (2) One struct ref is 'owned' by the activity count.  That is put when the
     active count is reduced to 0 (final_destruction label).

 (3) A ref can be held on a cell whilst it is queued for management on a
     work queue without confusing the active count.  afs_queue_cell() is
     added to wrap this.

 (4) The queue's ref is dropped at the end of the management.  This is
     split out into a separate function, afs_manage_cell_work().

 (5) The root volume record is put after a cell is removed (at the
     final_destruction label) rather then in the RCU destruction routine.

 (6) Volumes hold struct refs, but aren't active users.

 (7) Both counts are displayed in /proc/net/afs/cells.

There are some management function changes:

 (*) afs_put_cell() now just decrements the refcount and triggers the RCU
     destruction if it becomes 0.  It no longer sets a timer to have the
     manager do this.

 (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease
     the active count.  afs_unuse_cell() sets the management timer.

 (*) afs_queue_cell() is added to queue a cell with approprate refs.

There are also some other fixes:

 (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL.

 (*) Make sure that candidate cells in lookups are properly destroyed
     rather than being simply kfree'd.  This ensures the bits it points to
     are destroyed also.

 (*) afs_dec_cells_outstanding() is now called in cell destruction rather
     than at "final_destruction".  This ensures that cell->net is still
     valid to the end of the destructor.

 (*) As a consequence of the previous two changes, move the increment of
     net->cells_outstanding that was at the point of insertion into the
     tree to the allocation routine to correctly balance things.

Fixes: 989782dc ("afs: Overhaul cell database management")
Signed-off-by: NDavid Howells <dhowells@redhat.com>

88c853c3

afs: Fix rapid cell addition/removal by not using RCU on cells tree · 92e3cc91

由 David Howells 提交于 10月 09, 2020

There are a number of problems that are being seen by the rapidly mounting
and unmounting an afs dynamic root with an explicit cell and volume
specified (which should probably be rejected, but that's a separate issue):

What the tests are doing is to look up/create a cell record for the name
given and then tear it down again without actually using it to try to talk
to a server.  This is repeated endlessly, very fast, and the new cell
collides with the old one if it's not quick enough to reuse it.

It appears (as suggested by Hillf Danton) that the search through the RB
tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and
that it's not blocking the write_seqlock(), despite taking two passes at
it.  He suggested that the code should take a ref on the cell it's
attempting to look at - but this shouldn't be necessary until we've
compared the cell names.  It's possible that I'm missing a barrier
somewhere.

However, using an RCU search for this is overkill, really - we only need to
access the cell name in a few places, and they're places where we're may
end up sleeping anyway.

Fix this by switching to an R/W semaphore instead.

Additionally, draw the down_read() call inside the function (renamed to
afs_find_cell()) since all the callers were taking the RCU read lock (or
should've been[*]).

[*] afs_probe_cell_name() should have been, but that doesn't appear to be
involved in the bug reports.

The symptoms of this look like:

	general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN
	KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf]
	...
	RIP: 0010:strncasecmp lib/string.c:52 [inline]
	RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43
	 afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88
	 afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249
	 afs_parse_source fs/afs/super.c:290 [inline]
	...

Fixes: 989782dc ("afs: Overhaul cell database management")
Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
cc: syzkaller-bugs@googlegroups.com

92e3cc91

09 10月, 2020 1 次提交

afs: Fix deadlock between writeback and truncate · ec0fa0b6

由 David Howells 提交于 10月 07, 2020

The afs filesystem has a lock[*] that it uses to serialise I/O operations
going to the server (vnode->io_lock), as the server will only perform one
modification operation at a time on any given file or directory.  This
prevents the the filesystem from filling up all the call slots to a server
with calls that aren't going to be executed in parallel anyway, thereby
allowing operations on other files to obtain slots.

  [*] Note that is probably redundant for directories at least since
      i_rwsem is used to serialise directory modifications and
      lookup/reading vs modification.  The server does allow parallel
      non-modification ops, however.

When a file truncation op completes, we truncate the in-memory copy of the
file to match - but we do it whilst still holding the io_lock, the idea
being to prevent races with other operations.

However, if writeback starts in a worker thread simultaneously with
truncation (whilst notify_change() is called with i_rwsem locked, writeback
pays it no heed), it may manage to set PG_writeback bits on the pages that
will get truncated before afs_setattr_success() manages to call
truncate_pagecache().  Truncate will then wait for those pages - whilst
still inside io_lock:

    # cat /proc/8837/stack
    [<0>] wait_on_page_bit_common+0x184/0x1e7
    [<0>] truncate_inode_pages_range+0x37f/0x3eb
    [<0>] truncate_pagecache+0x3c/0x53
    [<0>] afs_setattr_success+0x4d/0x6e
    [<0>] afs_wait_for_operation+0xd8/0x169
    [<0>] afs_do_sync_operation+0x16/0x1f
    [<0>] afs_setattr+0x1fb/0x25d
    [<0>] notify_change+0x2cf/0x3c4
    [<0>] do_truncate+0x7f/0xb2
    [<0>] do_sys_ftruncate+0xd1/0x104
    [<0>] do_syscall_64+0x2d/0x3a
    [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

The writeback operation, however, stalls indefinitely because it needs to
get the io_lock to proceed:

    # cat /proc/5940/stack
    [<0>] afs_get_io_locks+0x58/0x1ae
    [<0>] afs_begin_vnode_operation+0xc7/0xd1
    [<0>] afs_store_data+0x1b2/0x2a3
    [<0>] afs_write_back_from_locked_page+0x418/0x57c
    [<0>] afs_writepages_region+0x196/0x224
    [<0>] afs_writepages+0x74/0x156
    [<0>] do_writepages+0x2d/0x56
    [<0>] __writeback_single_inode+0x84/0x207
    [<0>] writeback_sb_inodes+0x238/0x3cf
    [<0>] __writeback_inodes_wb+0x68/0x9f
    [<0>] wb_writeback+0x145/0x26c
    [<0>] wb_do_writeback+0x16a/0x194
    [<0>] wb_workfn+0x74/0x177
    [<0>] process_one_work+0x174/0x264
    [<0>] worker_thread+0x117/0x1b9
    [<0>] kthread+0xec/0xf1
    [<0>] ret_from_fork+0x1f/0x30

and thus deadlock has occurred.

Note that whilst afs_setattr() calls filemap_write_and_wait(), the fact
that the caller is holding i_rwsem doesn't preclude more pages being
dirtied through an mmap'd region.

Fix this by:

 (1) Use the vnode validate_lock to mediate access between afs_setattr()
     and afs_writepages():

     (a) Exclusively lock validate_lock in afs_setattr() around the whole
     	 RPC operation.

     (b) If WB_SYNC_ALL isn't set on entry to afs_writepages(), trying to
     	 shared-lock validate_lock and returning immediately if we couldn't
     	 get it.

     (c) If WB_SYNC_ALL is set, wait for the lock.

     The validate_lock is also used to validate a file and to zap its cache
     if the file was altered by a third party, so it's probably a good fit
     for this.

 (2) Move the truncation outside of the io_lock in setattr, using the same
     hook as is used for local directory editing.

     This requires the old i_size to be retained in the operation record as
     we commit the revised status to the inode members inside the io_lock
     still, but we still need to know if we reduced the file size.

Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ec0fa0b6

21 8月, 2020 3 次提交

afs: Don't use VL probe running state to make decisions outside probe code · b95b3094

由 David Howells 提交于 8月 19, 2020

Don't use the running state for VL server probes to make decisions about
which server to use as the state is cleared at the start of a probe and
intermediate values might also be misleading.

Instead, add a separate 'latest known' rtt in the afs_vlserver struct and a
flag to indicate if the server is known to be responding and update these
as and when we know what to change them to.

Fixes: 3bf0fb6f ("afs: Probe multiple fileservers simultaneously")
Signed-off-by: NDavid Howells <dhowells@redhat.com>

b95b3094

afs: Expose information from afs_vlserver through /proc for debugging · fb72cd3d

由 David Howells 提交于 8月 20, 2020

Convert various bitfields in afs_vlserver::probe to a mask and then expose
this and some other bits of information through /proc/net/afs/<cell>/vlservers
to make it easier to debug VL server communication issues.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

fb72cd3d

afs: Remove afs_vlserver->probe.have_result · 4f4c2c05

由 David Howells 提交于 8月 19, 2020

Remove afs_vlserver->probe.have_result as it's neither read nor waited
upon.

Fixes: 3bf0fb6f ("afs: Probe multiple fileservers simultaneously")
Signed-off-by: NDavid Howells <dhowells@redhat.com>

4f4c2c05

28 6月, 2020 1 次提交

afs: Fix storage of cell names · 719fdd32

由 David Howells 提交于 6月 24, 2020

The cell name stored in the afs_cell struct is a 64-char + NUL buffer -
when it needs to be able to handle up to AFS_MAXCELLNAME (256 chars) + NUL.

Fix this by changing the array to a pointer and allocating the string.

Found using Coverity.

Fixes: 989782dc ("afs: Overhaul cell database management")
Reported-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

719fdd32

21 6月, 2020 1 次提交

afs: Fix hang on rmmod due to outstanding timer · 5481fc6e

由 David Howells 提交于 6月 19, 2020

The fileserver probe timer, net->fs_probe_timer, isn't cancelled when
the kafs module is being removed and so the count it holds on
net->servers_outstanding doesn't get dropped..

This causes rmmod to wait forever.  The hung process shows a stack like:

	afs_purge_servers+0x1b5/0x23c [kafs]
	afs_net_exit+0x44/0x6e [kafs]
	ops_exit_list+0x72/0x93
	unregister_pernet_operations+0x14c/0x1ba
	unregister_pernet_subsys+0x1d/0x2a
	afs_exit+0x29/0x6f [kafs]
	__do_sys_delete_module.isra.0+0x1a2/0x24b
	do_syscall_64+0x51/0x95
	entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix this by:

 (1) Attempting to cancel the probe timer and, if successful, drop the
     count that the timer was holding.

 (2) Make the timer function just drop the count and not schedule the
     prober if the afs portion of net namespace is being destroyed.

Also, whilst we're at it, make the following changes:

 (3) Initialise net->servers_outstanding to 1 and decrement it before
     waiting on it so that it doesn't generate wake up events by being
     decremented to 0 until we're cleaning up.

 (4) Switch the atomic_dec() on ->servers_outstanding for ->fs_timer in
     afs_purge_servers() to use the helper function for that.

Fixes: f6cbb368 ("afs: Actively poll fileservers to maintain NAT or firewall openings")
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5481fc6e

17 6月, 2020 1 次提交

afs: Fix silly rename · b6489a49

由 David Howells 提交于 6月 15, 2020

Fix AFS's silly rename by the following means:

 (1) Set the destination directory in afs_do_silly_rename() so as to avoid
     misbehaviour and indicate that the directory data version will
     increment by 1 so as to avoid warnings about unexpected changes in the
     DV.  Also indicate that the ctime should be updated to avoid xfstest
     grumbling.

 (2) Note when the server indicates that a directory changed more than we
     expected (AFS_OPERATION_DIR_CONFLICT), indicating a conflict with a
     third party change, checking on successful completion of unlink and
     rename.

     The problem is that the FS.RemoveFile RPC op doesn't report the status
     of the unlinked file, though YFS.RemoveFile2 does.  This can be
     mitigated by the assumption that if the directory DV cranked by
     exactly 1, we can be sure we removed one link from the file; further,
     ordinarily in AFS, files cannot be hardlinked across directories, so
     if we reduce nlink to 0, the file is deleted.

     However, if the directory DV jumps by more than 1, we cannot know if a
     third party intervened by adding or removing a link on the file we
     just removed a link from.

     The same also goes for any vnode that is at the destination of the
     FS.Rename RPC op.

 (3) Make afs_vnode_commit_status() apply the nlink drop inside the cb_lock
     section along with the other attribute updates if ->op_unlinked is set
     on the descriptor for the appropriate vnode.

 (4) Issue a follow up status fetch to the unlinked file in the event of a
     third party conflict that makes it impossible for us to know if we
     actually deleted the file or not.

 (5) Provide a flag, AFS_VNODE_SILLY_DELETED, to make afs_getattr() lie to
     the user about the nlink of a silly deleted file so that it appears as
     0, not 1.

Found with the generic/035 and generic/084 xfstests.

Fixes: e49c7b2f ("afs: Build an abstraction around an "operation" concept")
Reported-by: NMarc Dionne <marc.dionne@auristor.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>

b6489a49

16 6月, 2020 3 次提交

afs: Fix use of afs_check_for_remote_deletion() · 728279a5

由 David Howells 提交于 6月 16, 2020

afs_check_for_remote_deletion() checks to see if error ENOENT is returned
by the server in response to an operation and, if so, marks the primary
vnode as having been deleted as the FID is no longer valid.

However, it's being called from the operation success functions, where no
abort has happened - and if an inline abort is recorded, it's handled by
afs_vnode_commit_status().

Fix this by actually calling the operation aborted method if provided and
having that point to afs_check_for_remote_deletion().

Fixes: e49c7b2f ("afs: Build an abstraction around an "operation" concept")
Signed-off-by: NDavid Howells <dhowells@redhat.com>

728279a5

afs: Remove afs_operation::abort_code · 44767c35

由 David Howells 提交于 6月 16, 2020

Remove afs_operation::abort_code as it's read but never set.  Use
ac.abort_code instead.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

44767c35

D
afs: Remove yfs_fs_fetch_file_status() as it's not used · 6c85cacc
由 David Howells 提交于 6月 16, 2020
```
Remove yfs_fs_fetch_file_status() as it's no longer used.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
```
6c85cacc

15 6月, 2020 2 次提交

afs: Fix truncation issues and mmap writeback size · 793fe82e

由 David Howells 提交于 6月 12, 2020

Fix the following issues:

 (1) Fix writeback to reduce the size of a store operation to i_size,
     effectively discarding the extra data.

     The problem comes when afs_page_mkwrite() records that a page is about
     to be modified by mmap().  It doesn't know what bits of the page are
     going to be modified, so it records the whole page as being dirty
     (this is stored in page->private as start and end offsets).

     Without this, the marshalling for the store to the server extends the
     size of the file to the end of the page (in afs_fs_store_data() and
     yfs_fs_store_data()).

 (2) Fix setattr to actually truncate the pagecache, thereby clearing
     the discarded part of a file.

 (3) Fix setattr to check that the new size is okay and to disable
     ATTR_SIZE if i_size wouldn't change.

 (4) Force i_size to be updated as the result of a truncate.

 (5) Don't truncate if ATTR_SIZE is not set.

 (6) Call pagecache_isize_extended() if the file was enlarged.

Note that truncate_set_size() isn't used because the setting of i_size is
done inside afs_vnode_commit_status() under the vnode->cb_lock.

Found with the generic/029 and generic/393 xfstests.

Fixes: 31143d5d ("AFS: implement basic file write support")
Fixes: 4343d008 ("afs: Get rid of the afs_writeback record")
Signed-off-by: NDavid Howells <dhowells@redhat.com>

793fe82e

afs: Concoct ctimes · da8d0755

由 David Howells 提交于 6月 13, 2020

The in-kernel afs filesystem ignores ctime because the AFS fileserver
protocol doesn't support ctimes.  This, however, causes various xfstests to
fail.

Work around this by:

 (1) Setting ctime to attr->ia_ctime in afs_setattr().

 (2) Not ignoring ATTR_MTIME_SET, ATTR_TIMES_SET and ATTR_TOUCH settings.

 (3) Setting the ctime from the server mtime when on the target file when
     creating a hard link to it.

 (4) Setting the ctime on directories from their revised mtimes when
     renaming/moving a file.

Found by the generic/221 and generic/309 xfstests.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

da8d0755

10 6月, 2020 1 次提交

afs: Make afs_zap_data() static · c68421bb

由 David Howells 提交于 2月 10, 2020

Make afs_zap_data() static as it's only used in the file in which it is
defined.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

c68421bb

09 6月, 2020 1 次提交

afs: Fix file locking · 5749ce92

由 David Howells 提交于 6月 04, 2020

Fix AFS file locking to use the correct vnode pointer and remove a member
of the afs_operation struct that is never set, but it is read and followed,
causing an oops.

This can be triggered by:

	flock -s /afs/example.com/foo sleep 1

when it calls the kernel to get a file lock.

Fixes: e49c7b2f ("afs: Build an abstraction around an "operation" concept")
Reported-by: NDave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Tested-by: NDave Botsch <botsch@cnf.cornell.edu>

5749ce92

04 6月, 2020 11 次提交

afs: Adjust the fileserver rotation algorithm to reprobe/retry more quickly · 8409f67b

由 David Howells 提交于 4月 22, 2020

Adjust the fileserver rotation algorithm so that if we've tried all the
addresses on a server (cumulatively over multiple operations) until we've
run out of untried addresses, immediately reprobe all that server's
interfaces and retry the op at least once before we move onto the next
server.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

8409f67b

afs: Show more a bit more server state in /proc/net/afs/servers · 32275d3f

由 David Howells 提交于 5月 02, 2020

Display more information about the state of a server record, including the
flags, rtt and break counter plus the probe state for each server in
/proc/net/afs/servers.

Rearrange the server flags a bit to make them easier to read at a glance in
the proc file.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

32275d3f

afs: Don't use probe running state to make decisions outside probe code · f3c130e6

由 David Howells 提交于 5月 02, 2020

Don't use the running state for fileserver probes to make decisions about
which server to use as the state is cleared at the start of a probe and
also intermediate values might be misleading.

Instead, add a separate 'latest known' rtt in the afs_server struct and a
flag to indicate if the server is known to be responding and update these
as and when we know what to change them to.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

f3c130e6

afs: Fix the by-UUID server tree to allow servers with the same UUID · 3c4c4075

由 David Howells 提交于 5月 27, 2020

Whilst it shouldn't happen, it is possible for multiple fileservers to
share a UUID, particularly if an entire cell has been duplicated, UUIDs and
all. In such a case, it's not necessarily possible to map the effect of
the CB.InitCallBackState3 incoming RPC to a specific server unambiguously
by UUID and thus to a specific cell.

Indeed, there's a problem whereby multiple server records may need to
occupy the same spot in the rb_tree rooted in the afs_net struct.

Fix this by allowing servers to form a list, with the head of the list in
the tree. When the front entry in the list is removed, the second in the
list just replaces it. afs_init_callback_state() then just goes down the
line, poking each server in the list.

This means that some servers will be unnecessarily poked, unfortunately.
An alternative would be to route by call parameters.
Reported-by: NJeffrey Altman <jaltman@auristor.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")

3c4c4075

afs: Reorganise volume and server trees to be rooted on the cell · 20325960

由 David Howells 提交于 4月 30, 2020

Reorganise afs_volume objects such that they're in a tree keyed on volume
ID, rooted at on an afs_cell object rather than being in multiple trees,
each of which is rooted on an afs_server object.

afs_server structs become per-cell and acquire a pointer to the cell.

The process of breaking a callback then starts with finding the server by
its network address, following that to the cell and then looking up each
volume ID in the volume tree.

This is simpler than the afs_vol_interest/afs_cb_interest N:M mapping web
and allows those structs and the code for maintaining them to be simplified
or removed.

It does make a couple of things a bit more tricky, though:

 (1) Operations now start with a volume, not a server, so there can be more
     than one answer as to whether or not the server we'll end up using
     supports the FS.InlineBulkStatus RPC.

 (2) CB RPC operations that specify the server UUID.  There's still a tree
     of servers by UUID on the afs_net struct, but the UUIDs in it aren't
     guaranteed unique.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

20325960

D
afs: Add a tracepoint to track the lifetime of the afs_volume struct · cca37d45
由 David Howells 提交于 4月 29, 2020
```
Add a tracepoint to track the lifetime of the afs_volume struct.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
```
cca37d45

afs: Detect cell aliases 1 - Cells with root volumes · 8a070a96

由 David Howells 提交于 4月 25, 2020

Put in the first phase of cell alias detection.  This part handles alias
detection for cells that have root.cell volumes (which is expected to be
likely).

When a cell becomes newly active, it is probed for its root.cell volume,
and if it has one, this volume is compared against other root.cell volumes
to find out if the list of fileserver UUIDs have any in common - and if
that's the case, do the address lists of those fileservers have any
addresses in common.  If they do, the new cell is adjudged to be an alias
of the old cell and the old cell is used instead.

Comparing is aided by the server list in struct afs_server_list being
sorted in UUID order and the addresses in the fileserver address lists
being sorted in address order.

The cell then retains the afs_volume object for the root.cell volume, even
if it's not mounted for future alias checking.

This necessary because:

 (1) Whilst fileservers have UUIDs that are meant to be globally unique, in
     practice they are not because cells get cloned without changing the
     UUIDs - so afs_server records need to be per cell.

 (2) Sometimes the DNS is used to make cell aliases - but if we don't know
     they're the same, we may end up with multiple superblocks and multiple
     afs_server records for the same thing, impairing our ability to
     deliver callback notifications of third party changes

 (3) The fileserver RPC API doesn't contain the cell name, so it can't tell
     us which cell it's notifying and can't see that a change made to to
     one cell should notify the same client that's also accessed as the
     other cell.
Reported-by: NJeffrey Altman <jaltman@auristor.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>

8a070a96

afs: Implement client support for the YFSVL.GetCellName RPC op · c3e9f888

由 David Howells 提交于 4月 29, 2020

Implement client support for the YFSVL.GetCellName RPC operation by which
YFS permits the canonical cell name to be queried from a VL server.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

c3e9f888

afs: Retain more of the VLDB record for alias detection · 194d28cf

由 David Howells 提交于 4月 27, 2020

Save more bits from the volume location database record obtained for a
server so that we can use this information in cell alias detection.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

194d28cf

afs: Don't get epoch from a server because it may be ambiguous · 44746355

由 David Howells 提交于 5月 27, 2020

Don't get the epoch from a server, particularly one that we're looking up
by UUID, as UUIDs may be ambiguous and may map to more than one server - so
we can't draw any conclusions from it.
Reported-by: NJeffrey Altman <jaltman@auristor.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>

44746355

afs: Build an abstraction around an "operation" concept · e49c7b2f

由 David Howells 提交于 4月 10, 2020

Turn the afs_operation struct into the main way that most fileserver
operations are managed.  Various things are added to the struct, including
the following:

 (1) All the parameters and results of the relevant operations are moved
     into it, removing corresponding fields from the afs_call struct.
     afs_call gets a pointer to the op.

 (2) The target volume is made the main focus of the operation, rather than
     the target vnode(s), and a bunch of op->vnode->volume are made
     op->volume instead.

 (3) Two vnode records are defined (op->file[]) for the vnode(s) involved
     in most operations.  The vnode record (struct afs_vnode_param)
     contains:

	- The vnode pointer.

	- The fid of the vnode to be included in the parameters or that was
          returned in the reply (eg. FS.MakeDir).

	- The status and callback information that may be returned in the
     	  reply about the vnode.

	- Callback break and data version tracking for detecting
          simultaneous third-parth changes.

 (4) Pointers to dentries to be updated with new inodes.

 (5) An operations table pointer.  The table includes pointers to functions
     for issuing AFS and YFS-variant RPCs, handling the success and abort
     of an operation and handling post-I/O-lock local editing of a
     directory.

To make this work, the following function restructuring is made:

 (A) The rotation loop that issues calls to fileservers that can be found
     in each function that wants to issue an RPC (such as afs_mkdir()) is
     extracted out into common code, in a new file called fs_operation.c.

 (B) The rotation loops, such as the one in afs_mkdir(), are replaced with
     a much smaller piece of code that allocates an operation, sets the
     parameters and then calls out to the common code to do the actual
     work.

 (C) The code for handling the success and failure of an operation are
     moved into operation functions (as (5) above) and these are called
     from the core code at appropriate times.

 (D) The pseudo inode getting stuff used by the dynamic root code is moved
     over into dynroot.c.

 (E) struct afs_iget_data is absorbed into the operation struct and
     afs_iget() expects to be given an op pointer and a vnode record.

 (F) Point (E) doesn't work for the root dir of a volume, but we know the
     FID in advance (it's always vnode 1, unique 1), so a separate inode
     getter, afs_root_iget(), is provided to special-case that.

 (G) The inode status init/update functions now also take an op and a vnode
     record.

 (H) The RPC marshalling functions now, for the most part, just take an
     afs_operation struct as their only argument.  All the data they need
     is held there.  The result delivery functions write their answers
     there as well.

 (I) The call is attached to the operation and then the operation core does
     the waiting.

And then the new operation code is, for the moment, made to just initialise
the operation, get the appropriate vnode I/O locks and do the same rotation
loop as before.

This lays the foundation for the following changes in the future:

 (*) Overhauling the rotation (again).

 (*) Support for asynchronous I/O, where the fileserver rotation must be
     done asynchronously also.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

e49c7b2f

31 5月, 2020 7 次提交

afs: Rename struct afs_fs_cursor to afs_operation · a310082f

由 David Howells 提交于 3月 20, 2020

As a prelude to implementing asynchronous fileserver operations in the afs
filesystem, rename struct afs_fs_cursor to afs_operation.

This struct is going to form the core of the operation management and is
going to acquire more members in later.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

a310082f

afs: Remove the error argument from afs_protocol_error() · 7126ead9

由 David Howells 提交于 4月 08, 2020

Remove the error argument from afs_protocol_error() as it's always
-EBADMSG.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

7126ead9

afs: Set error flag rather than return error from file status decode · 38355eec

由 David Howells 提交于 4月 08, 2020

Set a flag in the call struct to indicate an unmarshalling error rather
than return and handle an error from the decoding of file statuses. This
flag is checked on a successful return from the delivery function.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

38355eec

afs: Make callback processing more efficient. · 8230fd82

由 David Howells 提交于 3月 27, 2020

afs_vol_interest objects represent the volume IDs currently being accessed
from a fileserver.  These hold lists of afs_cb_interest objects that
repesent the superblocks using that volume ID on that server.

When a callback notification from the server telling of a modification by
another client arrives, the volume ID specified in the notification is
looked up in the server's afs_vol_interest list.  Through the
afs_cb_interest list, the relevant superblocks can be iterated over and the
specific inode looked up and marked in each one.

Make the following efficiency improvements:

 (1) Hold rcu_read_lock() over the entire processing rather than locking it
     each time.

 (2) Do all the callbacks for each vid together rather than individually.
     Each volume then only needs to be looked up once.

 (3) afs_vol_interest objects are now stored in an rb_tree rather than a
     flat list to reduce the lookup step count.

 (4) afs_vol_interest lookup is now done with RCU, but because it's in an
     rb_tree which may rotate under us, a seqlock is used so that if it
     changes during the walk, we repeat the walk with a lock held.

With this and the preceding patch which adds RCU-based lookups in the inode
cache, target volumes/vnodes can be taken without the need to take any
locks, except on the target itself.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

8230fd82

afs: Actively poll fileservers to maintain NAT or firewall openings · f6cbb368

由 David Howells 提交于 4月 24, 2020

When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.

If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.

If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.

The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.

To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.

Fix this by adding a mechanism to emit regular probes.

Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.

Further notes:

(1) Probing is now no longer driven from the fileserver rotation
algorithm.

(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.

(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.

(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.

(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.

Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: NDave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: NDavid Howells <dhowells@redhat.com>

f6cbb368

afs: Split the usage count on struct afs_server · 977e5f8e

由 David Howells 提交于 4月 17, 2020

Split the usage count on the afs_server struct to have an active count that
registers who's actually using it separately from the reference count on
the object.

This allows a future patch to dispatch polling probes without advancing the
"unuse" time into the future each time we emit a probe, which would
otherwise prevent unused server records from expiring.

Included in this:

 (1) The latter part of afs_destroy_server() in which the RCU destruction
     of afs_server objects is invoked and the outstanding server count is
     decremented is split out into __afs_put_server().

 (2) afs_put_server() now calls __afs_put_server() rather then setting the
     management timer.

 (3) The calls begun by afs_fs_give_up_all_callbacks() and
     afs_fs_get_capabilities() can now take a ref on the server record, so
     afs_destroy_server() can just drop its ref and needn't wait for the
     completion of these calls.  They'll put the ref when they're done.

 (4) Because of (3), afs_fs_probe_done() no longer needs to wake up
     afs_destroy_server() with server->probe_outstanding.

 (5) afs_gc_servers can be simplified.  It only needs to check if
     server->active is 0 rather than playing games with the refcount.

 (6) afs_manage_servers() can propose a server for gc if usage == 0 rather
     than if ref == 1.  The gc is effected by (5).
Signed-off-by: NDavid Howells <dhowells@redhat.com>

977e5f8e

afs: Use the serverUnique field in the UVLDB record to reduce rpc ops · 81006805

由 David Howells 提交于 4月 16, 2020

The U-version VLDB volume record retrieved by the VL.GetEntryByNameU rpc op
carries a change counter (the serverUnique field) for each fileserver
listed in the record as backing that volume. This is incremented whenever
the registration details for a fileserver change (such as its address
list). Note that the same value will be seen in all UVLDB records that
refer to that fileserver.

This should be checked before calling the VL server to re-query the address
list for a fileserver. If it's the same, there's no point doing the query.
Reported-by: NJeffrey Altman <jaltman@auristor.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>

81006805

24 4月, 2020 2 次提交

afs: Make record checking use TASK_UNINTERRUPTIBLE when appropriate · c4bfda16

由 David Howells 提交于 4月 16, 2020

When an operation is meant to be done uninterruptibly (such as
FS.StoreData), we should not be allowing volume and server record checking
to be interrupted.

Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: NDavid Howells <dhowells@redhat.com>

c4bfda16

afs: Remove some unused bits · be59167c

由 David Howells 提交于 4月 24, 2020

Remove three bits:

 (1) afs_server::no_epoch is neither set nor used.

 (2) afs_server::have_result is set and a wakeup is applied to it, but
     nothing looks at it or waits on it.

 (3) afs_vl_dump_edestaddrreq() prints afs_addr_list::probed, but nothing
     sets it for VL servers.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

be59167c

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功