提交 · 0552210997badb6a60740a26ff9d976a416510f0 · openanolis / cloud-kernel

29 5月, 2018 23 次提交

Btrfs: don't BUG_ON() in btrfs_truncate_inode_items() · 05522109

由 Omar Sandoval 提交于 5月 11, 2018

btrfs_free_extent() can fail because of ENOMEM. There's no reason to
panic here, we can just abort the transaction.

Fixes: f4b9aa8d ("btrfs_truncate")
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

05522109

Btrfs: fix error handling in btrfs_truncate_inode_items() · fd86a3a3

由 Omar Sandoval 提交于 5月 11, 2018

btrfs_truncate_inode_items() uses two variables for error handling, ret
and err. These are not handled consistently, leading to a couple of
bugs.

- Errors from btrfs_del_items() are handled but not propagated to the
  caller
- If btrfs_run_delayed_refs() fails and aborts the transaction, we
  continue running

Just use ret everywhere and simplify things a bit, fixing both of these
issues.

Fixes: 79787eaa ("btrfs: replace many BUG_ONs with proper error handling")
Fixes: 1262133b ("Btrfs: account for crcs in delayed ref processing")
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

fd86a3a3

Btrfs: update stale comments referencing vmtruncate() · d1342aad

由 Omar Sandoval 提交于 5月 11, 2018

Commit a41ad394 ("Btrfs: convert to the new truncate sequence")
changed btrfs_setsize() to call truncate_setsize() instead of
vmtruncate() but didn't update the comment above it. truncate_setsize()
never fails (the IS_SWAPFILE() check happens elsewhere), so remove the
comment.

Additionally, the comment above btrfs_page_mkwrite() references
vmtruncate(), but truncate_setsize() does the size write and page
locking now.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

d1342aad

btrfs: rename btrfs_update_iflags to reflect which flags it touches · 7b6a221e

由 David Sterba 提交于 3月 26, 2018

The btrfs inode flag flavour is now simply called 'inode flags' and the
vfs inode are i_flags.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

7b6a221e

btrfs: Unexport and rename btrfs_invalidate_inodes · 20a68004

由 Nikolay Borisov 提交于 4月 27, 2018

This function is no longer used outside of inode.c so just make it
static. At the same time give a more becoming name, since it's not
really invalidating the inodes but just calling d_prune_alias. Last,
but not least - move the function above the sole caller to avoid
introducing yet-another-pointless forward declaration.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Reviewed-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

20a68004

btrfs: replace waitqueue_actvie with cond_wake_up · 093258e6

由 David Sterba 提交于 2月 26, 2018

Use the wrappers and reduce the amount of low-level details about the
waitqueue management.
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

093258e6

btrfs: take the last remnants of ->d_fsdata use out · 7a1b1e70

由 Al Viro 提交于 5月 13, 2018

[spotted while going through ->d_fsdata handling around d_splice_alias();
don't really care which tree that goes through]

The only thing even looking at ->d_fsdata in there (since 2012)
had been kfree(dentry->d_fsdata) in btrfs_dentry_delete().  Which,
incidentally, is all btrfs_dentry_delete() does.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

7a1b1e70

btrfs: Add assert in __btrfs_del_delalloc_inode · 7c8a0d36

由 Nikolay Borisov 提交于 4月 27, 2018

The invariant is that when nr_delalloc_inodes is 0 then the root
mustn't have any inodes on its delalloc inodes list.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

7c8a0d36

btrfs: Unexport btrfs_alloc_delalloc_work · 3a2f8c07

由 Nikolay Borisov 提交于 4月 24, 2018

It's used only in inode.c so makes no sense to have it exported. Also
move the definition of btrfs_delalloc_work to inode.c since it's used
only this file.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

3a2f8c07

btrfs: Remove delayed_iput member from btrfs_delalloc_work · 076da91c

由 Nikolay Borisov 提交于 4月 23, 2018

When allocating a delalloc work we are always setting the delayed_iput
to 0. So remove the delay_iput member of btrfs_delalloc_work, as a
result also remove it as a parameter from btrfs_alloc_delalloc_work
since it's not used anymore.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

076da91c

btrfs: Remove delay_iput parameter from __start_delalloc_inodes · 4fbb5147

由 Nikolay Borisov 提交于 4月 23, 2018

It's always set to 0 so remove it.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NQu Wenruo <wqu@suse.com>
[ rename to start_delalloc_inodes ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

4fbb5147

btrfs: Remove delayed_iput parameter from btrfs_start_delalloc_inodes · 76f32e24

由 Nikolay Borisov 提交于 4月 23, 2018

It's always set to 0, so just remove it and collapse the constant value
to the only function we are passing it.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

76f32e24

btrfs: Remove delayed_iput parameter of btrfs_start_delalloc_roots · 82b3e53b

由 Nikolay Borisov 提交于 4月 23, 2018

This parameter was introduced alongside the function in
eb73c1b7 ("Btrfs: introduce per-subvolume delalloc inode list") to
avoid deadlocks since this function was used in the transaction commit
path. However, commit 8d875f95 ("btrfs: disable strict file flushes
for renames and truncates") removed that usage, rendering the parameter
obsolete.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

82b3e53b

btrfs: Remove btrfs_wait_and_free_delalloc_work · 40012f96

由 Nikolay Borisov 提交于 4月 19, 2018

This function is called from only 1 place and is effectively a wrapper
over wait_completion/kfree. It doesn't really bring any value having
those two calls in a separate function. Just open code it and remove it.
No functional changes.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

40012f96

btrfs: Remove tree argument from extent_writepages · 8ae225a8

由 Nikolay Borisov 提交于 4月 19, 2018

It can be directly referenced from the passed address_space so do that.
No functional changes.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

8ae225a8

btrfs: Use list_empty instead of list_empty_careful · 81f1d390

由 Nikolay Borisov 提交于 4月 19, 2018

list_empty_careful usually is a signal of something tricky going on. Its
usage in btrfs is actually not needed since both lists it's used on are
local to a function and cannot be modified concurrently. So switch to
plain list_empty. No functional changes.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

81f1d390

btrfs: Remove redundant tree argument from extent_readpages · 2a3ff0ad

由 Nikolay Borisov 提交于 4月 19, 2018

This function is called only from btrfs_readpage and is already passed
the mapping. Simplify its signature by moving the code obtaining
reference to the extent tree in the function. No functional changes.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

2a3ff0ad

btrfs: Sink extent_tree arguments in try_release_extent_mapping · 477a30ba

由 Nikolay Borisov 提交于 4月 19, 2018

This function already gets the page from which the two extent trees
are referenced. Simplify its signature by moving the code getting the
trees inside the function. No functional changes.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

477a30ba

btrfs: Allow rmdir(2) to delete an empty subvolume · a79a464d

由 Misono Tomohiro 提交于 4月 18, 2018

Change the behavior of rmdir(2) and allow it to delete an empty
subvolume by using btrfs_delete_subvolume() which is used by
btrfs_ioctl_snap_destroy().

This is a change in behaviour and has been requested by users. Deleting
the subvolume by ioctl requires root permissions while the rmdir way
does works with standard tools and syscalls for all users that can
access the subvolume.

The main usecase is to allow 'rm -rf /path/with/subvols' to simply work.
We were not able to find any nasty usability surprises, the intention is
to do the destructive rm. Without allowing rmdir, this would have to be
followed by the ioctl subvolume deletion, which is more of an annoyance.

Implementation details:

The required lock for @dir and inode of @dentry is already acquired in
vfs layer.

We need some check before deleting a subvolume. Permission check is done
in vfs layer, emptiness check is in btrfs_rmdir() and additional check
(i.e. neither the subvolume is a default subvolume nor send is in progress)
is in btrfs_delete_subvolume().

Note that in btrfs_ioctl_snap_destroy(), d_delete() is called after
btrfs_delete_subvolume(). For rmdir(2), d_delete() is called in vfs
layer later.
Tested-by: NGoffredo Baroncelli <kreijack@inwind.it>
Signed-off-by: NTomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ enhance changelog ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

a79a464d

btrfs: Factor out the main deletion process from btrfs_ioctl_snap_destroy() · f60a2364

由 Misono Tomohiro 提交于 4月 18, 2018

Factor out the second half of btrfs_ioctl_snap_destroy() as
btrfs_delete_subvolume(), which performs some subvolume specific checks
before deletion:

1. send is not in progress
2. the subvolume is not the default subvolume
3. the subvolume does not contain other subvolumes

and actual deletion process. btrfs_delete_subvolume() requires
inode_lock for both @dir and inode of @dentry. The remaining part of
btrfs_ioctl_snap_destroy() is mainly permission checks.

Note that call of d_delete() is not included in btrfs_delete_subvolume()
as this function will also be used by btrfs_rmdir() to delete an empty
subvolume and in that case d_delete() is called in VFS layer.

As a result, btrfs_unlink_subvol() and may_destroy_subvol()
become static functions. No functional changes.
Signed-off-by: NTomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ minor comment updates ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

f60a2364

btrfs: Move may_destroy_subvol() from ioctl.c to inode.c · ec42f167

由 Misono Tomohiro 提交于 4月 18, 2018

This is a preparation work to refactor btrfs_ioctl_snap_destroy()
and to allow rmdir(2) to delete an empty subvolume.
Signed-off-by: NTomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ minor update of the function comment ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ec42f167

btrfs: use fs_info for btrfs_handle_em_exist tracepoint · f46b24c9

由 David Sterba 提交于 4月 03, 2018

We really want to know to which filesystem the extent map events belong,
but as it cannot be reached from the extent_map pointers, we need to
pass it down the callchain.
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

f46b24c9

btrfs: Use while loop instead of labels in __endio_write_update_ordered · b25f0d00

由 Nikolay Borisov 提交于 4月 11, 2018

Currently __endio_write_update_ordered uses labels to implement
what is essentially a simple while loop. This makes the code more
cumbersome to follow than it actually has to be. No functional
changes. No xfstest regressions were found during testing.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

b25f0d00

24 5月, 2018 1 次提交

Btrfs: fix error handling in btrfs_truncate() · d5014738

由 Omar Sandoval 提交于 5月 22, 2018

Jun Wu at Facebook reported that an internal service was seeing a return
value of 1 from ftruncate() on Btrfs in some cases. This is coming from
the NEED_TRUNCATE_BLOCK return value from btrfs_truncate_inode_items().

btrfs_truncate() uses two variables for error handling, ret and err.
When btrfs_truncate_inode_items() returns non-zero, we set err to the
return value. However, NEED_TRUNCATE_BLOCK is not an error. Make sure we
only set err if ret is an error (i.e., negative).

To reproduce the issue: mount a filesystem with -o compress-force=zstd
and the following program will encounter return value of 1 from
ftruncate:

int main(void) {
        char buf[256] = { 0 };
        int ret;
        int fd;

        fd = open("test", O_CREAT | O_WRONLY | O_TRUNC, 0666);
        if (fd == -1) {
                perror("open");
                return EXIT_FAILURE;
        }

        if (write(fd, buf, sizeof(buf)) != sizeof(buf)) {
                perror("write");
                close(fd);
                return EXIT_FAILURE;
        }

        if (fsync(fd) == -1) {
                perror("fsync");
                close(fd);
                return EXIT_FAILURE;
        }

        ret = ftruncate(fd, 128);
        if (ret) {
                printf("ftruncate() returned %d\n", ret);
                close(fd);
                return EXIT_FAILURE;
        }

        close(fd);
        return EXIT_SUCCESS;
}

Fixes: ddfae63c ("btrfs: move btrfs_truncate_block out of trans handle")
CC: stable@vger.kernel.org # 4.15+
Reported-by: NJun Wu <quark@fb.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

d5014738

17 5月, 2018 1 次提交

btrfs: Split btrfs_del_delalloc_inode into 2 functions · 2b877331

由 Nikolay Borisov 提交于 4月 27, 2018

This is in preparation of fixing delalloc inodes leakage on transaction
abort. Also export the new function.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Reviewed-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

2b877331

12 5月, 2018 1 次提交

do d_instantiate/unlock_new_inode combinations safely · 1e2e547a

由 Al Viro 提交于 5月 04, 2018

For anything NFS-exported we do _not_ want to unlock new inode
before it has grown an alias; original set of fixes got the
ordering right, but missed the nasty complication in case of
lockdep being enabled - unlock_new_inode() does
	lockdep_annotate_inode_mutex_key(inode)
which can only be done before anyone gets a chance to touch
->i_mutex.  Unfortunately, flipping the order and doing
unlock_new_inode() before d_instantiate() opens a window when
mkdir can race with open-by-fhandle on a guessed fhandle, leading
to multiple aliases for a directory inode and all the breakage
that follows from that.

	Correct solution: a new primitive (d_instantiate_new())
combining these two in the right order - lockdep annotate, then
d_instantiate(), then the rest of unlock_new_inode().  All
combinations of d_instantiate() with unlock_new_inode() should
be converted to that.

Cc: stable@kernel.org	# 2.6.29 and later
Tested-by: NMike Marshall <hubcap@omnibond.com>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1e2e547a

19 4月, 2018 1 次提交

btrfs: fix unaligned access in readdir · 92d32170

由 David Sterba 提交于 4月 16, 2018

The last update to readdir introduced a temporary buffer to store the
emitted readdir data, but as there are file names of variable length,
there's a lot of unaligned access.

This was observed on a sparc64 machine:

  Kernel unaligned access at TPC[102f3080] btrfs_real_readdir+0x51c/0x718 [btrfs]

Fixes: 23b5ec74 ("btrfs: fix readdir deadlock with pagefault")
CC: stable@vger.kernel.org # 4.14+
Reported-and-tested-by: NRené Rebe <rene@exactcode.com>
Reviewed-by: NLiu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

92d32170

12 4月, 2018 1 次提交

btrfs: replace GPL boilerplate by SPDX -- sources · c1d7c514

由 David Sterba 提交于 4月 03, 2018

Remove GPL boilerplate text (long, short, one-line) and keep the rest,
ie. personal, company or original source copyright statements. Add the
SPDX header.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c1d7c514

31 3月, 2018 9 次提交

btrfs: qgroup: Use separate meta reservation type for delalloc · 43b18595

由 Qu Wenruo 提交于 12月 12, 2017

Before this patch, btrfs qgroup is mixing per-transcation meta rsv with
preallocated meta rsv, making it quite easy to underflow qgroup meta
reservation.

Since we have the new qgroup meta rsv types, apply it to delalloc
reservation.

Now for delalloc, most of its reserved space will use META_PREALLOC qgroup
rsv type.

And for callers reducing outstanding extent like btrfs_finish_ordered_io(),
they will convert corresponding META_PREALLOC reservation to
META_PERTRANS.

This is mainly due to the fact that current qgroup numbers will only be
updated in btrfs_commit_transaction(), that's to say if we don't keep
such placeholder reservation, we can exceed qgroup limitation.

And for callers freeing outstanding extent in error handler, we will
just free META_PREALLOC bytes.

This behavior makes callers of btrfs_qgroup_release_meta() or
btrfs_qgroup_convert_meta() to be aware of which type they are.
So in this patch, btrfs_delalloc_release_metadata() and its callers get
an extra parameter to info qgroup to do correct meta convert/release.

The good news is, even we use the wrong type (convert or free), it won't
cause obvious bug, as prealloc type is always in good shape, and the
type only affects how per-trans meta is increased or not.

So the worst case will be at most metadata limitation can be sometimes
exceeded (no convert at all) or metadata limitation is reached too soon
(no free at all).
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

43b18595

Btrfs: delete dead code in btrfs_orphan_add() · 0a0d4415

由 Omar Sandoval 提交于 1月 25, 2018

btrfs_orphan_add() has had this case commented out since it was first
introduced in commit d68fc57b ("Btrfs: Metadata reservation for
orphan inodes"). Most of the orphan cleanup code has been rewritten
since then, so it's safe to say that this code isn't needed.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
[ switch to bool ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

0a0d4415

btrfs: drop fs_info parameter from btrfs_run_delayed_refs · c79a70b1

由 Nikolay Borisov 提交于 3月 15, 2018

It's provided by the transaction handle.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c79a70b1

btrfs: Remove unused root var from relink_file_extents · 8535dc19

由 Nikolay Borisov 提交于 3月 15, 2018

Added in 38c227d8 ("Btrfs: snapshot-aware defrag") but subsequently
made redundant by 0b246afa ("btrfs: root->fs_info cleanup, add
fs_info convenience variables").
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

8535dc19

D
btrfs: rename submit callbacks and drop double underscores · d0ee3934
由 David Sterba 提交于 3月 08, 2018
```
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
d0ee3934
D
btrfs: remove unused parameters from extent_submit_bio_done_t · 6c553435
由 David Sterba 提交于 3月 08, 2018
```
Remove parameters not used by any of the callbacks.
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
6c553435
D
btrfs: remove unused parameters from extent_submit_bio_start_t · d0779291
由 David Sterba 提交于 3月 08, 2018
```
Remove parameters not used by any of the callbacks.
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
d0779291
D
btrfs: open code trivial helper btrfs_page_exists_in_range · 051c98eb
由 David Sterba 提交于 3月 07, 2018
```
The called function name is self explanatory.
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
051c98eb

btrfs: Use filemap_range_has_page() · 965aab1c

由 Matthew Wilcox 提交于 3月 06, 2018

The current implementation of btrfs_page_exists_in_range() gives the
wrong answer if the workingset code has stored a shadow entry in the
page cache.  The filemap_range_has_page() function does not have this
problem, and it's shared code, so use it instead.
eigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

965aab1c

26 3月, 2018 3 次提交

btrfs: adjust return values of btrfs_inode_by_name · 005d6712

由 Su Yue 提交于 3月 05, 2018

Previously, btrfs_inode_by_name() returned 0 which left caller to check
objectid of location even location if the type was invalid.

Let btrfs_inode_by_name() return -EUCLEAN if a corrupted location of a
dir entry is found.  Removal of label out_err also simplifies the
function.
Signed-off-by: NSu Yue <suy.fnst@cn.fujitsu.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ drop unlikely ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

005d6712

btrfs: Remove root argument from cow_file_range_inline · d02c0e20

由 Nikolay Borisov 提交于 3月 02, 2018

This argument is always set to the root of the inode, which is also
passed. So let's get a reference inside the function and simplify
the arg list.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

d02c0e20

Btrfs: skip writeback of last page when truncating file to same size · 213e8c55

由 Filipe Manana 提交于 2月 06, 2018

When we truncate a file to the same size and that size is not aligned
with the sector size, we end up triggering writeback (and wait for it to
complete) of the last page. This is unncessary as we can not have delayed
allocation beyond the inode's i_size and the goal of truncating a file
to its own size is to discard prealloc extents (allocated via the
fallocate(2) system call). Besides the unnecessary IO start and wait, it
also breaks the oppurtunity for larger contiguous extents on disk, as
before the last dirty page there might be other dirty pages.

This scenario is probably not very common in general, however it is
common for btrfs receive implementations because currently the send
stream always issues a truncate operation for each processed inode as
the last operation for that inode (this truncate operation is not
always needed and the send implementation will be addressed to avoid
them).

So improve this by not starting and waiting for writeback of the inode's
last page when we are truncating to exactly the same size.

The following script was used to quickly measure the time a receive
operation takes:

 $ cat test_send.sh
 #!/bin/bash

 SRC_DEV=/dev/sdc
 DST_DEV=/dev/sdd
 SRC_MNT=/mnt/sdc
 DST_MNT=/mnt/sdd

 mkfs.btrfs -f $SRC_DEV >/dev/null
 mkfs.btrfs -f $DST_DEV >/dev/null
 mount $SRC_DEV $SRC_MNT
 mount $DST_DEV $DST_MNT

 echo "Creating source filesystem"
 for ((t = 0; t < 10; t++)); do
     (
         for ((i = 1; i <= 20000; i++)); do
             xfs_io -f -c "pwrite -S 0xab 0 5000" \
                $SRC_MNT/file_$i > /dev/null
         done
     ) &
     worker_pids[$t]=$!
 done
 wait ${worker_pids[@]}

 echo "Creating and sending snapshot"
 btrfs subvolume snapshot -r $SRC_MNT $SRC_MNT/snap1 >/dev/null
 /usr/bin/time -f "send took %e seconds"    \
     btrfs send -f $SRC_MNT/send_file $SRC_MNT/snap1
 /usr/bin/time -f "receive took %e seconds" \
     btrfs receive -f $SRC_MNT/send_file $DST_MNT

 umount $SRC_MNT
 umount $DST_MNT

The results for 5 runs were the following:

* Without this change

average receive time was 26.49 seconds
standard deviation of 2.53 seconds

* With this change

average receive time was 12.51 seconds
standard deviation of 0.32 seconds
Reported-by: NRobbie Ko <robbieko@synology.com>
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

213e8c55

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功