提交 · b1a09f1ec540408abf3a50d15dff5d9506932693 · openanolis / cloud-kernel

31 3月, 2018 25 次提交

btrfs: remove trivial locking wrappers of tree mod log · b1a09f1e

由 David Sterba 提交于 3月 05, 2018

The wrappers are trivial and do not bring any extra value on top of the
plain locking primitives.
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

b1a09f1e

D
btrfs: drop fs_info parameter from __tree_mod_log_oldest_root · bcd24dab
由 David Sterba 提交于 3月 05, 2018
```
It's provided by the extent_buffer.
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
bcd24dab

btrfs: embed tree_mod_move structure to tree_mod_elem · b6dfa35b

由 David Sterba 提交于 3月 05, 2018

The tree_mod_move is not used anywhere and can be embedded as anonymous
structure.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

b6dfa35b

D
btrfs: drop unused fs_info parameter from tree_mod_log_eb_move · a446a979
由 David Sterba 提交于 3月 05, 2018
```
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
a446a979
D
btrfs: drop fs_info parameter from tree_mod_log_free_eb · 95b757c1
由 David Sterba 提交于 3月 05, 2018
```
It's provided by the extent_buffer.
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
95b757c1
D
btrfs: drop fs_info parameter from tree_mod_log_free_eb · db7279a2
由 David Sterba 提交于 3月 05, 2018
```
It's provided by the extent_buffer.
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
db7279a2
D
btrfs: drop fs_info parameter from tree_mod_log_insert_key · e09c2efe
由 David Sterba 提交于 3月 05, 2018
```
It's provided by the extent_buffer.
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
e09c2efe
D
btrfs: drop fs_info parameter from tree_mod_log_insert_move · 6074d45f
由 David Sterba 提交于 3月 05, 2018
```
It's provided by the extent_buffer.
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
6074d45f
D
btrfs: drop fs_info parameter from tree_mod_log_set_node_key · 3ac6de1a
由 David Sterba 提交于 3月 05, 2018
```
It's provided by the extent_buffer.
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
3ac6de1a
D
btrfs: document more parameters of submit_extent_page · b8b3d625
由 David Sterba 提交于 6月 12, 2017
```
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
b8b3d625

btrfs: cleanup merging conditions in submit_extent_page · 0c8508a6

由 David Sterba 提交于 6月 12, 2017

The merge call was factored out to a separate helper but it's a trivial
one and arguably we can opencode it and cache the value.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

0c8508a6

btrfs: remove redundant variable in __do_readpage · 8eec8296

由 David Sterba 提交于 6月 06, 2017

The value of page_end is only stored to end, no other use.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

8eec8296

btrfs: assume that bio_ret is always valid in submit_extent_page · 5c2b1fd7

由 David Sterba 提交于 6月 06, 2017

All callers pass a valid pointer so we can drop the redundant checks.
The call to submit_one_bio never happend and can be removed.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

5c2b1fd7

Btrfs: scrub: batch rebuild for raid56 · 6ca1765b

由 Liu Bo 提交于 3月 07, 2018

In case of raid56, writes and rebuilds always take BTRFS_STRIPE_LEN(64K)
as unit, however, scrub_extent() sets blocksize as unit, so rebuild
process may be triggered on every block on a same stripe.

A typical example would be that when we're replacing a disappeared disk,
all reads on the disks get -EIO, every block (size is 4K if blocksize is
4K) would go thru these,

scrub_handle_errored_block
  scrub_recheck_block # re-read pages one by one
  scrub_recheck_block # rebuild by calling raid56_parity_recover()
                        page by page

Although with raid56 stripe cache most of reads during rebuild can be
avoided, the parity recover calculation(xor or raid6 algorithms) needs to
be done $(BTRFS_STRIPE_LEN / blocksize) times.

This makes it smarter by doing raid56 scrub/replace on stripe length.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

6ca1765b

btrfs: sort and group mount option definitions · 416a7202

由 David Sterba 提交于 3月 09, 2018

Sort mount options by the primary name, followed by the 'no-'
counterpart if it exists. Group the deprecated and debugging options.
Enum and token defintions are synced.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

416a7202

btrfs: Add nossd_spread mount option · 62b8e077

由 Howard McLauchlan 提交于 3月 08, 2018

Btrfs has two mount options for SSD optimizations: ssd and ssd_spread.
Presently there is an option to disable all SSD optimizations, but there
isn't an option to disable just ssd_spread.

This patch adds a mount option nossd_spread that disables ssd_spread
only.
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NHoward McLauchlan <hmclauchlan@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

62b8e077

btrfs: Remove btrfs_fs_info::open_ioctl_trans · 92e2f7e3

由 Nikolay Borisov 提交于 2月 05, 2018

Since userspace transaction have been removed we no longer have use
for this field so delete it.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

92e2f7e3

btrfs: Remove code referencing unused TRANS_USERSPACE · bcf3a3e7

由 Nikolay Borisov 提交于 2月 05, 2018

Now that the userspace transaction ioctls have been removed,
TRANS_USERSPACE is no longer used hence we can remove it.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

bcf3a3e7

btrfs: Remove btrfs_file_private::trans · 859e682d

由 Nikolay Borisov 提交于 2月 05, 2018

Now that the userspace transaction IOCTL have been removed, this member
is no longer used so just remove it
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

859e682d

btrfs: Remove userspace transaction ioctls · 7a5a07a8

由 Nikolay Borisov 提交于 2月 05, 2018

Commit 3558d4f8 ("btrfs: Deprecate userspace transaction ioctls")
marked the beginning of the end of userspace transaction. This commit
finishes the job! There are no known users and ceph does not use the
ioctl anymore.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Acked-by: NSage Weil <sage@redhat.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

7a5a07a8

btrfs: qgroup: Fix root item corruption when multiple same source snapshots... · 4d31778a

由 Qu Wenruo 提交于 12月 19, 2017

btrfs: qgroup: Fix root item corruption when multiple same source snapshots are created with quota enabled

When multiple pending snapshots referring to the same source subvolume
are executed, enabled quota will cause root item corruption, where root
items are using old bytenr (no backref in extent tree).

This can be triggered by fstests btrfs/152.

The cause is when source subvolume is still dirty, extra commit
(simplied transaction commit) of qgroup_account_snapshot() can skip
dirty roots not recorded in current transaction, making root item of
source subvolume not updated.

Fix it by forcing recording source subvolume in current transaction
before qgroup sub-transaction commit.
Reported-by: NJustin Maggard <jmaggard@netgear.com>
Signed-off-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

4d31778a

btrfs: Relax memory barrier in btrfs_tree_unlock · 2e32ef87

由 Nikolay Borisov 提交于 2月 14, 2018

When performing an unlock on an extent buffer we'd like to order the
decrement of extent_buffer::blocking_writers with waking up any
waiters. In such situations it's sufficient to use smp_mb__after_atomic
rather than the heavy smp_mb. On architectures where atomic operations
are fully ordered (such as x86 or s390) unconditionally executing
a heavyweight smp_mb instruction causes a severe hit to performance
while bringin no improvements in terms of correctness.

The better thing is to use the appropriate smp_mb__after_atomic routine
which will do the correct thing (invoke a full smp_mb or in the case
of ordered atomics insert a compiler barrier). Put another way,
an RMW atomic op + smp_load__after_atomic equals, in terms of
semantics, to a full smp_mb. This ensures that none of the problems
described in the accompanying comment of waitqueue_active occur.
No functional changes.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

2e32ef87

btrfs: add define for oldest generation · 7c829b72

由 Anand Jain 提交于 3月 07, 2018

Some functions can filter metadata by the generation. Add a define that
will annotate such arguments.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

7c829b72

D
btrfs: open code trivial helper btrfs_page_exists_in_range · 051c98eb
由 David Sterba 提交于 3月 07, 2018
```
The called function name is self explanatory.
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
051c98eb

btrfs: Use filemap_range_has_page() · 965aab1c

由 Matthew Wilcox 提交于 3月 06, 2018

The current implementation of btrfs_page_exists_in_range() gives the
wrong answer if the workingset code has stored a shadow entry in the
page cache.  The filemap_range_has_page() function does not have this
problem, and it's shared code, so use it instead.
eigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

965aab1c

26 3月, 2018 15 次提交

Btrfs: dev-replace: make sure target is identical to source when raid56 rebuild fails · 4759700a

由 Liu Bo 提交于 3月 02, 2018

In the last step of scrub_handle_error_block, we try to combine good
copies on all possible mirrors, this works fine for raid1 and raid10,
but not for raid56 as it's doing parity rebuild.

If parity rebuild doesn't get back with correct data which matches its
checksum, in case of replace we'd rather write what is stored in the
source device than the data calculuated from parity.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

4759700a

Btrfs: raid56: remove redundant async_missing_raid56 · d6a69135

由 Liu Bo 提交于 3月 02, 2018

async_missing_raid56() is identical to async_read_rebuild().
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

d6a69135

btrfs: adjust return values of btrfs_inode_by_name · 005d6712

由 Su Yue 提交于 3月 05, 2018

Previously, btrfs_inode_by_name() returned 0 which left caller to check
objectid of location even location if the type was invalid.

Let btrfs_inode_by_name() return -EUCLEAN if a corrupted location of a
dir entry is found.  Removal of label out_err also simplifies the
function.
Signed-off-by: NSu Yue <suy.fnst@cn.fujitsu.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ drop unlikely ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

005d6712

btrfs: rename btrfs_close_extra_device to btrfs_free_extra_devids · 9b99b115

由 Anand Jain 提交于 2月 27, 2018

This function btrfs_close_extra_devices() is about freeing
extra devids which once it may have belonged to this filesystem.
So rename it and add the comment. The _devid suffix is
appropriate as this function won't handle devices which are
outside of the filesytem being mounted.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

9b99b115

btrfs: Remove root argument from cow_file_range_inline · d02c0e20

由 Nikolay Borisov 提交于 3月 02, 2018

This argument is always set to the root of the inode, which is also
passed. So let's get a reference inside the function and simplify
the arg list.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

d02c0e20

Btrfs: send: fix typo in TLV_PUT · 895a72be

由 Liu Bo 提交于 3月 02, 2018

According to tlv_put()'s prototype, data and attrlen needs to be
exchanged in the macro, but seems all callers are already aware of
this misorder and are therefore not affected.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

895a72be

btrfs: Remove root argument from btrfs_log_dentry_safe · e5b84f7a

由 Nikolay Borisov 提交于 2月 27, 2018

Now that nothing uses the root arg of btrfs_log_dentry_safe it can be
safely removed. No functional changes.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e5b84f7a

btrfs: Remove root arg from btrfs_log_inode_parent · f882274b

由 Nikolay Borisov 提交于 2月 27, 2018

btrfs_log_inode_parent is called from 2 places (btrfs_log_dentry_safe
and btrfs_log_new_name) both of which pass inode->root as the root
argument and the inode itself. Remove the redundant root argument and
get a reference to the root directly from the inode, also remove
redundant root != inode->root check from the same function. No
functional change.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

f882274b

btrfs: Remove redundant comment from btrfs_search_forward · 448f3a17

由 Nikolay Borisov 提交于 2月 27, 2018

This function always sets keep_locks to 1 and saves the old value of
keep_locks which is restored at the end. So there is no way it can be
called without keep_locks being set. Remove comment imposing redundant
requirement on callers.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

448f3a17

btrfs: move btrfs_listxattr prototype to xattr.h · 738c93d4

由 David Sterba 提交于 2月 27, 2018

There's a proper header for xattr handlers.
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

738c93d4

btrfs: adjust return type of btrfs_getxattr · bcadd705

由 David Sterba 提交于 2月 27, 2018

The xattr_handler::get prototype returns int, use it. The only ssize_t
exception is the per-inode listxattr handler.
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

bcadd705

btrfs: drop extern from function declarations · ab0d0936

由 David Sterba 提交于 2月 27, 2018

Extern for functions does not make any difference, there are only a few
so let's remove them before it's too late.
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ab0d0936

D
btrfs: drop underscores from exported xattr functions · 7852781d
由 David Sterba 提交于 2月 27, 2018
```
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
7852781d

Btrfs: send, do not issue unnecessary truncate operations · ffa7c429

由 Filipe Manana 提交于 2月 06, 2018

When send finishes processing an inode representing a regular file, it
always issues a truncate operation for that file, even if its size did
not change or the last write sets the file size correctly. In the most
common cases, the issued write operations set the file to correct size
(either full or incremental sends) or the file size did not change (for
incremental sends), so the only case where a truncate operation is needed
is when a file size becomes smaller in the send snapshot when compared
to the parent snapshot.

By not issuing unnecessary truncate operations we reduce the stream size
and save time in the receiver. Currently truncating a file to the same
size triggers writeback of its last page (if it's dirty) and waits for it
to complete (only if the file size is not aligned with the filesystem's
sector size). This is being fixed by another patch and is independent of
this change (that patch's title is "Btrfs: skip writeback of last page
when truncating file to same size").

The following script was used to measure time spent by a receiver without
this change applied, with this change applied, and without this change and
with the truncate fix applied (the fix to not make it start and wait for
writeback to complete).

  $ cat test_send.sh
  #!/bin/bash

  SRC_DEV=/dev/sdc
  DST_DEV=/dev/sdd
  SRC_MNT=/mnt/sdc
  DST_MNT=/mnt/sdd

  mkfs.btrfs -f $SRC_DEV >/dev/null
  mkfs.btrfs -f $DST_DEV >/dev/null
  mount $SRC_DEV $SRC_MNT
  mount $DST_DEV $DST_MNT

  echo "Creating source filesystem"
  for ((t = 0; t < 10; t++)); do
      (
          for ((i = 1; i <= 20000; i++)); do
              xfs_io -f -c "pwrite -S 0xab 0 5000" \
                  $SRC_MNT/file_$i > /dev/null
          done
      ) &
     worker_pids[$t]=$!
  done
  wait ${worker_pids[@]}

  echo "Creating and sending snapshot"
  btrfs subvolume snapshot -r $SRC_MNT $SRC_MNT/snap1 >/dev/null
  /usr/bin/time -f "send took %e seconds"    \
         btrfs send -f $SRC_MNT/send_file $SRC_MNT/snap1
  /usr/bin/time -f "receive took %e seconds" \
         btrfs receive -f $SRC_MNT/send_file $DST_MNT

  umount $SRC_MNT
  umount $DST_MNT

The results, which are averages for 5 runs for each case, were the
following:

* Without this change

average receive time was 26.49 seconds
standard deviation of 2.53 seconds

* Without this change and with the truncate fix

average receive time was 12.51 seconds
standard deviation of 0.32 seconds

* With this change and without the truncate fix

average receive time was 10.02 seconds
standard deviation of 1.11 seconds
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ffa7c429

Btrfs: skip writeback of last page when truncating file to same size · 213e8c55

由 Filipe Manana 提交于 2月 06, 2018

When we truncate a file to the same size and that size is not aligned
with the sector size, we end up triggering writeback (and wait for it to
complete) of the last page. This is unncessary as we can not have delayed
allocation beyond the inode's i_size and the goal of truncating a file
to its own size is to discard prealloc extents (allocated via the
fallocate(2) system call). Besides the unnecessary IO start and wait, it
also breaks the oppurtunity for larger contiguous extents on disk, as
before the last dirty page there might be other dirty pages.

This scenario is probably not very common in general, however it is
common for btrfs receive implementations because currently the send
stream always issues a truncate operation for each processed inode as
the last operation for that inode (this truncate operation is not
always needed and the send implementation will be addressed to avoid
them).

So improve this by not starting and waiting for writeback of the inode's
last page when we are truncating to exactly the same size.

The following script was used to quickly measure the time a receive
operation takes:

 $ cat test_send.sh
 #!/bin/bash

 SRC_DEV=/dev/sdc
 DST_DEV=/dev/sdd
 SRC_MNT=/mnt/sdc
 DST_MNT=/mnt/sdd

 mkfs.btrfs -f $SRC_DEV >/dev/null
 mkfs.btrfs -f $DST_DEV >/dev/null
 mount $SRC_DEV $SRC_MNT
 mount $DST_DEV $DST_MNT

 echo "Creating source filesystem"
 for ((t = 0; t < 10; t++)); do
     (
         for ((i = 1; i <= 20000; i++)); do
             xfs_io -f -c "pwrite -S 0xab 0 5000" \
                $SRC_MNT/file_$i > /dev/null
         done
     ) &
     worker_pids[$t]=$!
 done
 wait ${worker_pids[@]}

 echo "Creating and sending snapshot"
 btrfs subvolume snapshot -r $SRC_MNT $SRC_MNT/snap1 >/dev/null
 /usr/bin/time -f "send took %e seconds"    \
     btrfs send -f $SRC_MNT/send_file $SRC_MNT/snap1
 /usr/bin/time -f "receive took %e seconds" \
     btrfs receive -f $SRC_MNT/send_file $DST_MNT

 umount $SRC_MNT
 umount $DST_MNT

The results for 5 runs were the following:

* Without this change

average receive time was 26.49 seconds
standard deviation of 2.53 seconds

* With this change

average receive time was 12.51 seconds
standard deviation of 0.32 seconds
Reported-by: NRobbie Ko <robbieko@synology.com>
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

213e8c55

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功