提交 · f5c8daa5b2ae6de4baa18a95002271cd7f90be90 · openeuler / Kernel

30 4月, 2019 2 次提交

Btrfs: do not start a transaction at iterate_extent_inodes() · bfc61c36

由 Filipe Manana 提交于 4月 17, 2019

When finding out which inodes have references on a particular extent, done
by backref.c:iterate_extent_inodes(), from the BTRFS_IOC_LOGICAL_INO (both
v1 and v2) ioctl and from scrub we use the transaction join API to grab a
reference on the currently running transaction, since in order to give
accurate results we need to inspect the delayed references of the currently
running transaction.

However, if there is currently no running transaction, the join operation
will create a new transaction. This is inefficient as the transaction will
eventually be committed, doing unnecessary IO and introducing a potential
point of failure that will lead to a transaction abort due to -ENOSPC, as
recently reported [1].

That's because the join, creates the transaction but does not reserve any
space, so when attempting to update the root item of the root passed to
btrfs_join_transaction(), during the transaction commit, we can end up
failling with -ENOSPC. Users of a join operation are supposed to actually
do some filesystem changes and reserve space by some means, which is not
the case of iterate_extent_inodes(), it is a read-only operation for all
contextes from which it is called.

The reported [1] -ENOSPC failure stack trace is the following:

 heisenberg kernel: ------------[ cut here ]------------
 heisenberg kernel: BTRFS: Transaction aborted (error -28)
 heisenberg kernel: WARNING: CPU: 0 PID: 7137 at fs/btrfs/root-tree.c:136 btrfs_update_root+0x22b/0x320 [btrfs]
(...)
 heisenberg kernel: CPU: 0 PID: 7137 Comm: btrfs-transacti Not tainted 4.19.0-4-amd64 #1 Debian 4.19.28-2
 heisenberg kernel: Hardware name: FUJITSU LIFEBOOK U757/FJNB2A5, BIOS Version 1.21 03/19/2018
 heisenberg kernel: RIP: 0010:btrfs_update_root+0x22b/0x320 [btrfs]
(...)
 heisenberg kernel: RSP: 0018:ffffb5448828bd40 EFLAGS: 00010286
 heisenberg kernel: RAX: 0000000000000000 RBX: ffff8ed56bccef50 RCX: 0000000000000006
 heisenberg kernel: RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8ed6bda166a0
 heisenberg kernel: RBP: 00000000ffffffe4 R08: 00000000000003df R09: 0000000000000007
 heisenberg kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff8ed63396a078
 heisenberg kernel: R13: ffff8ed092d7c800 R14: ffff8ed64f5db028 R15: ffff8ed6bd03d068
 heisenberg kernel: FS:  0000000000000000(0000) GS:ffff8ed6bda00000(0000) knlGS:0000000000000000
 heisenberg kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 heisenberg kernel: CR2: 00007f46f75f8000 CR3: 0000000310a0a002 CR4: 00000000003606f0
 heisenberg kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 heisenberg kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 heisenberg kernel: Call Trace:
 heisenberg kernel:  commit_fs_roots+0x166/0x1d0 [btrfs]
 heisenberg kernel:  ? _cond_resched+0x15/0x30
 heisenberg kernel:  ? btrfs_run_delayed_refs+0xac/0x180 [btrfs]
 heisenberg kernel:  btrfs_commit_transaction+0x2bd/0x870 [btrfs]
 heisenberg kernel:  ? start_transaction+0x9d/0x3f0 [btrfs]
 heisenberg kernel:  transaction_kthread+0x147/0x180 [btrfs]
 heisenberg kernel:  ? btrfs_cleanup_transaction+0x530/0x530 [btrfs]
 heisenberg kernel:  kthread+0x112/0x130
 heisenberg kernel:  ? kthread_bind+0x30/0x30
 heisenberg kernel:  ret_from_fork+0x35/0x40
 heisenberg kernel: ---[ end trace 05de912e30e012d9 ]---

So fix that by using the attach API, which does not create a transaction
when there is currently no running transaction.

[1] https://lore.kernel.org/linux-btrfs/b2a668d7124f1d3e410367f587926f622b3f03a4.camel@scientia.net/Reported-by: NZygo Blaxell <ce3g8jdj@umail.furryterror.org>
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

bfc61c36

btrfs: use BUG() instead of BUG_ON(1) · 290342f6

由 Arnd Bergmann 提交于 3月 25, 2019

BUG_ON(1) leads to bogus warnings from clang when
CONFIG_PROFILE_ANNOTATED_BRANCHES is set:

fs/btrfs/volumes.c:5041:3: error: variable 'max_chunk_size' is used uninitialized whenever 'if' condition is false
      [-Werror,-Wsometimes-uninitialized]
                BUG_ON(1);
                ^~~~~~~~~
include/asm-generic/bug.h:61:36: note: expanded from macro 'BUG_ON'
 #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
                                   ^~~~~~~~~~~~~~~~~~~
include/linux/compiler.h:48:23: note: expanded from macro 'unlikely'
 #  define unlikely(x)   (__branch_check__(x, 0, __builtin_constant_p(x)))
                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/btrfs/volumes.c:5046:9: note: uninitialized use occurs here
                             max_chunk_size);
                             ^~~~~~~~~~~~~~
include/linux/kernel.h:860:36: note: expanded from macro 'min'
 #define min(x, y)       __careful_cmp(x, y, <)
                                         ^
include/linux/kernel.h:853:17: note: expanded from macro '__careful_cmp'
                __cmp_once(x, y, __UNIQUE_ID(__x), __UNIQUE_ID(__y), op))
                              ^
include/linux/kernel.h:847:25: note: expanded from macro '__cmp_once'
                typeof(y) unique_y = (y);               \
                                      ^
fs/btrfs/volumes.c:5041:3: note: remove the 'if' if its condition is always true
                BUG_ON(1);
                ^
include/asm-generic/bug.h:61:32: note: expanded from macro 'BUG_ON'
 #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
                               ^
fs/btrfs/volumes.c:4993:20: note: initialize the variable 'max_chunk_size' to silence this warning
        u64 max_chunk_size;
                          ^
                           = 0

Change it to BUG() so clang can see that this code path can never
continue.
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

290342f6

25 2月, 2019 2 次提交

btrfs: honor path->skip_locking in backref code · 38e3eebf

由 Josef Bacik 提交于 1月 16, 2019

Qgroups will do the old roots lookup at delayed ref time, which could be
while walking down the extent root while running a delayed ref.  This
should be fine, except we specifically lock eb's in the backref walking
code irrespective of path->skip_locking, which deadlocks the system.
Fix up the backref code to honor path->skip_locking, nobody will be
modifying the commit_root when we're searching so it's completely safe
to do.

This happens since fb235dc0 ("btrfs: qgroup: Move half of the qgroup
accounting time out of commit trans"), kernel may lockup with quota
enabled.

There is one backref trace triggered by snapshot dropping along with
write operation in the source subvolume.  The example can be reliably
reproduced:

  btrfs-cleaner   D    0  4062      2 0x80000000
  Call Trace:
   schedule+0x32/0x90
   btrfs_tree_read_lock+0x93/0x130 [btrfs]
   find_parent_nodes+0x29b/0x1170 [btrfs]
   btrfs_find_all_roots_safe+0xa8/0x120 [btrfs]
   btrfs_find_all_roots+0x57/0x70 [btrfs]
   btrfs_qgroup_trace_extent_post+0x37/0x70 [btrfs]
   btrfs_qgroup_trace_leaf_items+0x10b/0x140 [btrfs]
   btrfs_qgroup_trace_subtree+0xc8/0xe0 [btrfs]
   do_walk_down+0x541/0x5e3 [btrfs]
   walk_down_tree+0xab/0xe7 [btrfs]
   btrfs_drop_snapshot+0x356/0x71a [btrfs]
   btrfs_clean_one_deleted_snapshot+0xb8/0xf0 [btrfs]
   cleaner_kthread+0x12b/0x160 [btrfs]
   kthread+0x112/0x130
   ret_from_fork+0x27/0x50

When dropping snapshots with qgroup enabled, we will trigger backref
walk.

However such backref walk at that timing is pretty dangerous, as if one
of the parent nodes get WRITE locked by other thread, we could cause a
dead lock.

For example:

           FS 260     FS 261 (Dropped)
            node A        node B
           /      \      /      \
       node C      node D      node E
      /   \         /  \        /     \
  leaf F|leaf G|leaf H|leaf I|leaf J|leaf K

The lock sequence would be:

      Thread A (cleaner)             |       Thread B (other writer)
-----------------------------------------------------------------------
write_lock(B)                        |
write_lock(D)                        |
^^^ called by walk_down_tree()       |
                                     |       write_lock(A)
                                     |       write_lock(D) << Stall
read_lock(H) << for backref walk     |
read_lock(D) << lock owner is        |
                the same thread A    |
                so read lock is OK   |
read_lock(A) << Stall                |

So thread A hold write lock D, and needs read lock A to unlock.
While thread B holds write lock A, while needs lock D to unlock.

This will cause a deadlock.

This is not only limited to snapshot dropping case.  As the backref
walk, even only happens on commit trees, is breaking the normal top-down
locking order, makes it deadlock prone.

Fixes: fb235dc0 ("btrfs: qgroup: Move half of the qgroup accounting time out of commit trans")
CC: stable@vger.kernel.org # 4.14+
Reported-and-tested-by: NDavid Sterba <dsterba@suse.com>
Reported-by: NFilipe Manana <fdmanana@suse.com>
Reviewed-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NFilipe Manana <fdmanana@suse.com>
[ rebase to latest branch and fix lock assert bug in btrfs/007 ]
Signed-off-by: NQu Wenruo <wqu@suse.com>
[ copy logs and deadlock analysis from Qu's patch ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

38e3eebf

btrfs: replace btrfs_set_lock_blocking_rw with appropriate helpers · 300aa896

由 David Sterba 提交于 4月 04, 2018

We can use the right helper where the lock type is a fixed parameter.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

300aa896

17 12月, 2018 3 次提交

btrfs: Fix typos in comments and strings · 52042d8e

由 Andrea Gelmini 提交于 11月 28, 2018

The typos accumulate over time so once in a while time they get fixed in
a large patch.
Signed-off-by: NAndrea Gelmini <andrea.gelmini@gelma.net>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

52042d8e

btrfs: Remove needless tree locking in iterate_inode_extrefs · 5c623d33

由 Nikolay Borisov 提交于 8月 15, 2018

In iterate_inode_exrefs the eb is cloned via btrfs_clone_extent_buffer
which creates a private extent buffer with the dummy flag set and ref
count of 1. Then this buffer is locked for reading and its ref count is
incremented by 1. Finally it's fed to the passed iterate_irefs_t
function. The actual iterate call back is inode_to_path (coming from
paths_from_inode) which feeds the eb to btrfs_ref_to_path. In this final
function the passed eb is only read by first assigning it to the local
eb variable. This variable is only modified in the case another eb was
referenced from the passed path that is eb != eb_in check triggers.

Considering this there is no point in locking the cloned eb in
iterate_inode_refs since it's never being modified and is not published
anywhere. Furthermore the cloned eb is completely fine having its ref
count be 1.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

5c623d33

btrfs: Remove needless tree locking in iterate_inode_refs · e5bba0b0

由 Nikolay Borisov 提交于 8月 15, 2018

In iterate_inode_refs the eb is cloned via btrfs_clone_extent_buffer
which creates a private extent buffer with the dummy flag set and ref
count of 1. Then this buffer is locked for reading and its ref count is
incremented by 1. Finally it's fed to the passed iterate_irefs_t
function. The actual iterate call back is inode_to_path (coming from
paths_from_inode) which feeds the eb to btrfs_ref_to_path. In this final
function the passed eb is only read by first assigning it to the local
eb variable. This variable is only modified in the case another eb was
referenced from the passed path that is eb != eb_in check triggers.

Considering this there is no point in locking the cloned eb in
iterate_inode_refs since it's never being modified and is not published
anywhere. Furthermore the cloned eb is completely fine having its ref
count be 1.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e5bba0b0

15 10月, 2018 3 次提交

Btrfs: preftree: use rb_first_cached · ecf160b4

由 Liu Bo 提交于 8月 23, 2018

rb_first_cached() trades an extra pointer "leftmost" for doing the same
job as rb_first() but in O(1).

While resolving indirect refs and missing refs, it always looks for the
first rb entry in a while loop, it's helpful to use rb_first_cached
instead.

For more details about the optimization see patch "Btrfs: delayed-refs:
use rb_first_cached for href_root".
Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ecf160b4

Btrfs: delayed-refs: use rb_first_cached for ref_tree · e3d03965

由 Liu Bo 提交于 8月 23, 2018

rb_first_cached() trades an extra pointer "leftmost" for doing the same
job as rb_first() but in O(1).

Functions manipulating href->ref_tree need to get the first entry, this
converts href->ref_tree to use rb_first_cached().

For more details about the optimization see patch "Btrfs: delayed-refs:
use rb_first_cached for href_root".
Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e3d03965

btrfs: Remove 'objectid' member from struct btrfs_root · 4fd786e6

由 Misono Tomohiro 提交于 8月 06, 2018

There are two members in struct btrfs_root which indicate root's
objectid: objectid and root_key.objectid.

They are both set to the same value in __setup_root():

  static void __setup_root(struct btrfs_root *root,
                           struct btrfs_fs_info *fs_info,
                           u64 objectid)
  {
    ...
    root->objectid = objectid;
    ...
    root->root_key.objectid = objecitd;
    ...
  }

and not changed to other value after initialization.

grep in btrfs directory shows both are used in many places:
  $ grep -rI "root->root_key.objectid" | wc -l
  133
  $ grep -rI "root->objectid" | wc -l
  55
 (4.17, inc. some noise)

It is confusing to have two similar variable names and it seems
that there is no rule about which should be used in a certain case.

Since ->root_key itself is needed for tree reloc tree, let's remove
'objecitd' member and unify code to use ->root_key.objectid in all places.
Signed-off-by: NMisono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

4fd786e6

06 8月, 2018 2 次提交

btrfs: backref: Use ERR_CAST to return error code · afc6961f

由 Misono Tomohiro 提交于 7月 26, 2018

Use ERR_CAST() instead of void * to make meaning clear.
Signed-off-by: NMisono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

afc6961f

btrfs: return EUCLEAN if extent_inline_ref type is invalid · af431dcb

由 Su Yue 提交于 6月 22, 2018

If type of extent_inline_ref found is not expected, filesystem may have
been corrupted, should return EUCLEAN instead of EINVAL.
Signed-off-by: NSu Yue <suy.fnst@cn.fujitsu.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

af431dcb

12 4月, 2018 1 次提交

btrfs: replace GPL boilerplate by SPDX -- sources · c1d7c514

由 David Sterba 提交于 4月 03, 2018

Remove GPL boilerplate text (long, short, one-line) and keep the rest,
ie. personal, company or original source copyright statements. Add the
SPDX header.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c1d7c514

31 3月, 2018 2 次提交

btrfs: Validate child tree block's level and first key · 581c1760

由 Qu Wenruo 提交于 3月 29, 2018

We have several reports about node pointer points to incorrect child
tree blocks, which could have even wrong owner and level but still with
valid generation and checksum.

Although btrfs check could handle it and print error message like:
leaf parent key incorrect 60670574592

Kernel doesn't have enough check on this type of corruption correctly.
At least add such check to read_tree_block() and btrfs_read_buffer(),
where we need two new parameters @level and @first_key to verify the
child tree block.

The new @level check is mandatory and all call sites are already
modified to extract expected level from its call chain.

While @first_key is optional, the following call sites are skipping such
check:
1) Root node/leaf
   As ROOT_ITEM doesn't contain the first key, skip @first_key check.
2) Direct backref
   Only parent bytenr and level is known and we need to resolve the key
   all by ourselves, skip @first_key check.

Another note of this verification is, it needs extra info from nodeptr
or ROOT_ITEM, so it can't fit into current tree-checker framework, which
is limited to node/leaf boundary.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

581c1760

btrfs: Remove unused op_key var from add_delayed_refs · a6dbceaf

由 Nikolay Borisov 提交于 3月 15, 2018

Added as part of 86d5f994 ("btrfs: convert prelimary reference
tracking to use rbtrees") but never used. tmp_op_key essentially
subsumed that variable.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

a6dbceaf

26 3月, 2018 1 次提交

btrfs: add more __cold annotations · e67c718b

由 David Sterba 提交于 2月 19, 2018

The __cold functions are placed to a special section, as they're
expected to be called rarely. This could help i-cache prefetches or help
compiler to decide which branches are more/less likely to be taken
without any other annotations needed.

Though we can't add more __exit annotations, it's still possible to add
__cold (that's also added with __exit). That way the following function
categories are tagged:

- printf wrappers, error messages
- exit helpers
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e67c718b

15 3月, 2018 1 次提交

btrfs: add missing initialization in btrfs_check_shared · 18bf591b

由 Edmund Nadolski 提交于 3月 14, 2018

This patch addresses an issue that causes fiemap to falsely
report a shared extent.  The test case is as follows:

xfs_io -f -d -c "pwrite -b 16k 0 64k" -c "fiemap -v" /media/scratch/file5
sync
xfs_io  -c "fiemap -v" /media/scratch/file5

which gives the resulting output:

wrote 65536/65536 bytes at offset 0
64 KiB, 4 ops; 0.0000 sec (121.359 MiB/sec and 7766.9903 ops/sec)
/media/scratch/file5:
 EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
   0: [0..127]:        24576..24703       128 0x2001
/media/scratch/file5:
 EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
   0: [0..127]:        24576..24703       128   0x1

This is because btrfs_check_shared calls find_parent_nodes
repeatedly in a loop, passing a share_check struct to report
the count of shared extent. But btrfs_check_shared does not
re-initialize the count value to zero for subsequent calls
from the loop, resulting in a false share count value. This
is a regressive behavior from 4.13.

With proper re-initialization the test result is as follows:

wrote 65536/65536 bytes at offset 0
64 KiB, 4 ops; 0.0000 sec (110.035 MiB/sec and 7042.2535 ops/sec)
/media/scratch/file5:
 EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
   0: [0..127]:        24576..24703       128   0x1
/media/scratch/file5:
 EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
   0: [0..127]:        24576..24703       128   0x1

which corrects the regression.

Fixes: 3ec4d323 ("btrfs: allow backref search checks for shared extents")
Signed-off-by: NEdmund Nadolski <enadolski@suse.com>
[ add text from cover letter to changelog ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

18bf591b

02 2月, 2018 1 次提交

btrfs: remove spurious WARN_ON(ref->count < 0) in find_parent_nodes · c8195a7b

由 Zygo Blaxell 提交于 1月 23, 2018

Until v4.14, this warning was very infrequent:

	WARNING: CPU: 3 PID: 18172 at fs/btrfs/backref.c:1391 find_parent_nodes+0xc41/0x14e0
	Modules linked in: [...]
	CPU: 3 PID: 18172 Comm: bees Tainted: G      D W    L  4.11.9-zb64+ #1
	Hardware name: System manufacturer System Product Name/M5A78L-M/USB3, BIOS 2101    12/02/2014
	Call Trace:
	 dump_stack+0x85/0xc2
	 __warn+0xd1/0xf0
	 warn_slowpath_null+0x1d/0x20
	 find_parent_nodes+0xc41/0x14e0
	 __btrfs_find_all_roots+0xad/0x120
	 ? extent_same_check_offsets+0x70/0x70
	 iterate_extent_inodes+0x168/0x300
	 iterate_inodes_from_logical+0x87/0xb0
	 ? iterate_inodes_from_logical+0x87/0xb0
	 ? extent_same_check_offsets+0x70/0x70
	 btrfs_ioctl+0x8ac/0x2820
	 ? lock_acquire+0xc2/0x200
	 do_vfs_ioctl+0x91/0x700
	 ? __fget+0x112/0x200
	 SyS_ioctl+0x79/0x90
	 entry_SYSCALL_64_fastpath+0x23/0xc6
	 ? trace_hardirqs_off_caller+0x1f/0x140

Starting with v4.14 (specifically 86d5f994 ("btrfs: convert prelimary
reference tracking to use rbtrees")) the WARN_ON occurs three orders of
magnitude more frequently--almost once per second while running workloads
like bees.

Replace the WARN_ON() with a comment rationale for its removal.
The rationale is paraphrased from an explanation by Edmund Nadolski
<enadolski@suse.de> on the linux-btrfs mailing list.

Fixes: 8da6d581 ("Btrfs: added btrfs_find_all_roots()")
Signed-off-by: NZygo Blaxell <ce3g8jdj@umail.furryterror.org>
Reviewed-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c8195a7b

22 1月, 2018 1 次提交

btrfs: make function update_share_count static · ccc8dc75

由 Colin Ian King 提交于 11月 30, 2017

The function update_share_count is local to the source and does
not need to be in global scope, so make it static.

Cleans up sparse warning:
fs/btrfs/backref.c:219:6: warning: symbol 'update_share_count' was not
declared. Should it be static?
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ccc8dc75

02 11月, 2017 2 次提交

btrfs: track refs in a rb_tree instead of a list · 0e0adbcf

由 Josef Bacik 提交于 10月 19, 2017

If we get a significant amount of delayed refs for a single block (think
modifying multiple snapshots) we can end up spending an ungodly amount
of time looping through all of the entries trying to see if they can be
merged. This is because we only add them to a list, so we have O(2n)
for every ref head. This doesn't make any sense as we likely have refs
for different roots, and so they cannot be merged. Tracking in a tree
will allow us to break as soon as we hit an entry that doesn't match,
making our worst case O(n).

With this we can also merge entries more easily. Before we had to hope
that matching refs were on the ends of our list, but with the tree we
can search down to exact matches and merge them at insert time.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

0e0adbcf

btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents · c995ab3c

由 Zygo Blaxell 提交于 9月 22, 2017

The LOGICAL_INO ioctl provides a backward mapping from extent bytenr and
offset (encoded as a single logical address) to a list of extent refs.
LOGICAL_INO complements TREE_SEARCH, which provides the forward mapping
(extent ref -> extent bytenr and offset, or logical address).  These are
useful capabilities for programs that manipulate extents and extent
references from userspace (e.g. dedup and defrag utilities).

When the extents are uncompressed (and not encrypted and not other),
check_extent_in_eb performs filtering of the extent refs to remove any
extent refs which do not contain the same extent offset as the 'logical'
parameter's extent offset.  This prevents LOGICAL_INO from returning
references to more than a single block.

To find the set of extent references to an uncompressed extent from [a, b),
userspace has to run a loop like this pseudocode:

	for (i = a; i < b; ++i)
		extent_ref_set += LOGICAL_INO(i);

At each iteration of the loop (up to 32768 iterations for a 128M extent),
data we are interested in is collected in the kernel, then deleted by
the filter in check_extent_in_eb.

When the extents are compressed (or encrypted or other), the 'logical'
parameter must be an extent bytenr (the 'a' parameter in the loop).
No filtering by extent offset is done (or possible?) so the result is
the complete set of extent refs for the entire extent.  This removes
the need for the loop, since we get all the extent refs in one call.

Add an 'ignore_offset' argument to iterate_inodes_from_logical,
[...several levels of function call graph...], and check_extent_in_eb, so
that we can disable the extent offset filtering for uncompressed extents.
This flag can be set by an improved version of the LOGICAL_INO ioctl to
get either behavior as desired.

There is no functional change in this patch.  The new flag is always
false.
Signed-off-by: NZygo Blaxell <ce3g8jdj@umail.furryterror.org>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ minor coding style fixes ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c995ab3c

30 10月, 2017 1 次提交

btrfs: remove delayed_ref_node from ref_head · d278850e

由 Josef Bacik 提交于 9月 29, 2017

This is just excessive information in the ref_head, and makes the code
complicated.  It is a relic from when we had the heads and the refs in
the same tree, which is no longer the case.  With this removal I've
cleaned up a bunch of the cruft around this old assumption as well.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

d278850e

21 8月, 2017 1 次提交

Btrfs: convert to use btrfs_get_extent_inline_ref_type · 3de28d57

由 Liu Bo 提交于 8月 18, 2017

Since we have a helper which can do sanity check, this converts all
btrfs_extent_inline_ref_type to it.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

3de28d57

16 8月, 2017 11 次提交

btrfs: clean up extraneous computations in add_delayed_refs · 01747e92

由 Edmund Nadolski 提交于 7月 12, 2017

Repeating the same computation in multiple places is not
necessary.
Signed-off-by: NEdmund Nadolski <enadolski@suse.com>
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

01747e92

btrfs: allow backref search checks for shared extents · 3ec4d323

由 Edmund Nadolski 提交于 7月 12, 2017

When called with a struct share_check, find_parent_nodes()
will detect a shared extent and immediately return with
BACKREF_SHARED_FOUND.
Signed-off-by: NEdmund Nadolski <enadolski@suse.com>
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

3ec4d323

btrfs: add cond_resched() calls when resolving backrefs · 9dd14fd6

由 Edmund Nadolski 提交于 7月 12, 2017

Since backref resolution is CPU-intensive, the cond_resched calls
should help alleviate soft lockup occurences.
Signed-off-by: NEdmund Nadolski <enadolski@suse.com>
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

9dd14fd6

btrfs: backref, add tracepoints for prelim_ref insertion and merging · 00142756

由 Jeff Mahoney 提交于 7月 12, 2017

This patch adds a tracepoint event for prelim_ref insertion and
merging.  For each, the ref being inserted or merged and the count
of tree nodes is issued.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

00142756

btrfs: add a node counter to each of the rbtrees · 6c336b21

由 Jeff Mahoney 提交于 7月 12, 2017

This patch adds counters to each of the rbtrees so that we can tell
how large they are growing for a given workload.  These counters
will be exported by tracepoints in the next patch.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

6c336b21

btrfs: convert prelimary reference tracking to use rbtrees · 86d5f994

由 Edmund Nadolski 提交于 7月 12, 2017

It's been known for a while that the use of multiple lists
that are periodically merged was an algorithmic problem within
btrfs.  There are several workloads that don't complete in any
reasonable amount of time (e.g. btrfs/130) and others that cause
soft lockups.

The solution is to use a set of rbtrees that do insertion merging
for both indirect and direct refs, with the former converting
refs into the latter.  The result is a btrfs/130 workload that
used to take several hours now takes about half of that. This
runtime still isn't acceptable and a future patch will address that
by moving the rbtrees higher in the stack so the lookups can be
shared across multiple calls to find_parent_nodes.
Signed-off-by: NEdmund Nadolski <enadolski@suse.com>
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

86d5f994

btrfs: remove ref_tree implementation from backref.c · f6954245

由 Edmund Nadolski 提交于 6月 28, 2017

Commit afce772e ("btrfs: fix check_shared for fiemap ioctl") added
the ref_tree code in backref.c to reduce backref searching for
shared extents under the FIEMAP ioctl. This code will not be
compatible with the upcoming rbtree changes for improved backref
searching, so this patch removes the ref_tree code.  The rbtree
changes will provide the equivalent functionality for FIEMAP.

The above commit also introduced transaction semantics around calls to
btrfs_check_shared() in order to accurately account for delayed refs.
This functionality needs to be retained, so a complete revert of the
above commit is not desirable. This patch therefore removes the
ref_tree portion of the commit as above, however it does not remove
the transaction portion.
Signed-off-by: NEdmund Nadolski <enadolski@suse.com>
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

f6954245

btrfs: btrfs_check_shared should manage its own transaction · bb739cf0

由 Edmund Nadolski 提交于 6月 28, 2017

Commit afce772e ("btrfs: fix check_shared for fiemap ioctl") added
transaction semantics around calls to btrfs_check_shared() in order to
provide accurate accounting of delayed refs. The transaction management
should be done inside btrfs_check_shared(), so that callers do not need
to manage transactions individually.
Signed-off-by: NEdmund Nadolski <enadolski@suse.com>
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

bb739cf0

btrfs: backref, cleanup __ namespace abuse · e0c476b1

由 Jeff Mahoney 提交于 6月 28, 2017

We typically use __ to indicate a helper routine that shouldn't be
called directly without understanding the proper context required
to do so. We use static functions to indicate that a function is
private to a particular C file. The backref code uses static
function and __ prefixes on nearly everything, which makes the code
difficult to read and establishes a pattern for future code that
shouldn't be followed. This patch drops all the unnecessary prefixes.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e0c476b1

btrfs: backref, add unode_aux_to_inode_list helper · 4dae077a

由 Jeff Mahoney 提交于 6月 28, 2017

Replacing the double cast and ternary conditional with a helper makes
the code easier on the eyes.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

4dae077a

btrfs: backref, constify some arguments · 73980bec

由 Jeff Mahoney 提交于 6月 28, 2017

This constifies a few buffers used in the backref code.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

73980bec

20 6月, 2017 1 次提交

btrfs: use GFP_KERNEL in init_ipath · f54de068

由 David Sterba 提交于 5月 31, 2017

Now that init_ipath is called either from a safe context or with
memalloc_nofs protection, we can switch to GFP_KERNEL allocations in
init_path and init_data_container.
Reviewed-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

f54de068

18 4月, 2017 3 次提交

btrfs: replace hardcoded value with SEQ_LAST macro · de47c9d3

由 Edmund Nadolski 提交于 3月 16, 2017

Define the SEQ_LAST macro to replace (u64)-1 in places where said
value triggers a special-case ref search behavior.
Signed-off-by: NEdmund Nadolski <enadolski@suse.com>
Reviewed-by: NJeff Mahoney <jeffm@suse.com>
Reviewed-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

de47c9d3

btrfs: provide enumeration for __merge_refs mode argument · f58d88b3

由 Edmund Nadolski 提交于 3月 16, 2017

Replace hardcoded numeric values for __merge_refs 'mode' argument
with descriptive constants.
Signed-off-by: NEdmund Nadolski <enadolski@suse.com>
Reviewed-by: NJeff Mahoney <jeffm@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

f58d88b3

btrfs: convert btrfs_delayed_ref_node.refs from atomic_t to refcount_t · 6df8cdf5

由 Elena Reshetova 提交于 3月 03, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

6df8cdf5

17 2月, 2017 1 次提交
- D
  btrfs: remove unused parameter from __add_inline_refs · eeac44cb
  由 David Sterba 提交于 2月 10, 2017
```
Never used.
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
  eeac44cb
14 2月, 2017 1 次提交

Btrfs: pass delayed_refs directly to btrfs_find_delayed_ref_head · f72ad18e

由 Liu Bo 提交于 1月 30, 2017

All we need is @delayed_refs, all callers have get it ahead of calling
btrfs_find_delayed_ref_head since lock needs to be acquired firstly,
there is no reason to deference it again inside the function.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

f72ad18e

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功