提交 · 98ff7b94e4daca8d7ea4352a9a7905cced597a4c · openeuler / Kernel

15 10月, 2018 7 次提交

btrfs: relocation: Cleanup while loop using rbtree_postorder_for_each_entry_safe · 98ff7b94

由 Qu Wenruo 提交于 9月 21, 2018

And add one line comment explaining what we're doing for each loop.
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

98ff7b94

btrfs: qgroup: Only trace data extents in leaves if we're relocating data block group · 3d0174f7

由 Qu Wenruo 提交于 9月 27, 2018

For qgroup_trace_extent_swap(), if we find one leaf that needs to be
traced, we will also iterate all file extents and trace them.

This is OK if we're relocating data block groups, but if we're
relocating metadata block groups, balance code itself has ensured that
both subtree of file tree and reloc tree contain the same contents.

That's to say, if we're relocating metadata block groups, all file
extents in reloc and file tree should match, thus no need to trace them.
This should reduce the total number of dirty extents processed in metadata
block group balance.

[[Benchmark]] (with all previous enhancement)
Hardware:
	VM 4G vRAM, 8 vCPUs,
	disk is using 'unsafe' cache mode,
	backing device is SAMSUNG 850 evo SSD.
	Host has 16G ram.

Mkfs parameter:
	--nodesize 4K (To bump up tree size)

Initial subvolume contents:
	4G data copied from /usr and /lib.
	(With enough regular small files)

Snapshots:
	16 snapshots of the original subvolume.
	each snapshot has 3 random files modified.

balance parameter:
	-m

So the content should be pretty similar to a real world root fs layout.

                     | v4.19-rc1    | w/ patchset    | diff (*)
---------------------------------------------------------------
relocated extents    | 22929        | 22851          | -0.3%
qgroup dirty extents | 227757       | 140886         | -38.1%
time (sys)           | 65.253s      | 37.464s        | -42.6%
time (real)          | 74.032s      | 44.722s        | -39.6%
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

3d0174f7

btrfs: qgroup: Use generation-aware subtree swap to mark dirty extents · 5f527822

由 Qu Wenruo 提交于 9月 27, 2018

Before this patch, with quota enabled during balance, we need to mark
the whole subtree dirty for quota.

E.g.
OO = Old tree blocks (from file tree)
NN = New tree blocks (from reloc tree)

        File tree (src)		          Reloc tree (dst)
            OO (a)                              NN (a)
           /  \                                /  \
     (b) OO    OO (c)                    (b) NN    NN (c)
        /  \  /  \                          /  \  /  \
       OO  OO OO OO (d)                    OO  OO OO NN (d)

For old balance + quota case, quota will mark the whole src and dst tree
dirty, including all the 3 old tree blocks in reloc tree.

It's doable for small file tree or new tree blocks are all located at
lower level.

But for large file tree or new tree blocks are all located at higher
level, this will lead to mark the whole tree dirty, and be unbelievably
slow.

This patch will change how we handle such balance with quota enabled
case.

Now we will search from (b) and (c) for any new tree blocks whose
generation is equal to @last_snapshot, and only mark them dirty.

In above case, we only need to trace tree blocks NN(b), NN(c) and NN(d).
(NN(a) will be traced when COW happens for nodeptr modification).  And
also for tree blocks OO(b), OO(c), OO(d). (OO(a) will be traced when COW
happens for nodeptr modification.)

For above case, we could skip 3 tree blocks, but for larger tree, we can
skip tons of unmodified tree blocks, and hugely speed up balance.

This patch will introduce a new function,
btrfs_qgroup_trace_subtree_swap(), which will do the following main
work:

1) Read out real root eb
   And setup basic dst_path for later calls
2) Call qgroup_trace_new_subtree_blocks()
   To trace all new tree blocks in reloc tree and their counter
   parts in the file tree.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

5f527822

btrfs: relocation: Add basic extent backref related comments for build_backref_tree · fa6ac715

由 Qu Wenruo 提交于 9月 25, 2018

fs/btrfs/relocation.c:build_backref_tree() is some code from 2009 era,
although it works pretty fine, it's not that easy to understand.
Especially combined with the complex btrfs backref format.

This patch adds some basic comment for the backref build part of the
code, making it less hard to read, at least for backref searching part.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

fa6ac715

btrfs: Handle owner mismatch gracefully when walking up tree · 65c6e82b

由 Qu Wenruo 提交于 8月 21, 2018

[BUG]
When mounting certain crafted image, btrfs will trigger kernel BUG_ON()
when trying to recover balance:

  kernel BUG at fs/btrfs/extent-tree.c:8956!
  invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 1 PID: 662 Comm: mount Not tainted 4.18.0-rc1-custom+ #10
  RIP: 0010:walk_up_proc+0x336/0x480 [btrfs]
  RSP: 0018:ffffb53540c9b890 EFLAGS: 00010202
  Call Trace:
   walk_up_tree+0x172/0x1f0 [btrfs]
   btrfs_drop_snapshot+0x3a4/0x830 [btrfs]
   merge_reloc_roots+0xe1/0x1d0 [btrfs]
   btrfs_recover_relocation+0x3ea/0x420 [btrfs]
   open_ctree+0x1af3/0x1dd0 [btrfs]
   btrfs_mount_root+0x66b/0x740 [btrfs]
   mount_fs+0x3b/0x16a
   vfs_kern_mount.part.9+0x54/0x140
   btrfs_mount+0x16d/0x890 [btrfs]
   mount_fs+0x3b/0x16a
   vfs_kern_mount.part.9+0x54/0x140
   do_mount+0x1fd/0xda0
   ksys_mount+0xba/0xd0
   __x64_sys_mount+0x21/0x30
   do_syscall_64+0x60/0x210
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

[CAUSE]
Extent tree corruption.  In this particular case, reloc tree root's
owner is DATA_RELOC_TREE (should be TREE_RELOC), thus its backref is
corrupted and we failed the owner check in walk_up_tree().

[FIX]
It's pretty hard to take care of every extent tree corruption, but at
least we can remove such BUG_ON() and exit more gracefully.

And since in this particular image, DATA_RELOC_TREE and TREE_RELOC share
the same root (which is obviously invalid), we needs to make
__del_reloc_root() more robust to detect such invalid sharing to avoid
possible NULL dereference as root->node can be NULL in this case.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200411Reported-by: NXu Wen <wen.xu@gatech.edu>
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

65c6e82b

btrfs: Remove 'objectid' member from struct btrfs_root · 4fd786e6

由 Misono Tomohiro 提交于 8月 06, 2018

There are two members in struct btrfs_root which indicate root's
objectid: objectid and root_key.objectid.

They are both set to the same value in __setup_root():

  static void __setup_root(struct btrfs_root *root,
                           struct btrfs_fs_info *fs_info,
                           u64 objectid)
  {
    ...
    root->objectid = objectid;
    ...
    root->root_key.objectid = objecitd;
    ...
  }

and not changed to other value after initialization.

grep in btrfs directory shows both are used in many places:
  $ grep -rI "root->root_key.objectid" | wc -l
  133
  $ grep -rI "root->objectid" | wc -l
  55
 (4.17, inc. some noise)

It is confusing to have two similar variable names and it seems
that there is no rule about which should be used in a certain case.

Since ->root_key itself is needed for tree reloc tree, let's remove
'objecitd' member and unify code to use ->root_key.objectid in all places.
Signed-off-by: NMisono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

4fd786e6

btrfs: switch update_size to bool in btrfs_block_rsv_migrate and btrfs_rsv_add_bytes · 3a584174

由 Lu Fengqi 提交于 8月 04, 2018

Using true and false here is closer to the expected semantic than using
0 and 1.  No functional change.
Signed-off-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

3a584174

06 8月, 2018 9 次提交

btrfs: btrfs_iget never returns an is_bad_inode inode · 2e19f1f9

由 Al Viro 提交于 7月 29, 2018

Just get rid of pointless checks.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

2e19f1f9

btrfs: qgroup: Drop root parameter from btrfs_qgroup_trace_subtree · deb40627

由 Lu Fengqi 提交于 7月 18, 2018

The fs_info can be fetched from the transaction handle directly.
Signed-off-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

deb40627

btrfs: relocation: Only remove reloc rb_trees if reloc control has been initialized · 389305b2

由 Qu Wenruo 提交于 7月 03, 2018

Invalid reloc tree can cause kernel NULL pointer dereference when btrfs
does some cleanup of the reloc roots.

It turns out that fs_info::reloc_ctl can be NULL in
btrfs_recover_relocation() as we allocate relocation control after all
reloc roots have been verified.
So when we hit: note, we haven't called set_reloc_control() thus
fs_info::reloc_ctl is still NULL.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=199833Reported-by: NXu Wen <wen.xu@gatech.edu>
Signed-off-by: NQu Wenruo <wqu@suse.com>
Tested-by: NGu Jinxiang <gujx@cn.fujitsu.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

389305b2

btrfs: annotate unlikely branches after V0 extent type removal · 6d8ff4e4

由 David Sterba 提交于 6月 26, 2018

The v0 extent type checks are the right case for the unlikely
annotations as we don't expect to ever see them, so let's give the
compiler some hint.
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

6d8ff4e4

btrfs: Add graceful handling of V0 extents · ba3c2b19

由 Nikolay Borisov 提交于 6月 26, 2018

Following the removal of the v0 handling code let's be courteous and
print an error message when such extents are handled. In the cases
where we have a transaction just abort it, otherwise just call
btrfs_handle_fs_error. Both cases result in the FS being re-mounted RO.

In case the error handling would be too intrusive, leave the BUG_ON in
place, like extent_data_ref_count, other proper handling would catch
that earlier.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ba3c2b19

btrfs: Remove V0 extent support · a79865c6

由 Nikolay Borisov 提交于 6月 21, 2018

The v0 compat code was introduced in commit 5d4f98a2
("Btrfs: Mixed back reference  (FORWARD ROLLING FORMAT CHANGE)") 9
years ago, which was merged in 2.6.31. This means that the code is
there to support filesystems which are _VERY_ old and if you are using
btrfs on such an old kernel, you have much bigger problems. This coupled
with the fact that no one is likely testing/maintining this code likely
means it has bugs lurking. All things considered I think 43 kernel
releases later it's high time this remnant of the past got removed.

This patch removes all code wrapped in #ifdefs but leaves the BUG_ONs in case
we have a v0 with no support intact as a sort of safety-net.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

a79865c6

btrfs: return EUCLEAN if extent_inline_ref type is invalid · af431dcb

由 Su Yue 提交于 6月 22, 2018

If type of extent_inline_ref found is not expected, filesystem may have
been corrupted, should return EUCLEAN instead of EINVAL.
Signed-off-by: NSu Yue <suy.fnst@cn.fujitsu.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

af431dcb

btrfs: Remove fs_info from btrfs_force_chunk_alloc · 43a7e99d

由 Nikolay Borisov 提交于 6月 20, 2018

It can be referenced from the passed transaction handle.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

43a7e99d

btrfs: Remove fs_info from btrfs_inc_block_group_ro · c83488af

由 Nikolay Borisov 提交于 6月 20, 2018

It can be referenced from the passed bg cache.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c83488af

29 5月, 2018 2 次提交

btrfs: fix describe_relocation when printing unknown flags · b78e2b78

由 Anand Jain 提交于 5月 17, 2018

Looks like the original idea was to print the hex of the flags which is
not coded with their flag name. So use the current buf pointer bp
instead of buf.

Reaching the uknown flags should never happen, it's there just in case.

Fixes: ebce0e01 ("btrfs: make block group flags in balance printks human-readable")
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

b78e2b78

btrfs: remove unused fs_info parameter · 9113493e

由 Gu Jinxiang 提交于 4月 26, 2018

Since the commit c6100a4b ("Btrfs: replace tree->mapping with
tree->private_data"), parameter fs_info in alloc_reloc_control is
not used. So remove it.
Signed-off-by: NGu Jinxiang <gujx@cn.fujitsu.com>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

9113493e

26 4月, 2018 1 次提交

btrfs: Fix wrong first_key parameter in replace_path · 17515f1b

由 Qu Wenruo 提交于 4月 23, 2018

Commit 581c1760 ("btrfs: Validate child tree block's level and first
key") introduced new @first_key parameter for read_tree_block(), however
caller in replace_path() is parasing wrong key to read_tree_block().

It should use parameter @first_key other than @key.

Normally it won't expose problem as @key is normally initialzied to the
same value of @first_key we expect.
However in relocation recovery case, @key can be set to (0, 0, 0), and
since no valid key in relocation tree can be (0, 0, 0), it will cause
read_tree_block() to return -EUCLEAN and interrupt relocation recovery.

Fix it by setting @first_key correctly.

Fixes: 581c1760 ("btrfs: Validate child tree block's level and first key")
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

17515f1b

12 4月, 2018 1 次提交

btrfs: replace GPL boilerplate by SPDX -- sources · c1d7c514

由 David Sterba 提交于 4月 03, 2018

Remove GPL boilerplate text (long, short, one-line) and keep the rest,
ie. personal, company or original source copyright statements. Add the
SPDX header.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c1d7c514

31 3月, 2018 2 次提交

btrfs: Validate child tree block's level and first key · 581c1760

由 Qu Wenruo 提交于 3月 29, 2018

We have several reports about node pointer points to incorrect child
tree blocks, which could have even wrong owner and level but still with
valid generation and checksum.

Although btrfs check could handle it and print error message like:
leaf parent key incorrect 60670574592

Kernel doesn't have enough check on this type of corruption correctly.
At least add such check to read_tree_block() and btrfs_read_buffer(),
where we need two new parameters @level and @first_key to verify the
child tree block.

The new @level check is mandatory and all call sites are already
modified to extract expected level from its call chain.

While @first_key is optional, the following call sites are skipping such
check:
1) Root node/leaf
   As ROOT_ITEM doesn't contain the first key, skip @first_key check.
2) Direct backref
   Only parent bytenr and level is known and we need to resolve the key
   all by ourselves, skip @first_key check.

Another note of this verification is, it needs extra info from nodeptr
or ROOT_ITEM, so it can't fit into current tree-checker framework, which
is limited to node/leaf boundary.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

581c1760

btrfs: qgroup: Use separate meta reservation type for delalloc · 43b18595

由 Qu Wenruo 提交于 12月 12, 2017

Before this patch, btrfs qgroup is mixing per-transcation meta rsv with
preallocated meta rsv, making it quite easy to underflow qgroup meta
reservation.

Since we have the new qgroup meta rsv types, apply it to delalloc
reservation.

Now for delalloc, most of its reserved space will use META_PREALLOC qgroup
rsv type.

And for callers reducing outstanding extent like btrfs_finish_ordered_io(),
they will convert corresponding META_PREALLOC reservation to
META_PERTRANS.

This is mainly due to the fact that current qgroup numbers will only be
updated in btrfs_commit_transaction(), that's to say if we don't keep
such placeholder reservation, we can exceed qgroup limitation.

And for callers freeing outstanding extent in error handler, we will
just free META_PREALLOC bytes.

This behavior makes callers of btrfs_qgroup_release_meta() or
btrfs_qgroup_convert_meta() to be aware of which type they are.
So in this patch, btrfs_delalloc_release_metadata() and its callers get
an extra parameter to info qgroup to do correct meta convert/release.

The good news is, even we use the wrong type (convert or free), it won't
cause obvious bug, as prealloc type is always in good shape, and the
type only affects how per-trans meta is increased or not.

So the worst case will be at most metadata limitation can be sometimes
exceeded (no convert at all) or metadata limitation is reached too soon
(no free at all).
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

43b18595

01 3月, 2018 1 次提交

btrfs: Handle btrfs_set_extent_delalloc failure in relocate_file_extent_cluster · 765f3ceb

由 Nikolay Borisov 提交于 1月 31, 2018

Essentially duplicate the error handling from the above block which
handles the !PageUptodate(page) case and additionally clear
EXTENT_BOUNDARY.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

765f3ceb

16 11月, 2017 1 次提交

Btrfs: fix reported number of inode blocks after buffered append writes · e3b8a485

由 Filipe Manana 提交于 11月 04, 2017

The patch from commit a7e3b975 ("Btrfs: fix reported number of inode
blocks") introduced a regression where if we do a buffered write starting
at position equal to or greater than the file's size and then stat(2) the
file before writeback is triggered, the number of used blocks does not
change (unless there's a prealloc/unwritten extent). Example:

  $ xfs_io -f -c "pwrite -S 0xab 0 64K" foobar
  $ du -h foobar
  0	foobar
  $ sync
  $ du -h foobar
  64K	foobar

The first version of that patch didn't had this regression and the second
version, which was the one committed, was made only to address some
performance regression detected by the intel test robots using fs_mark.

This fixes the regression by setting the new delaloc bit in the range, and
doing it at btrfs_dirty_pages() while setting the regular dealloc bit as
well, so that this way we set both bits at once avoiding navigation of the
inode's io tree twice. Doing it at btrfs_dirty_pages() is also the most
meaninful place, as we should set the new dellaloc bit when if we set the
delalloc bit, which happens only if we copied bytes into the pages at
__btrfs_buffered_write().

This was making some of LTP's du tests fail, which can be quickly run
using a command line like the following:

  $ ./runltp -q -p -l /ltp.log -f commands -s du -d /mnt

Fixes: a7e3b975 ("Btrfs: fix reported number of inode blocks")
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e3b8a485

02 11月, 2017 1 次提交

Btrfs: rework outstanding_extents · 8b62f87b

由 Josef Bacik 提交于 10月 19, 2017

Right now we do a lot of weird hoops around outstanding_extents in order
to keep the extent count consistent.  This is because we logically
transfer the outstanding_extent count from the initial reservation
through the set_delalloc_bits.  This makes it pretty difficult to get a
handle on how and when we need to mess with outstanding_extents.

Fix this by revamping the rules of how we deal with outstanding_extents.
Now instead everybody that is holding on to a delalloc extent is
required to increase the outstanding extents count for itself.  This
means we'll have something like this

btrfs_delalloc_reserve_metadata	- outstanding_extents = 1
 btrfs_set_extent_delalloc	- outstanding_extents = 2
btrfs_release_delalloc_extents	- outstanding_extents = 1

for an initial file write.  Now take the append write where we extend an
existing delalloc range but still under the maximum extent size

btrfs_delalloc_reserve_metadata - outstanding_extents = 2
  btrfs_set_extent_delalloc
    btrfs_set_bit_hook		- outstanding_extents = 3
    btrfs_merge_extent_hook	- outstanding_extents = 2
btrfs_delalloc_release_extents	- outstanding_extnets = 1

In order to make the ordered extent transition we of course must now
make ordered extents carry their own outstanding_extent reservation, so
for cow_file_range we end up with

btrfs_add_ordered_extent	- outstanding_extents = 2
clear_extent_bit		- outstanding_extents = 1
btrfs_remove_ordered_extent	- outstanding_extents = 0

This makes all manipulations of outstanding_extents much more explicit.
Every successful call to btrfs_delalloc_reserve_metadata _must_ now be
combined with btrfs_release_delalloc_extents, even in the error case, as
that is the only function that actually modifies the
outstanding_extents counter.

The drawback to this is now we are much more likely to have transient
cases where outstanding_extents is much larger than it actually should
be.  This could happen before as we manipulated the delalloc bits, but
now it happens basically at every write.  This may put more pressure on
the ENOSPC flushing code, but I think making this code simpler is worth
the cost.  I have another change coming to mitigate this side-effect
somewhat.

I also added trace points for the counter manipulation.  These were used
by a bpf script I wrote to help track down leak issues.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

8b62f87b

30 10月, 2017 1 次提交

btrfs: pass root to various extent ref mod functions · 84f7d8e6

由 Josef Bacik 提交于 9月 29, 2017

We need the actual root for the ref verifier tool to work, so change
these functions to pass the root around instead.  This will be used in
a subsequent patch.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

84f7d8e6

26 9月, 2017 1 次提交

btrfs: fix NULL pointer dereference from free_reloc_roots() · bb166d72

由 Naohiro Aota 提交于 8月 25, 2017

__del_reloc_root should be called before freeing up reloc_root->node.
If not, calling __del_reloc_root() dereference reloc_root->node, causing
the system BUG.

Fixes: 6bdf131f ("Btrfs: don't leak reloc root nodes on error")
Cc: <stable@vger.kernel.org> # 4.9
Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

bb166d72

21 8月, 2017 3 次提交

Btrfs: remove BUG_ON in __add_tree_block · cdccee99

由 Liu Bo 提交于 8月 18, 2017

The BUG_ON() can be triggered when the caller is processing an invalid
extent inline ref, e.g.

a shared data ref is offered instead of an extent data ref, such that
it tries to find a non-existent tree block and then btrfs_search_slot
returns 1 for no such item.

This replaces the BUG_ON() with a WARN() followed by calling
btrfs_print_leaf() to show more details about what's going on and
returning -EINVAL to upper callers.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

cdccee99

Btrfs: remove BUG() in add_data_reference · b14c55a1

由 Liu Bo 提交于 8月 18, 2017

Now that we have a helper to report invalid value of extent inline ref
type, we need to quit gracefully instead of throwing out a kernel panic.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

b14c55a1

Btrfs: convert to use btrfs_get_extent_inline_ref_type · 3de28d57

由 Liu Bo 提交于 8月 18, 2017

Since we have a helper which can do sanity check, this converts all
btrfs_extent_inline_ref_type to it.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

3de28d57

16 8月, 2017 1 次提交

btrfs: Remove never reached error handling code in __add_reloc_root · e3f3ad12

由 Nikolay Borisov 提交于 7月 13, 2017

One of the error handling paths in __add_reloc_root contains btrfs_panic()
followed by some other code. As the name implies what it does is print
some error message and call BUG, naturally what follow afterwards is not
invoked. So remove this extra code.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e3f3ad12

30 6月, 2017 3 次提交

btrfs: fix integer overflow in calc_reclaim_items_nr · 6374e57a

由 Chris Mason 提交于 6月 23, 2017

Dave Jones hit a WARN_ON(nr < 0) in btrfs_wait_ordered_roots() with
v4.12-rc6.  This was because commit 70e7af24 made it possible for
calc_reclaim_items_nr() to return a negative number.  It's not really a
bug in that commit, it just didn't go far enough down the stack to find
all the possible 64->32 bit overflows.

This switches calc_reclaim_items_nr() to return a u64 and changes everyone
that uses the results of that math to u64 as well.
Reported-by: NDave Jones <davej@codemonkey.org.uk>
Fixes: 70e7af24 ("Btrfs: fix delalloc accounting leak caused by u32 overflow")
Signed-off-by: NChris Mason <clm@fb.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

6374e57a

btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges · bc42bda2

由 Qu Wenruo 提交于 2月 27, 2017

[BUG]
For the following case, btrfs can underflow qgroup reserved space
at an error path:
(Page size 4K, function name without "btrfs_" prefix)

         Task A                  |             Task B
----------------------------------------------------------------------
Buffered_write [0, 2K)           |
|- check_data_free_space()       |
|  |- qgroup_reserve_data()      |
|     Range aligned to page      |
|     range [0, 4K)          <<< |
|     4K bytes reserved      <<< |
|- copy pages to page cache      |
                                 | Buffered_write [2K, 4K)
                                 | |- check_data_free_space()
                                 | |  |- qgroup_reserved_data()
                                 | |     Range alinged to page
                                 | |     range [0, 4K)
                                 | |     Already reserved by A <<<
                                 | |     0 bytes reserved      <<<
                                 | |- delalloc_reserve_metadata()
                                 | |  And it *FAILED* (Maybe EQUOTA)
                                 | |- free_reserved_data_space()
                                      |- qgroup_free_data()
                                         Range aligned to page range
                                         [0, 4K)
                                         Freeing 4K
(Special thanks to Chandan for the detailed report and analyse)

[CAUSE]
Above Task B is freeing reserved data range [0, 4K) which is actually
reserved by Task A.

And at writeback time, page dirty by Task A will go through writeback
routine, which will free 4K reserved data space at file extent insert
time, causing the qgroup underflow.

[FIX]
For btrfs_qgroup_free_data(), add @reserved parameter to only free
data ranges reserved by previous btrfs_qgroup_reserve_data().
So in above case, Task B will try to free 0 byte, so no underflow.
Reported-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
Tested-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

bc42bda2

btrfs: qgroup: Introduce extent changeset for qgroup reserve functions · 364ecf36

由 Qu Wenruo 提交于 2月 27, 2017

Introduce a new parameter, struct extent_changeset for
btrfs_qgroup_reserved_data() and its callers.

Such extent_changeset was used in btrfs_qgroup_reserve_data() to record
which range it reserved in current reserve, so it can free it in error
paths.

The reason we need to export it to callers is, at buffered write error
path, without knowing what exactly which range we reserved in current
allocation, we can free space which is not reserved by us.

This will lead to qgroup reserved space underflow.
Reviewed-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

364ecf36

20 6月, 2017 1 次提交

Btrfs: replace tree->mapping with tree->private_data · c6100a4b

由 Josef Bacik 提交于 5月 05, 2017

For extent_io tree's we have carried the address_mapping of the inode
around in the io tree in order to pull the inode back out for calling
into various tree ops hooks.  This works fine when everything that has
an extent_io_tree has an inode.  But we are going to remove the
btree_inode, so we need to change this.  Instead just have a generic
void * for private data that we can initialize with, and have all the
tree ops use that instead.  This had a lot of cascading changes but
should be relatively straightforward.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Reviewed-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ minor reordering of the callback prototypes ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c6100a4b

28 2月, 2017 4 次提交
- N
  btrfs: Make btrfs_orphan_add take btrfs_inode · 73f2e545
  由 Nikolay Borisov 提交于 2月 20, 2017
```
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
  73f2e545
- N
  btrfs: Make btrfs_drop_extent_cache take btrfs_inode · dcdbc059
  由 Nikolay Borisov 提交于 2月 20, 2017
```
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
  dcdbc059
- N
  btrfs: all btrfs_delalloc_release_metadata take btrfs_inode · 691fa059
  由 Nikolay Borisov 提交于 2月 20, 2017
```
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
  691fa059
- N
  btrfs: Make btrfs_delalloc_reserve_metadata take btrfs_inode · 9f3db423
  由 Nikolay Borisov 提交于 2月 20, 2017
```
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
  9f3db423
17 2月, 2017 1 次提交

btrfs: free-space-cache, clean up unnecessary root arguments · 77ab86bf

由 Jeff Mahoney 提交于 2月 15, 2017

The free space cache APIs accept a root but always use the tree root.

Also, btrfs_truncate_free_space_cache accepts a root AND an inode but
the inode always points to the root anyway, so let's just pass the inode.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

77ab86bf

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功