提交 · 21633fc6037f8ceb2bb927dacc3f9ef46ccae891 · openanolis / cloud-kernel

11 6月, 2015 5 次提交

btrfs: qgroup: Add new function to record old_roots. · 3b7d00f9

由 Qu Wenruo 提交于 4月 16, 2015

Add function btrfs_qgroup_prepare_account_extents() to get old_roots
which are needed for qgroup.

We do it in commit_transaction() and before switch_roots(), and only
search commit_root, so it gives a quite accurate view for previous
transaction.

With old_roots from previous transaction, we can use it to do accurate
account with current transaction.
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

3b7d00f9

btrfs: qgroup: Record possible quota-related extent for qgroup. · 3368d001

由 Qu Wenruo 提交于 4月 16, 2015

Add hook in add_delayed_ref_head() to record quota-related extent record
into delayed_ref_root->dirty_extent_record rb-tree for later qgroup
accounting.
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

3368d001

btrfs: qgroup: Add function qgroup_update_counters(). · 823ae5b8

由 Qu Wenruo 提交于 4月 12, 2015

Add function qgroup_update_counters(), which will update related
qgroups' rfer/excl according to old/new_roots.

This is one of the two core functions for the new qgroup implement.

This is based on btrfs_adjust_coutners() but with clearer logic and
comment.
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

823ae5b8

btrfs: qgroup: Add function qgroup_update_refcnt(). · d810ef2b

由 Qu Wenruo 提交于 4月 12, 2015

This function is used to update refcnt for qgroups.
And is one of the two core functions used in the new qgroup implement.

This is based on the old update_old/new_refcnt, but provides a unified
logic and behavior.
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

d810ef2b

btrfs: qgroup: Cleanup open-coded old/new_refcnt update and read. · 9c542136

由 Qu Wenruo 提交于 3月 12, 2015

Use inline functions to do such things, to improve readability.
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Acked-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

9c542136

03 6月, 2015 1 次提交

btrfs: qgroup: Fix possible leak in btrfs_add_qgroup_relation() · ab3680dd

由 Christian Engelmayer 提交于 5月 02, 2015

Commit 9c8b35b1 ("btrfs: quota: Automatically update related qgroups or
mark INCONSISTENT flags when assigning/deleting a qgroup relations.")
introduced the allocation of a temporary ulist in function
btrfs_add_qgroup_relation() and added the corresponding cleanup to the out
path. However, the allocation was introduced before the src/dst level check
that directly returns. Fix the possible leakage of the ulist by moving the
allocation after the input validation. Detected by Coverity CID 1295988.
Signed-off-by: NChristian Engelmayer <cengelma@gmx.at>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

ab3680dd

13 4月, 2015 15 次提交

btrfs: quota: Automatically update related qgroups or mark INCONSISTENT flags... · 9c8b35b1

由 Qu Wenruo 提交于 2月 27, 2015

btrfs: quota: Automatically update related qgroups or mark INCONSISTENT flags when assigning/deleting a qgroup relations.

Operation like qgroups assigning/deleting qgroup relations will mostly
cause qgroup data inconsistent, since it needs to do the full rescan to
determine whether shared extents are exclusive or still shared in
parent qgroups.

But there are some exceptions, like qgroup with only exclusive extents
(qgroup->excl == qgroup->rfer), in that case, we only needs to
modify all its parents' excl and rfer.

So this patch adds a quick path for such qgroup in qgroup
assign/remove routine, and if quick path failed, the qgroup status will
be marked INCONSISTENT, and return 1 to info user-land.

BTW since the quick path is much the same of qgroup_excl_accounting(),
so move the core of it to __qgroup_excl_accounting() and reuse it.
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NDongsheng Yang <yangds.fnst@cn.fujitsu.com>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

9c8b35b1

btrfs: qgroup: clear STATUS_FLAG_ON in disabling quota. · 8ea0ec9e

由 Dongsheng Yang 提交于 2月 27, 2015

we forgot to clear STATUS_FLAG_ON in quota_disable(), it
will cause a problem shown as below:

	# mount /dev/sdc /mnt
	# btrfs quota enable /mnt
	# btrfs quota disable /mnt
	# btrfs quota rescan /mnt
	quota rescan started <--- expecting it fail here.
	# echo $?
	0
Signed-off-by: NDongsheng Yang <yangds.fnst@cn.fujitsu.com>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

8ea0ec9e

btrfs: Update btrfs qgroup status item when rescan is done. · 53b7cde9

由 Qu Wenruo 提交于 2月 27, 2015

Update qgroup status when rescan is done.

Before this patch, status item is not updated on rescan finish, which
causing the RESCAN and INCONSISTENT flags never cleared.
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

53b7cde9

btrfs: qgroup: Fix dead judgement on qgroup_rescan_leaf() return value. · 3393168d

由 Qu Wenruo 提交于 2月 27, 2015

Old qgroup_rescan_leaf() comment indicates ret == 2 as complete and
cleared INCONSISTENT flag.

This is not true since it will never return 2, and inside it no codes
will clear INCONSISTENT flag.
The flag clearance is done in btrfs_qgroup_rescan_work().
This caused the bug that INCONSISTENT flag is never cleared.

So change the comment and fix the dead judgment.
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

3393168d

btrfs: Check qgroup level in kernel qgroup assign. · 8465ecec

由 Qu Wenruo 提交于 2月 27, 2015

Although we have qgroup level check in btrfs-progs, it's not enough
since other programe may still call ioctl directly not using
btrfs-progs. For example, systemd.

But it's btrfs-progs to be blame since we don't provide a
full-function(like subvolume create things) btrfs library with enough
check, and only rely on kernel ioctl.

So Add level checks in kernel too.
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

8465ecec

btrfs: qgroup: allow to remove qgroup which has parent but no child. · f5a6b1c5

由 Dongsheng Yang 提交于 11月 24, 2014

When a qgroup has parents but no child, it should be removable in
Theory I think. But currently, we can not remove it when it has
either parent or child.

Example:
	# btrfs quota enable /mnt
	# btrfs qgroup create 1/0 /mnt
	# btrfs qgroup create 2/0 /mnt
	# btrfs qgroup assign 1/0 2/0 /mnt
	# btrfs qgroup show -pcre /mnt
qgroupid rfer  excl  max_rfer max_excl parent  child
-------- ----  ----  -------- -------- ------  -----
0/5      16384 16384 0        0        ---     ---
1/0      0     0     0        0        2/0     ---
2/0      0     0     0        0        ---     1/0

At this time, there is no subvol or qgroup depending on it.
Just a qgroup 2/0 is its parent, but 2/0 can work well without
1/0. So I think 1/0 should be removalbe. But:
	# btrfs qgroup destroy 1/0 /mnt
ERROR: unable to destroy quota group: Device or resource busy

This patch remove the check of qgroup->parent in removing it,
then we can remove a qgroup when it has a parent.
Signed-off-by: NDongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

f5a6b1c5

btrfs: qgroup: return EINVAL if level of parent is not higher than child's. · 09870d27

由 Dongsheng Yang 提交于 11月 11, 2014

When we create a subvol inheriting a qgroup, we need to check the level
of them. Otherwise, there is a chance a qgroup can inherit another qgroup
at the same level.
Signed-off-by: NDongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

09870d27

btrfs: qgroup: do a reservation in a higher level. · e2d1f923

由 Dongsheng Yang 提交于 2月 06, 2015

There are two problems in qgroup:

a). The PAGE_CACHE is 4K, even when we are writing a data of 1K,
qgroup will reserve a 4K size. It will cause the last 3K in a qgroup
is not available to user.

b). When user is writing a inline data, qgroup will not reserve it,
it means this is a window we can exceed the limit of a qgroup.

The main idea of this patch is reserving the data size of write_bytes
rather than the reserve_bytes. It means qgroup will not care about
the data size btrfs will reserve for user, but only care about the
data size user is going to write. Then reserve it when user want to
write and release it in transaction committed.

In this way, qgroup can be released from the complex procedure in
btrfs and only do the reserve when user want to write and account
when the data is written in commit_transaction().
Signed-off-by: NDongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

e2d1f923

Btrfs: qgroup: Introduce a may_use to account space_info->bytes_may_use. · 31193213

由 Dongsheng Yang 提交于 12月 12, 2014

Currently, for pre_alloc or delay_alloc, the bytes will be accounted
in space_info by the three guys.
space_info->bytes_may_use --- space_info->reserved --- space_info->used.
But on the other hand, in qgroup, there are only two counters to account the
bytes, qgroup->reserved and qgroup->excl. And qg->reserved accounts
bytes in space_info->bytes_may_use and qg->excl accounts bytes in
space_info->used. So the bytes in space_info->reserved is not accounted
in qgroup. If so, there is a window we can exceed the quota limit when
bytes is in space_info->reserved.

Example:
	# btrfs quota enable /mnt
	# btrfs qgroup limit -e 10M /mnt
	# for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
	# sync
	# btrfs qgroup show -pcre /mnt
qgroupid rfer     excl     max_rfer max_excl parent  child
-------- ----     ----     -------- -------- ------  -----
0/5      20987904 20987904 0        10485760 ---     ---

qg->excl is 20987904 larger than max_excl 10485760.

This patch introduce a new counter named may_use to qgroup, then
there are three counters in qgroup to account bytes in space_info
as below.
space_info->bytes_may_use --- space_info->reserved --- space_info->used.
qgroup->may_use           --- qgroup->reserved     --- qgroup->excl

With this patch applied:
	# btrfs quota enable /mnt
	# btrfs qgroup limit -e 10M /mnt
	# for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
fallocate: /mnt/data9: fallocate failed: Disk quota exceeded
fallocate: /mnt/data10: fallocate failed: Disk quota exceeded
fallocate: /mnt/data11: fallocate failed: Disk quota exceeded
fallocate: /mnt/data12: fallocate failed: Disk quota exceeded
fallocate: /mnt/data13: fallocate failed: Disk quota exceeded
fallocate: /mnt/data14: fallocate failed: Disk quota exceeded
fallocate: /mnt/data15: fallocate failed: Disk quota exceeded
fallocate: /mnt/data16: fallocate failed: Disk quota exceeded
fallocate: /mnt/data17: fallocate failed: Disk quota exceeded
fallocate: /mnt/data18: fallocate failed: Disk quota exceeded
fallocate: /mnt/data19: fallocate failed: Disk quota exceeded
	# sync
	# btrfs qgroup show -pcre /mnt
qgroupid rfer    excl    max_rfer max_excl parent  child
-------- ----    ----    -------- -------- ------  -----
0/5      9453568 9453568 0        10485760 ---     ---
Reported-by: NCyril SCETBON <cyril.scetbon@free.fr>
Signed-off-by: NDongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

31193213

D
Btrfs: qgroup: cleanup, remove an unsued parameter in btrfs_create_qgroup(). · 4087cf24
由 Dongsheng Yang 提交于 1月 18, 2015
```
Signed-off-by: NDongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>
```
4087cf24

btrfs: qgroup: fix limit args override whole limit struct · 03477d94

由 Dongsheng Yang 提交于 2月 06, 2015

btrfs_limit_group use arg limit to override the old qgroup_limit of
corresponding qgroup. However, we should override part of old qgroup_limit
according to the bit which has been set in arg limit.
Signed-off-by: NFan Chengniang <fancn.fnst@cn.fujitsu.com>
Signed-off-by: NDongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

03477d94

btrfs: qgroup: update limit info in function btrfs_run_qgroups(). · d3001ed3

由 Dongsheng Yang 提交于 11月 20, 2014

When we commit_transaction(), qgroups in btree should be updated.
But, limit info is not considered currently. It will cause a problem
when a qgroup of a snapshot inherit the limit info from srcqgroup,
then there is an inconsistency.
Signed-off-by: NDongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

d3001ed3

btrfs: qgroup: consolidate the parameter of fucntion update_qgroup_limit_item(). · 1510e71c

由 Dongsheng Yang 提交于 11月 20, 2014

Cleanup: Change the parameter of update_qgroup_limit_item() to the family of
update_qgroup_xxx_item().
Signed-off-by: NDongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

1510e71c

btrfs: qgroup: update qgroup in memory at the same time when we update it in btree. · e8c8541a

由 Dongsheng Yang 提交于 11月 20, 2014

When we call btrfs_qgroup_inherit() with BTRFS_QGROUP_INHERIT_SET_LIMITS,
btrfs will update the limit info of qgroup in btree but forget to update
the qgroup in rbtree at the same time. It obviousely will cause an inconsistency.

This patch fix it by updating the rbtree at the same time.
Signed-off-by: NDongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

e8c8541a

btrfs: qgroup: inherit limit info from srcgroup in creating snapshot. · 3eeb4d59

由 Dongsheng Yang 提交于 11月 20, 2014

Currently, when we snapshot a subvol, snapshot will not copy the limits
from srcqgroup.

This patch make the qgroup in snapshot inherit the limit info when create
a snapshot.
Signed-off-by: NDongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

3eeb4d59

27 3月, 2015 1 次提交

Btrfs: change the insertion criteria for the qgroup operations rbtree · bf691960

由 Filipe Manana 提交于 3月 14, 2015

After looking at Liu Bo's recent patch (titled
"Btrfs: fix comp_oper to get right order") I realized the search made by
qgroup_oper_exists() was buggy because its rbtree navigation comparison
function, comp_oper_exist(), only looks at the fields bytenr and ref_root
of a tree node, ignoring the seq field completely. This was wrong because
when we insert a node into the rbtree we use comp_oper(), which takes a
decision based first on bytenr, then on seq and then on the ref_root field.
That means qgroup_oper_exists() could miss the fact that at least one
operation with given bytenr and ref_root exists.

Consider the following simple example of a 3 nodes qgroup operations
rbtree (created using comp_oper before this patch), where each node's key
is a tuple with the shape (bytenr, seq, ref_root, op):

                          [ (4096, 2, 20, op X) ]
                         /                       \
                        /                         \
   [ (4096, 1, 5, op Y) ]                         [ (4096, 3, 10, op Z) ]

qgroup_oper_exists() when called to search for an existing operation for
bytenr 4096 and ref root 10 wouldn't find anything because it would go to
the left subtree instead of the right subtree, since comp_oper_exits()
ignores the seq field completely.

Fix this by changing the insertion navigation function to use the ref_root
field right after using the bytenr field and before using the seq field,
so that qgroup_oper_exists() / comp_oper_exist() work as expected.

This patch applies on top of the patch mentioned above from Liu.
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

bf691960

14 3月, 2015 1 次提交

Btrfs: fix comp_oper to get right order · 48da5f0a

由 Liu Bo 提交于 3月 13, 2015

Case (oper1->seq > oper2->seq) should differ with case (oper1->seq < oper2->seq).
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Reviewed-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

48da5f0a

04 3月, 2015 1 次提交

btrfs: use explicit initializer for seq_elem · 3284da7b

由 David Sterba 提交于 2月 25, 2015

Using {} as initializer for struct seq_elem does not properly initialize
the list_head member, but it currently works because it gets set through
btrfs_get_tree_mod_seq if 'seq' is 0.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

3284da7b

17 2月, 2015 1 次提交

Btrfs: disk-io: replace root args iff only fs_info used · 01d58472

由 Daniel Dressler 提交于 11月 21, 2014

This is the 3rd independent patch of a larger project to cleanup btrfs's
internal usage of btrfs_root. Many functions take btrfs_root only to
grab the fs_info struct.

By requiring a root these functions cause programmer overhead. That
these functions can accept any valid root is not obvious until
inspection.

This patch reduces the specificity of such functions to accept the
fs_info directly.

These patches can be applied independently and thus are not being
submitted as a patch series. There should be about 26 patches by the
project's completion. Each patch will cleanup between 1 and 34 functions
apiece.  Each patch covers a single file's functions.

This patch affects the following function(s):
  1) csum_tree_block
  2) csum_dirty_buffer
  3) check_tree_block_fsid
  4) btrfs_find_tree_block
  5) clean_tree_block
Signed-off-by: NDaniel Dressler <danieru.dressler@gmail.com>
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

01d58472

22 1月, 2015 1 次提交

btrfs: qgroup: move WARN_ON() to the correct location. · 0ee13fe2

由 Yang Dongsheng 提交于 1月 06, 2015

In function qgroup_excl_accounting(), we need to WARN when
qg->excl is less than what we want to free, same to child
and parents. But currently, for parent qgroup, the WARN_ON()
is located after freeing qg->excl. It will WARN out even we
free it normally.

This patch move this WARN_ON() before freeing qg->excl.
Signed-off-by: NDongsheng Yang <yangds.fnst@cn.fujitsu.com>
Reviewed-by: NSatoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

0ee13fe2

02 10月, 2014 1 次提交
- D
  btrfs: move checks for DUMMY_ROOT into a helper · fccb84c9
  由 David Sterba 提交于 9月 29, 2014
```
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
```
  fccb84c9
18 9月, 2014 3 次提交

btrfs: don't go readonly on existing qgroup items · 0b4699dc

由 Mark Fasheh 提交于 8月 18, 2014

btrfs_drop_snapshot() leaves subvolume qgroup items on disk after
completion. This can cause problems with snapshot creation. If a new
snapshot tries to claim the deleted subvolumes id, btrfs will get -EEXIST
from add_qgroup_item() and go read-only. The following commands will
reproduce this problem (assume btrfs is on /dev/sda and is mounted at
/btrfs)

mkfs.btrfs -f /dev/sda
mount -t btrfs /dev/sda /btrfs/
btrfs quota enable /btrfs/
btrfs su sna /btrfs/ /btrfs/snap
btrfs su de /btrfs/snap
sleep 45
umount /btrfs/
mount -t btrfs /dev/sda /btrfs/

We can fix this by catching -EEXIST in add_qgroup_item() and
initializing the existing items. We have the problem of orphaned
relation items being on disk from an old snapshot but that is outside
the scope of this patch.
Signed-off-by: NMark Fasheh <mfasheh@suse.de>
Signed-off-by: NChris Mason <clm@fb.com>

0b4699dc

btrfs: add trace for qgroup accounting · d3982100

由 Mark Fasheh 提交于 7月 17, 2014

We want this to debug qgroup changes on live systems.
Signed-off-by: NMark Fasheh <mfasheh@suse.de>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

d3982100

btrfs: use nodesize everywhere, kill leafsize · 707e8a07

由 David Sterba 提交于 6月 04, 2014

The nodesize and leafsize were never of different values. Unify the
usage and make nodesize the one. Cleanup the redundant checks and
helpers.

Shaves a few bytes from .text:

  text    data     bss     dec     hex filename
852418   24560   23112  900090   dbbfa btrfs.ko.before
851074   24584   23112  898770   db6d2 btrfs.ko.after
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

707e8a07

24 8月, 2014 1 次提交

Btrfs: fix task hang under heavy compressed write · 9e0af237

由 Liu Bo 提交于 8月 15, 2014

This has been reported and discussed for a long time, and this hang occurs in
both 3.15 and 3.16.

Btrfs now migrates to use kernel workqueue, but it introduces this hang problem.

Btrfs has a kind of work queued as an ordered way, which means that its
ordered_func() must be processed in the way of FIFO, so it usually looks like --

normal_work_helper(arg)
    work = container_of(arg, struct btrfs_work, normal_work);

    work->func() <---- (we name it work X)
    for ordered_work in wq->ordered_list
            ordered_work->ordered_func()
            ordered_work->ordered_free()

The hang is a rare case, first when we find free space, we get an uncached block
group, then we go to read its free space cache inode for free space information,
so it will

file a readahead request
    btrfs_readpages()
         for page that is not in page cache
                __do_readpage()
                     submit_extent_page()
                           btrfs_submit_bio_hook()
                                 btrfs_bio_wq_end_io()
                                 submit_bio()
                                 end_workqueue_bio() <--(ret by the 1st endio)
                                      queue a work(named work Y) for the 2nd
                                      also the real endio()

So the hang occurs when work Y's work_struct and work X's work_struct happens
to share the same address.

A bit more explanation,

A,B,C -- struct btrfs_work
arg   -- struct work_struct

kthread:
worker_thread()
    pick up a work_struct from @worklist
    process_one_work(arg)
	worker->current_work = arg;  <-- arg is A->normal_work
	worker->current_func(arg)
		normal_work_helper(arg)
		     A = container_of(arg, struct btrfs_work, normal_work);

		     A->func()
		     A->ordered_func()
		     A->ordered_free()  <-- A gets freed

		     B->ordered_func()
			  submit_compressed_extents()
			      find_free_extent()
				  load_free_space_inode()
				      ...   <-- (the above readhead stack)
				      end_workqueue_bio()
					   btrfs_queue_work(work C)
		     B->ordered_free()

As if work A has a high priority in wq->ordered_list and there are more ordered
works queued after it, such as B->ordered_func(), its memory could have been
freed before normal_work_helper() returns, which means that kernel workqueue
code worker_thread() still has worker->current_work pointer to be work
A->normal_work's, ie. arg's address.

Meanwhile, work C is allocated after work A is freed, work C->normal_work
and work A->normal_work are likely to share the same address(I confirmed this
with ftrace output, so I'm not just guessing, it's rare though).

When another kthread picks up work C->normal_work to process, and finds our
kthread is processing it(see find_worker_executing_work()), it'll think
work C as a collision and skip then, which ends up nobody processing work C.

So the situation is that our kthread is waiting forever on work C.

Besides, there're other cases that can lead to deadlock, but the real problem
is that all btrfs workqueue shares one work->func, -- normal_work_helper,
so this makes each workqueue to have its own helper function, but only a
wraper pf normal_work_helper.

With this patch, I no long hit the above hang.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <clm@fb.com>

9e0af237

21 8月, 2014 1 次提交

btrfs: fix leak in qgroup_subtree_accounting() error path · a3c10895

由 Eric Sandeen 提交于 8月 17, 2014

Coverity pointed this out; in the newly added
qgroup_subtree_accounting(), if btrfs_find_all_roots()
returns an error, we leak at least the parents pointer,
and possibly the roots pointer, depending on what failure
occurs.

If btrfs_find_all_roots() returns an error, we need to
free up all allocations before we return.  "roots" is
initialized to NULL, so it should be safe to free
it unconditionally (ulist_free() handles that case).

Cc: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NMark Fasheh <mfasheh@suse.de>
Signed-off-by: NChris Mason <clm@fb.com>

a3c10895

15 8月, 2014 2 次提交

btrfs: correctly handle return from ulist_add · f90e579c

由 Mark Fasheh 提交于 7月 17, 2014

ulist_add() can return '1' on sucess, which qgroup_subtree_accounting()
doesn't take into account. As a result, that value can be bubbled up to
callers, causing an error to be printed. Fix this by only returning the
value of ulist_add() when it indicates an error.
Signed-off-by: NMark Fasheh <mfasheh@suse.de>
Signed-off-by: NChris Mason <clm@fb.com>

f90e579c

btrfs: qgroup: account shared subtrees during snapshot delete · 1152651a

由 Mark Fasheh 提交于 7月 17, 2014

During its tree walk, btrfs_drop_snapshot() will skip any shared
subtrees it encounters. This is incorrect when we have qgroups
turned on as those subtrees need to have their contents
accounted. In particular, the case we're concerned with is when
removing our snapshot root leaves the subtree with only one root
reference.

In those cases we need to find the last remaining root and add
each extent in the subtree to the corresponding qgroup exclusive
counts.

This patch implements the shared subtree walk and a new qgroup
operation, BTRFS_QGROUP_OPER_SUB_SUBTREE. When an operation of
this type is encountered during qgroup accounting, we search for
any root references to that extent and in the case that we find
only one reference left, we go ahead and do the math on it's
exclusive counts.
Signed-off-by: NMark Fasheh <mfasheh@suse.de>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

1152651a

14 6月, 2014 1 次提交

btrfs: free ulist in qgroup_shared_accounting() error path · d7372780

由 Eric Sandeen 提交于 6月 12, 2014

If tmp = ulist_alloc(GFP_NOFS) fails, we return without
freeing the previously allocated qgroups = ulist_alloc(GFP_NOFS)
and cause a memory leak.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NChris Mason <clm@fb.com>

d7372780

10 6月, 2014 3 次提交

Btrfs: free tmp ulist for qgroup rescan · 2a108409

由 Josef Bacik 提交于 5月 20, 2014

Memory leaks are bad mmkay?
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

2a108409

Btrfs: add sanity tests for new qgroup accounting code · faa2dbf0

由 Josef Bacik 提交于 5月 07, 2014

This exercises the various parts of the new qgroup accounting code. We do some
basic stuff and do some things with the shared refs to make sure all that code
works. I had to add a bunch of infrastructure because I needed to be able to
insert items into a fake tree without having to do all the hard work myself,
hopefully this will be usefull in the future. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

faa2dbf0

Btrfs: rework qgroup accounting · fcebe456

由 Josef Bacik 提交于 5月 13, 2014

Currently qgroups account for space by intercepting delayed ref updates to fs
trees. It does this by adding sequence numbers to delayed ref updates so that
it can figure out how the tree looked before the update so we can adjust the
counters properly. The problem with this is that it does not allow delayed refs
to be merged, so if you say are defragging an extent with 5k snapshots pointing
to it we will thrash the delayed ref lock because we need to go back and
manually merge these things together. Instead we want to process quota changes
when we know they are going to happen, like when we first allocate an extent, we
free a reference for an extent, we add new references etc. This patch
accomplishes this by only adding qgroup operations for real ref changes. We
only modify the sequence number when we need to lookup roots for bytenrs, this
reduces the amount of churn on the sequence number and allows us to merge
delayed refs as we add them most of the time. This patch encompasses a bunch of
architectural changes

1) qgroup ref operations: instead of tracking qgroup operations through the
delayed refs we simply add new ref operations whenever we notice that we need to
when we've modified the refs themselves.

2) tree mod seq: we no longer have this separation of major/minor counters.
this makes the sequence number stuff much more sane and we can remove some
locking that was needed to protect the counter.

3) delayed ref seq: we now read the tree mod seq number and use that as our
sequence. This means each new delayed ref doesn't have it's own unique sequence
number, rather whenever we go to lookup backrefs we inc the sequence number so
we can make sure to keep any new operations from screwing up our world view at
that given point. This allows us to merge delayed refs during runtime.

With all of these changes the delayed ref stuff is a little saner and the qgroup
accounting stuff no longer goes negative in some cases like it was before.
Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

fcebe456

11 3月, 2014 2 次提交

btrfs: Cleanup the "_struct" suffix in btrfs_workequeue · d458b054

由 Qu Wenruo 提交于 2月 28, 2014

Since the "_struct" suffix is mainly used for distinguish the differnt
btrfs_work between the original and the newly created one,
there is no need using the suffix since all btrfs_workers are changed
into btrfs_workqueue.

Also this patch fixed some codes whose code style is changed due to the
too long "_struct" suffix.
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Tested-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <jbacik@fb.com>

d458b054

btrfs: Replace fs_info->qgroup_rescan_worker workqueue with btrfs_workqueue. · fc97fab0

由 Qu Wenruo 提交于 2月 28, 2014

Replace the fs_info->qgroup_rescan_worker with the newly created
btrfs_workqueue.
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Tested-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <jbacik@fb.com>

fc97fab0

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功