提交 · 720f1e2060138855b4a1b1e8aa642f9c7feb6750 · openeuler / Kernel

15 3月, 2013 5 次提交

Btrfs: return as soon as possible when edquot happens · 720f1e20

由 Wang Shilong 提交于 3月 06, 2013

If one of qgroup fails to reserve firstly, we should return immediately,
it is unnecessary to continue check.
Signed-off-by: NWang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

720f1e20

Btrfs: return EIO if we have extent tree corruption · 492104c8

由 Josef Bacik 提交于 3月 08, 2013

The callers of lookup_inline_extent_info all handle getting an error back
properly, so return an error if we have corruption instead of being a jerk and
panicing.  Still WARN_ON() since this is kind of crucial and I've been seeing it
a bit too much recently for my taste, I think we're doing something wrong
somewhere.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

492104c8

btrfs: use rcu_barrier() to wait for bdev puts at unmount · bc178622

由 Eric Sandeen 提交于 3月 09, 2013

Doing this would reliably fail with -EBUSY for me:

# mount /dev/sdb2 /mnt/scratch; umount /mnt/scratch; mkfs.btrfs -f /dev/sdb2
...
unable to open /dev/sdb2: Device or resource busy

because mkfs.btrfs tries to open the device O_EXCL, and somebody still has it.

Using systemtap to track bdev gets & puts shows a kworker thread doing a
blkdev put after mkfs attempts a get; this is left over from the unmount
path:

btrfs_close_devices
	__btrfs_close_devices
		call_rcu(&device->rcu, free_device);
			free_device
				INIT_WORK(&device->rcu_work, __free_device);
				schedule_work(&device->rcu_work);

so unmount might complete before __free_device fires & does its blkdev_put.

Adding an rcu_barrier() to btrfs_close_devices() causes unmount to wait
until all blkdev_put()s are done, and the device is truly free once
unmount completes.

Cc: stable@vger.kernel.org
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

bc178622

Btrfs: remove btrfs_try_spin_lock · d340d247

由 Liu Bo 提交于 3月 11, 2013

Remove a useless function declaration
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

d340d247

Btrfs: get better concurrency for snapshot-aware defrag work · a09a0a70

由 Liu Bo 提交于 3月 11, 2013

Using spinning case instead of blocking will result in better concurrency
overall.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

a09a0a70

07 3月, 2013 3 次提交

Btrfs: improve the delayed inode throttling · de3cb945

由 Chris Mason 提交于 3月 04, 2013

The delayed inode code batches up changes to the btree in hopes of doing
them in bulk.  As the changes build up, processes kick off worker
threads and wait for them to make progress.

The current code kicks off an async work queue item for each delayed
node, which creates a lot of churn.  It also uses a fixed 1 HZ waiting
period for the throttle, which allows us to build a lot of pending
work and can slow down the commit.

This changes us to watch a sequence counter as it is bumped during the
operations.  We kick off fewer work items and have each work item do
more work.
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

de3cb945

Btrfs: fix a mismerge in btrfs_balance() · 3a01aa7a

由 Ilya Dryomov 提交于 3月 06, 2013

Raid56 merge (merge commit e942f883) had mistakenly removed a call to
__cancel_balance(), which resulted in balance not cleaning up after itself
after a successful finish.  (Cleanup includes switching the state, removing
the balance item and releasing mut_ex_op testnset lock.)  Bring it back.
Reported-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

3a01aa7a

Merge branch 'master' of... · 2cc65e3e

由 Chris Mason 提交于 3月 06, 2013

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next into for-linus-3.9

2cc65e3e

06 3月, 2013 1 次提交

Btrfs: enforce min_bytes parameter during extent allocation · 154ea289

由 Chris Mason 提交于 3月 05, 2013

Commit 24542bf7 changed preallocation of
extents to cap the max size we try to allocate.  It's a valid change,
but the extent reservation code is also used by balance, and that
can't tolerate a smaller extent being allocated.

__btrfs_prealloc_file_range already has a min_size parameter, which is
used by relocation to request a specific extent size.  This commit
adds an extra check to enforce that minimum extent size.
Signed-off-by: NChris Mason <chris.mason@fusionio.com>
Reported-by: NStefan Behrens <sbehrens@giantdisaster.de>

154ea289

05 3月, 2013 10 次提交

Btrfs: allow running defrag in parallel to administrative tasks · 9b53157a

由 Stefan Behrens 提交于 3月 04, 2013

Commit 5ac00add added a testnset mutex and code that disallows
running administrative tasks in parallel. It is prevented that
the device add/delete/balance/replace/resize operations are
started in parallel. By mistake, the defragmentation operation
was included in the check for mutually exclusiveness as well.
This is fixed with this commit.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

9b53157a

Btrfs: avoid deadlock on transaction waiting list · 66b6135b

由 Liu Bo 提交于 3月 04, 2013

Only let one trans handle to wait for other handles, otherwise we
will get ABBA issues.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

66b6135b

Btrfs: do not BUG_ON on aborted situation · 0f788c58

由 Liu Bo 提交于 3月 04, 2013

Btrfs balance can easily hit BUG_ON in these places, but we want
to it bail out gracefully after we force the whole filesystem to
readonly.  So we use btrfs_std_error hook in place of BUG_ON.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

0f788c58

Btrfs: do not BUG_ON in prepare_to_reloc · 28818947

由 Liu Bo 提交于 3月 04, 2013

We can bail out from here gracefully instead of a cold BUG_ON.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

28818947

Btrfs: free all recorded tree blocks on error · e1a12670

由 Liu Bo 提交于 3月 04, 2013

We've missed the 'free blocks' part on ENOMEM error.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

e1a12670

Btrfs: build up error handling for merge_reloc_roots · aca1bba6

由 Liu Bo 提交于 3月 04, 2013

We first use btrfs_std_error hook to replace with BUG_ON, and we
also need to cleanup what is left, including reloc roots rbtree
and reloc roots list.
Here we use a helper function to cleanup both rbtree and list, and
since this function can also be used in the balance recover path,
we also make the change as well to keep code simple.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

aca1bba6

Btrfs: check for NULL pointer in updating reloc roots · 8f71f3e0

由 Liu Bo 提交于 3月 04, 2013

Add a check for NULL pointer to avoid invalid reference.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

8f71f3e0

Btrfs: fix unclosed transaction handler when the async transaction commitment fails · 00d71c9c

由 Miao Xie 提交于 3月 04, 2013

If the async transaction commitment failed, we need close the
current transaction handler, or the current transaction will be
blocked to commit because of this orphan handler.

We fix the problem by doing sync transaction commitment, that is
to invoke btrfs_commit_transaction().
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

00d71c9c

Btrfs: fix wrong handle at error path of create_snapshot() when the commit fails · aec8030a

由 Miao Xie 提交于 3月 04, 2013

There are several bugs at error path of create_snapshot() when the
transaction commitment failed.
- access the freed transaction handler. At the end of the
  transaction commitment, the transaction handler was freed, so we
  should not access it after the transaction commitment.
- we were not aware of the error which happened during the snapshot
  creation if we submitted a async transaction commitment.
- pending snapshot access vs pending snapshot free. when something
  wrong happened after we submitted a async transaction commitment,
  the transaction committer would cleanup the pending snapshots and
  free them. But the snapshot creators were not aware of it, they
  would access the freed pending snapshots.

This patch fixes the above problems by:
- remove the dangerous code that accessed the freed handler
- assign ->error if the error happens during the snapshot creation
- the transaction committer doesn't free the pending snapshots,
  just assigns the error number and evicts them before we unblock
  the transaction.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

aec8030a

Btrfs: use set_nlink if our i_nlink is 0 · 9bf7a489

由 Josef Bacik 提交于 3月 01, 2013

We need to inc the nlink of deleted entries when running replay so we can do the
unlink on the fs_root and get everything cleaned up and then have the orphan
cleanup do the right thing. The problem is inc_nlink complains about this, even
thought it still does the right thing. So use set_nlink() if our i_nlink is 0
to keep users from seeing the warnings during log replay. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

9bf7a489

03 3月, 2013 1 次提交

btrfs/raid56: Add missing #include <linux/vmalloc.h> · d7011f5b

由 Geert Uytterhoeven 提交于 3月 03, 2013

tilegx_defconfig:

fs/btrfs/raid56.c: In function 'btrfs_alloc_stripe_hash_table':
fs/btrfs/raid56.c:206:3: error: implicit declaration of function 'vzalloc' [-Werror=implicit-function-declaration]
fs/btrfs/raid56.c:206:9: warning: assignment makes pointer from integer without a cast [enabled by default]
fs/btrfs/raid56.c:226:4: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration]
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

d7011f5b

02 3月, 2013 2 次提交

btrfs: fixup/remove module.h usage as required · 180e001c

由 Paul Gortmaker 提交于 2月 14, 2013

We want to avoid module.h where posible, since it in turn includes
nearly all of header space.  This means removing it where it is not
required, and using export.h where we are only exporting symbols via
EXPORT_SYMBOL and friends.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

180e001c

Btrfs: delete inline extents when we find them during logging · 124fe663

由 Josef Bacik 提交于 3月 01, 2013

Apparently when we do inline extents we allow the data to overlap the last chunk
of the btrfs_file_extent_item, which means that we can possibly have a
btrfs_file_extent_item that isn't actually as large as a btrfs_file_extent_item.
This messes with us when we try to overwrite the extent when logging new extents
since we expect for it to be the right size. To fix this just delete the item
and try to do the insert again which will give us the proper sized
btrfs_file_extent_item. This fixes a panic where map_private_extent_buffer
would blow up because we're trying to write past the end of the leaf. Thanks,

Cc: stable@vger.kernel.org
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

124fe663

01 3月, 2013 14 次提交

btrfs: try harder to allocate raid56 stripe cache · 83c8266a

由 David Sterba 提交于 3月 01, 2013

The stripe hash table is large, starting with allocation order 4 and can go as
high as order 7 in case lock debugging is turned on and structure padding
happens.

Observed mount failure:

mount: page allocation failure: order:7, mode:0x200050
Pid: 8234, comm: mount Tainted: G        W    3.8.0-default+ #267
Call Trace:
 [<ffffffff81114353>] warn_alloc_failed+0xf3/0x140
 [<ffffffff811171d2>] ? __alloc_pages_direct_compact+0x92/0x250
 [<ffffffff81117ac3>] __alloc_pages_nodemask+0x733/0x9d0
 [<ffffffff81152878>] ? cache_alloc_refill+0x3f8/0x840
 [<ffffffff811528bc>] cache_alloc_refill+0x43c/0x840
 [<ffffffff811302eb>] ? is_kernel_percpu_address+0x4b/0x90
 [<ffffffffa00a00ac>] ? btrfs_alloc_stripe_hash_table+0x5c/0x130 [btrfs]
 [<ffffffff811531d7>] kmem_cache_alloc_trace+0x247/0x270
 [<ffffffffa00a00ac>] btrfs_alloc_stripe_hash_table+0x5c/0x130 [btrfs]
 [<ffffffffa003133f>] open_ctree+0xb2f/0x1f90 [btrfs]
 [<ffffffff81397289>] ? string+0x49/0xe0
 [<ffffffff813987b3>] ? vsnprintf+0x443/0x5d0
 [<ffffffffa0007cb6>] btrfs_mount+0x526/0x600 [btrfs]
 [<ffffffff8115127c>] ? cache_alloc_debugcheck_after+0x4c/0x200
 [<ffffffff81162b90>] mount_fs+0x20/0xe0
 [<ffffffff8117db26>] vfs_kern_mount+0x76/0x120
 [<ffffffff811801b6>] do_mount+0x386/0x980
 [<ffffffff8112a5cb>] ? strndup_user+0x5b/0x80
 [<ffffffff81180840>] sys_mount+0x90/0xe0
 [<ffffffff81962e99>] system_call_fastpath+0x16/0x1b
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

83c8266a

Btrfs: cleanup to make the function btrfs_delalloc_reserve_metadata more logic · 88e081bf

由 Wang Shilong 提交于 3月 01, 2013

The original code is a little confusing and not clear, The right
way to deal with the kernel code like this:
		[...]
		if (ret)
			goto out;
		[...]

So i move the common clean_up code to the place labeled with
out_fail, this will be easier to maintain.
Signed-off-by: NWang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

88e081bf

Btrfs: don't call btrfs_qgroup_free if just btrfs_qgroup_reserve fails · a9870c0e

由 Wang Shilong 提交于 3月 01, 2013

commit eb6b88d9 leads into another bug.
If it is just because qgroup_reserve fails, the function btrfs_qgroup_free
should not be called, otherwise, it will cause the wrong quota accounting.
Signed-off-by: NWang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

a9870c0e

Btrfs: remove reduplicate check about root in the function btrfs_clean_quota_tree · 235fdb8e

由 Wang Shilong 提交于 2月 27, 2013

The check work has been done just before the function btrfs_clean_quota_tree
is called, it is not necessary to check it again, remove it.
Signed-off-by: NWang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

235fdb8e

Btrfs: return ENOMEM rather than use BUG_ON when btrfs_alloc_path fails · 84cbe2f7

由 Wang Shilong 提交于 2月 27, 2013

Return ENOMEM rather trigger BUG_ON, fix it.
Signed-off-by: NWang Shilong <wangsl-fnst@cn.fujitsu.com>
Reviewed-by: NMiao Xie <miaox@cn.fujitsu.com>
Reviewed-by: NZach Brown <zab@redhat.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

84cbe2f7

Btrfs: fix missing deleted items in btrfs_clean_quota_tree · 06b3a860

由 Wang Shilong 提交于 2月 27, 2013

Steps to reproduce:

	i=0
	ncases=100

	mkfs.btrfs <disk>
	mount <disk> <mnt>
	btrfs quota enable <mnt>
	btrfs qgroup create 2/1 <mnt>
	while [ $i -le $ncases ]
	do
		btrfs qgroup create 1/$i <mnt>
		btrfs qgroup assign 1/$i 2/1 <mnt>
		i=$(($i+1))
	done

	btrfs quota disable <mnt>
	umount <mnt>
	btrfsck <mnt>

You can also use the commands:
	btrfs-debug-tree <disk> | grep QGROUP

You will find there are still items existed.The reasons why this happens
is because the original code just checks slots[0]==0 and returns.
We try to fix it by deleting the leaf one by one.
Signed-off-by: NWang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

06b3a860

btrfs: use only inline_pages from extent buffer · b8dae313

由 David Sterba 提交于 2月 28, 2013

The nodesize is capped at 64k and there are enough pages preallocated in
extent_buffer::inline_pages. The fallback to kmalloc never happened
because even on the smallest page size considered (4k) inline_pages
covered the needs.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

b8dae313

Btrfs: fix wrong reserved space when deleting a snapshot/subvolume · c58aaad2

由 Miao Xie 提交于 2月 28, 2013

When deleting a snapshot/subvolume, we need remove root ref/backref,
dir entries and update the dir inode, so we must reserve free space
for those operations.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

c58aaad2

Btrfs: fix wrong reserved space in qgroup during snap/subv creation · d5c12070

由 Miao Xie 提交于 2月 28, 2013

There are two problems in the space reservation of the snapshot/
subvolume creation.
- don't reserve the space for the root item insertion
- the space which is reserved in the qgroup is different with
  the free space reservation. we need reserve free space for
  7 items, but in qgroup reservation, we need reserve space only
  for 3 items.

So we implement new metadata reservation functions for the
snapshot/subvolume creation.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

d5c12070

Btrfs: remove unnecessary dget_parent/dput when creating the pending snapshot · e9662f70

由 Miao Xie 提交于 2月 28, 2013

Since we have grabbed the parent inode at the beginning of the
snapshot creation, and both sync and async snapshot creation
release it after the pending snapshots are actually created,
it is safe to access the parent inode directly during the snapshot
creation, we needn't use dget_parent/dput to fix the parent dentry
and get the dir inode.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

e9662f70

btrfs: remove a printk from scan_one_device · 2d8946c5

由 David Sterba 提交于 2月 27, 2013

Dave pointed out that he saw messages from btrfs although there was no
such filesystem on his computers. The automatic device scan is called on
every new blockdevice if the usual distro udev rule set is used. The
printk introduced in 6f60cbd3 was a remainder from copying
portions of code from btrfs_get_bdev_and_sb which is used under
different conditions and the warning makes sense there.
Reported-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

2d8946c5

Btrfs: fix NULL pointer after aborting a transaction · f094ac32

由 Liu Bo 提交于 2月 27, 2013

While doing cleanup work on an aborted transaction, we've set
the global running transaction pointer to NULL _before_ waiting all
other transaction handles to finish, so others'd hit NULL pointer
crash when referencing the global running transaction pointer.

This first sets a hint to avoid new transaction handle joining, then
waits other existing handles to abort or finish so that we can safely
set the above global pointer to NULL.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

f094ac32

Btrfs: fix memory leak of log roots · 3321719e

由 Liu Bo 提交于 2月 27, 2013

When we abort a transaction while fsyncing, we'll skip freeing log roots
part of committing a transaction, which leads to memory leak.

This adds a 'free log roots' in putting super when no more users hold
references on log roots, so it's safe and clean.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

3321719e

Btrfs: copy everything if we've created an inline extent · bdc20e67

由 Josef Bacik 提交于 2月 28, 2013

I noticed while looking into a tree logging bug that we aren't logging inline
extents properly.  Since this requires copying and it shouldn't happen too often
just force us to copy everything for the inode into the tree log when we have an
inline extent.  With this patch we have valid data after a crash when we write
an inline extent.  Thanks,

Cc: stable@vger.kernel.org
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

bdc20e67

27 2月, 2013 4 次提交

btrfs: cleanup for open-coded alignment · fda2832f

由 Qu Wenruo 提交于 2月 26, 2013

Though most of the btrfs codes are using ALIGN macro for page alignment,
there are still some codes using open-coded alignment like the
following:
------
        u64 mask = ((u64)root->stripesize - 1);
        u64 ret = (val + mask) & ~mask;
------
Or even hidden one:
------
        num_bytes = (end - start + blocksize) & ~(blocksize - 1);
------

Sometimes these open-coded alignment is not so easy to understand for
newbie like me.

This commit changes the open-coded alignment to the ALIGN macro for a
better readability.

Also there is a previous patch from David Sterba with similar changes,
but the patch is for 3.2 kernel and seems not merged.
http://www.spinics.net/lists/linux-btrfs/msg12747.html

Cc: David Sterba <dave@jikos.cz>
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

fda2832f

Btrfs: do not change inode flags in rename · 8c4ce81e

由 Liu Bo 提交于 2月 25, 2013

Before we forced to change a file's NOCOW and COMPRESS flag due to
the parent directory's, but this ends up a bad idea, because it
confuses end users a lot about file's NOCOW status, eg. if someone
change a file to NOCOW via 'chattr' and then rename it in the current
directory which is without NOCOW attribute, the file will lose the
NOCOW flag silently.

This diables 'change flags in rename', so from now on we'll only
inherit flags from the parent directory on creation stage while in
other places we can use 'chattr' to set NOCOW or COMPRESS flags.
Reported-by: NMarios Titas <redneb8888@gmail.com>
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

8c4ce81e

Btrfs: use reserved space for creating a snapshot · 2382c5cc

由 Liu Bo 提交于 2月 22, 2013

While inserting dir index and updating inode for a snapshot, we'd
add delayed items which consume trans->block_rsv, if we don't have
any space reserved in this trans handle, we either just return or
reserve space again.

But before creating pending snapshots during committing transaction,
we've done a release on this trans handle, so we don't have space reserved
in it at this stage.

What we're using is block_rsv of pending snapshots which has already
reserved well enough space for both inserting dir index and updating
inode, so we need to set trans handle to indicate that we have space
now.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

2382c5cc

clear chunk_alloc flag on retryable failure · a81cb9a2

由 Alexandre Oliva 提交于 2月 21, 2013

I've experienced filesystem freezes with permanent spikes in the active
process count for quite a while, particularly on filesystems whose
available raw space has already been fully allocated to chunks.

While looking into this, I found a pretty obvious error in
do_chunk_alloc: it sets space_info->chunk_alloc, but if
btrfs_alloc_chunk returns an error other than ENOSPC, it returns leaving
that flag set, which causes any other threads waiting for
space_info->chunk_alloc to become zero to spin indefinitely.

I haven't double-checked that this patch fixes the failure I've observed
fully (it's not exactly trivial to trigger), but it surely is a bug and
the fix is trivial, so... Please put it in :-)

What I saw in that function also happens to explain why in some cases I
see filesystems allocate a huge number of chunks that remain unused
(leading to the scenario above, of not having more chunks to allocate).
It happens for data and metadata, but not necessarily both. I'm
guessing some thread sets the force_alloc flag on the corresponding
space_info, and then several threads trying to get disk space end up
attempting to allocate a new chunk concurrently. All of them will see
the force_alloc flag and bump their local copy of force up to the level
they see first, and they won't clear it even if another thread succeeds
in allocating a chunk, thus clearing the force flag. Then each thread
that observed the force flag will, on its turn, force the allocation of
a new chunk. And any threads that come in while it does that will see
the force flag still set and pick it up, and so on. This sounds like a
problem to me, but... what should the correct behavior be? Clear
force_flag once we copy it to a local force? Reset force to the
incoming value on every loop? Set the flag to our incoming force if we
have it at first, clear our local flag, and move it from the space_info
when we determined that we are the thread that's going to perform the
allocation?

btrfs: clear chunk_alloc flag on retryable failure

From: Alexandre Oliva <oliva@gnu.org>

If btrfs_alloc_chunk fails with e.g. ENOMEM, we exit do_chunk_alloc
without clearing chunk_alloc in space_info. As a result, any further
calls to do_chunk_alloc on that filesystem will start busy-waiting for
chunk_alloc to be cleared, but it never will be. This patch adjusts
do_chunk_alloc so that it clears this flag in case of an error.
Signed-off-by: NAlexandre Oliva <oliva@gnu.org>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

a81cb9a2

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功