- 13 4月, 2015 7 次提交
-
-
由 Dongsheng Yang 提交于
When we exceed quota limit in writing, we will free some reserved extent when we need to drop but not free account in qgroup. It means, each time we exceed quota in writing, there will be some remain space in qg->reserved we can not use any more. If things go on like this, the all space will be ate up. Signed-off-by: NDongsheng Yang <yangds.fnst@cn.fujitsu.com> Reviewed-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Zhao Lei 提交于
Reproduce: while true; do dd if=/dev/zero of=/mnt/btrfs/file count=[75% fs_size] rm /mnt/btrfs/file done Then we can see above loop failed on NO_SPACE. It it long-term problem since very beginning, because delayed-iput after rm are not run. We already have commit_transaction() in alloc_space code, but it is not triggered in above case. This patch trigger commit_transaction() to run delayed-iput and reflash pinned-space to to make write success. It is based on previous fix of delayed-iput in commit_transaction(), need to be applied on top of: btrfs: Fix NO_SPACE bug caused by delayed-iput Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Zhao Lei 提交于
Steps to reproduce: while true; do dd if=/dev/zero of=/btrfs_dir/file count=[fs_size * 75%] rm /btrfs_dir/file sync done And we'll see dd failed because btrfs return NO_SPACE. Reason: Normally, btrfs_commit_transaction() call btrfs_run_delayed_iputs() in end to free fs space for next write, but sometimes it hadn't done work on time, because btrfs-cleaner thread get delayed-iputs from list before, but do iput() after next write. This is log: [ 2569.050776] comm=btrfs-cleaner func=btrfs_evict_inode() begin [ 2569.084280] comm=sync func=btrfs_commit_transaction() call btrfs_run_delayed_iputs() [ 2569.085418] comm=sync func=btrfs_commit_transaction() done btrfs_run_delayed_iputs() [ 2569.087554] comm=sync func=btrfs_commit_transaction() end [ 2569.191081] comm=dd begin [ 2569.790112] comm=dd func=__btrfs_buffered_write() ret=-28 [ 2569.847479] comm=btrfs-cleaner func=add_pinned_bytes() 0 + 32677888 = 32677888 [ 2569.849530] comm=btrfs-cleaner func=add_pinned_bytes() 32677888 + 23834624 = 56512512 ... [ 2569.903893] comm=btrfs-cleaner func=add_pinned_bytes() 943976448 + 21762048 = 965738496 [ 2569.908270] comm=btrfs-cleaner func=btrfs_evict_inode() end Fix: Make btrfs_commit_transaction() wait current running btrfs-cleaner's delayed-iputs() done in end. Test: Use script similar to above(more complex), before patch: 7 failed in 100 * 20 loop. after patch: 0 failed in 100 * 20 loop. Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Zhao Lei 提交于
space_info's value calculation is some complex and easy to cause bug, add WARN_ON() to help debug. Changelog v1->v2: Put WARN_ON()s under the ENOSPC_DEBUG mount option. Suggested by: David Sterba <dsterba@suse.cz> Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Zhao Lei 提交于
Bug1: space_info->bytes_readonly was set to very large(negative) value in btrfs_remove_block_group(). Reason: Current code set block_group_cache->pinned = 0 in btrfs_delete_unused_bgs(), but above space was not counted to space_info->bytes_readonly. Then in btrfs_remove_block_group(): block_group->space_info->bytes_readonly -= block_group->key.offset; We can see following value in trace: btrfs_remove_block_group: pid=2677 comm=btrfs-cleaner WARNING: bytes_readonly=12582912, key.offset=134217728 Bug2: space_info->total_bytes_pinned grow to value larger than fs size. In a 1.2G fs, we can get following trace log: at first: ZL_DEBUG: add_pinned_bytes: pid=2710 comm=sync change total_bytes_pinned flags=1 869793792 + 95944704 = 965738496 after some op: ZL_DEBUG: add_pinned_bytes: pid=2770 comm=sync change total_bytes_pinned flags=1 1780178944 + 95944704 = 1876123648 after some op: ZL_DEBUG: add_pinned_bytes: pid=3193 comm=sync change total_bytes_pinned flags=1 2924568576 + 95551488 = 3020120064 ... Reason: Similar to bug1, we also need to adjust space_info->total_bytes_pinned in above code block. Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Zhao Lei 提交于
If we have any chance to make a successful write, we should not give up. This patch adjust commit-transaction condition from: pinned >= wanted to left + pinned >= wanted Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Zhao Lei 提交于
Old code bypass commit transaction when we don't have enough pinned space, but another case is there exist freed bgs in current transction, it have possibility to make alloc_chunk success. This patch modify the condition to: if (have_free_bg || have_pinned_space) commit_transaction() Confirmed above action by printk before and after patch. Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 11 4月, 2015 7 次提交
-
-
由 Chris Mason 提交于
Near the end of close_ctree, we're calling btrfs_free_block_rsv to free up the orphan rsv. The problem is this call updates the space_info, which has already been freed. This adds a new __ function that directly calls kfree instead of trying to update the space infos. Signed-off-by: NChris Mason <clm@fb.com>
-
由 Chris Mason 提交于
We loop through all of the dirty block groups during commit and write the free space cache. In order to make sure the cache is currect, we do this while no other writers are allowed in the commit. If a large number of block groups are dirty, this can introduce long stalls during the final stages of the commit, which can block new procs trying to change the filesystem. This commit changes the block group cache writeout to take appropriate locks and allow it to run earlier in the commit. We'll still have to redo some of the block groups, but it means we can get most of the work out of the way without blocking the entire FS. Signed-off-by: NChris Mason <clm@fb.com>
-
由 Chris Mason 提交于
Block group cache writeout is currently waiting on the pages for each block group cache before moving on to writing the next one. This commit switches things around to send down all the caches and then wait on them in batches. The end result is much faster, since we're keeping the disk pipeline full. Signed-off-by: NChris Mason <clm@fb.com>
-
由 Josef Bacik 提交于
We're triggering a huge number of commits from btrfs_async_reclaim_metadata_space. These aren't really requried, because everyone calling the async reclaim code is going to end up triggering a commit on their own. Signed-off-by: NChris Mason <clm@fb.com>
-
由 Josef Bacik 提交于
This changes our delayed refs calculations to include the space needed to write back dirty block groups. Signed-off-by: NChris Mason <clm@fb.com>
-
由 Chris Mason 提交于
When truncate starts, it allocates some space in the block reserves so that we'll have enough to update metadata along the way. For very large files, we can easily go through all of that space as we loop through the extents. This changes truncate to refill the space reservation as it progresses through the file. Signed-off-by: NChris Mason <clm@fb.com>
-
由 Josef Bacik 提交于
As we delete large extents, we end up doing huge amounts of COW in order to delete the corresponding crcs. This adds accounting so that we keep track of that space and flushing of delayed refs so that we don't build up too much delayed crc work. This helps limit the delayed work that must be done at commit time and tries to avoid ENOSPC aborts because the crcs eat all the global reserves. Signed-off-by: NChris Mason <clm@fb.com>
-
- 01 4月, 2015 1 次提交
-
-
由 Chris Mason 提交于
The error handling path for alloc_reserved_tree_block is calling btrfs_free_and_pin_reserved_extent with a spinning tree lock held. This might sleep as we allocate extent_state objects: BUG: sleeping function called from invalid context at mm/slub.c:1268 in_atomic(): 1, irqs_disabled(): 0, pid: 11093, name: kworker/u4:7 5 locks held by kworker/u4:7/11093: #0: ("%s-%s""btrfs", name){++++.+}, at: [<ffffffff81091d51>] process_one_work+0x151/0x520 #1: ((&work->normal_work)){+.+.+.}, at: [<ffffffff81091d51>] process_one_work+0x151/0x520 #2: (sb_internal){++++.+}, at: [<ffffffffa003a70e>] start_transaction+0x43e/0x590 [btrfs] #3: (&head_ref->mutex){+.+...}, at: [<ffffffffa0089f8c>] btrfs_delayed_ref_lock+0x4c/0x240 [btrfs] #4: (btrfs-extent-00){++++..}, at: [<ffffffffa007697b>] btrfs_clear_lock_blocking_rw+0x9b/0x150 [btrfs] CPU: 0 PID: 11093 Comm: kworker/u4:7 Tainted: G W 4.0.0-rc6-default+ #246 Hardware name: Intel Corporation Santa Rosa platform/Matanzas, BIOS TSRSCRB1.86C.0047.B00.0610170821 10/17/06 Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] 00000000000004f4 ffff88006dd17848 ffffffff81ab0e3b ffff88006dd17848 ffff88007a944760 ffff88006dd17868 ffffffff8109d516 ffff88006dd17898 0000000000000000 ffff88006dd17898 ffffffff8109d5b2 ffffffff81aba2bb Call Trace: [<ffffffff81ab0e3b>] dump_stack+0x4f/0x6c [<ffffffff8109d516>] ___might_sleep+0xf6/0x140 [<ffffffff8109d5b2>] __might_sleep+0x52/0x90 [<ffffffff81aba2bb>] ? ftrace_call+0x5/0x34 [<ffffffff81196363>] kmem_cache_alloc+0x163/0x1b0 [<ffffffffa0056f31>] ? alloc_extent_state+0x31/0x150 [btrfs] [<ffffffffa0056f20>] ? alloc_extent_state+0x20/0x150 [btrfs] [<ffffffffa0056f31>] alloc_extent_state+0x31/0x150 [btrfs] [<ffffffffa005805b>] __set_extent_bit+0x37b/0x5d0 [btrfs] [<ffffffff81aba2bb>] ? ftrace_call+0x5/0x34 [<ffffffffa005888d>] ? set_extent_bit+0xd/0x30 [btrfs] [<ffffffffa00588a3>] set_extent_bit+0x23/0x30 [btrfs] [<ffffffffa0058e80>] set_extent_dirty+0x20/0x30 [btrfs] [<ffffffffa00195ba>] pin_down_extent+0xaa/0x170 [btrfs] [<ffffffffa001d8ef>] __btrfs_free_reserved_extent+0xcf/0x160 [btrfs] [<ffffffffa0023856>] btrfs_free_and_pin_reserved_extent+0x16/0x20 [btrfs] [<ffffffffa002482a>] __btrfs_run_delayed_refs+0xfca/0x1290 [btrfs] [<ffffffffa0026eae>] btrfs_run_delayed_refs+0x6e/0x2e0 [btrfs] [<ffffffffa0027378>] delayed_ref_async_start+0x48/0xb0 [btrfs] [<ffffffffa006c883>] normal_work_helper+0x83/0x350 [btrfs] [<ffffffffa006cd79>] ? btrfs_extent_refs_helper+0x9/0x20 [btrfs] [<ffffffffa006cd82>] btrfs_extent_refs_helper+0x12/0x20 [btrfs] [<ffffffff81091dcb>] process_one_work+0x1cb/0x520 [<ffffffff81091d51>] ? process_one_work+0x151/0x520 [<ffffffff811c7abf>] ? seq_read+0x3f/0x400 [<ffffffff8109260b>] worker_thread+0x5b/0x4e0 [<ffffffff81097be2>] ? __kthread_parkme+0x12/0xa0 [<ffffffff810925b0>] ? rescuer_thread+0x450/0x450 [<ffffffff81098686>] kthread+0xf6/0x120 [<ffffffff81098590>] ? flush_kthread_worker+0x1b0/0x1b0 [<ffffffff81ab8088>] ret_from_fork+0x58/0x90 [<ffffffff81098590>] ? flush_kthread_worker+0x1b0/0x1b0 ------------[ cut here ]------------ This changes things to free the path first, which will also unlock the extent buffer. Signed-off-by: NChris Mason <clm@fb.com> Reported-by: NDave Sterba <dsterba@suse.cz> Tested-by: NDave Sterba <dsterba@suse.cz>
-
- 27 3月, 2015 1 次提交
-
-
由 Filipe Manana 提交于
While committing a transaction we free the log roots before we write the new super block. Freeing the log roots implies marking the disk location of every node/leaf (metadata extent) as pinned before the new super block is written. This is to prevent the disk location of log metadata extents from being reused before the new super block is written, otherwise we would have a corrupted log tree if before the new super block is written a crash/reboot happens and the location of any log tree metadata extent ended up being reused and rewritten. Even though we pinned the log tree's metadata extents, we were issuing a discard against them if the fs was mounted with the -o discard option, resulting in corruption of the log tree if a crash/reboot happened before writing the new super block - the next time the fs was mounted, during the log replay process we would find nodes/leafs of the log btree with a content full of zeroes, causing the process to fail and require the use of the tool btrfs-zero-log to wipeout the log tree (and all data previously fsynced becoming lost forever). Fix this by not doing a discard when pinning an extent. The discard will be done later when it's safe (after the new super block is committed) at extent-tree.c:btrfs_finish_extent_commit(). Fixes: e688b725 (Btrfs: fix extent pinning bugs in the tree log) CC: <stable@vger.kernel.org> Signed-off-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 18 3月, 2015 1 次提交
-
-
由 Josef Bacik 提交于
I introduced a regression wrt outstanding_extents accounting. These are tricky areas that aren't easily covered by xfstests as we could change MAX_EXTENT_SIZE at any time. So add sanity tests to cover the various conditions that are tricky in order to make sure we don't introduce regressions in the future. Thanks, Signed-off-by: NJosef Bacik <jbacik@fb.com>
-
- 17 3月, 2015 1 次提交
-
-
由 Josef Bacik 提交于
Writing the block group cache will modify the extent tree quite a bit because it truncates the old space cache and pre-allocates new stuff. To try and cut down on the churn lets do the setup dance first, then later on hopefully we can avoid looping with newly dirtied roots. Thanks, Signed-off-by: NJosef Bacik <jbacik@fb.com>
-
- 14 3月, 2015 1 次提交
-
-
由 Josef Bacik 提交于
Direct IO can easily pass in an buffer that is greater than BTRFS_MAX_EXTENT_SIZE, so take this into account when reserving extents in the delalloc reservation code. Thanks, Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 04 3月, 2015 5 次提交
-
-
由 David Sterba 提交于
Switch to div_u64_rem that does type checking and has more obvious semantics than do_div. Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
由 David Sterba 提交于
The divisor is derived from nodesize or PAGE_SIZE, fits into 32bit type. Get rid of a few more do_div instances. Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
由 David Sterba 提交于
Switch to div_u64 if the divisor is a numeric constant or sum of sizeof()s. We can remove a few instances of do_div that has the hidden semtantics of changing the 1st argument. Small power-of-two divisors are converted to bitshifts, large values are kept intact for clarity. Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
由 David Sterba 提交于
Switch to div_u64_rem that does type checking and has more obvious semantics than do_div. Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
由 David Sterba 提交于
The divisor is derived from nodesize or PAGE_SIZE, fits into 32bit type. Get rid of a few more do_div instances. Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
- 03 3月, 2015 1 次提交
-
-
由 Josef Bacik 提交于
Our gluster boxes were hitting a problem where they'd run out of space when updating the block group cache and therefore wouldn't be able to update the free space inode. This is a problem because this is how we invalidate the cache and protect ourselves from errors further down the stack, so if this fails we have to abort the transaction so we make sure we don't end up with stale free space cache. Thanks, Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 21 2月, 2015 1 次提交
-
-
由 David Sterba 提交于
Switch to div_u64 if the divisor is a numeric constant or sum of sizeof()s. We can remove a few instances of do_div that has the hidden semtantics of changing the 1st argument. Small power-of-two divisors are converted to bitshifts, large values are kept intact for clarity. Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
- 17 2月, 2015 2 次提交
-
-
由 Zhao Lei 提交于
int alloc_chunk is never used in this function, remove it. Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
由 Daniel Dressler 提交于
This is the 3rd independent patch of a larger project to cleanup btrfs's internal usage of btrfs_root. Many functions take btrfs_root only to grab the fs_info struct. By requiring a root these functions cause programmer overhead. That these functions can accept any valid root is not obvious until inspection. This patch reduces the specificity of such functions to accept the fs_info directly. These patches can be applied independently and thus are not being submitted as a patch series. There should be about 26 patches by the project's completion. Each patch will cleanup between 1 and 34 functions apiece. Each patch covers a single file's functions. This patch affects the following function(s): 1) csum_tree_block 2) csum_dirty_buffer 3) check_tree_block_fsid 4) btrfs_find_tree_block 5) clean_tree_block Signed-off-by: NDaniel Dressler <danieru.dressler@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
- 15 2月, 2015 2 次提交
-
-
由 Forrest Liu 提交于
Removing large amount of block group in a transaction may encounters BUG_ON() in btrfs_orphan_add(). That is because btrfs_orphan_reserve_metadata() will grab metadata reservation from transaction handle, and btrfs_delete_unused_bgs() didn't reserve metadata for trnasaction handle when delete unused block group. The problem can be reproduce by following script mntpath=/btrfs loopdev=/dev/loop0 filepath=/home/forrest/image umount $mntpath losetup -d $loopdev truncate --size 1000g $filepath losetup $loopdev $filepath mkfs.btrfs -f $loopdev mount $loopdev $mntpath for j in `seq 1 1 1000`; do fallocate -l 1g $mntpath/$j done # wait cleaner thread remove unused block group sleep 300 The call trace that results from the BUG_ON() is: [ 613.093084] ------------[ cut here ]------------ [ 613.097928] kernel BUG at fs/btrfs/inode.c:3142! [ 613.105855] invalid opcode: 0000 [#1] SMP [ 613.112702] Modules linked in: coretemp(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) snd_ens1371(E) snd_ac97_codec(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ppdev(E) ac97_bus(E) ablk_helper(E) gameport(E) cryptd(E) snd_rawmidi(E) snd_seq_device(E) snd_pcm(E) vmw_balloon(E) snd_timer(E) snd(E) soundcore(E) serio_raw(E) vmwgfx(E) ttm(E) drm_kms_helper(E) drm(E) vmw_vmci(E) parport_pc(E) shpchp(E) i2c_piix4(E) mac_hid(E) lp(E) parport(E) btrfs(E) xor(E) raid6_pq(E) hid_generic(E) usbhid(E) hid(E) psmouse(E) ahci(E) libahci(E) e1000(E) mptspi(E) mptscsih(E) mptbase(E) floppy(E) vmw_pvscsi(E) vmxnet3(E) [ 613.144196] CPU: 0 PID: 1480 Comm: btrfs-cleaner Tainted: G E 3.19.0-rc7-custom #2 [ 613.148501] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 [ 613.152694] task: ffff880035cdb1a0 ti: ffff880039cf4000 task.ti: ffff880039cf4000 [ 613.154969] RIP: 0010:[<ffffffffa01441c2>] [<ffffffffa01441c2>] btrfs_orphan_add+0x1d2/0x1e0 [btrfs] [ 613.157780] RSP: 0018:ffff880039cf7c48 EFLAGS: 00010286 [ 613.159560] RAX: 00000000ffffffe4 RBX: ffff88003bd981a0 RCX: ffff88003c9e4000 [ 613.161904] RDX: 0000000000002244 RSI: 0000000000040000 RDI: ffff88003c9e4138 [ 613.164264] RBP: ffff880039cf7c88 R08: 000060ffc0000850 R09: 0000000000000000 [ 613.166507] R10: ffff88003bc4b7a0 R11: ffffea0000eb6740 R12: ffff88003c9c0000 [ 613.168681] R13: ffff88003c102160 R14: ffff88003c9c0458 R15: 0000000000000001 [ 613.170932] FS: 0000000000000000(0000) GS:ffff88003f600000(0000) knlGS:0000000000000000 [ 613.173316] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 613.175227] CR2: 00007f6343537000 CR3: 0000000036329000 CR4: 00000000000407f0 [ 613.177554] Stack: [ 613.178712] ffff880039cf7c88 ffffffffa0182a54 ffff88003c9e4b04 ffff88003c9c7800 [ 613.181297] ffff88003bc4b7a0 ffff88003bd981a0 ffff88003c8db200 ffff88003c2fcc60 [ 613.183782] ffff880039cf7d18 ffffffffa012da97 ffff88003bc4b7a4 ffff88003bc4b7a0 [ 613.186171] Call Trace: [ 613.187493] [<ffffffffa0182a54>] ? lookup_free_space_inode+0x44/0x100 [btrfs] [ 613.189801] [<ffffffffa012da97>] btrfs_remove_block_group+0x137/0x740 [btrfs] [ 613.192126] [<ffffffffa0166912>] btrfs_remove_chunk+0x672/0x780 [btrfs] [ 613.194267] [<ffffffffa012e2ff>] btrfs_delete_unused_bgs+0x25f/0x280 [btrfs] [ 613.196567] [<ffffffffa0135e4c>] cleaner_kthread+0x12c/0x190 [btrfs] [ 613.198687] [<ffffffffa0135d20>] ? check_leaf+0x350/0x350 [btrfs] [ 613.200758] [<ffffffff8108f232>] kthread+0xd2/0xf0 [ 613.202616] [<ffffffff8108f160>] ? kthread_create_on_node+0x180/0x180 [ 613.204738] [<ffffffff8175dabc>] ret_from_fork+0x7c/0xb0 [ 613.206652] [<ffffffff8108f160>] ? kthread_create_on_node+0x180/0x180 [ 613.208741] Code: ff ff 0f 1f 80 00 00 00 00 89 45 c8 3e 80 63 80 fd 48 89 df e8 d0 23 fe ff 8b 45 c8 e9 14 ff ff ff b8 f4 ff ff ff e9 12 ff ff ff <0f> 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 [ 613.216562] RIP [<ffffffffa01441c2>] btrfs_orphan_add+0x1d2/0x1e0 [btrfs] [ 613.218828] RSP <ffff880039cf7c48> [ 613.220382] ---[ end trace 71073106deb8a457 ]--- This patch replace btrfs_join_transaction() with btrfs_start_transaction() in btrfs_delete_unused_bgs() to revent BUG_ON() in btrfs_orphan_add() Signed-off-by: NForrest Liu <forrestl@synology.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Josef Bacik 提交于
On our gluster boxes we stream large tar balls of backups onto our fses. With 160gb of ram this means we get really large contiguous ranges of dirty data, but the way our ENOSPC stuff works is that as long as it's contiguous we only hold metadata reservation for one extent. The problem is we limit our extents to 128mb, so we'll end up with at least 800 extents so our enospc accounting is quite a bit lower than what we need. To keep track of this make sure we increase outstanding_extents for every multiple of the max extent size so we can be sure to have enough reserved metadata space. Thanks, Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 03 2月, 2015 2 次提交
-
-
由 Shaohua Li 提交于
Below test will fail currently: mkfs.ext4 -F /dev/sda btrfs-convert /dev/sda mount /dev/sda /mnt btrfs device add -f /dev/sdb /mnt btrfs balance start -v -dconvert=raid1 -mconvert=raid1 /mnt The reason is there are some block groups with usage 0, but the whole disk hasn't free space to allocate new chunk, so we even can't set such block group readonly. This patch deletes the chunk allocation when setting block group ro. For META, we already have reserve. But for SYSTEM, we don't have, so the check_system_chunk is still required. Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Filipe Manana 提交于
Committing a transaction can race with automatic removal of empty block groups (cleaner kthread), leading to a BUG_ON() in the transaction commit code while running btrfs_finish_extent_commit(). The following sequence diagram shows how it can happen: CPU 1 CPU 2 btrfs_commit_transaction() fs_info->running_transaction = NULL btrfs_finish_extent_commit() find_first_extent_bit() -> found range for block group X in fs_info->freed_extents[] btrfs_delete_unused_bgs() -> found block group X Removed block group X's range from fs_info->freed_extents[] btrfs_remove_chunk() btrfs_remove_block_group(bg X) unpin_extent_range(bg X range) btrfs_lookup_block_group(bg X) -> returns NULL -> BUG_ON() The trace that results from the BUG_ON() is: [48665.187808] ------------[ cut here ]------------ [48665.188032] kernel BUG at fs/btrfs/extent-tree.c:5675! [48665.188032] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC [48665.188032] Modules linked in: dm_flakey dm_mod crc32c_generic btrfs xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop parport_pc evdev microcode [48665.197388] CPU: 2 PID: 31211 Comm: kworker/u32:16 Tainted: G W 3.19.0-rc5-btrfs-next-4+ #1 [48665.197388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [48665.197388] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs] [48665.197388] task: ffff880222011810 ti: ffff8801b56a4000 task.ti: ffff8801b56a4000 [48665.197388] RIP: 0010:[<ffffffffa0350d05>] [<ffffffffa0350d05>] unpin_extent_range+0x6a/0x1ba [btrfs] [48665.197388] RSP: 0018:ffff8801b56a7b88 EFLAGS: 00010246 [48665.197388] RAX: 0000000000000000 RBX: ffff8802143a6000 RCX: ffff8802220120c8 [48665.197388] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff8800a3c140b0 [48665.197388] RBP: ffff8801b56a7bd8 R08: 0000000000000003 R09: 0000000000000000 [48665.197388] R10: 0000000000000000 R11: 000000000000bbac R12: 0000000012e8e000 [48665.197388] R13: ffff8800a3c14000 R14: 0000000000000000 R15: 0000000000000000 [48665.197388] FS: 0000000000000000(0000) GS:ffff88023ec40000(0000) knlGS:0000000000000000 [48665.197388] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [48665.197388] CR2: 00007f065e42f270 CR3: 0000000206f70000 CR4: 00000000000006e0 [48665.197388] Stack: [48665.197388] ffff8801b56a7bd8 0000000012ea0000 01ff8800a3c14138 0000000012e9ffff [48665.197388] ffff880141df3dd8 ffff8802143a6000 ffff8800a3c14138 ffff880141df3df0 [48665.197388] ffff880141df3dd8 0000000000000000 ffff8801b56a7c08 ffffffffa0354227 [48665.197388] Call Trace: [48665.197388] [<ffffffffa0354227>] btrfs_finish_extent_commit+0xb0/0xd9 [btrfs] [48665.197388] [<ffffffffa0366b4b>] btrfs_commit_transaction+0x791/0x92c [btrfs] [48665.197388] [<ffffffffa0352432>] flush_space+0x43d/0x452 [btrfs] [48665.197388] [<ffffffff814295c3>] ? _raw_spin_unlock+0x28/0x33 [48665.197388] [<ffffffffa035255f>] btrfs_async_reclaim_metadata_space+0x118/0x164 [btrfs] [48665.197388] [<ffffffff81059917>] ? process_one_work+0x14b/0x3ab [48665.197388] [<ffffffff810599ac>] process_one_work+0x1e0/0x3ab [48665.197388] [<ffffffff81079fa9>] ? trace_hardirqs_off+0xd/0xf [48665.197388] [<ffffffff8105a55b>] worker_thread+0x210/0x2d0 [48665.197388] [<ffffffff8105a34b>] ? rescuer_thread+0x2c3/0x2c3 [48665.197388] [<ffffffff8105e5c0>] kthread+0xef/0xf7 [48665.197388] [<ffffffff81429682>] ? _raw_spin_unlock_irq+0x2d/0x39 [48665.197388] [<ffffffff8105e4d1>] ? __kthread_parkme+0xad/0xad [48665.197388] [<ffffffff81429dec>] ret_from_fork+0x7c/0xb0 [48665.197388] [<ffffffff8105e4d1>] ? __kthread_parkme+0xad/0xad [48665.197388] Code: 85 f6 74 14 49 8b 06 49 03 46 09 49 39 c4 72 1d 4c 89 f7 e8 83 ec ff ff 4c 89 e6 4c 89 ef e8 1e f1 ff ff 48 85 c0 49 89 c6 75 02 <0f> 0b 49 8b 1e 49 03 5e 09 48 8b [48665.197388] RIP [<ffffffffa0350d05>] unpin_extent_range+0x6a/0x1ba [btrfs] [48665.197388] RSP <ffff8801b56a7b88> [48665.272246] ---[ end trace b9c6ab9957521376 ]--- Fix this by ensuring that unpining the block group's range in btrfs_finish_extent_commit() is done in a synchronized fashion with removing the block group's range from freed_extents[] in btrfs_delete_unused_bgs() This race got introduced with the change: Btrfs: remove empty block groups automatically commit 47ab2a6cSigned-off-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 22 1月, 2015 4 次提交
-
-
由 Liu Bo 提交于
"run_most" is not used anymore. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Reviewed-by: NSatoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Zhao Lei 提交于
1: ref_count is simple than current RBIO_HOLD_BBIO_MAP_BIT flag to keep btrfs_bio's memory in raid56 recovery implement. 2: free function for bbio will make code clean and flexible, plus forced data type checking in compile. Changelog v1->v2: Rename following by David Sterba's suggestion: put_btrfs_bio() -> btrfs_put_bio() get_btrfs_bio() -> btrfs_get_bio() bbio->ref_count -> bbio->refs Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Filipe Manana 提交于
Very often our extent buffer's header generation doesn't match the current transaction's id or it is also referenced by other trees (snapshots), so we don't need the corresponding block group cache object. Therefore only search for it if we are going to use it, so we avoid an unnecessary search in the block groups rbtree (and acquiring and releasing its spinlock). Freeing a tree block is performed when COWing or deleting a node/leaf, which implies we are holding the node/leaf's parent node lock, therefore reducing the amount of time spent when freeing a tree block helps reducing the amount of time we are holding the parent node's lock. For example, for a run of xfstests/generic/083, the block group cache object was needed only 682 times for a total of 226691 calls to free a tree block. Signed-off-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Josef Bacik 提交于
Currently any time we try to update the block groups on disk we will walk _all_ block groups and check for the ->dirty flag to see if it is set. This function can get called several times during a commit. So if you have several terabytes of data you will be a very sad panda as we will loop through _all_ of the block groups several times, which makes the commit take a while which slows down the rest of the file system operations. This patch introduces a dirty list for the block groups that we get added to when we dirty the block group for the first time. Then we simply update any block groups that have been dirtied since the last time we called btrfs_write_dirty_block_groups. This allows us to clean up how we write the free space cache out so it is much cleaner. Thanks, Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 20 1月, 2015 1 次提交
-
-
由 Filipe Manana 提交于
When removing a block group we were deleting it from its space_info's ro_bgs list without the correct protection - the space info's spinlock. Fix this by doing the list delete while holding the spinlock of the corresponding space info, which is the correct lock for any operation on that list. This issue was introduced in the 3.19 kernel by the following change: Btrfs: move read only block groups onto their own list V2 commit 633c0aad I ran into a kernel crash while a task was running statfs, which iterates the space_info->ro_bgs list while holding the space info's spinlock, and another task was deleting it from the same list, without holding that spinlock, as part of the block group remove operation (while running the function btrfs_remove_block_group). This happened often when running the stress test xfstests/generic/038 I recently made. Signed-off-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 03 1月, 2015 1 次提交
-
-
由 Josef Bacik 提交于
We shouldn't BUG_ON() if there is corruption. I hit this while testing my block group patch and the abort worked properly. Thanks, Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 13 12月, 2014 2 次提交
-
-
由 David Sterba 提交于
Finally it's clear that the requested blocksize is always equal to nodesize, with one exception, the superblock. Superblock has fixed size regardless of the metadata block size, but uses the same helpers to initialize sys array/chunk tree and to work with the chunk items. So it pretends to be an extent_buffer for a moment, btrfs_read_sys_array is full of special cases, we're adding one more. Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
由 David Sterba 提交于
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-