- 30 4月, 2019 6 次提交
-
-
由 Qu Wenruo 提交于
There is a report in kernel bugzilla about mismatch file type in dir item and inode item. This inspires us to check inode mode in inode item. This patch will check the following members: - inode key objectid Should be ROOT_DIR_DIR or [256, (u64)-256] or FREE_INO. - inode key offset Should be 0 - inode item generation - inode item transid No newer than sb generation + 1. The +1 is for log tree. - inode item mode No unknown bits. No invalid S_IF* bit. NOTE: S_IFMT check is not enough, need to check every know type. - inode item nlink Dir should have no more link than 1. - inode item flags Reviewed-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The commit fcebe456 ("Btrfs: rework qgroup accounting") reworked qgroups and added some new structures. Another rework of qgroup mechanics e69bcee3 ("btrfs: qgroup: Cleanup the old ref_node-oriented mechanism.") stopped using them and left uncleaned. Reviewed-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
From the introduction of btrfs_(set|clear)_header_flag, there is no usage of its return value. So just make it return void. Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 08 3月, 2019 1 次提交
-
-
由 Rasmus Villemoes 提交于
First, the btrfs_debug macros open-code (one possible definition of) DYNAMIC_DEBUG_BRANCH, so they don't benefit from the CONFIG_JUMP_LABEL optimization. Second, a planned change of struct _ddebug (to reduce its size on 64 bit machines) requires that all descriptors in a translation unit use distinct identifiers. Using the new _dynamic_func_call_no_desc helper macro from dynamic_debug.h takes care of both of these. No functional change. Link: http://lkml.kernel.org/r/20190212214150.4807-12-linux@rasmusvillemoes.dkSigned-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: NDavid Sterba <dsterba@suse.com> Acked-by: NJason Baron <jbaron@akamai.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: "Rafael J . Wysocki" <rafael.j.wysocki@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 2月, 2019 1 次提交
-
-
由 Josef Bacik 提交于
There's a bug in snapshot deletion where we won't update the drop_progress key if we're in the UPDATE_BACKREF stage. This is a problem because we could drop refs for blocks we know don't belong to ours. If we crash or umount at the right time we could experience messages such as the following when snapshot deletion resumes BTRFS error (device dm-3): unable to find ref byte nr 66797568 parent 0 root 258 owner 1 offset 0 ------------[ cut here ]------------ WARNING: CPU: 3 PID: 16052 at fs/btrfs/extent-tree.c:7108 __btrfs_free_extent.isra.78+0x62c/0xb30 [btrfs] CPU: 3 PID: 16052 Comm: umount Tainted: G W OE 5.0.0-rc4+ #147 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./890FX Deluxe5, BIOS P1.40 05/03/2011 RIP: 0010:__btrfs_free_extent.isra.78+0x62c/0xb30 [btrfs] RSP: 0018:ffffc90005cd7b18 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 RDX: ffff88842fade680 RSI: ffff88842fad6b18 RDI: ffff88842fad6b18 RBP: ffffc90005cd7bc8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000001 R11: ffffffff822696b8 R12: 0000000003fb4000 R13: 0000000000000001 R14: 0000000000000102 R15: ffff88819c9d67e0 FS: 00007f08bb138fc0(0000) GS:ffff88842fac0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8f5d861ea0 CR3: 00000003e99fe000 CR4: 00000000000006e0 Call Trace: ? _raw_spin_unlock+0x27/0x40 ? btrfs_merge_delayed_refs+0x356/0x3e0 [btrfs] __btrfs_run_delayed_refs+0x75a/0x13c0 [btrfs] ? join_transaction+0x2b/0x460 [btrfs] btrfs_run_delayed_refs+0xf3/0x1c0 [btrfs] btrfs_commit_transaction+0x52/0xa50 [btrfs] ? start_transaction+0xa6/0x510 [btrfs] btrfs_sync_fs+0x79/0x1c0 [btrfs] sync_filesystem+0x70/0x90 generic_shutdown_super+0x27/0x120 kill_anon_super+0x12/0x30 btrfs_kill_super+0x16/0xa0 [btrfs] deactivate_locked_super+0x43/0x70 deactivate_super+0x40/0x60 cleanup_mnt+0x3f/0x80 __cleanup_mnt+0x12/0x20 task_work_run+0x8b/0xc0 exit_to_usermode_loop+0xce/0xd0 do_syscall_64+0x20b/0x210 entry_SYSCALL_64_after_hwframe+0x49/0xbe To fix this simply mark dead roots we read from disk as DEAD and then set the walk_control->restarted flag so we know we have a restarted deletion. From here whenever we try to drop refs for blocks we check to verify our ref is set on them, and if it is not we skip it. Once we find a ref that is set we unset walk_control->restarted since the tree should be in a normal state from then on, and any problems we run into from there are different issues. I tested this with an existing broken fs and my reproducer that creates a broken fs and it fixed both file systems. Reviewed-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 25 2月, 2019 10 次提交
-
-
由 David Sterba 提交于
The member btrfs_fs_info::scrub_nocow_workers is unused since the nocow optimization was removed from scrub in 9bebe665 ("btrfs: scrub: Remove unused copy_nocow_pages and its callchain"). Reviewed-by: NAnand Jain <anand.jain@oracle.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The scrub worker pointers are not NULL iff the scrub is running, so reset them back once the last reference is dropped. Add assertions to the initial phase of scrub to verify that. Reviewed-by: NAnand Jain <anand.jain@oracle.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Anand Jain 提交于
Use the refcount_t for fs_info::scrub_workers_refcnt instead of int so we get the extra checks. All reference changes are still done under scrub_lock. Signed-off-by: NAnand Jain <anand.jain@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Josef Bacik 提交于
We've done this forever because of the voodoo around knowing how much space we have. However, we have better ways of doing this now, and on normal file systems we'll easily have a global reserve of 512MiB, and since metadata chunks are usually 1GiB that means we'll allocate metadata chunks more readily. Instead use the actual used amount when determining if we need to allocate a chunk or not. This has a side effect for mixed block group fs'es where we are no longer allocating enough chunks for the data/metadata requirements. To deal with this add a ALLOC_CHUNK_FORCE step to the flushing state machine. This will only get used if we've already made a full loop through the flushing machinery and tried committing the transaction. If we have then we can try and force a chunk allocation since we likely need it to make progress. This resolves issues I was seeing with the mixed bg tests in xfstests without the new flushing state. Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NJosef Bacik <josef@toxicpanda.com> [ merged with patch "add ALLOC_CHUNK_FORCE to the flushing code" ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Josef Bacik 提交于
The throttle path doesn't take cleaner_delayed_iput_mutex, which means we could think we're done flushing iputs in the data space reservation path when we could have a throttler doing an iput. There's no real reason to serialize the delayed iput flushing, so instead of taking the cleaner_delayed_iput_mutex whenever we flush the delayed iputs just replace it with an atomic counter and a waitqueue. This removes the short (or long depending on how big the inode is) window where we think there are no more pending iputs when there really are some. The waiting is killable as it could be indirectly called from user operations like fallocate or zero-range. Such call sites should handle the error but otherwise it's not necessary. Eg. flush_space just needs to attempt to make space by waiting on iputs. Signed-off-by: NJosef Bacik <josef@toxicpanda.com> [ add killable comment and changelog parts ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Anders Roxell 提交于
A compiler warning (in a patch in development) pointed to a variable that was used only inside and ASSERT: u64 root_objectid = root->root_key.objectid; ASSERT(root_objectid == ...); fs/btrfs/relocation.c: In function ‘insert_dirty_subv’: fs/btrfs/relocation.c:2138:6: warning: unused variable ‘root_objectid’ [-Wunused-variable] u64 root_objectid = root->root_key.objectid; ^~~~~~~~~~~~~ When CONFIG_BRTFS_ASSERT isn't enabled, variable root_objectid isn't used. Rework the assertion helper by adding a runtime check instead of the '#ifdef CONFIG_BTRFS_ASSERT #else ...", so the compiler sees the condition being passed into an inline function after preprocessing. Signed-off-by: NAnders Roxell <anders.roxell@linaro.org> Reviewed-by: NDavid Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
To allow delayed subtree swap rescan, btrfs needs to record per-root information about which tree blocks get swapped. This patch introduces the required infrastructure. The designed workflow will be: 1) Record the subtree root block that gets swapped. During subtree swap: O = Old tree blocks N = New tree blocks reloc tree subvolume tree X Root Root / \ / \ NA OB OA OB / | | \ / | | \ NC ND OE OF OC OD OE OF In this case, NA and OA are going to be swapped, record (NA, OA) into subvolume tree X. 2) After subtree swap. reloc tree subvolume tree X Root Root / \ / \ OA OB NA OB / | | \ / | | \ OC OD OE OF NC ND OE OF 3a) COW happens for OB If we are going to COW tree block OB, we check OB's bytenr against tree X's swapped_blocks structure. If it doesn't fit any, nothing will happen. 3b) COW happens for NA Check NA's bytenr against tree X's swapped_blocks, and get a hit. Then we do subtree scan on both subtrees OA and NA. Resulting 6 tree blocks to be scanned (OA, OC, OD, NA, NC, ND). Then no matter what we do to subvolume tree X, qgroup numbers will still be correct. Then NA's record gets removed from X's swapped_blocks. 4) Transaction commit Any record in X's swapped_blocks gets removed, since there is no modification to swapped subtrees, no need to trigger heavy qgroup subtree rescan for them. This will introduce 128 bytes overhead for each btrfs_root even qgroup is not enabled. This is to reduce memory allocations and potential failures. Signed-off-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
Relocation code will drop btrfs_root::reloc_root as soon as merge_reloc_root() finishes. However later qgroup code will need to access btrfs_root::reloc_root after merge_reloc_root() for delayed subtree rescan. So alter the timming of resetting btrfs_root:::reloc_root, make it happens after transaction commit. With this patch, we will introduce a new btrfs_root::state, BTRFS_ROOT_DEAD_RELOC_TREE, to info part of btrfs_root::reloc_tree user that although btrfs_root::reloc_tree is still non-NULL, but still it's not used any more. The lifespan of btrfs_root::reloc tree will become: Old behavior | New ------------------------------------------------------------------------ btrfs_init_reloc_root() --- | btrfs_init_reloc_root() --- set reloc_root | | set reloc_root | | | | | | | merge_reloc_root() | | merge_reloc_root() | |- btrfs_update_reloc_root() --- | |- btrfs_update_reloc_root() -+- clear btrfs_root::reloc_root | set ROOT_DEAD_RELOC_TREE | | record root into dirty | | roots rbtree | | | | reloc_block_group() Or | | btrfs_recover_relocation() | | | After transaction commit | | |- clean_dirty_subvols() --- | clear btrfs_root::reloc_root During ROOT_DEAD_RELOC_TREE set lifespan, the only user of btrfs_root::reloc_tree should be qgroup. Since reloc root needs a longer life-span, this patch will also delay btrfs_drop_snapshot() call. Now btrfs_drop_snapshot() is called in clean_dirty_subvols(). This patch will increase the size of btrfs_root by 16 bytes. Signed-off-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
This function is a simple wrapper over btrfs_get_extent that returns either: a) A real extent in the passed range or b) Adjusted extent based on whether delalloc bytes are found backing up a hole. To support these semantics it doesn't need the page/pg_offset/create arguments which are passed to btrfs_get_extent in case an extent is to be created. So simplify the function by removing the unused arguments. No functional changes. Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Since this function is no longer a callback there is no need to have its first argument obfuscated with a void *. Change it directly to a pointer to an inode. No functional changes. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 19 1月, 2019 2 次提交
-
-
由 Josef Bacik 提交于
The cleaner thread usually takes care of delayed iputs, with the exception of the btrfs_end_transaction_throttle path. Delaying iputs means we are potentially delaying the eviction of an inode and it's respective space. The cleaner thread only gets woken up every 30 seconds, or when we require space. If there are a lot of inodes that need to be deleted we could induce a serious amount of latency while we wait for these inodes to be evicted. So instead wakeup the cleaner if it's not already awake to process any new delayed iputs we add to the list. If we suddenly need space we will less likely be backed up behind a bunch of inodes that are waiting to be deleted, and we could possibly free space before we need to get into the flushing logic which will save us some latency. Reviewed-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Josef Bacik 提交于
We weren't doing any of the accounting cleanup when we aborted transactions. Fix this by making cleanup_ref_head_accounting global and calling it from the abort code, this fixes the issue where our accounting was all wrong after the fs aborts. The test generic/475 on a 2G VM can trigger the problems eg.: [ 8502.136957] WARNING: CPU: 0 PID: 11064 at fs/btrfs/extent-tree.c:5986 btrfs_free_block_grou +ps+0x3dc/0x410 [btrfs] [ 8502.148372] CPU: 0 PID: 11064 Comm: umount Not tainted 5.0.0-rc1-default+ #394 [ 8502.150807] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626 +cc-prebuilt.qemu-project.org 04/01/2014 [ 8502.154317] RIP: 0010:btrfs_free_block_groups+0x3dc/0x410 [btrfs] [ 8502.160623] RSP: 0018:ffffb1ab84b93de8 EFLAGS: 00010206 [ 8502.161906] RAX: 0000000001000000 RBX: ffff9f34b1756400 RCX: 0000000000000000 [ 8502.163448] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff9f34b1755400 [ 8502.164906] RBP: ffff9f34b7e8c000 R08: 0000000000000001 R09: 0000000000000000 [ 8502.166716] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9f34b7e8c108 [ 8502.168498] R13: ffff9f34b7e8c158 R14: 0000000000000000 R15: dead000000000100 [ 8502.170296] FS: 00007fb1cf15ffc0(0000) GS:ffff9f34bd400000(0000) knlGS:0000000000000000 [ 8502.172439] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8502.173669] CR2: 00007fb1ced507b0 CR3: 000000002f7a6000 CR4: 00000000000006f0 [ 8502.175094] Call Trace: [ 8502.175759] close_ctree+0x17f/0x350 [btrfs] [ 8502.176721] generic_shutdown_super+0x64/0x100 [ 8502.177702] kill_anon_super+0x14/0x30 [ 8502.178607] btrfs_kill_super+0x12/0xa0 [btrfs] [ 8502.179602] deactivate_locked_super+0x29/0x60 [ 8502.180595] cleanup_mnt+0x3b/0x70 [ 8502.181406] task_work_run+0x98/0xc0 [ 8502.182255] exit_to_usermode_loop+0x83/0x90 [ 8502.183113] do_syscall_64+0x15b/0x180 [ 8502.183919] entry_SYSCALL_64_after_hwframe+0x49/0xbe Corresponding to release_global_block_rsv() { ... WARN_ON(fs_info->delayed_refs_rsv.reserved > 0); CC: stable@vger.kernel.org Signed-off-by: NJosef Bacik <josef@toxicpanda.com> [ add log dump ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 22 12月, 2018 1 次提交
-
-
由 Al Viro 提交于
1) keeping a copy in btrfs_fs_info is completely pointless - we never use it for anything. Getting rid of that allows for simpler calling conventions for setup_security_options() (caller is responsible for freeing mnt_opts in all cases). 2) on remount we want to use ->sb_remount(), not ->sb_set_mnt_opts(), same as we would if not for FS_BINARY_MOUNTDATA. Behaviours *are* close (in fact, selinux sb_set_mnt_opts() ought to punt to sb_remount() in "already initialized" case), but let's handle that uniformly. And the only reason why the original btrfs changes didn't go for security_sb_remount() in btrfs_remount() case is that it hadn't been exported. Let's export it for a while - it'll be going away soon anyway. Reviewed-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 17 12月, 2018 19 次提交
-
-
由 Josef Bacik 提交于
When debugging some weird extent reference bug I suspected that we were changing a snapshot while we were deleting it, which could explain my bug. This was indeed what was happening, and this patch helped me verify my theory. It is never correct to modify the snapshot once it's being deleted, so mark the root when we are deleting it and make sure we complain about it when it happens. Reviewed-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Josef Bacik 提交于
Now with the delayed_refs_rsv we can now know exactly how much pending delayed refs space we need. This means we can drastically simplify btrfs_check_space_for_delayed_refs by simply checking how much space we have reserved for the global rsv (which acts as a spill over buffer) and the delayed refs rsv. If our total size is beyond that amount then we know it's time to commit the transaction and stop any more delayed refs from being generated. With the introduction of dealyed_refs_rsv infrastructure, namely btrfs_update_delayed_refs_rsv we now know exactly how much pending delayed refs space is required. Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Josef Bacik 提交于
A nice thing we gain with the delayed refs rsv is the ability to flush the delayed refs on demand to deal with enospc pressure. Add states to flush delayed refs on demand, and this will allow us to remove a lot of ad-hoc work around checking to see if we should commit the transaction to run our delayed refs. Signed-off-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Josef Bacik 提交于
Traditionally we've had voodoo in btrfs to account for the space that delayed refs may take up by having a global_block_rsv. This works most of the time, except when it doesn't. We've had issues reported and seen in production where sometimes the global reserve is exhausted during transaction commit before we can run all of our delayed refs, resulting in an aborted transaction. Because of this voodoo we have equally dubious flushing semantics around throttling delayed refs which we often get wrong. So instead give them their own block_rsv. This way we can always know exactly how much outstanding space we need for delayed refs. This allows us to make sure we are constantly filling that reservation up with space, and allows us to put more precise pressure on the enospc system. Instead of doing math to see if its a good time to throttle, the normal enospc code will be invoked if we have a lot of delayed refs pending, and they will be run via the normal flushing mechanism. For now the delayed_refs_rsv will hold the reservations for the delayed refs, the block group updates, and deleting csums. We could have a separate rsv for the block group updates, but the csum deletion stuff is still handled via the delayed_refs so that will stay there. Historical background: The global reserve has grown to cover everything we don't reserve space explicitly for, and we've grown a lot of weird ad-hoc heuristics to know if we're running short on space and when it's time to force a commit. A failure rate of 20-40 file systems when we run hundreds of thousands of them isn't super high, but cleaning up this code will make things less ugly and more predictible. Thus the delayed refs rsv. We always know how many delayed refs we have outstanding, and although running them generates more we can use the global reserve for that spill over, which fits better into it's desired use than a full blown reservation. This first approach is to simply take how many times we're reserving space for and multiply that by 2 in order to save enough space for the delayed refs that could be generated. This is a niave approach and will probably evolve, but for now it works. Signed-off-by: NJosef Bacik <jbacik@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> # high-level review [ added background notes from the cover letter ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
After the rw semaphore has been added, the custom blocking using ::blocking_readers and ::read_lock_wq is redundant. The blocking logic in __btrfs_map_block is replaced by extending the time the semaphore is held, that has the same blocking effect on writes as the previous custom scheme that waited until ::blocking_readers was zero. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
This is the first part of removing the custom locking and waiting scheme used for device replace. It was probably copied from extent buffer locking, but there's nothing that would require more than is provided by the common locking primitives. The rw spinlock protects waiting tasks counter in case of incompatible locks and the waitqueue. Same as rw semaphore. This patch only switches the locking primitive, for better bisectability. There should be no functional change other than the overhead of the locking and potential sleeping instead of spinning when the lock is contended. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The first auto-assigned value to enum is 0, we can use that and not initialize all members where the auto-increment does the same. This is used for values that are not part of on-disk format. Reviewed-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
We can use simple enum for values that are not part of on-disk format: root tree flags. Reviewed-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
We can use simple enum for values that are not part of on-disk format: internal filesystem states. Reviewed-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
We can use simple enum for values that are not part of on-disk format: block reserve types. Reviewed-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
We can use simple enum for values that are not part of on-disk format: global filesystem states. Reviewed-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
This function really checks whether adding more data to the bio will straddle a stripe/chunk. So first let's give it a more appropraite name - btrfs_bio_fits_in_stripe. Secondly, the offset parameter was never used to just remove it. Thirdly, pages are submitted to either btree or data inodes so it's guaranteed that tree->ops is set so replace the check with an ASSERT. Finally, document the parameters of the function. No functional changes. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Currently btrfs_fs_info structure contains a copy of the fsid/metadata_uuid fields. Same values are also contained in the btrfs_fs_devices structure which fs_info has a reference to. Let's reduce duplication by removing the fields from fs_info and always refer to the ones in fs_devices. No functional changes. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
This field is going to be used when the user wants to change the UUID of the filesystem without having to rewrite all metadata blocks. This field adds another level of indirection such that when the FSID is changed what really happens is the current UUID (the one with which the fs was created) is copied to the 'metadata_uuid' field in the superblock as well as a new incompat flag is set METADATA_UUID. When the kernel detects this flag is set it knows that the superblock in fact has 2 UUIDs: 1. Is the UUID which is user-visible, currently known as FSID. 2. Metadata UUID - this is the UUID which is stamped into all on-disk datastructures belonging to this file system. When the new incompat flag is present device scanning checks whether both fsid/metadata_uuid of the scanned device match any of the registered filesystems. When the flag is not set then both UUIDs are equal and only the FSID is retained on disk, metadata_uuid is set only in-memory during mount. Additionally a new metadata_uuid field is also added to the fs_info struct. It's initialised either with the FSID in case METADATA_UUID incompat flag is not set or with the metdata_uuid of the superblock otherwise. This commit introduces the new fields as well as the new incompat flag and switches all users of the fsid to the new logic. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> [ minor updates in comments ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Johannes Thumshirn 提交于
Depending on whether CONFIG_BTRFS_FS_RUN_SANITY_TESTS is set, some BTRFS functions are either local to the file they are implemented in and thus should be declared static or are called from within the test implementation defined in a different file. Introduce an EXPORT_FOR_TESTS macro which depending on CONFIG_BTRFS_FS_RUN_SANITY_TESTS either adds the 'static' keyword to a function or not. Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Ethan Lien 提交于
Snapshot is expected to be fast. But if there are writers steadily creating dirty pages in our subvolume, the snapshot may take a very long time to complete. To fix the problem, we use tagged writepage for snapshot flusher as we do in the generic write_cache_pages(), so we can omit pages dirtied after the snapshot command. This does not change the semantics regarding which data get to the snapshot, if there are pages being dirtied during the snapshotting operation. There's a sync called before snapshot is taken in old/new case, any IO in flight just after that may be in the snapshot but this depends on other system effects that might still sync the IO. We do a simple snapshot speed test on a Intel D-1531 box: fio --ioengine=libaio --iodepth=32 --bs=4k --rw=write --size=64G --direct=0 --thread=1 --numjobs=1 --time_based --runtime=120 --filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5; time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio original: 1m58sec patched: 6.54sec This is the best case for this patch since for a sequential write case, we omit nearly all pages dirtied after the snapshot command. For a multi writers, random write test: fio --ioengine=libaio --iodepth=32 --bs=4k --rw=randwrite --size=64G --direct=0 --thread=1 --numjobs=4 --time_based --runtime=120 --filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5; time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio original: 15.83sec patched: 10.35sec The improvement is smaller compared to the sequential write case, since we omit only half of the pages dirtied after snapshot command. Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NEthan Lien <ethanlien@synology.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
This parameter was never used, yet was part of the interface of the function ever since its introduction as extent_io_ops::writepage_end_io_hook in e6dcd2dc ("Btrfs: New data=ordered implementation"). Now that NULL is passed everywhere as a value for this parameter let's remove it for good. No functional changes. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Omar Sandoval 提交于
A later patch will implement swap file support for Btrfs, but before we do that, we need to make sure that the various Btrfs ioctls cannot change a swap file. When a swap file is active, we must make sure that the extents of the file are not moved and that they don't become shared. That means that the following are not safe: - chattr +c (enable compression) - reflink - dedupe - snapshot - defrag Don't allow those to happen on an active swap file. Additionally, balance, resize, device remove, and device replace are also unsafe if they affect an active swapfile. Add a red-black tree of block groups and devices which contain an active swapfile. Relocation checks each block group against this tree and skips it or errors out for balance or resize, respectively. Device remove and device replace check the tree for the device they will operate on. Note that we don't have to worry about chattr -C (disable nocow), which we ignore for non-empty files, because an active swapfile must be non-empty and can't be truncated. We also don't have to worry about autodefrag because it's only done on COW files. Truncate and fallocate are already taken care of by the generic code. Device add doesn't do relocation so it's not an issue, either. Signed-off-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
This is the counterpart to merge_extent_hook, similarly, it's used only for data/freespace inodes so let's remove it, rename it and call it directly where necessary. No functional changes. Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-