- 18 5月, 2013 1 次提交
-
-
由 Chris Mason 提交于
Btrfs has been pointer tagging bi_private and using bi_bdev to store the stripe index and mirror number of failed IOs. As bios bubble back up through the call chain, we use these to decide if and how to retry our IOs. They are also used to count IO failures on a per device basis. Recently a bio tracepoint was added lead to crashes because we were abusing bi_bdev. This commit adds a btrfs bioset, and creates explicit fields for the mirror number and stripe index. The plan is to extend this structure for all of the fields currently in struct btrfs_bio, which will mean one less kmalloc in our IO path. Signed-off-by: NChris Mason <chris.mason@fusionio.com> Reported-by: NTejun Heo <tj@kernel.org>
-
- 07 5月, 2013 39 次提交
-
-
由 Chris Mason 提交于
We've added new checks to make sure the super block crc is correct during mount. A fresh filesystem from an older mkfs won't have the crc set. This adds a warning when it finds a newly created filesystem but doesn't fail the mount. Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 David Sterba 提交于
The superblock checksum is not verified upon mount. <awkward silence> Add that check and also reorder existing checks to a more logical order. Current mkfs.btrfs does not calculate the correct checksum of super_block and thus a freshly created filesytem will fail to mount when this patch is applied. First transaction commit calculates correct superblock checksum and saves it to disk. Reproducer: $ mfks.btrfs /dev/sda $ mount /dev/sda /mnt $ btrfs scrub start /mnt $ sleep 5 $ btrfs scrub status /mnt ... super:2 ... Signed-off-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 David Sterba 提交于
The variable was named 'data' in btrfs_reserve_extent and that's the only function that actually uses it to let btrfs_get_alloc_profile know what profile we want. Then it's passed down as u64 flags. Signed-off-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 David Sterba 提交于
Signed-off-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Liu Bo 提交于
1) Right now scrub_stripe() is looping in some unnecessary cases: * when the found extent item's objectid has been out of the dev extent's range but we haven't finish scanning all the range within the dev extent * when all the items has been processed but we haven't finish scanning all the range within the dev extent In both cases, we can just finish the loop to save costs. 2) Besides, when the found extent item's length is larger than the stripe len(64k), we don't have to release the path and search again as it'll get at the same key used in the last loop, we can instead increase the logical cursor in place till all space of the extent is scanned. 3) And we use 0 as the key's offset to search btree, then get to previous item to find a smaller item, and again have to move to the next one to get the right item. Setting offset=-1 and previous_item() is the correct way. 4) As we won't find any checksum at offset unless this 'offset' is in a data extent, we can just find checksum when we're really going to scrub an extent. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 David Sterba 提交于
There's a theoretical possibility of reading stale (or even more theoretically, freed) data from DEV_INFO ioctl when the device would disappear between an early mutex unlock and data being copied from the device structure. Signed-off-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 David Sterba 提交于
It's unused since 0b32f4bb. Signed-off-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 David Sterba 提交于
Signed-off-by: NDavid Sterba <dsterba@suse.cz> Reviewed-by: NZach Brown <zab@redhat.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Eric Sandeen 提交于
Big patch, but all it does is add statics to functions which are in fact static, then remove the associated dead-code fallout. removed functions: btrfs_iref_to_path() __btrfs_lookup_delayed_deletion_item() __btrfs_search_delayed_insertion_item() __btrfs_search_delayed_deletion_item() find_eb_for_page() btrfs_find_block_group() range_straddles_pages() extent_range_uptodate() btrfs_file_extent_length() btrfs_scrub_cancel_devid() btrfs_start_transaction_lflush() btrfs_print_tree() is left because it is used for debugging. btrfs_start_transaction_lflush() and btrfs_reada_detach() are left for symmetry. ulist.c functions are left, another patch will take care of those. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
If you try to mount -o loop a restored file system it will panic if the file ends up being smaller than the original disk. This is because we go to try and get a block for a super that may be past the EOF which makes __getblk return NULL for a buffer head when we aren't expecting it to. Fix this by dealing with this case and just jacking up the errors count. With this patch we no longer panic when mounting a restored file system loopback. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
There were a whole bunch and I was doing it for other things. I haven't tested these error paths but at the very least this is better than panicing. I've only left 2 BUG_ON()'s since they are logic errors and I want to replace them with a ASSERT framework that we can compile out for production users. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
So everybody who got hit by my fsync bug will still continue to hit this BUG_ON() in the free space cache, which is pretty heavy handed. So I took a file system that had this bug and fixed up all the BUG_ON()'s and leaks that popped up when I tried to mount a broken file system like this. With this patch we just fail to mount instead of panicing. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Jan Schmidt 提交于
When qgroup tracking is enabled, we do an automatic cycle of the new rescan mechanism. Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Jan Schmidt 提交于
If qgroup tracking is out of sync, a rescan operation can be started. It iterates the complete extent tree and recalculates all qgroup tracking data. This is an expensive operation and should not be used unless required. A filesystem under rescan can still be umounted. The rescan continues on the next mount. Status information is provided with a separate ioctl while a rescan operation is in progress. Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Jan Schmidt 提交于
The function is separated into a preparation part and the three accounting steps mentioned in the qgroups documentation. The goal is to make steps two and three usable by the rescan functionality. A side effect is that the function is restructured into readable subunits. Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Miao Xie 提交于
When running the 208th of xfstests, the fs returned the enospc error when there was lots of free space in the disk. By bisect debug, we found it was introduced by commit 96f1bb57. This commit makes the space check for the global reservation in can_overcommit() be inconsistent with should_alloc_chunk(). can_overcommit() requires that the free space is 2 times the size of the global reservation, or we can't do overcommit. And instead, we need reclaim some reserved space, and if we still don't have enough free space, we need allocate a new chunk. But unfortunately, should_alloc_chunk() just requires that the free space is 1 time the size of the global reservation, that is we would not try to allocate a new chunk if the free space size is in the middle of these two requires, and just return the enospc error. Fix it. Cc: Jim Schutt <jaschut@sandia.gov> Cc: Josef Bacik <jbacik@fusionio.com> Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Jan Schmidt 提交于
Sequence numbers for delayed refs have been introduced in the first version of the qgroup patch set. To solve the problem of find_all_roots on a busy file system, the tree mod log was introduced. The sequence numbers for that were simply shared between those two users. However, at one point in qgroup's quota accounting, there's a statement accessing the previous sequence number, that's still just doing (seq - 1) just as it would have to in the very first version. To satisfy that requirement, this patch makes the sequence number counter 64 bit and splits it into a major part (used for qgroup sequence number counting) and a minor part (incremented for each tree modification in the log). This enables us to go exactly one major step backwards, as required for qgroups, while still incrementing the sequence counter for tree mod log insertions to keep track of their order. Keeping them in a single variable means there's no need to change all the code dealing with comparisons of two sequence numbers. The sequence number is reset to 0 on commit (not new in this patch), which ensures we won't overflow the two 32 bit counters. Without this fix, the qgroup tracking can occasionally go wrong and WARN_ONs from the tree mod log code may happen. Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Eric Sandeen 提交于
Clean up the leak debugging in extent_io.c by moving the debug code into functions. This also removes the list_heads used for debugging from the extent_buffer and extent_state structures when debug is not enabled. Since we need a global debug config to do that last part, implement CONFIG_BTRFS_DEBUG to accommodate. Thanks to Dave Sterba for the Kconfig bit. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Liu Bo 提交于
Replace some BUG_ONs with proper handling and take allocated space back to free space cache for later use. We don't have to worry about extent maps since they'd be freed in releasepage path. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Stefan Behrens 提交于
It is a rare exception that a new tree is created, like the qgroups tree. So far these new trees have an all-zero UUID in their root items. All trees that mkfs.btrfs has created get an UUID during the first mount when btrfs_read_root_item() rewrites the root_item to the v2 structure style. These UUID are never used so far, but anyway, since it is better to have it uniform for all trees, this commit adds some lines that generate and write an UUID for newly created trees. Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Stefan Behrens 提交于
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Tsutomu Itoh 提交于
fget() returns NULL if error. So, we should check NULL or not. Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Tsutomu Itoh 提交于
Variable 'p' is not used any more. So, remove it. Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
I have a broken file system that when it aborts leaves all sorts of accounting things wrong and gives you lots of WARN_ON()'s other than the abort. This is because we're not cleaning up various parts of the file system when we abort. The first chunks are specific to mount failures, we weren't cleaning up the block group cached inodes and we weren't cleaning up any transactions that had been aborted, which leaves a bunch of things laying around. The second half of this are related to the cleanup parts. First we don't need to release space for the dirty pages from the trans_block_rsv, that's all handled by the trans handles so this is just plain wrong. The other thing is we need to pin down extents that were set ->must_insert_reserved for delayed refs. This isn't so much for the pinning but more for the cleaning up the cache->reserved counter since we are no longer going to use those reserved bytes. With this patch I no longer see a bunch of WARN_ON()'s when I try to mount this broken file system, just the initial one from the abort. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
We can just look up the extent_buffers for the range and free stuff that way. This makes the cleanup a bit cleaner and we can make sure to evict the extent_buffers pretty quickly by marking them as stale. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
We need to check the return value of the commit in case something goes wrong, otherwise we could end up going down the line and doing more stuff (like orphan cleanup) before we notice we should have errored out. We need to do this before we free up the log_tree_root since the caller will handle all of that. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
This is just obnoxious. Just print a message, abort the transaction, and return an error. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
We can run the tree logging recovery or the orphan cleanup on mount, so we'll end up looking up a random fs tree in the meantime. So we need to clean this up so we don't leave extent buffers hanging around on the cache. With this patch we no longer leak extent buffers on failure to mount. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
This is the same as the fix from commit Btrfs: fix bad extent logging but for O_DIRECT. I missed this when I fixed the problem originally, we were still using the em for the orig_start and orig_block_len, which would be the merged extent. We need to use the actual extent from the on disk file extent item, which we have to lookup to make sure it's ok to nocow anyway so just pass in some pointers to hold this info. Thanks, Cc: stable@vger.kernel.org Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
We kept leaking extent buffers when mounting a broken file system and it turns out it's because not everybody uses read_tree_block properly. You need to check and make sure the extent_buffer is uptodate before you use it. This patch fixes everybody who calls read_tree_block directly to make sure they check that it is uptodate and free it and return an error if it is not. With this we no longer leak EB's when things go horribly wrong. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
If we fail to load block groups halfway through we can leave extent_state's on the excluded tree. This is because we just lookup the supers and add them to the excluded tree regardless of which block group we are looking at currently. This is a problem because we remove the excluded extents for the range of the block group only, so if we don't ever load a block group for one of the excluded extents we won't ever free it. This fixes the problem by only adding excluded extents if it falls in the block group range we care about. With this patch we're no longer leaking space when we fail to read all of the block groups. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
With a users corrupted fs I was getting weird behavior and panics and it turns out it was because one of his tree blocks had a bogus header level. So add this to the sanity checks in the endio handler for tree blocks. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
This work is done by btrfs_free_path() anyway so there's no need for this duplicate work. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
A user sent me a btrfs-image that was panicing because of some corruption. This is because we pass in a bogus value to btrfs_num_copies, and it panics. Instead just return 1. We only call btrfs_num_copies to see if there are other copies to try and read for things, so if we just return 1 it will make the callers exit out with an appropriate error value. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
Martin Steigerwald reported a BUG_ON() where we were given a bogus bytenr to map. Turns out he is using > PAGESIZE leafsizes. The readahead stuff is called every time we do a completion, but we may not have finished reading in all the pages, so the bytenr we read off the node could be completely bogus. Fix this by only calling the readahead hook once all pages have been read in. Thanks, Reported-by: NMartin Steigerwald <Martin@lichtvoll.de> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
Martin Steigerwald reported a BUG_ON() in btrfs_map_block where we didn't find a chunk for a particular block we were trying to map. This happened because the block was bogus. We shouldn't be BUG_ON()'ing in this case, just print a message and return an error. This came from reada_add_block and it appears to deal with an error fine so we should be good there. Thanks, Reported-by: NMartin Steigerwald <Martin@lichtvoll.de> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
We need to tag metadata io with REQ_META to avoid priority inversion when using io throttling cqroups. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
So I noticed there is an infinite loop in the slow caching code. If we return 1 when we hit the end of the tree, so we could end up caching the last block group the slow way and suddenly we're looping forever because we just keep re-searching and trying again. Fix this by only doing btrfs_next_leaf() if we don't need_resched(). Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
The locking order for stuff is __sb_start_write ordered_mutex but with sync() we don't do __sb_start_write for some strange reason, which means that our iput in wait_ordered_extents could start a transaction which does the __sb_start_write while we're holding the ordered_mutex. Fix this by using delayed iput in sync. Thanks, Reported-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-