- 02 11月, 2017 5 次提交
-
-
由 Zygo Blaxell 提交于
The LOGICAL_INO ioctl provides a backward mapping from extent bytenr and offset (encoded as a single logical address) to a list of extent refs. LOGICAL_INO complements TREE_SEARCH, which provides the forward mapping (extent ref -> extent bytenr and offset, or logical address). These are useful capabilities for programs that manipulate extents and extent references from userspace (e.g. dedup and defrag utilities). When the extents are uncompressed (and not encrypted and not other), check_extent_in_eb performs filtering of the extent refs to remove any extent refs which do not contain the same extent offset as the 'logical' parameter's extent offset. This prevents LOGICAL_INO from returning references to more than a single block. To find the set of extent references to an uncompressed extent from [a, b), userspace has to run a loop like this pseudocode: for (i = a; i < b; ++i) extent_ref_set += LOGICAL_INO(i); At each iteration of the loop (up to 32768 iterations for a 128M extent), data we are interested in is collected in the kernel, then deleted by the filter in check_extent_in_eb. When the extents are compressed (or encrypted or other), the 'logical' parameter must be an extent bytenr (the 'a' parameter in the loop). No filtering by extent offset is done (or possible?) so the result is the complete set of extent refs for the entire extent. This removes the need for the loop, since we get all the extent refs in one call. Add an 'ignore_offset' argument to iterate_inodes_from_logical, [...several levels of function call graph...], and check_extent_in_eb, so that we can disable the extent offset filtering for uncompressed extents. This flag can be set by an improved version of the LOGICAL_INO ioctl to get either behavior as desired. There is no functional change in this patch. The new flag is always false. Signed-off-by: NZygo Blaxell <ce3g8jdj@umail.furryterror.org> Reviewed-by: NDavid Sterba <dsterba@suse.com> [ minor coding style fixes ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
This code was first introduced in 31db9f7c ("Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive") and it was not functional, then it got slightly refactored in e938c8ad ("Btrfs: code cleanups for send/receive"), alas it was still dead. So let's remove it for good! Signed-off-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Anand Jain 提交于
That was only an extra check to tackle a few bugs around this area, now its safe to remove it. Replace it by an ASSERT. Signed-off-by: NAnand Jain <anand.jain@oracle.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Adam Borowski 提交于
This is bikeshedding, but it seems people are drastically more likely to understand "zlib:9" as compression level rather than an algorithm version compared to "zlib9". Based on feedback on the mailinglist, the ":9" will be the only accepted syntax. The level must be a single digit. Unrecognized format will result to the default, for forward compatibility in a similar way the compression algorithm specifier was relaxed in commit a7164fa4 ("btrfs: prepare for extensions in compression options"). Signed-off-by: NAdam Borowski <kilobyte@angband.pl> Reviewed-by: NDavid Sterba <dsterba@suse.com> [ tighten the accepted format ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
Preliminary support for setting compression level for zlib, the following works: $ mount -o compess=zlib # default $ mount -o compess=zlib0 # same $ mount -o compess=zlib9 # level 9, slower sync, less data $ mount -o compess=zlib1 # level 1, faster sync, more data $ mount -o remount,compress=zlib3 # level set by remount The compress-force works the same as compress'. The level is visible in the same format in /proc/mounts. Level set via file property does not work yet. Required patch: "btrfs: prepare for extensions in compression options" Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 30 10月, 2017 35 次提交
-
-
由 Nikolay Borisov 提交于
Currently btrfs' code uses a mix of opencoded sizes and defines from sizes.h. Let's unifiy the code base to always use the symbolic constants. No functional changes Signed-off-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Gu JinXiang 提交于
Fix missing change from commit f8f84b2d ("btrfs: index check-integrity state hash by a dev_t"). Function btrfsic_dev_state_hashtable_lookup uses dev_t to generate hashval when look in up a btrfsic_dev_state in hash table. So when we add a btrfsic_dev_state into the hash table, it should also use dev_t. Reproducer of this bug: Use MOUNT_OPTIONS="-o check_int" when running xfstest, device can not be mounted successfully. So xfstest can not run. Signed-off-by: NGu JinXiang <gujx@cn.fujitsu.com> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Anand Jain 提交于
When one of the device is missing, bbio_error() takes care of setting the error status. And if its only IO that is pending in that stripe, it fails to check the status of the other IO at %bbio_error before setting the error %bi_status for the %orig_bio. Fix this by checking if %bbio->error has exceeded the %bbio->max_errors. Reproducer as below fdatasync error is seen intermittently. mount -o degraded /dev/sdc /btrfs dd status=none if=/dev/zero of=$(mktemp /btrfs/XXX) bs=4096 count=1 conv=fdatasync dd: fdatasync failed for ‘/btrfs/LSe’: Input/output error The reason for the intermittences of the problem is because the following conditions have to be met, which depends on timing: In btrfs_map_bio() - the RAID1 the missing device has to be at %dev_nr = 1 In bbio_error() . before bbio_error() is called the bio of the not-missing device at %dev_nr = 0 must be completed so that the below condition is true if (atomic_dec_and_test(&bbio->stripes_pending)) { Signed-off-by: NAnand Jain <anand.jain@oracle.com> Reviewed-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Anand Jain 提交于
A cleanup patch, use need_full_stripe() to replace the open code. Signed-off-by: NAnand Jain <anand.jain@oracle.com> Reviewed-by: NQu Wenruo <quwenruo.btrfs@gmx.com> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Goldwyn Rodrigues 提交于
Code cleanup for better understanding: Variable needs_unlock to be called extent_locked to show state as opposed to action. Changed the type to int, to reduce code in the critical path. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Anand Jain 提交于
At few places we could use BLK_STS_OK and BLK_STS_NOSUPP. Signed-off-by: NAnand Jain <anand.jain@oracle.com> Reviewed-by: NSatoru Taekeuchi <satoru.takeuchi@gmail.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> [ dropped first hunk btrfs_endio_direct_read ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Josef Bacik 提交于
These are useful for debugging problems where we mess with trans->block_rsv to make sure we're not screwing something up. Signed-off-by: NJosef Bacik <jbacik@fb.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Josef Bacik 提交于
We can get this from the ref we've passed in. Signed-off-by: NJosef Bacik <jbacik@fb.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Josef Bacik 提交于
This is just excessive information in the ref_head, and makes the code complicated. It is a relic from when we had the heads and the refs in the same tree, which is no longer the case. With this removal I've cleaned up a bunch of the cruft around this old assumption as well. Signed-off-by: NJosef Bacik <jbacik@fb.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Josef Bacik 提交于
We do a couple different cleanup operations on the ref head. We adjust counters, we'll free any reserved space if we didn't end up using the ref, and we clear the pending csum bytes. Move all these disparate things into cleanup_ref_head and clean up the logic in __btrfs_run_delayed_refs so that it handles the !ref case a lot cleaner, as well as making run_one_delayed_ref() only deal with real refs and not the ref head. Signed-off-by: NJosef Bacik <jbacik@fb.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Josef Bacik 提交于
We only use this logic if our ref isn't a ref_head, so move it up into the if (ref) case since we know that this is a normal ref and not a delayed ref head. Signed-off-by: NJosef Bacik <jbacik@fb.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Josef Bacik 提交于
Move this code out to a helper function to further simplivy __btrfs_run_delayed_refs. Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Josef Bacik 提交于
Move the extent_op cleanup for an empty head ref to a helper function to help simplify __btrfs_run_delayed_refs. Signed-off-by: NJosef Bacik <jbacik@fb.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Josef Bacik 提交于
Simplify the error handling in __btrfs_run_delayed_refs by breaking out the code used to return a head back to the delayed_refs tree for processing into a helper function. Signed-off-by: NJosef Bacik <jbacik@fb.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Josef Bacik 提交于
We were only doing btrfs_check_space_for_delayed_refs() if the metadata space was full, ie we couldn't allocate chunks. This assumes we'll be able to allocate chunks during transaction commit, but since nothing does a LIMIT flush during the transaction commit this won't actually happen unless we happen to run shy of actual space. We already take into account a full fs in btrfs_check_space_for_delayed_refs() so just kill this extra check to make sure we're ending the transaction when we need to. Signed-off-by: NJosef Bacik <jbacik@fb.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Josef Bacik 提交于
We were having corruption issues that were tied back to problems with the extent tree. In order to track them down I built this tool to try and find the culprit, which was pretty successful. If you compile with this tool on it will live verify every ref update that the fs makes and make sure it is consistent and valid. I've run this through with xfstests and haven't gotten any false positives. Thanks, Signed-off-by: NJosef Bacik <jbacik@fb.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> [ update error messages, add fixup from Dan Carpenter to handle errors of read_tree_block ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Josef Bacik 提交于
We need the actual root for the ref verifier tool to work, so change these functions to pass the root around instead. This will be used in a subsequent patch. Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Josef Bacik 提交于
This adds the infrastructure for turning ref verify on and off for a mount, to be used by a later patch. Signed-off-by: NJosef Bacik <jbacik@fb.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> [ enhnance btrfs_print_mod_info to print if ref-verify is compiled in ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The use of sector_t in the callchain of submit_extent_page is not necessary. Switch to u64 and rename the variable and use byte units instead of 512b, ie. dropping the >> 9 shifts and avoiding the con(tro)versions of sector_t. Reviewed-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
We're going to remove sector_t and will use 'offset', so this patch frees the name. Reviewed-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The use of sector_t is not necessry, it's just for a warning. Switch to u64 and rename the variable and use byte units instead of 512b, ie. dropping the >> 9 shifts. The messages are adjusted as well. Reviewed-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Josef Bacik 提交于
We pass in a pointer in our send arg struct, this means the struct size doesn't match with 32bit user space and 64bit kernel space. Fix this by adding a compat mode and doing the appropriate conversion. Signed-off-by: NJosef Bacik <jbacik@fb.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> [ move structure to the beginning, next to receive 32bit compat ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Anand Jain 提交于
When device is missing without the -o degraded option then its an error so report it as an error instead of a warning. And when -o degraded option is provided, log the missing device as warning. Signed-off-by: NAnand Jain <anand.jain@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> [ switch error to bool ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Anand Jain 提交于
Signed-off-by: NAnand Jain <anand.jain@oracle.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Anand Jain 提交于
EIO is only for the IO failure to the device, avoid it. Use ENOENT as that's the closest error code describing what happened. Signed-off-by: NAnand Jain <anand.jain@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Anand Jain 提交于
add_missing_dev() can return device pointer so that IS_ERR/PTR_ERR can be used to check for the actual error that occurred in the function. Signed-off-by: NAnand Jain <anand.jain@oracle.com> Reviewed-by: NLiu Bo <bo.li.liu@oracle.com> [ minor error message adjustment ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Christos Gkekas 提交于
Remove variables 'start' and 'end', which are set but never used. Signed-off-by: NChristos Gkekas <chris.gekas@gmail.com> Reviewed-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Arnd Bergmann 提交于
We now get a harmless compile-time on 32-bit architectures: fs/btrfs/tree-checker.c: In function 'check_extent_data_item': fs/btrfs/tree-checker.c:189:70: error: format '%lu' expects argument of type 'long unsigned int', but argument 6 has type 'unsigned int' [-Werror=format=] This changes the format string to use %zu instead of %lu for size_t. Fixes: c1f6520bf360 ("btrfs: tree-checker: Enhance output for check_extent_data_item") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Liu Bo 提交于
Now that we have the combo of flushing twice, which can make sure IO have started since the second flush will wait for page lock which won't be unlocked unless setting page writeback and queuing ordered extents, we don't need %async_submit_draining, %async_delalloc_pages and %nr_async_submits to tell whether the IO has actually started. Moreover, all the flushers in use are followed by functions that wait for ordered extents to complete, so %nr_async_submits, which tracks whether bio's async submit has made progress, doesn't really make sense. However, %async_delalloc_pages is still required by shrink_delalloc() as that function doesn't flush twice in the normal case (just issues a writeback with WB_REASON_FS_FREE_SPACE). Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Liu Bo 提交于
By setting compression for a defrag task, the task will start IO at the end of defrag. After the combo of filemap_flush(), we've already made sure that dirty pages have made progress via async compress thread because the second filemap_flush() will wait for page lock, which won't be unlocked until those pages have been marked as writeback and ordered extents have been queued. And this is for per-inode defrag, it's not helpful to wait on a global %async_delalloc_pages and %nr_async_submits from fs_info. Although waiting on %nr_async_submits means that all bios are submitted down to per-device schedule IO lists, it doesn't wait for their completions, thus users still need to do fsync/sync to make sure the data is on disk. While with this change, it makes sure that pages are marked with writeback bits and will be submitted asynchronously shortly, therefore, the behavior of defrag option '-c' remains unchanged. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Liu Bo 提交于
This was intended to congest higher layers to not send bios, but as 1) the congested bit has been taken by writeback Async bios come from buffered writes and DIO writes. For DIO writes, we want to submit them ASAP, while for buffered writes, writeback uses balance_dirty_pages() to throttle how much dirty pages we can have. 2) and no one is waiting for %nr_async_bios down to zero, Historically, it was introduced along with changes which let checksumming workload spread accross different cpus. And at that time, pdflush was used instead of per-bdi flushing, perhaps pdflush did not have the necessary information for writeback to do throttling. We can safely remove them now. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> [ additional explanation from mails, removed unused variable 'limit' ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
Output the invalid member name and its bad value, along with its expected value range or alignment. Signed-off-by: NQu Wenruo <quwenruo.btrfs@gmx.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
Output the bad value and expected good value (or its alignment). Signed-off-by: NQu Wenruo <quwenruo.btrfs@gmx.com> [ unindent long strings ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
Enhance the output to print: 1) the eason 2) the ad value, if reason is not sufficient 3) good value (range) Signed-off-by: NQu Wenruo <quwenruo.btrfs@gmx.com> [ wording, unidented long strings ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
Use inline function to replace macro since we don't need stringification. (Macro still exists until all callers get updated) And add more info about the error, and replace EIO with EUCLEAN. For nr_items error, report if it's too large or too small, and output the valid value range. For node block pointer, added a new alignment checker. For key order, also output the next key to make the problem more obvious. Signed-off-by: NQu Wenruo <quwenruo.btrfs@gmx.com> [ wording adjustments, unindented long strings ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-