- 02 7月, 2019 4 次提交
-
-
由 David Sterba 提交于
The write_locks is either 0 or 1 and always updated under the lock, so we don't need the atomic_t semantics. Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The spinning_writers is either 0 or 1 and always updated under the lock, so we don't need the atomic_t semantics. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The blocking_writers is either 0 or 1 and always updated under the lock, so we don't need the atomic_t semantics. Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
Turn the comment about required lock into an assertion. Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 01 7月, 2019 36 次提交
-
-
由 David Sterba 提交于
There are no concerns about locking during the selftests so the locks are not necessary, but following patches will add lockdep assertions to add_extent_mapping so this is needed in tests too. Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
The function has a lot of return values and specific conventions making it cumbersome to understand what's returned. Have a go at documenting its parameters and return values. Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Currently find_first_clear_extent_bit always returns a range whose starting value is >= passed 'start'. This implicit trimming behavior is somewhat subtle and an implementation detail. Instead, this patch modifies the function such that now it always returns the range which contains passed 'start' and has the given bits unset. This range could either be due to presence of existing records which contains 'start' but have the bits unset or because there are no records that contain the given starting offset. This patch also adds test cases which cover find_first_clear_extent_bit since they were missing up until now. Reviewed-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Currently the first megabyte on a device housing a btrfs filesystem is exempt from allocation and trimming. Currently this is not a problem since 'start' is set to 1M at the beginning of btrfs_trim_free_extents and find_first_clear_extent_bit always returns a range that is >= start. However, in a follow up patch find_first_clear_extent_bit will be changed such that it will return a range containing 'start' and this range may very well be 0...>=1M so 'start'. Future proof the sole user of find_first_clear_extent_bit by setting 'start' after the function is called. No functional changes. Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The filename:line format is commonly understood by editors and can be copy&pasted more easily than the current format. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Johannes Thumshirn 提交于
btrfs_print_data_csum_error() still assumed checksums to be 32 bit in size. Make it size agnostic. Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Johannes Thumshirn 提交于
Currently btrfs_csum_data() relied on the crc32c() wrapper around the crypto framework for calculating the CRCs. As we have our own crypto_shash structure in the fs_info now, we can directly call into the crypto framework without going trough the wrapper. This way we can even remove the btrfs_csum_data() and btrfs_csum_final() wrappers. The module dependency on crc32c is preserved via MODULE_SOFTDEP("pre: crc32c"), which was previously provided by LIBCRC32C config option doing the same. Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Johannes Thumshirn 提交于
Add boilerplate code for directly including the crypto framework. This helps us flipping the switch for new algorithms. Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Johannes Thumshirn 提交于
Now that we have already checked for a valid checksum type before calling btrfs_check_super_csum(), it can be simplified even further. While at it get rid of the implicit size assumption of the resulting checksum as well. This is a preparation for changing all checksum functionality to use the crypto layer later. Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Johannes Thumshirn 提交于
Now that we have factorerd out the superblock checksum type validation, we can check for supported superblock checksum types before doing the actual validation of the superblock read from disk. This leads the path to further simplifications of btrfs_check_super_csum() later on. Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NDavid Sterba <dsterba@suse.com> [ add comment ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Johannes Thumshirn 提交于
Currently btrfs is only supporting CRC32C as checksumming algorithm. As this is about to change provide a function to validate the checksum type in the superblock against all possible algorithms. This makes adding new algorithms easier as there are fewer places to adjust when adding new algorithms. Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Johannes Thumshirn 提交于
Add a small helper for btrfs_print_data_csum_error() which formats the checksum according to it's type for pretty printing. Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NDavid Sterba <dsterba@suse.com> [ shorten macro name ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Johannes Thumshirn 提交于
BTRFS has the implicit assumption that a checksum in compressed_bio is 4 bytes. While this is true for CRC32C, it is not for any other checksum. Change the data type to be a byte array and adjust loop index calculation accordingly. Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Johannes Thumshirn 提交于
BTRFS has the implicit assumption that a checksum in btrfs_orderd_sums is 4 bytes. While this is true for CRC32C, it is not for any other checksum. Change the data type to be a byte array and adjust loop index calculation accordingly. This includes moving the adjustment of 'index' by 'ins_size' in btrfs_csum_file_blocks() before dividing 'ins_size' by the checksum size, because before this patch the 'sums' member of 'struct btrfs_ordered_sum' was 4 Bytes in size and afterwards it is only one byte. Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Johannes Thumshirn 提交于
The CRC checksum in the free space cache is not dependant on the super block's csum_type field but always a CRC32C. So use btrfs_crc32c() and btrfs_crc32c_final() instead of btrfs_csum_data() and btrfs_csum_final() for computing these checksums. Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Johannes Thumshirn 提交于
Commit 9678c543 ("btrfs: Remove custom crc32c init code") removed the btrfs_crc32c() function, because it was a duplicate of the crc32c() library function we already have in the kernel. Resurrect it as a shim wrapper over crc32c() to make following transformations of the checksumming code in btrfs easier. Also provide a btrfs_crc32_final() to ease following transformations. Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Johannes Thumshirn 提交于
btrfsic_test_for_metadata() directly calls the crc32c() library function for calculating the CRC32C checksum, but then uses btrfs_csum_final() to invert the result. To ease further refactoring and development around checksumming in BTRFS convert to calling btrfs_csum_data(), which is a wrapper around crc32c(). Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
btrfs: Flush before reflinking any extent to prevent NOCOW write falling back to COW without data reservation [BUG] The following script can cause unexpected fsync failure: #!/bin/bash dev=/dev/test/test mnt=/mnt/btrfs mkfs.btrfs -f $dev -b 512M > /dev/null mount $dev $mnt -o nospace_cache # Prealloc one extent xfs_io -f -c "falloc 8k 64m" $mnt/file1 # Fill the remaining data space xfs_io -f -c "pwrite 0 -b 4k 512M" $mnt/padding sync # Write into the prealloc extent xfs_io -c "pwrite 1m 16m" $mnt/file1 # Reflink then fsync, fsync would fail due to ENOSPC xfs_io -c "reflink $mnt/file1 8k 0 4k" -c "fsync" $mnt/file1 umount $dev The fsync fails with ENOSPC, and the last page of the buffered write is lost. [CAUSE] This is caused by: - Btrfs' back reference only has extent level granularity So write into shared extent must be COWed even only part of the extent is shared. So for above script we have: - fallocate Create a preallocated extent where we can do NOCOW write. - fill all the remaining data and unallocated space - buffered write into preallocated space As we have not enough space available for data and the extent is not shared (yet) we fall into NOCOW mode. - reflink Now part of the large preallocated extent is shared, later write into that extent must be COWed. - fsync triggers writeback But now the extent is shared and therefore we must fallback into COW mode, which fails with ENOSPC since there's not enough space to allocate data extents. [WORKAROUND] The workaround is to ensure any buffered write in the related extents (not just the reflink source range) get flushed before reflink/dedupe, so that NOCOW writes succeed that happened before reflinking succeed. The workaround is expensive, we could do it better by only flushing NOCOW range, but that needs extra accounting for NOCOW range. For now, fix the possible data loss first. Reviewed-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
The first thing code does in check_can_nocow is trying to block concurrent snapshots. If this fails (due to snpashot already being in progress) the function returns ENOSPC which makes no sense. Instead return EAGAIN. Despite this return value not being propagated to callers it's good practice to return the closest in terms of semantics error code. No functional changes. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
In case no cached_state argument is passed to btrfs_lock_and_flush_ordered_range use one locally in the function. This optimises the case when an ordered extent is found since the unlock function will be able to unlock that state directly without searching for it again. Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
There several functions which open code btrfs_lock_and_flush_ordered_range, just replace them with a call to the function. No functional changes. Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
There is a certain idiom used in multiple places in btrfs' codebase, dealing with flushing an ordered range. Factor this in a separate function that can be reused. Future patches will replace the existing code with that function. Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
At the context of btrfs_run_delalloc_range(), we haven't started/joined a transaction, thus even something went wrong, we can't and won't abort transaction, thus no way to make the fs RO. Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
Add trace event for update_bytes_pinned() and update_bytes_may_use() to detect underflow better. The output would be something like (only showing data part): ## Buffered write start, 16K total ## 2255.954 xfs_io/860 btrfs:update_bytes_may_use:(nil)U: type=DATA old=0 diff=4096 2257.169 sudo/860 btrfs:update_bytes_may_use:(nil)U: type=DATA old=4096 diff=4096 2257.346 sudo/860 btrfs:update_bytes_may_use:(nil)U: type=DATA old=8192 diff=4096 2257.542 sudo/860 btrfs:update_bytes_may_use:(nil)U: type=DATA old=12288 diff=4096 ## Delalloc start ## 3727.853 kworker/u8:3-e/700 btrfs:update_bytes_may_use:(nil)U: type=DATA old=16384 diff=-16384 ## Space cache update ## 3733.132 sudo/862 btrfs:update_bytes_may_use:(nil)U: type=DATA old=0 diff=65536 3733.169 sudo/862 btrfs:update_bytes_may_use:(nil)U: type=DATA old=65536 diff=-65536 3739.868 sudo/862 btrfs:update_bytes_may_use:(nil)U: type=DATA old=0 diff=65536 3739.891 sudo/862 btrfs:update_bytes_may_use:(nil)U: type=DATA old=65536 diff=-65536 These two trace events will allow bcc tool to probe btrfs_space_info changes and detect underflow with more details (e.g. backtrace for each update). Signed-off-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
Just add a safe net for btrfs_space_info member updating. Signed-off-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
There are several places that call nr_data_stripes, but this value does not change. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The helper lacks the btrfs_ prefix and the parameter is the raw blockgroup type, so none of the callers has to do the flags -> index conversion. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
Merge the repeated code before the if-else block. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The raid_attr table is now 7 * 56 = 392 bytes long, consisting of just small numbers so we don't have to use ints. New size is 7 * 32 = 224, saving 3 cachelines. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
Factor the sequence of ifs to a helper, the 'data stripes' here means the number of stripes without redundancy and parity. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The factor is the number of copies. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
Replace open coded list of the profiles by selecting them from the raid_attr table. The criteria are now more explicit, we need profiles that have more than 1 copy of the data or can reconstruct the data with a missing device. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
Iterate over the table and gather all allowed profiles for a given number of devices, instead of open coding. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The number of tolerated failures is stored in the raid_attr table, use it. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-