- 22 10月, 2015 34 次提交
-
-
由 Qu Wenruo 提交于
Add new version of btrfs_delalloc_reserve_space() and btrfs_delalloc_release_space() functions, which supports accurate qgroup reserve. Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Qu Wenruo 提交于
Use new reserve/free for buffered write and inode cache. For buffered write case, as nodatacow write won't increase quota account, so unlike old behavior which does reserve before check nocow, now we check nocow first and then only reserve data if we can't do nocow write. Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Qu Wenruo 提交于
btrfs: extent-tree: Add new version of btrfs_check_data_free_space and btrfs_free_reserved_data_space. Add new functions __btrfs_check_data_free_space() and __btrfs_free_reserved_data_space() to work with new accurate qgroup reserved space framework. The new function will replace old btrfs_check_data_free_space() and btrfs_free_reserved_data_space() respectively, but until all the change is done, let's just use the new name. Also, export internal use function btrfs_alloc_data_chunk_ondemand(), as now qgroup reserve requires precious bytes, some operation can't get the accurate number in advance(like fallocate). But data space info check and data chunk allocate doesn't need to be that accurate, and can be called at the beginning. So export it for later operations. Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Qu Wenruo 提交于
As we have the new metadata reservation functions, use them to replace the old btrfs_qgroup_reserve() call for metadata. Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Qu Wenruo 提交于
Introduce new functions btrfs_qgroup_reserve/free_meta() to reserve/free metadata reserved space. Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Qu Wenruo 提交于
Qgroup reserved space needs to be released from inode dirty map and get freed at different timing: 1) Release when the metadata is written into tree After corresponding metadata is written into tree, any newer write will be COWed(don't include NOCOW case yet). So we must release its range from inode dirty range map, or we will forget to reserve needed range, causing accounting exceeding the limit. 2) Free reserved bytes when delayed ref is run When delayed refs are run, qgroup accounting will follow soon and turn the reserved bytes into rfer/excl numbers. As run_delayed_refs and qgroup accounting are all done at commit_transaction() time, we are safe to free reserved space in run_delayed_ref time(). With these timing to release/free reserved space, we should be able to resolve the long existing qgroup reserve space leak problem. Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Qu Wenruo 提交于
Add new function btrfs_add_delayed_qgroup_reserve() function to record how much space is reserved for that extent. As btrfs only accounts qgroup at run_delayed_refs() time, so newly allocated extent should keep the reserved space until then. So add needed function with related members to do it. Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Qu Wenruo 提交于
space Introduce functions btrfs_qgroup_release/free_data() to release/free reserved data range. Release means, just remove the data range from io_tree, but doesn't free the reserved space. This is for normal buffered write case, when data is written into disc and its metadata is added into tree, its reserved space should still be kept until commit_trans(). So in that case, we only release dirty range, but keep the reserved space recorded some other place until commit_tran(). Free means not only remove data range, but also free reserved space. This is used for case for cleanup and invalidate page. Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Qu Wenruo 提交于
Introduce a new function, btrfs_qgroup_reserve_data(), which will use io_tree to accurate qgroup reserve, to avoid reserved space leaking. Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Qu Wenruo 提交于
Introduce new function clear_record_extent_bits(), which will clear bits for given range and record the details about which ranges are cleared and how many bytes in total it changes. This provides the basis for later qgroup reserve codes. Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Qu Wenruo 提交于
Introduce new function set_record_extent_bits(), which will not only set given bits, but also record how many bytes are changed, and detailed range info. This is quite important for later qgroup reserve framework. The number of bytes will be used to do qgroup reserve, and detailed range info will be used to cleanup for EQUOT case. Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Qu Wenruo 提交于
Add a new structure, extent_change_set, to record how many bytes are changed in one set/clear_extent_bits() operation, with detailed changed ranges info. This provides the needed facilities for later qgroup reserve framework. Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Chris Mason 提交于
Merge branch 'integration-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.4
-
由 Chris Mason 提交于
Merge branch 'cleanups/for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.4
-
由 Luis de Bethencourt 提交于
reada is using -1 instead of the -ENOMEM defined macro to specify that a buffer allocation failed. Since the error number is propagated, the caller will get a -EPERM which is the wrong error condition. Also, updating the caller to return the exact value from reada_add_block. Smatch tool warning: reada_add_block() warn: returning -1 instead of -ENOMEM is sloppy Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NLuis de Bethencourt <luisbg@osg.samsung.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Luis de Bethencourt 提交于
check-integrity is using -1 instead of the -ENOMEM defined macro to specify that a buffer allocation failed. Since the error number is propagated, the caller will get a -EPERM which is the wrong error condition. Also, the smatch tool complains with the following warnings: btrfsic_process_superblock() warn: returning -1 instead of -ENOMEM is sloppy btrfsic_read_block() warn: returning -1 instead of -ENOMEM is sloppy Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NLuis de Bethencourt <luisbg@osg.samsung.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Byongho Lee 提交于
Below variables are defined per compress type. - struct list_head comp_idle_workspace[BTRFS_COMPRESS_TYPES] - spinlock_t comp_workspace_lock[BTRFS_COMPRESS_TYPES] - int comp_num_workspace[BTRFS_COMPRESS_TYPES] - atomic_t comp_alloc_workspace[BTRFS_COMPRESS_TYPES] - wait_queue_head_t comp_workspace_wait[BTRFS_COMPRESS_TYPES] BTW, while accessing one compress type of these variables, the next or before address is other compress types of it. So this patch puts these variables in a struct to make cache friendly. Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NByongho Lee <bhlee.kernel@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Byongho Lee 提交于
This patch eliminates the last item of prop_handlers array which is used to check end of array and instead uses ARRAY_SIZE macro. Though this is a very tiny optimization, using ARRAY_SIZE macro is a good practice to iterate array. Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NByongho Lee <bhlee.kernel@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Geliang Tang 提交于
Just fix a typo in the code comment. Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NGeliang Tang <geliangtang@163.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Alexandru Moise 提交于
rsv_count ultimately gets passed to start_transaction() which now takes an unsigned int as its num_items parameter. The value of rsv_count should always be positive so declare it as being unsigned. Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NAlexandru Moise <00moses.alexander00@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Alexandru Moise 提交于
The value of num_items that start_transaction() ultimately always takes is a small one, so a 64 bit integer is overkill. Also change num_items for btrfs_start_transaction() and btrfs_start_transaction_lflush() as well. Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NAlexandru Moise <00moses.alexander00@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Alexandru Moise 提交于
Improve readability by generalizing the profile validity checks. Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NAlexandru Moise <00moses.alexander00@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Shan Hai 提交于
The commit b37392ea ("Btrfs: cleanup unnecessary parameter and variant of prepare_pages()") makes it redundant. Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NShan Hai <haishan.bai@hotmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Zhao Lei 提交于
btrfs_raid_array[] holds attributes of all raid types. Use btrfs_raid_array[].devs_min is best way for request in btrfs_reduce_alloc_profile(), instead of use complex condition of each raid types. Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Zhao Lei 提交于
btrfs_raid_array[] is used to define all raid attributes, use it to get tolerated_failures in btrfs_get_num_tolerated_disk_barrier_failures(), instead of complex condition in function. It can make code simple and auto-support other possible raid-type in future. Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Zhao Lei 提交于
This array is used to record attributes of each raid type, make it public, and many functions will benifit with this array. For example, num_tolerated_disk_barrier_failures(), we can avoid complex conditions in this function, and get raid attribute simply by accessing above array. It can also make code logic simple, reduce duplication code, and increase maintainability. Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Alexandru Moise 提交于
Rather than have three separate if() statements for the same outcome we should just OR them together in the same if() statement. Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NAlexandru Moise <00moses.alexander00@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Alexandru Moise 提交于
Use memset() to null out the btrfs_delayed_ref_root of btrfs_transaction instead of setting all the members to 0 by hand. Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NAlexandru Moise <00moses.alexander00@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Byongho Lee 提交于
We can safely iterate whole list items, without using list_del macro. So remove the list_del call. Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NByongho Lee <bhlee.kernel@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Byongho Lee 提交于
There is no removing list element while iterating over list. So, replace list_for_each_entry_safe to list_for_each_entry. Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NByongho Lee <bhlee.kernel@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Alexandru Moise 提交于
Just call kmem_cache_zalloc() instead of calling kmem_cache_alloc(). We're just initializing most fields to 0, false and NULL later on _anyway_, so to make the code mode readable and potentially gain a bit of performance (completely untested claim), we should fill our btrfs_trans_handle with zeros on allocation then just initialize those five remaining fields (not counting the list_heads) as normal. Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NAlexandru Moise <00moses.alexander00@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Alexandru Moise 提交于
old_len is used to store the return value of btrfs_item_size_nr(). The return value of btrfs_item_size_nr() is of type u32. To improve code correctness and avoid mixing signed and unsigned integers I've changed old_len to be of type u32 as well. Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NAlexandru Moise <00moses.alexander00@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Alexandru Moise 提交于
The return values of btrfs_item_offset_nr and btrfs_item_size_nr are of type u32. To avoid mixing signed and unsigned integers we should also declare dsize and last_off to be of type u32. Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NAlexandru Moise <00moses.alexander00@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Chandan Rajendra 提交于
btrfs_submit_bio_hook() uses integer constants instead of values from "enum btrfs_wq_endio_type". Fix this. Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NChandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 17 10月, 2015 1 次提交
-
-
由 Filipe Manana 提交于
When truncating a file to a smaller size which consists of an inline extent that is compressed, we did not discard (or made unusable) the data between the new file size and the old file size, wasting metadata space and allowing for the truncated data to be leaked and the data corruption/loss mentioned below. We were also not correctly decrementing the number of bytes used by the inode, we were setting it to zero, giving a wrong report for callers of the stat(2) syscall. The fsck tool also reported an error about a mismatch between the nbytes of the file versus the real space used by the file. Now because we weren't discarding the truncated region of the file, it was possible for a caller of the clone ioctl to actually read the data that was truncated, allowing for a security breach without requiring root access to the system, using only standard filesystem operations. The scenario is the following: 1) User A creates a file which consists of an inline and compressed extent with a size of 2000 bytes - the file is not accessible to any other users (no read, write or execution permission for anyone else); 2) The user truncates the file to a size of 1000 bytes; 3) User A makes the file world readable; 4) User B creates a file consisting of an inline extent of 2000 bytes; 5) User B issues a clone operation from user A's file into its own file (using a length argument of 0, clone the whole range); 6) User B now gets to see the 1000 bytes that user A truncated from its file before it made its file world readbale. User B also lost the bytes in the range [1000, 2000[ bytes from its own file, but that might be ok if his/her intention was reading stale data from user A that was never supposed to be public. Note that this contrasts with the case where we truncate a file from 2000 bytes to 1000 bytes and then truncate it back from 1000 to 2000 bytes. In this case reading any byte from the range [1000, 2000[ will return a value of 0x00, instead of the original data. This problem exists since the clone ioctl was added and happens both with and without my recent data loss and file corruption fixes for the clone ioctl (patch "Btrfs: fix file corruption and data loss after cloning inline extents"). So fix this by truncating the compressed inline extents as we do for the non-compressed case, which involves decompressing, if the data isn't already in the page cache, compressing the truncated version of the extent, writing the compressed content into the inline extent and then truncate it. The following test case for fstests reproduces the problem. In order for the test to pass both this fix and my previous fix for the clone ioctl that forbids cloning a smaller inline extent into a larger one, which is titled "Btrfs: fix file corruption and data loss after cloning inline extents", are needed. Without that other fix the test fails in a different way that does not leak the truncated data, instead part of destination file gets replaced with zeroes (because the destination file has a larger inline extent than the source). seq=`basename $0` seqres=$RESULT_DIR/$seq echo "QA output created by $seq" tmp=/tmp/$$ status=1 # failure is the default! trap "_cleanup; exit \$status" 0 1 2 3 15 _cleanup() { rm -f $tmp.* } # get standard environment, filters and checks . ./common/rc . ./common/filter # real QA test starts here _need_to_be_root _supported_fs btrfs _supported_os Linux _require_scratch _require_cloner rm -f $seqres.full _scratch_mkfs >>$seqres.full 2>&1 _scratch_mount "-o compress" # Create our test files. File foo is going to be the source of a clone operation # and consists of a single inline extent with an uncompressed size of 512 bytes, # while file bar consists of a single inline extent with an uncompressed size of # 256 bytes. For our test's purpose, it's important that file bar has an inline # extent with a size smaller than foo's inline extent. $XFS_IO_PROG -f -c "pwrite -S 0xa1 0 128" \ -c "pwrite -S 0x2a 128 384" \ $SCRATCH_MNT/foo | _filter_xfs_io $XFS_IO_PROG -f -c "pwrite -S 0xbb 0 256" $SCRATCH_MNT/bar | _filter_xfs_io # Now durably persist all metadata and data. We do this to make sure that we get # on disk an inline extent with a size of 512 bytes for file foo. sync # Now truncate our file foo to a smaller size. Because it consists of a # compressed and inline extent, btrfs did not shrink the inline extent to the # new size (if the extent was not compressed, btrfs would shrink it to 128 # bytes), it only updates the inode's i_size to 128 bytes. $XFS_IO_PROG -c "truncate 128" $SCRATCH_MNT/foo # Now clone foo's inline extent into bar. # This clone operation should fail with errno EOPNOTSUPP because the source # file consists only of an inline extent and the file's size is smaller than # the inline extent of the destination (128 bytes < 256 bytes). However the # clone ioctl was not prepared to deal with a file that has a size smaller # than the size of its inline extent (something that happens only for compressed # inline extents), resulting in copying the full inline extent from the source # file into the destination file. # # Note that btrfs' clone operation for inline extents consists of removing the # inline extent from the destination inode and copy the inline extent from the # source inode into the destination inode, meaning that if the destination # inode's inline extent is larger (N bytes) than the source inode's inline # extent (M bytes), some bytes (N - M bytes) will be lost from the destination # file. Btrfs could copy the source inline extent's data into the destination's # inline extent so that we would not lose any data, but that's currently not # done due to the complexity that would be needed to deal with such cases # (specially when one or both extents are compressed), returning EOPNOTSUPP, as # it's normally not a very common case to clone very small files (only case # where we get inline extents) and copying inline extents does not save any # space (unlike for normal, non-inlined extents). $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/foo $SCRATCH_MNT/bar # Now because the above clone operation used to succeed, and due to foo's inline # extent not being shinked by the truncate operation, our file bar got the whole # inline extent copied from foo, making us lose the last 128 bytes from bar # which got replaced by the bytes in range [128, 256[ from foo before foo was # truncated - in other words, data loss from bar and being able to read old and # stale data from foo that should not be possible to read anymore through normal # filesystem operations. Contrast with the case where we truncate a file from a # size N to a smaller size M, truncate it back to size N and then read the range # [M, N[, we should always get the value 0x00 for all the bytes in that range. # We expected the clone operation to fail with errno EOPNOTSUPP and therefore # not modify our file's bar data/metadata. So its content should be 256 bytes # long with all bytes having the value 0xbb. # # Without the btrfs bug fix, the clone operation succeeded and resulted in # leaking truncated data from foo, the bytes that belonged to its range # [128, 256[, and losing data from bar in that same range. So reading the # file gave us the following content: # # 0000000 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 # * # 0000200 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a # * # 0000400 echo "File bar's content after the clone operation:" od -t x1 $SCRATCH_MNT/bar # Also because the foo's inline extent was not shrunk by the truncate # operation, btrfs' fsck, which is run by the fstests framework everytime a # test completes, failed reporting the following error: # # root 5 inode 257 errors 400, nbytes wrong status=0 exit Cc: stable@vger.kernel.org Signed-off-by: NFilipe Manana <fdmanana@suse.com>
-
- 14 10月, 2015 3 次提交
-
-
由 Filipe Manana 提交于
If when reading a page we find a hole and our caller had already locked the range (bio flags has the bit EXTENT_BIO_PARENT_LOCKED set), we end up unlocking the hole's range and then later our caller unlocks it again, which might have already been locked by some other task once the first unlock happened. Currently this can only happen during a call to the extent_same ioctl, as it's the only caller of __do_readpage() that sets the bit EXTENT_BIO_PARENT_LOCKED for bio flags. Fix this by leaving the unlock exclusively to the caller. Signed-off-by: NFilipe Manana <fdmanana@suse.com>
-
由 Filipe Manana 提交于
Currently the clone ioctl allows to clone an inline extent from one file to another that already has other (non-inlined) extents. This is a problem because btrfs is not designed to deal with files having inline and regular extents, if a file has an inline extent then it must be the only extent in the file and must start at file offset 0. Having a file with an inline extent followed by regular extents results in EIO errors when doing reads or writes against the first 4K of the file. Also, the clone ioctl allows one to lose data if the source file consists of a single inline extent, with a size of N bytes, and the destination file consists of a single inline extent with a size of M bytes, where we have M > N. In this case the clone operation removes the inline extent from the destination file and then copies the inline extent from the source file into the destination file - we lose the M - N bytes from the destination file, a read operation will get the value 0x00 for any bytes in the the range [N, M] (the destination inode's i_size remained as M, that's why we can read past N bytes). So fix this by not allowing such destructive operations to happen and return errno EOPNOTSUPP to user space. Currently the fstest btrfs/035 tests the data loss case but it totally ignores this - i.e. expects the operation to succeed and does not check the we got data loss. The following test case for fstests exercises all these cases that result in file corruption and data loss: seq=`basename $0` seqres=$RESULT_DIR/$seq echo "QA output created by $seq" tmp=/tmp/$$ status=1 # failure is the default! trap "_cleanup; exit \$status" 0 1 2 3 15 _cleanup() { rm -f $tmp.* } # get standard environment, filters and checks . ./common/rc . ./common/filter # real QA test starts here _need_to_be_root _supported_fs btrfs _supported_os Linux _require_scratch _require_cloner _require_btrfs_fs_feature "no_holes" _require_btrfs_mkfs_feature "no-holes" rm -f $seqres.full test_cloning_inline_extents() { local mkfs_opts=$1 local mount_opts=$2 _scratch_mkfs $mkfs_opts >>$seqres.full 2>&1 _scratch_mount $mount_opts # File bar, the source for all the following clone operations, consists # of a single inline extent (50 bytes). $XFS_IO_PROG -f -c "pwrite -S 0xbb 0 50" $SCRATCH_MNT/bar \ | _filter_xfs_io # Test cloning into a file with an extent (non-inlined) where the # destination offset overlaps that extent. It should not be possible to # clone the inline extent from file bar into this file. $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 16K" $SCRATCH_MNT/foo \ | _filter_xfs_io $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo # Doing IO against any range in the first 4K of the file should work. # Due to a past clone ioctl bug which allowed cloning the inline extent, # these operations resulted in EIO errors. echo "File foo data after clone operation:" # All bytes should have the value 0xaa (clone operation failed and did # not modify our file). od -t x1 $SCRATCH_MNT/foo $XFS_IO_PROG -c "pwrite -S 0xcc 0 100" $SCRATCH_MNT/foo | _filter_xfs_io # Test cloning the inline extent against a file which has a hole in its # first 4K followed by a non-inlined extent. It should not be possible # as well to clone the inline extent from file bar into this file. $XFS_IO_PROG -f -c "pwrite -S 0xdd 4K 12K" $SCRATCH_MNT/foo2 \ | _filter_xfs_io $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo2 # Doing IO against any range in the first 4K of the file should work. # Due to a past clone ioctl bug which allowed cloning the inline extent, # these operations resulted in EIO errors. echo "File foo2 data after clone operation:" # All bytes should have the value 0x00 (clone operation failed and did # not modify our file). od -t x1 $SCRATCH_MNT/foo2 $XFS_IO_PROG -c "pwrite -S 0xee 0 90" $SCRATCH_MNT/foo2 | _filter_xfs_io # Test cloning the inline extent against a file which has a size of zero # but has a prealloc extent. It should not be possible as well to clone # the inline extent from file bar into this file. $XFS_IO_PROG -f -c "falloc -k 0 1M" $SCRATCH_MNT/foo3 | _filter_xfs_io $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo3 # Doing IO against any range in the first 4K of the file should work. # Due to a past clone ioctl bug which allowed cloning the inline extent, # these operations resulted in EIO errors. echo "First 50 bytes of foo3 after clone operation:" # Should not be able to read any bytes, file has 0 bytes i_size (the # clone operation failed and did not modify our file). od -t x1 $SCRATCH_MNT/foo3 $XFS_IO_PROG -c "pwrite -S 0xff 0 90" $SCRATCH_MNT/foo3 | _filter_xfs_io # Test cloning the inline extent against a file which consists of a # single inline extent that has a size not greater than the size of # bar's inline extent (40 < 50). # It should be possible to do the extent cloning from bar to this file. $XFS_IO_PROG -f -c "pwrite -S 0x01 0 40" $SCRATCH_MNT/foo4 \ | _filter_xfs_io $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo4 # Doing IO against any range in the first 4K of the file should work. echo "File foo4 data after clone operation:" # Must match file bar's content. od -t x1 $SCRATCH_MNT/foo4 $XFS_IO_PROG -c "pwrite -S 0x02 0 90" $SCRATCH_MNT/foo4 | _filter_xfs_io # Test cloning the inline extent against a file which consists of a # single inline extent that has a size greater than the size of bar's # inline extent (60 > 50). # It should not be possible to clone the inline extent from file bar # into this file. $XFS_IO_PROG -f -c "pwrite -S 0x03 0 60" $SCRATCH_MNT/foo5 \ | _filter_xfs_io $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo5 # Reading the file should not fail. echo "File foo5 data after clone operation:" # Must have a size of 60 bytes, with all bytes having a value of 0x03 # (the clone operation failed and did not modify our file). od -t x1 $SCRATCH_MNT/foo5 # Test cloning the inline extent against a file which has no extents but # has a size greater than bar's inline extent (16K > 50). # It should not be possible to clone the inline extent from file bar # into this file. $XFS_IO_PROG -f -c "truncate 16K" $SCRATCH_MNT/foo6 | _filter_xfs_io $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo6 # Reading the file should not fail. echo "File foo6 data after clone operation:" # Must have a size of 16K, with all bytes having a value of 0x00 (the # clone operation failed and did not modify our file). od -t x1 $SCRATCH_MNT/foo6 # Test cloning the inline extent against a file which has no extents but # has a size not greater than bar's inline extent (30 < 50). # It should be possible to clone the inline extent from file bar into # this file. $XFS_IO_PROG -f -c "truncate 30" $SCRATCH_MNT/foo7 | _filter_xfs_io $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo7 # Reading the file should not fail. echo "File foo7 data after clone operation:" # Must have a size of 50 bytes, with all bytes having a value of 0xbb. od -t x1 $SCRATCH_MNT/foo7 # Test cloning the inline extent against a file which has a size not # greater than the size of bar's inline extent (20 < 50) but has # a prealloc extent that goes beyond the file's size. It should not be # possible to clone the inline extent from bar into this file. $XFS_IO_PROG -f -c "falloc -k 0 1M" \ -c "pwrite -S 0x88 0 20" \ $SCRATCH_MNT/foo8 | _filter_xfs_io $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo8 echo "File foo8 data after clone operation:" # Must have a size of 20 bytes, with all bytes having a value of 0x88 # (the clone operation did not modify our file). od -t x1 $SCRATCH_MNT/foo8 _scratch_unmount } echo -e "\nTesting without compression and without the no-holes feature...\n" test_cloning_inline_extents echo -e "\nTesting with compression and without the no-holes feature...\n" test_cloning_inline_extents "" "-o compress" echo -e "\nTesting without compression and with the no-holes feature...\n" test_cloning_inline_extents "-O no-holes" "" echo -e "\nTesting with compression and with the no-holes feature...\n" test_cloning_inline_extents "-O no-holes" "-o compress" status=0 exit Cc: stable@vger.kernel.org Signed-off-by: NFilipe Manana <fdmanana@suse.com>
-
由 Robin Ruede 提交于
This fixes a regression introduced by 37b8d27d between v4.1 and v4.2. When a snapshot is received, its received_uuid is set to the original uuid of the subvolume. When that snapshot is then resent to a third filesystem, it's received_uuid is set to the second uuid instead of the original one. The same was true for the parent_uuid. This behaviour was partially changed in 37b8d27d, but in that patch only the parent_uuid was taken from the real original, not the uuid itself, causing the search for the parent to fail in the case below. This happens for example when trying to send a series of linked snapshots (e.g. created by snapper) from the backup file system back to the original one. The following commands reproduce the issue in v4.2.1 (no error in 4.1.6) # setup three test file systems for i in 1 2 3; do truncate -s 50M fs$i mkfs.btrfs fs$i mkdir $i mount fs$i $i done echo "content" > 1/testfile btrfs su snapshot -r 1/ 1/snap1 echo "changed content" > 1/testfile btrfs su snapshot -r 1/ 1/snap2 # works fine: btrfs send 1/snap1 | btrfs receive 2/ btrfs send -p 1/snap1 1/snap2 | btrfs receive 2/ # ERROR: could not find parent subvolume btrfs send 2/snap1 | btrfs receive 3/ btrfs send -p 2/snap1 2/snap2 | btrfs receive 3/ Signed-off-by: NRobin Ruede <rruede+git@gmail.com> Fixes: 37b8d27d ("Btrfs: use received_uuid of parent during send") Cc: stable@vger.kernel.org # v4.2+ Reviewed-by: NFilipe Manana <fdmanana@suse.com> Tested-by: NEd Tomlinson <edt@aei.ca>
-
- 13 10月, 2015 2 次提交
-
-
由 Filipe Manana 提交于
If we have a file that shares an extent with other files, when processing the extent item relative to a shared extent, we blindly issue a clone operation that will target a length matching the length in the extent item and uses as a source some other file the receiver already has and points to the same extent. However that range in the other file might not exclusively point only to the shared extent, and so using that length will result in the receiver getting a file with different data from the one in the send snapshot. This issue happened both for incremental and full send operations. So fix this by issuing clone operations with lengths that don't cover regions of the source file that point to different extents (or have holes). The following test case for fstests reproduces the problem. seq=`basename $0` seqres=$RESULT_DIR/$seq echo "QA output created by $seq" tmp=/tmp/$$ status=1 # failure is the default! trap "_cleanup; exit \$status" 0 1 2 3 15 _cleanup() { rm -fr $send_files_dir rm -f $tmp.* } # get standard environment, filters and checks . ./common/rc . ./common/filter # real QA test starts here _supported_fs btrfs _supported_os Linux _require_scratch _need_to_be_root _require_cp_reflink _require_xfs_io_command "fpunch" send_files_dir=$TEST_DIR/btrfs-test-$seq rm -f $seqres.full rm -fr $send_files_dir mkdir $send_files_dir _scratch_mkfs >>$seqres.full 2>&1 _scratch_mount # Create our test file with a single 100K extent. $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 100K" \ $SCRATCH_MNT/foo | _filter_xfs_io # Clone our file into a new file named bar. cp --reflink=always $SCRATCH_MNT/foo $SCRATCH_MNT/bar # Now overwrite parts of our foo file. $XFS_IO_PROG -c "pwrite -S 0xbb 50K 10K" \ -c "pwrite -S 0xcc 90K 10K" \ -c "fpunch 70K 10k" \ $SCRATCH_MNT/foo | _filter_xfs_io _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT \ $SCRATCH_MNT/snap echo "File digests in the original filesystem:" md5sum $SCRATCH_MNT/snap/foo | _filter_scratch md5sum $SCRATCH_MNT/snap/bar | _filter_scratch _run_btrfs_util_prog send $SCRATCH_MNT/snap -f $send_files_dir/1.snap # Now recreate the filesystem by receiving the send stream and verify # we get the same file contents that the original filesystem had. _scratch_unmount _scratch_mkfs >>$seqres.full 2>&1 _scratch_mount _run_btrfs_util_prog receive $SCRATCH_MNT -f $send_files_dir/1.snap # We expect the destination filesystem to have exactly the same file # data as the original filesystem. # The btrfs send implementation had a bug where it sent a clone # operation from file foo into file bar covering the whole [0, 100K[ # range after creating and writing the file foo. This was incorrect # because the file bar now included the updates done to file foo after # we cloned foo to bar, breaking the COW nature of reflink copies # (cloned extents). echo "File digests in the new filesystem:" md5sum $SCRATCH_MNT/snap/foo | _filter_scratch md5sum $SCRATCH_MNT/snap/bar | _filter_scratch status=0 exit Another test case that reproduces the problem when we have compressed extents: seq=`basename $0` seqres=$RESULT_DIR/$seq echo "QA output created by $seq" tmp=/tmp/$$ status=1 # failure is the default! trap "_cleanup; exit \$status" 0 1 2 3 15 _cleanup() { rm -fr $send_files_dir rm -f $tmp.* } # get standard environment, filters and checks . ./common/rc . ./common/filter # real QA test starts here _supported_fs btrfs _supported_os Linux _require_scratch _need_to_be_root _require_cp_reflink send_files_dir=$TEST_DIR/btrfs-test-$seq rm -f $seqres.full rm -fr $send_files_dir mkdir $send_files_dir _scratch_mkfs >>$seqres.full 2>&1 _scratch_mount "-o compress" # Create our file with an extent of 100K starting at file offset 0K. $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 100K" \ -c "fsync" \ $SCRATCH_MNT/foo | _filter_xfs_io # Rewrite part of the previous extent (its first 40K) and write a new # 100K extent starting at file offset 100K. $XFS_IO_PROG -c "pwrite -S 0xbb 0K 40K" \ -c "pwrite -S 0xcc 100K 100K" \ $SCRATCH_MNT/foo | _filter_xfs_io # Our file foo now has 3 file extent items in its metadata: # # 1) One covering the file range 0 to 40K; # 2) One covering the file range 40K to 100K, which points to the first # extent we wrote to the file and has a data offset field with value # 40K (our file no longer uses the first 40K of data from that # extent); # 3) One covering the file range 100K to 200K. # Now clone our file foo into file bar. cp --reflink=always $SCRATCH_MNT/foo $SCRATCH_MNT/bar # Create our snapshot for the send operation. _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT \ $SCRATCH_MNT/snap echo "File digests in the original filesystem:" md5sum $SCRATCH_MNT/snap/foo | _filter_scratch md5sum $SCRATCH_MNT/snap/bar | _filter_scratch _run_btrfs_util_prog send $SCRATCH_MNT/snap -f $send_files_dir/1.snap # Now recreate the filesystem by receiving the send stream and verify we # get the same file contents that the original filesystem had. # Btrfs send used to issue a clone operation from foo's range # [80K, 140K[ to bar's range [40K, 100K[ when cloning the extent pointed # to by foo's second file extent item, this was incorrect because of bad # accounting of the file extent item's data offset field. The correct # range to clone from should have been [40K, 100K[. _scratch_unmount _scratch_mkfs >>$seqres.full 2>&1 _scratch_mount "-o compress" _run_btrfs_util_prog receive $SCRATCH_MNT -f $send_files_dir/1.snap echo "File digests in the new filesystem:" # Must match the digests we got in the original filesystem. md5sum $SCRATCH_MNT/snap/foo | _filter_scratch md5sum $SCRATCH_MNT/snap/bar | _filter_scratch status=0 exit Signed-off-by: NFilipe Manana <fdmanana@suse.com>
-
由 Chris Mason 提交于
Merge branch 'fix/waitqueue-barriers' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.4
-