- 30 10月, 2017 32 次提交
-
-
由 Qu Wenruo 提交于
Add extra checks for item with EXTENT_DATA type. This checks the following thing: 0) Key offset All key offsets must be aligned to sectorsize. Inline extent must have 0 for key offset. 1) Item size Uncompressed inline file extent size must match item size. (Compressed inline file extent has no information about its on-disk size.) Regular/preallocated file extent size must be a fixed value. 2) Every member of regular file extent item Including alignment for bytenr and offset, possible value for compression/encryption/type. 3) Type/compression/encode must be one of the valid values. This should be the most comprehensive and strict check in the context of btrfs_item for EXTENT_DATA. Signed-off-by: NQu Wenruo <quwenruo.btrfs@gmx.com> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> [ switch to BTRFS_FILE_EXTENT_TYPES, similar to what BTRFS_COMPRESS_TYPES does ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
Function check_leaf() checks if any item pointer points outside of the leaf, but it doesn't check if the pointer overlaps with the item itself. Normally only the last item may be the victim, but adding such check is never a bad idea anyway. Signed-off-by: NQu Wenruo <quwenruo.btrfs@gmx.com> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
Current check_leaf() function does a good job checking key order and item offset/size. However it only checks from slot 0 to the last but one slot, this is good but makes later expansion hard. So this refactoring iterates from slot 0 to the last slot. For key comparison, it uses a key with all 0 as initial key, so all valid keys should be larger than that. And for item size/offset checks, it compares current item end with previous item offset. For slot 0, use leaf end as a special case. This makes later item/key offset checks and item size checks easier to be implemented. Also, makes check_leaf() to return -EUCLEAN other than -EIO to indicate error. Signed-off-by: NQu Wenruo <quwenruo.btrfs@gmx.com> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Timofey Titovets 提交于
Was added in: c8b97818 "Btrfs: Add zlib compression support" Survive to near time (from 08.10.2008). Because 'start' checked for zero before branch, so it's safe to remove that subtraction. Signed-off-by: NTimofey Titovets <nefelim4ag@gmail.com> Reviewed-by: NSatoru Takeuchi <satoru.takeuchi@gmail.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Josef Bacik 提交于
Nikolay reported that generic/273 was failing currently with ENOSPC. Turns out this is because we get to the point where the outstanding reservations are greater than the pinned space on the fs. This is a mistake, previously we used the current reservation amount in may_commit_transaction, not the entire outstanding reservation amount. Fix this to find the minimum byte size needed to make progress in flushing, and pass that into may_commit_transaction. From there we can make a smarter decision on whether to commit the transaction or not. This fixes the failure in generic/273. From Nikolai, IOW: when we go to the final stage of deciding whether to do trans commit, instead of passing all the reservations from all tickets we just pass the reservation for the current ticket. Otherwise, in case all reservations exceed pinned space, then we don't commit transaction and fail prematurely. Before we passed num_bytes from flush_space, where num_bytes was the sum of all pending reserverations, but now all we do is take the first ticket and commit the trans if we can satisfy that. Fixes: 957780eb ("Btrfs: introduce ticketed enospc infrastructure") Cc: stable@vger.kernel.org # 4.8 Reported-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NJosef Bacik <jbacik@fb.com> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Tested-by: NNikolay Borisov <nborisov@suse.com> [ added Nikolai's comment ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Kuanling Huang 提交于
By analyzing the perf on btrfs send, we found it take large amount of cpu time on page_cache_sync_readahead. This effort can be reduced after switching to asynchronous one. Overall performance gain on HDD and SSD were 9 and 15 percent if simply send a large file. Signed-off-by: NKuanling Huang <peterh@synology.com> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Liu Bo 提交于
The local bio_list may have pending bios when doing cleanup, it can end up with memory leak if they don't get freed. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Colin Ian King 提交于
Don't populate the read-only array types on the stack, instead make it static const. Makes the object code smaller by nearly 60 bytes: Before: text data bss dec hex filename 90536 6552 64 97152 17b80 fs/btrfs/ioctl.o After: text data bss dec hex filename 90414 6616 64 97094 17b46 fs/btrfs/ioctl.o Signed-off-by: NColin Ian King <colin.king@canonical.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Allen Pais 提交于
Forward the correct return value -ENOMEM from btrfsic_dev_state_alloc() too. Signed-off-by: NAllen Pais <allen.lkml@gmail.com> Reviewed-by: NAnand Jain <anand.jain@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> [ adjust changelog ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Liu Bo 提交于
We've seen the following backtrace stack in ftrace or dmesg log, kworker/u16:10-4244 [000] 241942.480955: function: btrfs_put_ordered_extent kworker/u16:10-4244 [000] 241942.480956: kernel_stack: <stack trace> => finish_ordered_fn (ffffffffa0384475) => btrfs_scrubparity_helper (ffffffffa03ca577) <-----"incorrect" => btrfs_freespace_write_helper (ffffffffa03ca98e) <-----"correct" => process_one_work (ffffffff81117b2f) => worker_thread (ffffffff81118c2a) => kthread (ffffffff81121de0) => ret_from_fork (ffffffff81d7087a) btrfs_freespace_write_helper is actually calling normal_worker_helper instead of btrfs_scrubparity_helper, so somehow kernel has parsed the incorrect function address while unwinding the stack, btrfs_scrubparity_helper really shouldn't be shown up. It's caused by compiler doing inline for our helper function, adding a noinline tag can fix that. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> [ use noinline_for_stack ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Liu Bo 提交于
Since both committing transaction and writing log-tree are doing plugging on metadata IO, we can unify to use %sync_writers to benefit both cases, instead of checking bio_flags while writing meta blocks of log-tree. We can remove this bio_flags because in order to write dirty blocks, log tree also uses btrfs_write_marked_extents(), inside which we have enabled %sync_writers, therefore, every write goes in a synchronous way, so does checksuming. Please also note that, bio_flags is applied per-context while %sync_writers is applied per-inode, so this might incur some overhead, ie. 1) while log tree is flushing its dirty blocks via btrfs_write_marked_extents(), in which %sync_writers is increased by one. 2) in the meantime, some writeback operations may happen upon btrfs's metadata inode, so these writes go synchronously, too. However, AFAICS, the overhead is not a big one while the win is that we unify the two places that needs synchronous way and remove a special hack/flag. This removes the bio_flags related stuff for writing log-tree. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Liu Bo 提交于
We have started plug in btrfs_write_and_wait_marked_extents() but the generated IOs actually go to device's schedule IO list where the work is doing in another task, thus the started plug doesn't make any sense. And since we wait for IOs immediately after writing meta blocks, it's the same case as writing log tree, doing sync submit can merge more IOs. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Satoru Takeuchi 提交于
Signed-off-by: NSatoru Takeuchi <satoru.takeuchi@gmail.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Colin Ian King 提交于
There are checks on fs_info in __btrfs_panic to avoid dereferencing a null fs_info, however, there is a call to btrfs_crit that may also dereference a null fs_info. Fix this by adding a check to see if fs_info is null and only print the s_id if fs_info is non-null. Detected by CoverityScan CID#401973 ("Dereference after null check") Fixes: efe120a0 ("Btrfs: convert printk to btrfs_ and fix BTRFS prefix") Signed-off-by: NColin Ian King <colin.king@canonical.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Christos Gkekas 提交于
The value of variable 'can_recover' is never used after being set, thus it should be removed, as it was never used since the first commit 68a7342c ("Btrfs: cleanup orphaned root orphan item"). Signed-off-by: NChristos Gkekas <chris.gekas@gmail.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Christophe JAILLET 提交于
If 'btrfs_alloc_path()' fails, we must free the resources already allocated, as done in the other error handling paths in this function. Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: NQu Wenruo <quwenruo.btrfs@gmx.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
__link_block_group is called from only 2 places and at each call site the space_info being passed is the same as the space info assigned to the passed cache struct. Let's remove the redundant argument and make the function reference the space_info from the passed block_group_cache. No functional changes Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NJosef Bacik <jbacik@fb.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> [ renamed to link_block_group ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Currently the code executes add_extent_mapping and if it is successful it links the new mapping, it then proceeds to unlock the extent mapping tree and check for failure and handle them. Instead, rework the code to only perform a single check if add_extent_mapping has failed and handle it, otherwise the code continues in a linear fashion. No functional changes Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NJosef Bacik <jbacik@fb.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Introduced by 5a5f79b5 ("Btrfs: allow unaligned DIO") and never used. The buffered fallback from unaligned DIO works as expected. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NTimofey Titovets <nefelim4ag@gmail.com> Reviewed-by: NJosef Bacik <jbacik@fb.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
btrfs_changed_cb_t represents the signature of the callback being passed to btrfs_compare_trees. Currently there is only one such callback, namely changed_cb in send.c. This function doesn't really uses the first 2 parameters, i.e. the roots. Since there are not other functions implementing the btrfs_changed_cb_t let's remove the unused parameters from the prototype and implementation. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Reviewed-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
iterate_dir_item:found_key - introduced in 31db9f7c ("Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive"), yet never used. record_ref:num - ditto This is a first pass with the low-hanging fruit. There are still quite a few unsued parameters in some function which have to abide by a callback interface. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NJosef Bacik <jbacik@fb.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Src was initially part of 31ff1cd2 ("Btrfs: Copy into the log tree in big batches"), however 16e7549f ("Btrfs: incompatible format change to remove hole extents") changed parameters passed to copy_items which made the src variable redundant. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NTimofey Titovets <nefelim4ag@gmail.com> Reviewed-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Liu Bo 提交于
While we submit direct writes, if the inode is flagged with nodatasum, there's no benefit to submit asynchronously, because a) we don't have to calculate checksum across processors, b) and direct IO has started a plug, but async submit makes us queue IO on each device's scheduled IO list instead of DIO's plug list, so that IOs get much less merges in general. Lets use sync submit for nodatasum inodes. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Reviewed-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Liu Bo 提交于
After mapping block with BTRFS_MAP_WRITE, parities have been sorted to the end position, so this search can start from the first parity stripe. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> [ copied changelog as a comment ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Anand Jain 提交于
We didn't copy fsid to struct super_block.s_uuid so Overlay disables index feature with btrfs as the lower FS. kernel: overlayfs: fs on '/lower' does not support file handles, falling back to index=off. Fix this by publishing the fsid through struct super_block.s_uuid. [ dsterba: I think that setting s_uuid is the last missing bit. Overlay needs the file handle encoding support from the lower filesystem, which is supported. Filling the whole filesystem id is correct, the subvolume id is encoded in the file handle buffer from inside btrfs_encode_fh. ] Signed-off-by: NAnand Jain <anand.jain@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Omar Sandoval 提交于
Signed-off-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Omar Sandoval 提交于
These aren't used outside of volumes.c. Signed-off-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Some static functions are needlessly forward declared. Let's remove those declarations since they add no value. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Liu Bo 提交于
Both wait_for_commit() and wait_for_writer() are checking the condition out of the mutex lock. This refactors code a bit to be lock safe. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Liu Bo 提交于
Since TASK_UNINTERRUPTIBLE has been used here, wait_event() can do the same job. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Liu Bo 提交于
If we're still going to wait after schedule(), we don't have to do finish_wait() to remove our %wait_queue_entry since prepare_to_wait() won't add the same %wait_queue_entry twice. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Liu Bo 提交于
Block layer has a limit on plug, ie. BLK_MAX_REQUEST_COUNT == 16, so we don't gain benefits by batching 64 bios here. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 19 10月, 2017 1 次提交
-
-
由 Matthew Garrett 提交于
[AV: in addition to the fix in previous commit] Signed-off-by: NMatthew Garrett <mjg59@google.com> Cc: David Howells <dhowells@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Reviewed-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 04 10月, 2017 2 次提交
-
-
由 Tsutomu Itoh 提交于
Because the values of BTRFS_FS_EXCL_OP and BTRFS_FS_QUOTA_OVERRIDE overlap, we should change the value. First, BTRFS_FS_EXCL_OP was set to 14. commit 171938e5 ("btrfs: track exclusive filesystem operation in flags") Next, the value of BTRFS_FS_QUOTA_OVERRIDE was set to 14. commit f29efe29 ("btrfs: add quota override flag to enable quota override for CAP_SYS_RESOURCE") As a result, the value 14 overlapped, by accident. This problem is solved by defining the value of BTRFS_FS_EXCL_OP as 16, the flags are internal. Fixes: f29efe29 ("btrfs: add quota override flag to enable quota override for CAP_SYS_RESOURCE") CC: stable@vger.kernel.org # 4.13+ Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> [ minimize the change, update only BTRFS_FS_EXCL_OP ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Goffredo Baroncelli 提交于
Jean-Denis Girard noticed commit c821e7f3 "pass bytes to btrfs_bio_alloc" (https://patchwork.kernel.org/patch/9763081/) introduces a regression on 32 bit machines. When CONFIG_LBDAF is _not_ defined (CONFIG_LBDAF == Support for large (2TB+) block devices and files) sector_t is 32 bit on 32bit machines. In the function submit_extent_page, 'sector' (which is sector_t type) is multiplied by 512 to convert it from sectors to bytes, leading to an overflow when the disk is bigger than 4GB (!). I added a cast to u64 to avoid overflow. Fixes: c821e7f3 ("btrfs: pass bytes to btrfs_bio_alloc") CC: stable@vger.kernel.org # 4.13+ Signed-off-by: NGoffredo Baroncelli <kreijack@inwind.it> Tested-by: NJean-Denis Girard <jd.girard@sysnux.pf> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 26 9月, 2017 5 次提交
-
-
由 Josef Bacik 提交于
Amir reported a bug discovered by his cleaned up version of my dm-log-writes xfstests where we were missing csums at certain replay points. This is because fsx was doing an msync(), which essentially fsync()'s a specific range of a file. We will log all modified extents, but only search for the checksums in the range we are being asked to sync. We cannot simply log the extents in the range we're being asked because we are logging the inode item as it is currently, which if it has had a i_size update before the msync means we will miss extents when replaying. We could possibly get around this by marking the inode with the transaction that extended the i_size to see if we have this case, but this would be racy and we'd have to lock the whole range of the inode to make sure we didn't have an ordered extent outside of our range that was in the middle of completing. Fix this simply by keeping track of the modified extents range and logging the csums for the entire range of extents that we are logging. This makes the xfstest pass. Reported-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Liu Bo 提交于
commit 4246a0b6 ("block: add a bi_error field to struct bio") changed the logic of how dio read endio reports errors. For single stripe dio read, %bio->bi_status reflects the error before verifying checksum, and now we're updating it when data block matches with its checksum, while in the mismatching case, %bio->bi_status is not updated to relfect that. When some blocks in a file have been corrupted on disk, reading such a file ends up with 1) checksum errors are reported in kernel log 2) read(2) returns successfully with some content being 0x01. In order to fix it, we need to report its checksum mismatch error to the upper layer (dio layer in this case) as well. Fixes: 4246a0b6 ("block: add a bi_error field to struct bio") Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Reported-by: NGoffredo Baroncelli <kreijack@inwind.it> Tested-by: NGoffredo Baroncelli <kreijack@inwind.it> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Sargun Dhillon 提交于
Previously, we were calling del_qgroup_item, and ignoring the return code resulting in a potential to have divergent in-memory state without an error. Perhaps, it makes sense to handle this error code, and put the filesystem into a read only, or similar state. This patch only adds reporting of the error if the error is fatal, (any error other than qgroup not found). Signed-off-by: NSargun Dhillon <sargun@sargun.me> Reviewed-by: NQu Wenruo <quwenruo.btrfs@gmx.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Liu Bo 提交于
Currently even if the underlying disk reports failure on IO, compressed read endio still gets to verify checksum and reports it as a checksum error. In fact, if some IO have failed during reading a compressed data extent , there's no way the checksum could match, therefore, we can skip that in order to return error quickly to the upper layer. Please note that we need to do this after recording the failed mirror index so that read-repair in the upper layer's endio can work properly. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Tested-by: NPaul Jones <paul@pauljones.id.au> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Liu Bo 提交于
The kernel oops happens at kernel BUG at fs/btrfs/extent_io.c:2104! ... RIP: clean_io_failure+0x263/0x2a0 [btrfs] It's showing that read-repair code is using an improper mirror index. This is due to the fact that compression read's endio hasn't recorded the failed mirror index in %cb->orig_bio. With this, btrfs's read-repair can work properly on reading compressed data. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Reported-by: NPaul Jones <paul@pauljones.id.au> Tested-by: NPaul Jones <paul@pauljones.id.au> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-