- 06 12月, 2016 1 次提交
-
-
由 Jeff Mahoney 提交于
There are 11 functions that accept a root parameter and immediately overwrite it. We can pass those an fs_info pointer instead. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 30 11月, 2016 39 次提交
-
-
由 Wang Xiaoguang 提交于
This issue was found when I tried to delete a heavily reflinked file, when deleting such files, other transaction operation will not have a chance to make progress, for example, start_transaction() will blocked in wait_current_trans(root) for long time, sometimes it even triggers soft lockups, and the time taken to delete such heavily reflinked file is also very large, often hundreds of seconds. Using perf top, it reports that: PerfTop: 7416 irqs/sec kernel:99.8% exact: 0.0% [4000Hz cpu-clock], (all, 4 CPUs) --------------------------------------------------------------------------------------- 84.37% [btrfs] [k] __btrfs_run_delayed_refs.constprop.80 11.02% [kernel] [k] delay_tsc 0.79% [kernel] [k] _raw_spin_unlock_irq 0.78% [kernel] [k] _raw_spin_unlock_irqrestore 0.45% [kernel] [k] do_raw_spin_lock 0.18% [kernel] [k] __slab_alloc It seems __btrfs_run_delayed_refs() took most cpu time, after some debug work, I found it's select_delayed_ref() causing this issue, for a delayed head, in our case, it'll be full of BTRFS_DROP_DELAYED_REF nodes, but select_delayed_ref() will firstly try to iterate node list to find BTRFS_ADD_DELAYED_REF nodes, obviously it's a disaster in this case, and waste much time. To fix this issue, we introduce a new ref_add_list in struct btrfs_delayed_ref_head, then in select_delayed_ref(), if this list is not empty, we can directly use nodes in this list. With this patch, it just took about 10~15 seconds to delte the same file. Now using perf top, it reports that: PerfTop: 2734 irqs/sec kernel:99.5% exact: 0.0% [4000Hz cpu-clock], (all, 4 CPUs) ---------------------------------------------------------------------------------------- 20.74% [kernel] [k] _raw_spin_unlock_irqrestore 16.33% [kernel] [k] __slab_alloc 5.41% [kernel] [k] lock_acquired 4.42% [kernel] [k] lock_acquire 4.05% [kernel] [k] lock_release 3.37% [kernel] [k] _raw_spin_unlock_irq For normal files, this patch also gives help, at least we do not need to iterate whole list to found BTRFS_ADD_DELAYED_REF nodes. Signed-off-by: NWang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Reviewed-by: NLiu Bo <bo.li.liu@oracle.com> Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
Commit 62b99540 (btrfs: relocation: Fix leaking qgroups numbers on data extents) only fixes the problem partly. The previous fix is to trace all new data extents at transaction commit time when balance finishes. However balance is not done in a large transaction, every path replacement can happen in its own transaction. This makes the fix useless if transaction commits during relocation. For example: relocate_block_group() |-merge_reloc_roots() | |- merge_reloc_root() | |- btrfs_start_transaction() <- Trans X | |- replace_path() <- Cause leak | |- btrfs_end_transaction_throttle() <- Trans X commits here | | Leak not fixed | | | |- btrfs_start_transaction() <- Trans Y | |- replace_path() <- Cause leak | |- btrfs_end_transaction_throttle() <- Trans Y ends | but not committed |-btrfs_join_transaction() <- Still trans Y |-qgroup_fix() <- Only fixes data leak | in trans Y |-btrfs_commit_transaction() <- Trans Y commits In that case, qgroup fixup can only fix data leak in trans Y, data leak in trans X is out of fix. So the correct fix should happen in the same transaction of replace_path(). This patch fixes it by tracing both subtrees of tree block swap, so it can fix the problem and ensure all leaking and fix are in the same transaction, so no leak again. Reported-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-and-Tested-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
Move account_shared_subtree() to qgroup.c and rename it to btrfs_qgroup_trace_subtree(). Do the same thing for account_leaf_items() and rename it to btrfs_qgroup_trace_leaf_items(). Since all these functions are only for qgroup, move them to qgroup.c and export them is more appropriate. Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-and-Tested-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
Rename btrfs_qgroup_insert_dirty_extent(_nolock) to btrfs_qgroup_trace_extent(_nolock), according to the new reserve/trace/account naming schema. Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-and-Tested-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
Add explaination how btrfs qgroups work. Qgroup is split into 3 main phrases: 1) Reserve To ensure qgroup doesn't exceed its limit 2) Trace To info qgroup to trace which extent 3) Account Calculate qgroup number change for each traced extent. This should save quite some time for new developers. Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Christoph Hellwig 提交于
And remove the bogus check for a NULL return value from kmap, which can't happen. While we're at it: I don't think that kmapping up to 256 will work without deadlocks on highmem machines, a better idea would be to use vm_map_ram to map all of them into a single virtual address range. Incidentally that would also simplify the code a lot. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Christoph Hellwig 提交于
Rework the loop a little bit to use the generic bio_for_each_segment_all helper for iterating over the bio. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Christoph Hellwig 提交于
Use the bvec offset and len members to prepare for multipage bvecs. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Christoph Hellwig 提交于
Instead of using bi_vcnt to calculate it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Christoph Hellwig 提交于
Use bio_for_each_segment_all to iterate over the segments instead. This requires a bit of reshuffling so that we only lookup up the ordered item once inside the bio_for_each_segment_all loop. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Christoph Hellwig 提交于
Just use bio_for_each_segment_all to iterate over all segments. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Christoph Hellwig 提交于
Just use bio_for_each_segment_all to iterate over all segments. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Christoph Hellwig 提交于
Pass the full bio to the decompression routines and use bio iterators to iterate over the data in the bio. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Jeff Mahoney 提交于
This fixes the WARN_ON on BTRFS_I(inode)->reserved_extents in btrfs_destroy_inode and the WARN_ON on nonzero delalloc bytes on umount with qgroups enabled. I was able to reproduce this by setting up a small (~500kb) quota limit and writing a file one byte at a time until I hit the limit. The warnings would all hit on umount. The root cause is that we would reserve a block-sized range in both the reservation and the quota in btrfs_check_data_free_space, but if we encountered a problem (like e.g. EDQUOT), we would only release the single byte in the qgroup reservation. That caused an iotree state split, which increased the number of outstanding extents, in turn disallowing releasing the metadata reservation. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Reviewed-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Josef Bacik 提交于
At this point we will have dropped extent entries from the file, so if we fail to insert the new hole entries then we are leaving the fs in a corrupt state (albeit an easily fixed one). Abort the transaciton if this happens so we can avoid corrupting the fs. Thanks, Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Josef Bacik 提交于
In order to do hole punching we have a block reserve to hold the reservation we need to drop the extents in our range. Since we could end up dropping a lot of extents we set rsv->failfast so we can just loop around again and drop the remaining of the range. Unfortunately we unconditionally fill the hole extents in and start from the last extent we encountered, which we may or may not have dropped. So this can result in overlapping file extent entries, which can be tripped over in a variety of ways, either by hitting BUG_ON(!ret) in fill_holes() after the search, or in btrfs_set_item_key_safe() in btrfs_drop_extent() at a later time by an unrelated task. Fix this by only setting drop_end to the last extent we did actually drop. This way our holes are filled in properly for the range that we did drop, and the rest of the range that remains to be dropped is actually dropped. Thanks, Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Jeff Mahoney 提交于
If we process the last item in the leaf and hit an I/O error while reading the next leaf, we return -EIO without having adjusted the position. Since we have emitted dirents, getdents() will return the byte count to the user instead of the error. Subsequent callers will emit the last successful dirent again, and return -EIO again, with the same result. Callers loop forever. Instead, if we always increment ctx->pos after emitting or skipping the dirent, we'll be sure that we won't hit the same one again. When we go to process the next leaf, we won't have emitted any dirents and the -EIO will be returned to the user properly. We also don't need to track if we've emitted a dirent already or if we've changed the position yet. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Jeff Mahoney 提交于
Commit 3de4586c (Btrfs: Allow subvolumes and snapshots anywhere in the directory tree) introduced the current system of placing snapshots in the directory tree. It also introduced the behavior of creating the snapshot and then creating the directory entries for it. We've kept this code around for compatibility reasons, but it turns out that no file systems with the old tree_root based snapshots can be mounted on newer (>= 2009) kernels anyway. About a month after the above commit, commit 2a7108ad (Btrfs: rev the disk format for the inode compat and csum selection changes) landed, changing the superblock magic number. As a result, we know that we'll never encounter tree_root-based dirents or have to deal with skipping our own snapshot dirents. Since that also means that we're now only iterating over DIR_INDEX items, which only contain one directory entry per leaf item, we don't need to loop over the leaf item contents anymore either. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nick Terrell 提交于
If zlib_inflateInit2 fails, the input page is never unmapped. Add a call to kunmap when it fails. Signed-off-by: NNick Terrell <nickrterrell@gmail.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The balance status item contains currently known filter values, but the stripes filter was unintentionally not among them. This would mean, that interrupted and automatically restarted balance does not apply the stripe filters. Fixes: dee32d0a CC: stable@vger.kernel.org # 4.4+ Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Christophe JAILLET 提交于
'btrfs_iget()' can not return NULL, so this test can be removed. Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Domagoj Tršan 提交于
csum member of struct btrfs_super_block has array type of u8. It makes sense that function btrfs_csum_final should be also declared to accept u8 *. I changed the declaration of method void btrfs_csum_final(u32 crc, char *result); to void btrfs_csum_final(u32 crc, u8 *result); Signed-off-by: NDomagoj Tršan <domagoj.trsan@gmail.com> [ changed cast to u8 at several call sites ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Liu Bo 提交于
If we have |0--hole--4095||4096--preallocate--12287| instead of using preallocated space, a 8K direct write will just create a new 8K extent and it'll end up with |0--new extent--8191||8192--preallocate--12287| It's because we find a hole em and then go to create a new 8K extent directly without adjusting @len. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Reviewed-by: NChris Mason <clm@fb.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Shailendra Verma 提交于
There is no need to call kfree() if memdup_user() fails, as no memory was allocated and the error in the error-valued pointer should be returned. Signed-off-by: NShailendra Verma <shailendra.v@samsung.com> [ edit subject ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
Using copy_extent_buffer is suitable for copying betwenn buffers from an arbitrary offset and deals with page boundaries. This is not necessary when doing a full extent_buffer-to-extent_buffer copy. We can utilize the copy_page helper as well. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The only memset we do is to 0, so sink the parameter to the function and simplify all calls. Rename the function to reflect the behaviour. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The copy_page is usually optimized and can be faster than memcpy. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The fsid and chunk tree uuid are always located in the first page, we don't need the to use write_extent_buffer. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
__bdev' has never been used since 0b86a832 (2008). Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
During the time, the function has been shrunk to the point that it just calls find_extent_buffer, just passing the parameters. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
We dereference fs_info several times, besides that post-mount functions should never see a NULL fs_info. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The lock is held, we make the same lookup that previously failed with EEXIST and we don't insert NULL pointers. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
Originally, the eb and start were passed separately in case eb is NULL. Since the readahead has been refactored in 4.6, this is not true anymore and we can get rid of the parameter. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
'start' is not used since "btrfs: reada: Pass reada_extent into __readahead_hook directly" (6e39dbe8). Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
We can't touch the eb directly in case the function is called with a non-zero error, so we can read the eb level when needed. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The helpers are not meant to be generic, the name is misleading. Convert them to static inlines for type checking. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
Signed-off-by: NDavid Sterba <dsterba@suse.com>
-