- 14 2月, 2017 19 次提交
-
-
由 Nikolay Borisov 提交于
Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
This function is internal to btrfs and doesn't really deal with any VFS members, as such it needn't take a struct inode refrence but btrfs_inode. Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Currently btrfs_ino takes a struct inode and this causes a lot of internal btrfs functions which consume this ino to take a VFS inode, rather than btrfs' own struct btrfs_inode. In order to fix this "leak" of VFS structs into the internals of btrfs first it's necessary to eliminate all uses of struct inode for the purpose of inode. This patch does that by using BTRFS_I to convert an inode to btrfs_inode. With this problem eliminated subsequent patches will start eliminating the passing of struct inode altogether, eventually resulting in a lot cleaner code. Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com> [ fix btrfs_get_extent tracepoint prototype ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The expression is open-coded in several places, this asks for a wrapper. As we know the MAX_EXTENT fits to u32, we can use the appropirate division helper. This cascades to the result type updates. Compiler is clever enough to use shift instead of integer division, so there's no change in the generated assembly. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
A proposed patch in https://marc.info/?l=linux-btrfs&m=147859791003837 pointed out bad limit threshold in cow_file_range_async, but it turned out that the whole logic is not necessary and is done by writeback. We agreed to remove it. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Anand Jain 提交于
As of now writes smaller than 64k for non compressed extents and 16k for compressed extents inside eof are considered as candidate for auto defrag, put them together at a place. Signed-off-by: NAnand Jain <anand.jain@oracle.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Anand Jain 提交于
Since btrfs_defrag_leaves() does not support extent_root, remove its corresponding call. The user can use the file based defrag to defrag extents as of now. No change in behaviour as extent_root is explicitly skipped in btrfs_defrag_leaves and this has never worked as expected. Signed-off-by: NAnand Jain <anand.jain@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> [ ehnance changelong ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Jeff Mahoney 提交于
btrfs_add_delayed_data_ref is always called with a NULL extent_op, so let's drop the argument. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Colin Ian King 提交于
The check for a null inode is redundant since the function is a callback for exportfs, which will itself crash if dentry->d_inode or parent->d_inode is NULL. Removing the null check makes this consistent with other file systems. Also remove the redundant null dir check too. Found with static analysis by CoverityScan, CID 1389472 Kudos to Jeff Mahoney for reviewing and explaining the error in my original patch (most of this explanation went into the above commit message) and David Sterba for pointing out that the dir check is also redundant. Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Seraphime Kirkovski 提交于
This replaces ACCESS_ONCE macro with the corresponding READ|WRITE macros Signed-off-by: NSeraphime Kirkovski <kirkseraph@gmail.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Seraphime Kirkovski 提交于
This cleans up the cases where the min/max macros were used with a cast rather than using directly min_t/max_t. Signed-off-by: NSeraphime Kirkovski <kirkseraph@gmail.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Geliang Tang 提交于
To make the code clearer, use rb_entry() instead of container_of() to deal with rbtree. Signed-off-by: NGeliang Tang <geliangtang@gmail.com> Reviewed-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Anand Jain 提交于
Signed-off-by: NAnand Jain <anand.jain@oracle.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Michal Hocko 提交于
try_release_extent_state reduces the gfp mask to GFP_NOFS if it is compatible. This is true for GFP_KERNEL as well. There is no real reason to do that though. There is no new lock taken down the the only consumer of the gfp mask which is try_release_extent_state clear_extent_bit __clear_extent_bit alloc_extent_state So this seems just unnecessary and confusing. Signed-off-by: NMichal Hocko <mhocko@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Michal Hocko 提交于
b335b003 ("Btrfs: Avoid using __GFP_HIGHMEM with slab allocator") has reduced the allocation mask in btrfs_releasepage to GFP_NOFS just to prevent from giving an unappropriate gfp mask to the slab allocator deeper down the callchain (in alloc_extent_state). This is wrong for two reasons a) GFP_NOFS might be just too restrictive for the calling context b) it is better to tweak the gfp mask down when it needs that. So just remove the mask tweaking from btrfs_releasepage and move it down to alloc_extent_state where it is needed. Signed-off-by: NMichal Hocko <mhocko@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
Goldwyn Rodrigues has exposed and fixed a bug which underflows btrfs qgroup reserved space, and leads to non-writable fs. This reminds us that we don't have enough underflow check for qgroup reserved space. For underflow case, we should not really underflow the numbers but warn and keeps qgroup still work. So add more check on qgroup reserved space and add WARN_ON() and btrfs_warn() for any underflow case. Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Reviewed-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 11 2月, 2017 1 次提交
-
-
由 Omar Sandoval 提交于
If btrfs_decompress_buf2page() is handed a bio with its page in the middle of the working buffer, then we adjust the offset into the working buffer. After we copy into the bio, we advance the iterator by the number of bytes we copied. Then, we have some logic to handle the case of discontiguous pages and adjust the offset into the working buffer again. However, if we didn't advance the bio to a new page, we may enter this case in error, essentially repeating the adjustment that we already made when we entered the function. The end result is bogus data in the bio. Previously, we only checked for this case when we advanced to a new page, but the conversion to bio iterators changed that. This restores the old, correct behavior. A case I saw when testing with zlib was: buf_start = 42769 total_out = 46865 working_bytes = total_out - buf_start = 4096 start_byte = 45056 The condition (total_out > start_byte && buf_start < start_byte) is true, so we adjust the offset: buf_offset = start_byte - buf_start = 2287 working_bytes -= buf_offset = 1809 current_buf_start = buf_start = 42769 Then, we copy bytes = min(bvec.bv_len, PAGE_SIZE - buf_offset, working_bytes) = 1809 buf_offset += bytes = 4096 working_bytes -= bytes = 0 current_buf_start += bytes = 44578 After bio_advance(), we are still in the same page, so start_byte is the same. Then, we check (total_out > start_byte && current_buf_start < start_byte), which is true! So, we adjust the values again: buf_offset = start_byte - buf_start = 2287 working_bytes = total_out - start_byte = 1809 current_buf_start = buf_start + buf_offset = 45056 But note that working_bytes was already zero before this, so we should have stopped copying. Fixes: 974b1adc ("btrfs: use bio iterators for the decompression handlers") Reported-by: NPat Erley <pat-lkml@erley.org> Reviewed-by: NChris Mason <clm@fb.com> Signed-off-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NChris Mason <clm@fb.com> Reviewed-by: NLiu Bo <bo.li.liu@oracle.com> Tested-by: NLiu Bo <bo.li.liu@oracle.com>
-
- 09 2月, 2017 1 次提交
-
-
由 Jeff Mahoney 提交于
Commit 4c63c245 incorrectly assumed that returning -ENOIOCTLCMD would cause the native ioctl to be called. The ->compat_ioctl callback is expected to handle all ioctls, not just compat variants. As a result, when using 32-bit userspace on 64-bit kernels, everything except those three ioctls would return -ENOTTY. Fixes: 4c63c245 ("btrfs: bugfix: handle FS_IOC32_{GETFLAGS,SETFLAGS,GETVERSION} in btrfs_ioctl") Cc: stable@vger.kernel.org Signed-off-by: NJeff Mahoney <jeffm@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 27 1月, 2017 3 次提交
-
-
由 Omar Sandoval 提交于
Subvolume directory inodes can't have ACLs. Cc: <stable@vger.kernel.org> # 4.9.x Signed-off-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Omar Sandoval 提交于
When you snapshot a subvolume containing a subvolume, you get a placeholder directory where the subvolume would be. These directory inodes have ->i_ops set to btrfs_dir_ro_inode_operations. Previously, these i_ops didn't include the xattr operation callbacks. The conversion to xattr_handlers missed this case, leading to bogus attempts to set xattrs on these inodes. This manifested itself as failures when running delayed inodes. To fix this, clear IOP_XATTR in ->i_opflags on these inodes. Fixes: 6c6ef9f2 ("xattr: Stop calling {get,set,remove}xattr inode operations") Cc: Andreas Gruenbacher <agruenba@redhat.com> Reported-by: NChris Murphy <lists@colorremedies.com> Tested-by: NChris Murphy <lists@colorremedies.com> Cc: <stable@vger.kernel.org> # 4.9.x Signed-off-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Omar Sandoval 提交于
As Jeff explained in c2951f32 ("btrfs: remove old tree_root dirent processing in btrfs_real_readdir()"), supporting this old format is no longer necessary since the Btrfs magic number has been updated since we changed to the current format. There are other places where we still handle this old format, but since this is part of a fix that is going to stable, I'm only removing this one for now. Cc: <stable@vger.kernel.org> # 4.9.x Signed-off-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 20 1月, 2017 3 次提交
-
-
由 Liu Bo 提交于
For such a file mapping, [0-4k][hole][8k-12k] In NO_HOLES mode, we don't have the [hole] extent any more. Commit c1aa4575 ("Btrfs: fix shrinking truncate when the no_holes feature is enabled") fixed disk isize not being updated in NO_HOLES mode when data is not flushed. However, even if data has been flushed, we can still have trouble in updating disk isize since we updated disk isize to 'start' of the last evicted extent. Reviewed-by: NChris Mason <clm@fb.com> Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Chandan Rajendra 提交于
The following deadlock is seen when executing generic/113 test, ---------------------------------------------------------+---------------------------------------------------- Direct I/O task Fast fsync task ---------------------------------------------------------+---------------------------------------------------- btrfs_direct_IO __blockdev_direct_IO do_blockdev_direct_IO do_direct_IO btrfs_get_blocks_direct while (blocks needs to written) get_more_blocks (first iteration) btrfs_get_blocks_direct btrfs_create_dio_extent down_read(&BTRFS_I(inode) >dio_sem) Create and add extent map and ordered extent up_read(&BTRFS_I(inode) >dio_sem) btrfs_sync_file btrfs_log_dentry_safe btrfs_log_inode_parent btrfs_log_inode btrfs_log_changed_extents down_write(&BTRFS_I(inode) >dio_sem) Collect new extent maps and ordered extents wait for ordered extent completion get_more_blocks (second iteration) btrfs_get_blocks_direct btrfs_create_dio_extent down_read(&BTRFS_I(inode) >dio_sem) -------------------------------------------------------------------------------------------------------------- In the above description, Btrfs direct I/O code path has not yet started submitting bios for file range covered by the initial ordered extent. Meanwhile, The fast fsync task obtains the write semaphore and waits for I/O on the ordered extent to get completed. However, the Direct I/O task is now blocked on obtaining the read semaphore. To resolve the deadlock, this commit modifies the Direct I/O code path to obtain the read semaphore before invoking __blockdev_direct_IO(). The semaphore is then given up after __blockdev_direct_IO() returns. This allows the Direct I/O code to complete I/O on all the ordered extents it creates. Signed-off-by: NChandan Rajendra <chandan@linux.vnet.ibm.com> Reviewed-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Wang Xiaoguang 提交于
Below test script can reveal this bug: dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=100 dev=$(losetup --show -f fs.img) mkdir -p /mnt/mntpoint mkfs.btrfs -f $dev mount $dev /mnt/mntpoint cd /mnt/mntpoint echo "workdir is: /mnt/mntpoint" blocksize=$((128 * 1024)) dd if=/dev/zero of=testfile bs=$blocksize count=1 sync count=$((17*1024*1024*1024/blocksize)) echo "file size is:" $((count*blocksize)) for ((i = 1; i <= $count; i++)); do dst_offset=$((blocksize * i)) xfs_io -f -c "reflink testfile 0 $dst_offset $blocksize"\ testfile > /dev/null done sync truncate --size 0 testfile The last truncate operation will fail for ENOSPC reason, but indeed it should not fail. In btrfs_truncate(), we use a temporary block_rsv to do truncate operation. With every btrfs_truncate_inode_items() call, we migrate space to this block_rsv, but forget to cleanup previous reservation, which will make this block_rsv's reserved bytes keep growing, and this reserved space will only be released in the end of btrfs_truncate(), this metadata leak will impact other's metadata reservation. In this case, it's "btrfs_start_transaction(root, 2);" fails for enospc error, which make this truncate operation fail. Call btrfs_block_rsv_release() to fix this bug. Signed-off-by: NWang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 09 1月, 2017 2 次提交
-
-
由 Liu Bo 提交于
'inode' is an important field for btrfs_get_extent, lets trace it. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
Enabling btrfs tracepoints leads to instant crash, as reported. The wq callbacks could free the memory and the tracepoints started to dereference the members to get to fs_info. The proposed fix https://marc.info/?l=linux-btrfs&m=148172436722606&w=2 removed the tracepoints but we could preserve them by passing only the required data in a safe way. Fixes: bc074524 ("btrfs: prefix fsid to all trace events") CC: stable@vger.kernel.org # 4.8+ Reported-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 04 1月, 2017 1 次提交
-
-
由 Liu Bo 提交于
Currently how btrfs dio deals with split dio write is not good enough if dio write is split into several segments due to the lack of contiguous space, a large dio write like 'dd bs=1G count=1' can end up with incorrect outstanding_extents counter and endio would complain loudly with an assertion. This fixes the problem by compensating the outstanding_extents counter in inode if a large dio write gets split. Reported-by: NAnand Jain <anand.jain@oracle.com> Tested-by: NAnand Jain <anand.jain@oracle.com> Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 03 1月, 2017 4 次提交
-
-
由 Liu Bo 提交于
While checking INODE_REF/INODE_EXTREF for a corner case, we may acquire a different inode's log_mutex with holding the current inode's log_mutex, and lockdep has complained this with a possilble deadlock warning. Fix this by using mutex_lock_nested() when processing the other inode's log_mutex. Reviewed-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Liu Bo 提交于
If @block_group is not @used_bg, it'll try to get @used_bg's lock without droping @block_group 's lock and lockdep has throwed a scary deadlock warning about it. Fix it by using down_read_nested. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Jeff Mahoney 提交于
In __btrfs_run_delayed_refs, when we put back a delayed ref that's too new, we have already dropped the lock on locked_ref when we set ->processing = 0. This patch keeps the lock to cover that assignment. Fixes: d7df2c79 (Btrfs: attach delayed ref updates to delayed ref heads) Signed-off-by: NJeff Mahoney <jeffm@suse.com> Reviewed-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Jeff Mahoney 提交于
In __btrfs_run_delayed_refs, the error path when run_delayed_extent_op fails sets locked_ref->processing = 0 but doesn't re-increment delayed_refs->num_heads_ready. As a result, we end up triggering the WARN_ON in btrfs_select_ref_head. Fixes: d7df2c79 (Btrfs: attach delayed ref updates to delayed ref heads) Reported-by: NJon Nelson <jnelson-suse@jamponi.net> Signed-off-by: NJeff Mahoney <jeffm@suse.com> Reviewed-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 20 12月, 2016 1 次提交
-
-
由 Pan Bian 提交于
In function btrfs_uuid_tree_iterate(), errno is assigned to variable ret on errors. However, it directly returns 0. It may be better to return ret. This patch also removes the warning, because the caller already prints a warning. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188731Signed-off-by: NPan Bian <bianpan2016@163.com> Reviewed-by: NOmar Sandoval <osandov@fb.com> [ edited subject ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 15 12月, 2016 3 次提交
-
-
由 Matthew Wilcox 提交于
This fixes several interlinked problems with the iterators in the presence of multiorder entries. 1. radix_tree_iter_next() would only advance by one slot, which would result in the iterators returning the same entry more than once if there were sibling entries. 2. radix_tree_next_slot() could return an internal pointer instead of a user pointer if a tagged multiorder entry was immediately followed by an entry of lower order. 3. radix_tree_next_slot() expanded to a lot more code than it used to when multiorder support was compiled in. And I wasn't comfortable with entry_to_node() being in a header file. Fixing radix_tree_iter_next() for the presence of sibling entries necessarily involves examining the contents of the radix tree, so we now need to pass 'slot' to radix_tree_iter_next(), and we need to change the calling convention so it is called *before* dropping the lock which protects the tree. Also rename it to radix_tree_iter_resume(), as some people thought it was necessary to call radix_tree_iter_next() each time around the loop. radix_tree_next_slot() becomes closer to how it looked before multiorder support was introduced. It only checks to see if the next entry in the chunk is a sibling entry or a pointer to a node; this should be rare enough that handling this case out of line is not a performance impact (and such impact is amortised by the fact that the entry we just processed was a multiorder entry). Also, radix_tree_next_slot() used to force a new chunk lookup for untagged entries, which is more expensive than the out of line sibling entry skipping. Link: http://lkml.kernel.org/r/1480369871-5271-55-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com> Tested-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox 提交于
We drop the lock which protects the radix tree, so we must call radix_tree_iter_next() in order to avoid a modification to the tree invalidating the iterator state. Link: http://lkml.kernel.org/r/1480369871-5271-54-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com> Tested-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Petr Mladek 提交于
Commit 262c5e86 ("printk/btrfs: handle more message headers") triggers: warning: `ratelimit' may be used uninitialized in this function with gcc (4.1.2) and probably many other versions. The code actually is correct but a bit twisted. Let's make it more straightforward and set the default values at the beginning. Link: http://lkml.kernel.org/r/20161213135246.GQ3506@pathway.suse.czSigned-off-by: NPetr Mladek <pmladek@suse.com> Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: NGeert Uytterhoeven <geert@linux-m68k.org> Acked-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 12月, 2016 1 次提交
-
-
由 Maxim Patlasov 提交于
Problem statement: unprivileged user who has read-write access to more than one btrfs subvolume may easily consume all kernel memory (eventually triggering oom-killer). Reproducer (./mkrmdir below essentially loops over mkdir/rmdir): [root@kteam1 ~]# cat prep.sh DEV=/dev/sdb mkfs.btrfs -f $DEV mount $DEV /mnt for i in `seq 1 16` do mkdir /mnt/$i btrfs subvolume create /mnt/SV_$i ID=`btrfs subvolume list /mnt |grep "SV_$i$" |cut -d ' ' -f 2` mount -t btrfs -o subvolid=$ID $DEV /mnt/$i chmod a+rwx /mnt/$i done [root@kteam1 ~]# sh prep.sh [maxim@kteam1 ~]$ for i in `seq 1 16`; do ./mkrmdir /mnt/$i 2000 2000 & done [root@kteam1 ~]# for i in `seq 1 4`; do grep "kmalloc-128" /proc/slabinfo | grep -v dma; sleep 60; done kmalloc-128 10144 10144 128 32 1 : tunables 0 0 0 : slabdata 317 317 0 kmalloc-128 9992352 9992352 128 32 1 : tunables 0 0 0 : slabdata 312261 312261 0 kmalloc-128 24226752 24226752 128 32 1 : tunables 0 0 0 : slabdata 757086 757086 0 kmalloc-128 42754240 42754240 128 32 1 : tunables 0 0 0 : slabdata 1336070 1336070 0 The huge numbers above come from insane number of async_work-s allocated and queued by btrfs_wq_run_delayed_node. The problem is caused by btrfs_wq_run_delayed_node() queuing more and more works if the number of delayed items is above BTRFS_DELAYED_BACKGROUND. The worker func (btrfs_async_run_delayed_root) processes at least BTRFS_DELAYED_BATCH items (if they are present in the list). So, the machinery works as expected while the list is almost empty. As soon as it is getting bigger, worker func starts to process more than one item at a time, it takes longer, and the chances to have async_works queued more than needed is getting higher. The problem above is worsened by another flaw of delayed-inode implementation: if async_work was queued in a throttling branch (number of items >= BTRFS_DELAYED_WRITEBACK), corresponding worker func won't quit until the number of items < BTRFS_DELAYED_BACKGROUND / 2. So, it is possible that the func occupies CPU infinitely (up to 30sec in my experiments): while the func is trying to drain the list, the user activity may add more and more items to the list. The patch fixes both problems in straightforward way: refuse queuing too many works in btrfs_wq_run_delayed_node and bail out of worker func if at least BTRFS_DELAYED_WRITEBACK items are processed. Changed in v2: remove support of thresh == NO_THRESHOLD. Signed-off-by: NMaxim Patlasov <mpatlasov@virtuozzo.com> Signed-off-by: NChris Mason <clm@fb.com> Cc: stable@vger.kernel.org # v3.15+
-
- 13 12月, 2016 1 次提交
-
-
由 Petr Mladek 提交于
Commit 4bcc595c ("printk: reinstate KERN_CONT for printing continuation lines") allows to define more message headers for a single message. The motivation is that continuous lines might get mixed. Therefore it make sense to define the right log level for every piece of a cont line. The current btrfs_printk() macros do not support continuous lines at the moment. But better be prepared for a custom messages and avoid potential "lvl" buffer overflow. This patch iterates over the entire message header. It is interested only into the message level like the original code. This patch also introduces PRINTK_MAX_SINGLE_HEADER_LEN. Three bytes are enough for the message level header at the moment. But it used to be three, see the commit 04d2c8c8 ("printk: convert the format for KERN_<LEVEL> to a 2 byte pattern"). Also I fixed the default ratelimit level. It looked very strange when it was different from the default log level. [pmladek@suse.com: Fix a check of the valid message level] Link: http://lkml.kernel.org/r/20161111183236.GD2145@dhcp128.suse.cz Link: http://lkml.kernel.org/r/1478695291-12169-4-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com> Acked-by: NDavid Sterba <dsterba@suse.com> Cc: Joe Perches <joe@perches.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-