- 18 1月, 2020 3 次提交
-
-
由 Eric Biggers 提交于
Since allocating an object from a mempool never fails when __GFP_DIRECT_RECLAIM (which is included in GFP_NOFS) is set, the check for failure to allocate a bio_post_read_ctx is unnecessary. Remove it. Signed-off-by: NEric Biggers <ebiggers@google.com> Reviewed-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Chao Yu 提交于
As Youling reported in mailing list: https://www.linuxquestions.org/questions/linux-newbie-8/the-file-system-f2fs-is-broken-4175666043/ https://www.linux.org/threads/the-file-system-f2fs-is-broken.26490/ There is a test case can corrupt f2fs image: - dd if=/dev/zero of=/swapfile bs=1M count=4096 - chmod 600 /swapfile - mkswap /swapfile - swapon --discard /swapfile The root cause is f2fs_swap_activate() intends to return zero value to setup_swap_extents() to enable SWP_FS mode (swap file goes through fs), in this flow, setup_swap_extents() setups swap extent with wrong block address range, result in discard_swap() erasing incorrect address. Because f2fs_swap_activate() has pinned swapfile, its data block address will not change, it's safe to let swap to handle IO through raw device, so we can get rid of SWAP_FS mode and initial swap extents inside f2fs_swap_activate(), by this way, later discard_swap() can trim in right address range. Fixes: 4969c06a ("f2fs: support swap file w/ DIO") Signed-off-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Chao Yu 提交于
This patch tries to support compression in f2fs. - New term named cluster is defined as basic unit of compression, file can be divided into multiple clusters logically. One cluster includes 4 << n (n >= 0) logical pages, compression size is also cluster size, each of cluster can be compressed or not. - In cluster metadata layout, one special flag is used to indicate cluster is compressed one or normal one, for compressed cluster, following metadata maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores data including compress header and compressed data. - In order to eliminate write amplification during overwrite, F2FS only support compression on write-once file, data can be compressed only when all logical blocks in file are valid and cluster compress ratio is lower than specified threshold. - To enable compression on regular inode, there are three ways: * chattr +c file * chattr +c dir; touch dir/file * mount w/ -o compress_extension=ext; touch file.ext Compress metadata layout: [Dnode Structure] +-----------------------------------------------+ | cluster 1 | cluster 2 | ......... | cluster N | +-----------------------------------------------+ . . . . . . . . . Compressed Cluster . . Normal Cluster . +----------+---------+---------+---------+ +---------+---------+---------+---------+ |compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 | +----------+---------+---------+---------+ +---------+---------+---------+---------+ . . . . . . +-------------+-------------+----------+----------------------------+ | data length | data chksum | reserved | compressed data | +-------------+-------------+----------+----------------------------+ Changelog: 20190326: - fix error handling of read_end_io(). - remove unneeded comments in f2fs_encrypt_one_page(). 20190327: - fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages(). - don't jump into loop directly to avoid uninitialized variables. - add TODO tag in error path of f2fs_write_cache_pages(). 20190328: - fix wrong merge condition in f2fs_read_multi_pages(). - check compressed file in f2fs_post_read_required(). 20190401 - allow overwrite on non-compressed cluster. - check cluster meta before writing compressed data. 20190402 - don't preallocate blocks for compressed file. - add lz4 compress algorithm - process multiple post read works in one workqueue Now f2fs supports processing post read work in multiple workqueue, it shows low performance due to schedule overhead of multiple workqueue executing orderly. 20190921 - compress: support buffered overwrite C: compress cluster flag V: valid block address N: NEW_ADDR One cluster contain 4 blocks before overwrite after overwrite - VVVV -> CVNN - CVNN -> VVVV - CVNN -> CVNN - CVNN -> CVVV - CVVV -> CVNN - CVVV -> CVVV 20191029 - add kconfig F2FS_FS_COMPRESSION to isolate compression related codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm. note that: will remove lzo backend if Jaegeuk agreed that too. - update codes according to Eric's comments. 20191101 - apply fixes from Jaegeuk 20191113 - apply fixes from Jaegeuk - split workqueue for fsverity 20191216 - apply fixes from Jaegeuk 20200117 - fix to avoid NULL pointer dereference [Jaegeuk Kim] - add tracepoint for f2fs_{,de}compress_pages() - fix many bugs and add some compression stats - fix overwrite/mmap bugs - address 32bit build error, reported by Geert. - bug fixes when handling errors and i_compressed_blocks Reported-by: <noreply@ellerman.id.au> Signed-off-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
- 16 1月, 2020 1 次提交
-
-
由 Chao Yu 提交于
In low memory scenario, we can allocate multiple bios without submitting any of them. - f2fs_write_checkpoint() - block_operations() - f2fs_sync_node_pages() step 1) flush cold nodes, allocate new bio from mempool - bio_alloc() - mempool_alloc() step 2) flush hot nodes, allocate a bio from mempool - bio_alloc() - mempool_alloc() step 3) flush warm nodes, be stuck in below call path - bio_alloc() - mempool_alloc() - loop to wait mempool element release, as we only reserved memory for two bio allocation, however above allocated two bios may never be submitted. So we need avoid using default bioset, in this patch we introduce a private bioset, in where we enlarg mempool element count to total number of log header, so that we can make sure we have enough backuped memory pool in scenario of allocating/holding multiple bios. Signed-off-by: NGao Xiang <gaoxiang25@huawei.com> Signed-off-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
- 13 12月, 2019 1 次提交
-
-
由 Jaegeuk Kim 提交于
This patch avoids some unnecessary locks for quota files when write_begin fails. Reviewed-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
- 10 12月, 2019 1 次提交
-
-
由 Jaegeuk Kim 提交于
The previous preallocation and DIO decision like below. allow_outplace_dio !allow_outplace_dio f2fs_force_buffered_io (*) No_Prealloc / Buffered_IO Prealloc / Buffered_IO !f2fs_force_buffered_io No_Prealloc / DIO Prealloc / DIO But, Javier reported Case (*) where zoned device bypassed preallocation but fell back to buffered writes in f2fs_direct_IO(), resulting in stale data being read. In order to fix the issue, actually we need to preallocate blocks whenever we fall back to buffered IO like this. No change is made in the other cases. allow_outplace_dio !allow_outplace_dio f2fs_force_buffered_io (*) Prealloc / Buffered_IO Prealloc / Buffered_IO !f2fs_force_buffered_io No_Prealloc / DIO Prealloc / DIO Reported-and-tested-by: NJavier Gonzalez <javier@javigon.com> Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Tested-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: NChao Yu <yuchao0@huawei.com> Reviewed-by: NJavier González <javier@javigon.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
- 20 11月, 2019 1 次提交
-
-
由 Chao Yu 提交于
As Eric mentioned, bare printk{,_ratelimited} won't show which filesystem instance these message is coming from, this patch tries to show fs instance with sb->s_id field in all places we missed before. Signed-off-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
- 08 11月, 2019 1 次提交
-
-
由 Chao Yu 提交于
We expect 64-bit calculation result from below statement, however in 32-bit machine, looped left shift operation on pgoff_t type variable may cause overflow issue, fix it by forcing type cast. page->index << PAGE_SHIFT; Fixes: 26de9b11 ("f2fs: avoid unnecessary updating inode during fsync") Fixes: 0a2aa8fb ("f2fs: refactor __exchange_data_block for speed up") Signed-off-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
- 26 10月, 2019 1 次提交
-
-
由 Chao Yu 提交于
In commit 8648de2c ("f2fs: add bio cache for IPU"), we added f2fs_submit_ipu_bio() in __write_data_page() as below: __write_data_page() if (!S_ISDIR(inode->i_mode) && !IS_NOQUOTA(inode)) { f2fs_submit_ipu_bio(sbi, bio, page); .... } in order to avoid below deadlock: Thread A Thread B - __write_data_page (inode x, page y) - f2fs_do_write_data_page - set_page_writeback ---- set writeback flag in page y - f2fs_inplace_write_data - f2fs_balance_fs - lock gc_mutex - lock gc_mutex - f2fs_gc - do_garbage_collect - gc_data_segment - move_data_page - f2fs_wait_on_page_writeback - wait_on_page_writeback --- wait writeback of page y However, the bio submission breaks the merge of IPU IOs. So in this patch let's add a global bio cache for merged IPU pages, then f2fs_wait_on_page_writeback() is able to submit bio if a writebacked page is cached in global bio cache. Signed-off-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
- 16 9月, 2019 3 次提交
-
-
由 Chao Yu 提交于
In f2fs_allocate_data_block(), we will reset fio.retry for IO alignment feature instead of IO serialization feature. In addition, spread F2FS_IO_ALIGNED() to check IO alignment feature status explicitly. Signed-off-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Chao Yu 提交于
In f2fs_map_blocks(), we should bail out once __allocate_data_block() failed. Fixes: f847c699 ("f2fs: allow out-place-update for direct IO in LFS mode") Signed-off-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Chao Yu 提交于
In LFS mode, por_fsstress testcase reports a bug as below: [ASSERT] (fsck_chk_inode_blk: 931) --> ino: 0x12fe has wrong ext: [pgofs:142, blk:215424, len:16] Since commit f847c699 ("f2fs: allow out-place-update for direct IO in LFS mode"), we start to allow OPU mode for direct IO, however, we missed to update extent cache in __allocate_data_block(), finally, it cause extent field being inconsistent with physical block address, fix it. Fixes: f847c699 ("f2fs: allow out-place-update for direct IO in LFS mode") Signed-off-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
- 07 9月, 2019 2 次提交
-
-
由 Chao Yu 提交于
This patch changes sematics of f2fs_is_checkpoint_ready()'s return value as: return true when checkpoint is ready, other return false, it can improve readability of below conditions. f2fs_submit_page_write() ... if (is_sbi_flag_set(sbi, SBI_IS_SHUTDOWN) || !f2fs_is_checkpoint_ready(sbi)) __submit_merged_bio(io); f2fs_balance_fs() ... if (!f2fs_is_checkpoint_ready(sbi)) return; Signed-off-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Chao Yu 提交于
Just cleanup, no logic change. Signed-off-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
- 23 8月, 2019 3 次提交
-
-
由 Chao Yu 提交于
Adjust f2fs_fiemap() to support fiemap() on directory inode. Signed-off-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Chao Yu 提交于
Since 07173c3e ("block: enable multipage bvecs"), one bio vector can store multi pages, so that we can not calculate max IO size of bio as PAGE_SIZE * bio->bi_max_vecs. However IO alignment feature of f2fs always has that assumption, so finally, it may cause panic during IO submission as below stack. kernel BUG at fs/f2fs/data.c:317! RIP: 0010:__submit_merged_bio+0x8b0/0x8c0 Call Trace: f2fs_submit_page_write+0x3cd/0xdd0 do_write_page+0x15d/0x360 f2fs_outplace_write_data+0xd7/0x210 f2fs_do_write_data_page+0x43b/0xf30 __write_data_page+0xcf6/0x1140 f2fs_write_cache_pages+0x3ba/0xb40 f2fs_write_data_pages+0x3dd/0x8b0 do_writepages+0xbb/0x1e0 __writeback_single_inode+0xb6/0x800 writeback_sb_inodes+0x441/0x910 wb_writeback+0x261/0x650 wb_workfn+0x1f9/0x7a0 process_one_work+0x503/0x970 worker_thread+0x7d/0x820 kthread+0x1ad/0x210 ret_from_fork+0x35/0x40 This patch adds one extra condition to check left space in bio while trying merging page to bio, to avoid panic. This bug was reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204043Signed-off-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Chao Yu 提交于
Wrap merge condition into function for readability, no logic change. Signed-off-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
- 17 8月, 2019 1 次提交
-
-
由 Jaegeuk Kim 提交于
This patch fixes livelock in the below call path when writing swap pages. [46374.617256] c2 701 __switch_to+0xe4/0x100 [46374.617265] c2 701 __schedule+0x80c/0xbc4 [46374.617273] c2 701 schedule+0x74/0x98 [46374.617281] c2 701 rwsem_down_read_failed+0x190/0x234 [46374.617291] c2 701 down_read+0x58/0x5c [46374.617300] c2 701 f2fs_map_blocks+0x138/0x9a8 [46374.617310] c2 701 get_data_block_dio_write+0x74/0x104 [46374.617320] c2 701 __blockdev_direct_IO+0x1350/0x3930 [46374.617331] c2 701 f2fs_direct_IO+0x55c/0x8bc [46374.617341] c2 701 __swap_writepage+0x1d0/0x3e8 [46374.617351] c2 701 swap_writepage+0x44/0x54 [46374.617360] c2 701 shrink_page_list+0x140/0xe80 [46374.617371] c2 701 shrink_inactive_list+0x510/0x918 [46374.617381] c2 701 shrink_node_memcg+0x2d4/0x804 [46374.617391] c2 701 shrink_node+0x10c/0x2f8 [46374.617400] c2 701 do_try_to_free_pages+0x178/0x38c [46374.617410] c2 701 try_to_free_pages+0x348/0x4b8 [46374.617419] c2 701 __alloc_pages_nodemask+0x7f8/0x1014 [46374.617429] c2 701 pagecache_get_page+0x184/0x2cc [46374.617438] c2 701 f2fs_new_node_page+0x60/0x41c [46374.617449] c2 701 f2fs_new_inode_page+0x50/0x7c [46374.617460] c2 701 f2fs_init_inode_metadata+0x128/0x530 [46374.617472] c2 701 f2fs_add_inline_entry+0x138/0xd64 [46374.617480] c2 701 f2fs_do_add_link+0xf4/0x178 [46374.617488] c2 701 f2fs_create+0x1e4/0x3ac [46374.617497] c2 701 path_openat+0xdc0/0x1308 [46374.617507] c2 701 do_filp_open+0x78/0x124 [46374.617516] c2 701 do_sys_open+0x134/0x248 [46374.617525] c2 701 SyS_openat+0x14/0x20 Reviewed-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
- 13 8月, 2019 1 次提交
-
-
由 Eric Biggers 提交于
Add fs-verity support to f2fs. fs-verity is a filesystem feature that enables transparent integrity protection and authentication of read-only files. It uses a dm-verity like mechanism at the file level: a Merkle tree is used to verify any block in the file in log(filesize) time. It is implemented mainly by helper functions in fs/verity/. See Documentation/filesystems/fsverity.rst for the full documentation. The f2fs support for fs-verity consists of: - Adding a filesystem feature flag and an inode flag for fs-verity. - Implementing the fsverity_operations to support enabling verity on an inode and reading/writing the verity metadata. - Updating ->readpages() to verify data as it's read from verity files and to support reading verity metadata pages. - Updating ->write_begin(), ->write_end(), and ->writepages() to support writing verity metadata pages. - Calling the fs-verity hooks for ->open(), ->setattr(), and ->ioctl(). Like ext4, f2fs stores the verity metadata (Merkle tree and fsverity_descriptor) past the end of the file, starting at the first 64K boundary beyond i_size. This approach works because (a) verity files are readonly, and (b) pages fully beyond i_size aren't visible to userspace but can be read/written internally by f2fs with only some relatively small changes to f2fs. Extended attributes cannot be used because (a) f2fs limits the total size of an inode's xattr entries to 4096 bytes, which wouldn't be enough for even a single Merkle tree block, and (b) f2fs encryption doesn't encrypt xattrs, yet the verity metadata *must* be encrypted when the file is because it contains hashes of the plaintext data. Acked-by: NJaegeuk Kim <jaegeuk@kernel.org> Acked-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NEric Biggers <ebiggers@google.com>
-
- 19 7月, 2019 1 次提交
-
-
由 Keith Busch 提交于
migrate_page_move_mapping() doesn't use the mode argument. Remove it and update callers accordingly. Link: http://lkml.kernel.org/r/20190508210301.8472-1-keith.busch@intel.comSigned-off-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NZi Yan <ziy@nvidia.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 7月, 2019 1 次提交
-
-
由 Tejun Heo 提交于
wbc_account_io() does a very specific job - try to see which cgroup is actually dirtying an inode and transfer its ownership to the majority dirtier if needed. The name is too generic and confusing. Let's rename it to something more specific. Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 03 7月, 2019 2 次提交
-
-
由 Jaegeuk Kim 提交于
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Chao Yu 提交于
f2fs uses EFAULT as error number to indicate filesystem is corrupted all the time, but generic filesystems use EUCLEAN for such condition, we need to change to follow others. This patch adds two new macros as below to wrap more generic error code macros, and spread them in code. EFSBADCRC EBADMSG /* Bad CRC detected */ EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */ Reported-by: NPavel Machek <pavel@ucw.cz> Signed-off-by: NChao Yu <yuchao0@huawei.com> Acked-by: NPavel Machek <pavel@ucw.cz> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
- 29 5月, 2019 2 次提交
-
-
由 Eric Biggers 提交于
Rename fscrypt_encrypt_page() to fscrypt_encrypt_pagecache_blocks() and redefine its behavior to encrypt all filesystem blocks from the given region of the given page, rather than assuming that the region consists of just one filesystem block. Also remove the 'inode' and 'lblk_num' parameters, since they can be retrieved from the page as it's already assumed to be a pagecache page. This is in preparation for allowing encryption on ext4 filesystems with blocksize != PAGE_SIZE. This is based on work by Chandan Rajendra. Reviewed-by: NChandan Rajendra <chandan@linux.ibm.com> Signed-off-by: NEric Biggers <ebiggers@google.com>
-
由 Eric Biggers 提交于
Currently, bounce page handling for writes to encrypted files is unnecessarily complicated. A fscrypt_ctx is allocated along with each bounce page, page_private(bounce_page) points to this fscrypt_ctx, and fscrypt_ctx::w::control_page points to the original pagecache page. However, because writes don't use the fscrypt_ctx for anything else, there's no reason why page_private(bounce_page) can't just point to the original pagecache page directly. Therefore, this patch makes this change. In the process, it also cleans up the API exposed to filesystems that allows testing whether a page is a bounce page, getting the pagecache page from a bounce page, and freeing a bounce page. Reviewed-by: NChandan Rajendra <chandan@linux.ibm.com> Signed-off-by: NEric Biggers <ebiggers@google.com>
-
- 23 5月, 2019 2 次提交
-
-
由 Chao Yu 提交于
As Hagbard Celine reported: [ 615.697824] INFO: task kworker/u16:5:344 blocked for more than 120 seconds. [ 615.697825] Not tainted 5.0.15-gentoo-f2fslog #4 [ 615.697826] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 615.697827] kworker/u16:5 D 0 344 2 0x80000000 [ 615.697831] Workqueue: writeback wb_workfn (flush-259:0) [ 615.697832] Call Trace: [ 615.697836] ? __schedule+0x2c5/0x8b0 [ 615.697839] schedule+0x32/0x80 [ 615.697841] schedule_preempt_disabled+0x14/0x20 [ 615.697842] __mutex_lock.isra.8+0x2ba/0x4d0 [ 615.697845] ? log_store+0xf5/0x260 [ 615.697848] f2fs_write_data_pages+0x133/0x320 [ 615.697851] ? trace_hardirqs_on+0x2c/0xe0 [ 615.697854] do_writepages+0x41/0xd0 [ 615.697857] __filemap_fdatawrite_range+0x81/0xb0 [ 615.697859] f2fs_sync_dirty_inodes+0x1dd/0x200 [ 615.697861] f2fs_balance_fs_bg+0x2a7/0x2c0 [ 615.697863] ? up_read+0x5/0x20 [ 615.697865] ? f2fs_do_write_data_page+0x2cb/0x940 [ 615.697867] f2fs_balance_fs+0xe5/0x2c0 [ 615.697869] __write_data_page+0x1c8/0x6e0 [ 615.697873] f2fs_write_cache_pages+0x1e0/0x450 [ 615.697878] f2fs_write_data_pages+0x14b/0x320 [ 615.697880] ? trace_hardirqs_on+0x2c/0xe0 [ 615.697883] do_writepages+0x41/0xd0 [ 615.697885] __filemap_fdatawrite_range+0x81/0xb0 [ 615.697887] f2fs_sync_dirty_inodes+0x1dd/0x200 [ 615.697889] f2fs_balance_fs_bg+0x2a7/0x2c0 [ 615.697891] f2fs_write_node_pages+0x51/0x220 [ 615.697894] do_writepages+0x41/0xd0 [ 615.697897] __writeback_single_inode+0x3d/0x3d0 [ 615.697899] writeback_sb_inodes+0x1e8/0x410 [ 615.697902] __writeback_inodes_wb+0x5d/0xb0 [ 615.697904] wb_writeback+0x28f/0x340 [ 615.697906] ? cpumask_next+0x16/0x20 [ 615.697908] wb_workfn+0x33e/0x420 [ 615.697911] process_one_work+0x1a1/0x3d0 [ 615.697913] worker_thread+0x30/0x380 [ 615.697915] ? process_one_work+0x3d0/0x3d0 [ 615.697916] kthread+0x116/0x130 [ 615.697918] ? kthread_create_worker_on_cpu+0x70/0x70 [ 615.697921] ret_from_fork+0x3a/0x50 There is still deadloop in below condition: d A - do_writepages - f2fs_write_node_pages - f2fs_balance_fs_bg - f2fs_sync_dirty_inodes - f2fs_write_cache_pages - mutex_lock(&sbi->writepages) -- lock once - __write_data_page - f2fs_balance_fs_bg - f2fs_sync_dirty_inodes - f2fs_write_data_pages - mutex_lock(&sbi->writepages) -- lock again Thread A Thread B - do_writepages - f2fs_write_node_pages - f2fs_balance_fs_bg - f2fs_sync_dirty_inodes - .cp_task = current - f2fs_sync_dirty_inodes - .cp_task = current - filemap_fdatawrite - .cp_task = NULL - filemap_fdatawrite - f2fs_write_cache_pages - enter f2fs_balance_fs_bg since .cp_task is NULL - .cp_task = NULL Change as below to avoid this: - add condition to avoid holding .writepages mutex lock in path of data flush - introduce mutex lock sbi.flush_lock to exclude concurrent data flush in background. Signed-off-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Chao Yu 提交于
SQLite in Wal mode may trigger sequential IPU write in db-wal file, after commit d1b3e72d ("f2fs: submit bio of in-place-update pages"), we lost the chance of merging page in inner managed bio cache, result in submitting more small-sized IO. So let's add temporary bio in writepages() to cache mergeable write IO as much as possible. Test case: 1. xfs_io -f /mnt/f2fs/file -c "pwrite 0 65536" -c "fsync" 2. xfs_io -f /mnt/f2fs/file -c "pwrite 0 65536" -c "fsync" Before: f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65544, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65552, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65560, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65568, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65576, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65584, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65592, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65600, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65608, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65616, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65624, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65632, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65640, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65648, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65656, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65664, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), NODE, sector = 57352, size = 4096 After: f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65544, size = 65536 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), NODE, sector = 57368, size = 4096 Signed-off-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
- 09 5月, 2019 4 次提交
-
-
由 Chao Yu 提交于
Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check whether @blkaddr locates in main area or not. That check is weak, since the block address in range of main area can point to the address which is not valid in segment info table, and we can not detect such condition, we may suffer worse corruption as system continues running. So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check which trigger SIT bitmap check rather than only range check. This patch did below changes as wel: - set SBI_NEED_FSCK in f2fs_is_valid_blkaddr(). - get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid. - introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check. - spread blkaddr check in: * f2fs_get_node_info() * __read_out_blkaddrs() * f2fs_submit_page_read() * ra_data_block() * do_recover_data() This patch can fix bug reported from bugzilla below: https://bugzilla.kernel.org/show_bug.cgi?id=203215 https://bugzilla.kernel.org/show_bug.cgi?id=203223 https://bugzilla.kernel.org/show_bug.cgi?id=203231 https://bugzilla.kernel.org/show_bug.cgi?id=203235 https://bugzilla.kernel.org/show_bug.cgi?id=203241 = Update by Jaegeuk Kim = DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths. But, xfstest/generic/446 compalins some generated kernel messages saying invalid bitmap was detected when reading a block. The reaons is, when we get the block addresses from extent_cache, there is no lock to synchronize it from truncating the blocks in parallel. Signed-off-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Chao Yu 提交于
This patch introduces f2fs_read_single_page() to wrap core operations of reading one page in f2fs_mpage_readpages(). In addition, if we failed in f2fs_mpage_readpages(), propagate error number to f2fs_read_data_page(), for f2fs_read_data_pages() path, always return success. Signed-off-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Chao Yu 提交于
This patch fixes to set FI_UPDATE_WRITE only if in-place IO was issued. Signed-off-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Chao Yu 提交于
This patch changes codes as below: - don't use is_read_io() as a condition to judge the meta IO. - use .is_por to replace .is_meta to indicate IO is from recovery explicitly. Signed-off-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
- 30 4月, 2019 1 次提交
-
-
由 Christoph Hellwig 提交于
We only have two callers that need the integer loop iterator, and they can easily maintain it themselves. Suggested-by: NMatthew Wilcox <willy@infradead.org> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Acked-by: NDavid Sterba <dsterba@suse.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Acked-by: NColy Li <colyli@suse.de> Reviewed-by: NMatthew Wilcox <willy@infradead.org> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 17 4月, 2019 1 次提交
-
-
由 Hariprasad Kelam 提交于
changed passing function argument "0 to NULL" to fix below sparse warning fs/f2fs/data.c:426:47: warning: Using plain integer as NULL pointer Signed-off-by: NHariprasad Kelam <hariprasad.kelam@gmail.com> Reviewed-by: NChao Yu <yuchao0@huawei.com> Reviewed-by: NMukesh Ojha <mojha@codeaurora.org> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
- 06 4月, 2019 2 次提交
-
-
由 Chao Yu 提交于
As Hagbard Celine reported: Hi, this is a long standing bug that I've hit before on older kernels, but I was not able to get the syslog saved because of the nature of the bug. This time I had booted form a pen-drive, and was able to save the log to it's efi-partition. What i did to trigger it was to create a partition and format it f2fs, then mount it with options: "rw,relatime,lazytime,background_gc=on,disable_ext_identify,discard,heap,user_xattr,inline_xattr,acl,inline_data,inline_dentry,flush_merge,data_flush,extent_cache,mode=adaptive,active_logs=6,whint_mode=fs-based,alloc_mode=default,fsync_mode=strict". Then I unpacked a big .tar.xz to the partition (I used a gentoo-stage3-tarball as I was in process of installing Gentoo). Same options just without data_flush gives no problems. Mar 20 20:54:01 usbgentoo kernel: FAT-fs (nvme0n1p4): Volume was not properly unmounted. Some data may be corrupt. Please run fsck. Mar 20 21:05:23 usbgentoo kernel: kworker/dying (1588) used greatest stack depth: 12064 bytes left Mar 20 21:06:40 usbgentoo kernel: BUG: stack guard page was hit at 00000000a4b0733c (stack is 0000000056016422..0000000096e7463f) Mar 20 21:06:40 usbgentoo kernel: kernel stack overflow ...... Mar 20 21:06:40 usbgentoo kernel: Call Trace: Mar 20 21:06:40 usbgentoo kernel: read_node_page+0x71/0xf0 Mar 20 21:06:40 usbgentoo kernel: ? xas_load+0x8/0x50 Mar 20 21:06:40 usbgentoo kernel: __get_node_page+0x73/0x2a0 Mar 20 21:06:40 usbgentoo kernel: f2fs_get_dnode_of_data+0x34e/0x580 Mar 20 21:06:40 usbgentoo kernel: f2fs_write_inline_data+0x5e/0x2a0 Mar 20 21:06:40 usbgentoo kernel: __write_data_page+0x421/0x690 Mar 20 21:06:40 usbgentoo kernel: f2fs_write_cache_pages+0x1cf/0x460 Mar 20 21:06:40 usbgentoo kernel: f2fs_write_data_pages+0x2b3/0x2e0 Mar 20 21:06:40 usbgentoo kernel: ? f2fs_inode_chksum_verify+0x1d/0xc0 Mar 20 21:06:40 usbgentoo kernel: ? read_node_page+0x71/0xf0 Mar 20 21:06:40 usbgentoo kernel: do_writepages+0x3c/0xd0 Mar 20 21:06:40 usbgentoo kernel: __filemap_fdatawrite_range+0x7c/0xb0 Mar 20 21:06:40 usbgentoo kernel: f2fs_sync_dirty_inodes+0xf2/0x200 Mar 20 21:06:40 usbgentoo kernel: f2fs_balance_fs_bg+0x2a3/0x2c0 Mar 20 21:06:40 usbgentoo kernel: ? f2fs_inode_dirtied+0x21/0xc0 Mar 20 21:06:40 usbgentoo kernel: f2fs_balance_fs+0xd6/0x2b0 Mar 20 21:06:40 usbgentoo kernel: __write_data_page+0x4fb/0x690 ...... Mar 20 21:06:40 usbgentoo kernel: __writeback_single_inode+0x2a1/0x340 Mar 20 21:06:40 usbgentoo kernel: ? soft_cursor+0x1b4/0x220 Mar 20 21:06:40 usbgentoo kernel: writeback_sb_inodes+0x1d5/0x3e0 Mar 20 21:06:40 usbgentoo kernel: __writeback_inodes_wb+0x58/0xa0 Mar 20 21:06:40 usbgentoo kernel: wb_writeback+0x250/0x2e0 Mar 20 21:06:40 usbgentoo kernel: ? 0xffffffff8c000000 Mar 20 21:06:40 usbgentoo kernel: ? cpumask_next+0x16/0x20 Mar 20 21:06:40 usbgentoo kernel: wb_workfn+0x2f6/0x3b0 Mar 20 21:06:40 usbgentoo kernel: ? __switch_to_asm+0x40/0x70 Mar 20 21:06:40 usbgentoo kernel: process_one_work+0x1f5/0x3f0 Mar 20 21:06:40 usbgentoo kernel: worker_thread+0x28/0x3c0 Mar 20 21:06:40 usbgentoo kernel: ? rescuer_thread+0x330/0x330 Mar 20 21:06:40 usbgentoo kernel: kthread+0x10e/0x130 Mar 20 21:06:40 usbgentoo kernel: ? kthread_create_on_node+0x60/0x60 Mar 20 21:06:40 usbgentoo kernel: ret_from_fork+0x35/0x40 The root cause is that we run into an infinite recursive calling in between f2fs_balance_fs_bg and writepage() as described below: - f2fs_write_data_pages --- A - __write_data_page - f2fs_balance_fs - f2fs_balance_fs_bg --- B - f2fs_sync_dirty_inodes - filemap_fdatawrite - f2fs_write_data_pages --- A ... - f2fs_balance_fs_bg --- B ... In order to fix this issue, let's detect such condition in __write_data_page() and just skip calling f2fs_balance_fs() recursively. Reported-by: NHagbard Celine <hagbardcelin@gmail.com> Signed-off-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Damien Le Moal 提交于
For a single device mount using a zoned block device, the zone information for the device is stored in the sbi->devs single entry array and sbi->s_ndevs is set to 1. This differs from a single device mount using a regular block device which does not allocate sbi->devs and sets sbi->s_ndevs to 0. However, sbi->s_devs == 0 condition is used throughout the code to differentiate a single device mount from a multi-device mount where sbi->s_ndevs is always larger than 1. This results in problems with single zoned block device volumes as these are treated as multi-device mounts but do not have the start_blk and end_blk information set. One of the problem observed is skipping of zone discard issuing resulting in write commands being issued to full zones or unaligned to a zone write pointer. Fix this problem by simply treating the cases sbi->s_ndevs == 0 (single regular block device mount) and sbi->s_ndevs == 1 (single zoned block device mount) in the same manner. This is done by introducing the helper function f2fs_is_multi_device() and using this helper in place of direct tests of sbi->s_ndevs value, improving code readability. Fixes: 7bb3a371 ("f2fs: Fix zoned block device support") Cc: <stable@vger.kernel.org> Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Reviewed-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
- 13 3月, 2019 5 次提交
-
-
由 Chao Yu 提交于
In f2fs_mpage_readpages(), if page is beyond EOF, we should just zero out it, but previously, before checking previous mapping info, we missed to check filesize boundary, fix it. Signed-off-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Chao Yu 提交于
As Gao Xiang reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202749 f2fs may skip pageout() due to incorrect page reference count. The problem here is that MM defined the rule [1] very clearly that once page was set with PG_private flag, we should increment the refcount in that page, also main flows like pageout(), migrate_page() will assume there is one additional page reference count if page_has_private() returns true. But currently, f2fs won't add/del refcount when changing PG_private flag. Anyway, f2fs should follow MM's rule to make MM's related flows running as expected. [1] https://lore.kernel.org/lkml/2b19b3c4-2bc4-15fa-15cc-27a13e5c7af1@aol.com/Reported-by: NGao Xiang <gaoxiang25@huawei.com> Signed-off-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Chao Yu 提交于
Since 8c242db9 ("f2fs: fix stale ATOMIC_WRITTEN_PAGE private pointer"), we've started to not skip clear private flag for atomic_write page truncation, so removing old wrong comment in f2fs_invalidate_page(). Signed-off-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Chao Yu 提交于
For IPU path of f2fs_do_write_data_page(), in its error path, we need to release encrypted page and fscrypt context, otherwise it will cause memory leak. Signed-off-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Gao Xiang 提交于
Note that __GFP_ZERO is not supported for mempool_alloc, which also documented in the mempool_alloc comments. Signed-off-by: NGao Xiang <gaoxiang25@huawei.com> Reviewed-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-