1. 18 4月, 2020 1 次提交
  2. 17 4月, 2020 1 次提交
  3. 04 4月, 2020 3 次提交
  4. 31 3月, 2020 4 次提交
    • C
      f2fs: fix potential .flags overflow on 32bit architecture · 7653b9d8
      Chao Yu 提交于
      f2fs_inode_info.flags is unsigned long variable, it has 32 bits
      in 32bit architecture, since we introduced FI_MMAP_FILE flag
      when we support data compression, we may access memory cross
      the border of .flags field, corrupting .i_sem field, result in
      below deadlock.
      
      To fix this issue, let's expand .flags as an array to grab enough
      space to store new flags.
      
      Call Trace:
       __schedule+0x8d0/0x13fc
       ? mark_held_locks+0xac/0x100
       schedule+0xcc/0x260
       rwsem_down_write_slowpath+0x3ab/0x65d
       down_write+0xc7/0xe0
       f2fs_drop_nlink+0x3d/0x600 [f2fs]
       f2fs_delete_inline_entry+0x300/0x440 [f2fs]
       f2fs_delete_entry+0x3a1/0x7f0 [f2fs]
       f2fs_unlink+0x500/0x790 [f2fs]
       vfs_unlink+0x211/0x490
       do_unlinkat+0x483/0x520
       sys_unlink+0x4a/0x70
       do_fast_syscall_32+0x12b/0x683
       entry_SYSENTER_32+0xaa/0x102
      
      Fixes: 4c8ff709 ("f2fs: support data compression")
      Tested-by: NOndrej Jirman <megous@megous.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7653b9d8
    • C
      f2fs: don't trigger data flush in foreground operation · 7bcd0cfa
      Chao Yu 提交于
      Data flush can generate heavy IO and cause long latency during
      flush, so it's not appropriate to trigger it in foreground
      operation.
      
      And also, we may face below potential deadlock during data flush:
      - f2fs_write_multi_pages
       - f2fs_write_raw_pages
        - f2fs_write_single_data_page
         - f2fs_balance_fs
          - f2fs_balance_fs_bg
           - f2fs_sync_dirty_inodes
            - filemap_fdatawrite   -- stuck on flush same cluster
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7bcd0cfa
    • C
      f2fs: clean up f2fs_may_encrypt() · 8c7d4b57
      Chao Yu 提交于
      Merge below two conditions into f2fs_may_encrypt() for cleanup
      - IS_ENCRYPTED()
      - DUMMY_ENCRYPTION_ENABLED()
      
      Check IS_ENCRYPTED(inode) condition in f2fs_init_inode_metadata()
      is enough since we have already set encrypt flag in f2fs_new_inode().
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      8c7d4b57
    • C
      f2fs: don't mark compressed inode dirty during f2fs_iget() · 530e0704
      Chao Yu 提交于
      - f2fs_iget
       - do_read_inode
        - set_inode_flag(, FI_COMPRESSED_FILE)
         - __mark_inode_dirty_flag(, true)
      
      It's unnecessary, so let's just mark compressed inode dirty while
      compressed inode conversion.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      530e0704
  5. 25 3月, 2020 1 次提交
  6. 23 3月, 2020 1 次提交
  7. 20 3月, 2020 8 次提交
  8. 11 3月, 2020 4 次提交
    • C
      f2fs: fix to check dirty pages during compressed inode conversion · 6cfdf15f
      Chao Yu 提交于
      Compressed cluster can be generated during dirty data writeback,
      if there is dirty pages on compressed inode, it needs to disable
      converting compressed inode to non-compressed one.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      6cfdf15f
    • C
      f2fs: fix to account compressed inode correctly · 96f5b4fa
      Chao Yu 提交于
      stat_inc_compr_inode() needs to check FI_COMPRESSED_FILE flag, so
      in f2fs_disable_compressed_file(), we should call stat_dec_compr_inode()
      before clearing FI_COMPRESSED_FILE flag.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      96f5b4fa
    • C
      f2fs: fix inconsistent comments · 7a88ddb5
      Chao Yu 提交于
      Lack of maintenance on comments may mislead developers, fix them.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7a88ddb5
    • C
      f2fs: cover last_disk_size update with spinlock · c10c9820
      Chao Yu 提交于
      This change solves below hangtask issue:
      
      INFO: task kworker/u16:1:58 blocked for more than 122 seconds.
            Not tainted 5.6.0-rc2-00590-g9983bdae4974e #11
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      kworker/u16:1   D    0    58      2 0x00000000
      Workqueue: writeback wb_workfn (flush-179:0)
      Backtrace:
       (__schedule) from [<c0913234>] (schedule+0x78/0xf4)
       (schedule) from [<c017ec74>] (rwsem_down_write_slowpath+0x24c/0x4c0)
       (rwsem_down_write_slowpath) from [<c0915f2c>] (down_write+0x6c/0x70)
       (down_write) from [<c0435b80>] (f2fs_write_single_data_page+0x608/0x7ac)
       (f2fs_write_single_data_page) from [<c0435fd8>] (f2fs_write_cache_pages+0x2b4/0x7c4)
       (f2fs_write_cache_pages) from [<c043682c>] (f2fs_write_data_pages+0x344/0x35c)
       (f2fs_write_data_pages) from [<c0267ee8>] (do_writepages+0x3c/0xd4)
       (do_writepages) from [<c0310cbc>] (__writeback_single_inode+0x44/0x454)
       (__writeback_single_inode) from [<c03112d0>] (writeback_sb_inodes+0x204/0x4b0)
       (writeback_sb_inodes) from [<c03115cc>] (__writeback_inodes_wb+0x50/0xe4)
       (__writeback_inodes_wb) from [<c03118f4>] (wb_writeback+0x294/0x338)
       (wb_writeback) from [<c0312dac>] (wb_workfn+0x35c/0x54c)
       (wb_workfn) from [<c014f2b8>] (process_one_work+0x214/0x544)
       (process_one_work) from [<c014f634>] (worker_thread+0x4c/0x574)
       (worker_thread) from [<c01564fc>] (kthread+0x144/0x170)
       (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
      Reported-and-tested-by: NOndřej Jirman <megi@xff.cz>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c10c9820
  9. 28 2月, 2020 3 次提交
  10. 24 1月, 2020 1 次提交
    • H
      f2fs: Add f2fs stats to sysfs · fc7100ea
      Hridya Valsaraju 提交于
      Currently f2fs stats are only available from /d/f2fs/status. This patch
      adds some of the f2fs stats to sysfs so that they are accessible even
      when debugfs is not mounted.
      
      The following sysfs nodes are added:
      -/sys/fs/f2fs/<disk>/free_segments
      -/sys/fs/f2fs/<disk>/cp_foreground_calls
      -/sys/fs/f2fs/<disk>/cp_background_calls
      -/sys/fs/f2fs/<disk>/gc_foreground_calls
      -/sys/fs/f2fs/<disk>/gc_background_calls
      -/sys/fs/f2fs/<disk>/moved_blocks_foreground
      -/sys/fs/f2fs/<disk>/moved_blocks_background
      -/sys/fs/f2fs/<disk>/avg_vblocks
      Signed-off-by: NHridya Valsaraju <hridya@google.com>
      [Jaegeuk Kim: allow STAT_FS without DEBUG_FS]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fc7100ea
  11. 18 1月, 2020 3 次提交
    • C
      f2fs: change to use rwsem for gc_mutex · fb24fea7
      Chao Yu 提交于
      Mutex lock won't serialize callers, in order to avoid starving of unlucky
      caller, let's use rwsem lock instead.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fb24fea7
    • J
      f2fs: convert inline_dir early before starting rename · b06af2af
      Jaegeuk Kim 提交于
      If we hit an error during rename, we'll get two dentries in different
      directories.
      
      Chao adds to check the room in inline_dir which can avoid needless
      inversion. This should be done by inode_lock(&old_dir).
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b06af2af
    • C
      f2fs: support data compression · 4c8ff709
      Chao Yu 提交于
      This patch tries to support compression in f2fs.
      
      - New term named cluster is defined as basic unit of compression, file can
      be divided into multiple clusters logically. One cluster includes 4 << n
      (n >= 0) logical pages, compression size is also cluster size, each of
      cluster can be compressed or not.
      
      - In cluster metadata layout, one special flag is used to indicate cluster
      is compressed one or normal one, for compressed cluster, following metadata
      maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
      data including compress header and compressed data.
      
      - In order to eliminate write amplification during overwrite, F2FS only
      support compression on write-once file, data can be compressed only when
      all logical blocks in file are valid and cluster compress ratio is lower
      than specified threshold.
      
      - To enable compression on regular inode, there are three ways:
      * chattr +c file
      * chattr +c dir; touch dir/file
      * mount w/ -o compress_extension=ext; touch file.ext
      
      Compress metadata layout:
                                   [Dnode Structure]
                   +-----------------------------------------------+
                   | cluster 1 | cluster 2 | ......... | cluster N |
                   +-----------------------------------------------+
                   .           .                       .           .
             .                       .                .                      .
        .         Compressed Cluster       .        .        Normal Cluster            .
      +----------+---------+---------+---------+  +---------+---------+---------+---------+
      |compr flag| block 1 | block 2 | block 3 |  | block 1 | block 2 | block 3 | block 4 |
      +----------+---------+---------+---------+  +---------+---------+---------+---------+
                 .                             .
               .                                           .
             .                                                           .
            +-------------+-------------+----------+----------------------------+
            | data length | data chksum | reserved |      compressed data       |
            +-------------+-------------+----------+----------------------------+
      
      Changelog:
      
      20190326:
      - fix error handling of read_end_io().
      - remove unneeded comments in f2fs_encrypt_one_page().
      
      20190327:
      - fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
      - don't jump into loop directly to avoid uninitialized variables.
      - add TODO tag in error path of f2fs_write_cache_pages().
      
      20190328:
      - fix wrong merge condition in f2fs_read_multi_pages().
      - check compressed file in f2fs_post_read_required().
      
      20190401
      - allow overwrite on non-compressed cluster.
      - check cluster meta before writing compressed data.
      
      20190402
      - don't preallocate blocks for compressed file.
      
      - add lz4 compress algorithm
      - process multiple post read works in one workqueue
        Now f2fs supports processing post read work in multiple workqueue,
        it shows low performance due to schedule overhead of multiple
        workqueue executing orderly.
      
      20190921
      - compress: support buffered overwrite
      C: compress cluster flag
      V: valid block address
      N: NEW_ADDR
      
      One cluster contain 4 blocks
      
       before overwrite   after overwrite
      
      - VVVV		->	CVNN
      - CVNN		->	VVVV
      
      - CVNN		->	CVNN
      - CVNN		->	CVVV
      
      - CVVV		->	CVNN
      - CVVV		->	CVVV
      
      20191029
      - add kconfig F2FS_FS_COMPRESSION to isolate compression related
      codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
      note that: will remove lzo backend if Jaegeuk agreed that too.
      - update codes according to Eric's comments.
      
      20191101
      - apply fixes from Jaegeuk
      
      20191113
      - apply fixes from Jaegeuk
      - split workqueue for fsverity
      
      20191216
      - apply fixes from Jaegeuk
      
      20200117
      - fix to avoid NULL pointer dereference
      
      [Jaegeuk Kim]
      - add tracepoint for f2fs_{,de}compress_pages()
      - fix many bugs and add some compression stats
      - fix overwrite/mmap bugs
      - address 32bit build error, reported by Geert.
      - bug fixes when handling errors and i_compressed_blocks
      
      Reported-by: <noreply@ellerman.id.au>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      4c8ff709
  12. 16 1月, 2020 4 次提交
    • C
      f2fs: introduce private bioset · f543805f
      Chao Yu 提交于
      In low memory scenario, we can allocate multiple bios without
      submitting any of them.
      
      - f2fs_write_checkpoint()
       - block_operations()
        - f2fs_sync_node_pages()
         step 1) flush cold nodes, allocate new bio from mempool
         - bio_alloc()
          - mempool_alloc()
         step 2) flush hot nodes, allocate a bio from mempool
         - bio_alloc()
          - mempool_alloc()
         step 3) flush warm nodes, be stuck in below call path
         - bio_alloc()
          - mempool_alloc()
           - loop to wait mempool element release, as we only
             reserved memory for two bio allocation, however above
             allocated two bios may never be submitted.
      
      So we need avoid using default bioset, in this patch we introduce a
      private bioset, in where we enlarg mempool element count to total
      number of log header, so that we can make sure we have enough
      backuped memory pool in scenario of allocating/holding multiple
      bios.
      Signed-off-by: NGao Xiang <gaoxiang25@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f543805f
    • S
      f2fs: cleanup duplicate stats for atomic files · 0e6d0164
      Sahitya Tummala 提交于
      Remove duplicate sbi->aw_cnt stats counter that tracks
      the number of atomic files currently opened (it also shows
      incorrect value sometimes). Use more relit lable sbi->atomic_files
      to show in the stats.
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0e6d0164
    • S
      f2fs: Check write pointer consistency of non-open zones · d508c94e
      Shin'ichiro Kawasaki 提交于
      To catch f2fs bugs in write pointer handling code for zoned block
      devices, check write pointers of non-open zones that current segments do
      not point to. Do this check at mount time, after the fsync data recovery
      and current segments' write pointer consistency fix. Or when fsync data
      recovery is disabled by mount option, do the check when there is no fsync
      data.
      
      Check two items comparing write pointers with valid block maps in SIT.
      The first item is check for zones with no valid blocks. When there is no
      valid blocks in a zone, the write pointer should be at the start of the
      zone. If not, next write operation to the zone will cause unaligned write
      error. If write pointer is not at the zone start, reset the write pointer
      to place at the zone start.
      
      The second item is check between the write pointer position and the last
      valid block in the zone. It is unexpected that the last valid block
      position is beyond the write pointer. In such a case, report as a bug.
      Fix is not required for such zone, because the zone is not selected for
      next write operation until the zone get discarded.
      Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d508c94e
    • S
      f2fs: Check write pointer consistency of open zones · c426d991
      Shin'ichiro Kawasaki 提交于
      On sudden f2fs shutdown, write pointers of zoned block devices can go
      further but f2fs meta data keeps current segments at positions before the
      write operations. After remounting the f2fs, this inconsistency causes
      write operations not at write pointers and "Unaligned write command"
      error is reported.
      
      To avoid the error, compare current segments with write pointers of open
      zones the current segments point to, during mount operation. If the write
      pointer position is not aligned with the current segment position, assign
      a new zone to the current segment. Also check the newly assigned zone has
      write pointer at zone start. If not, reset write pointer of the zone.
      
      Perform the consistency check during fsync recovery. Not to lose the
      fsync data, do the check after fsync data gets restored and before
      checkpoint commit which flushes data at current segment positions. Not to
      cause conflict with kworker's dirfy data/node flush, do the fix within
      SBI_POR_DOING protection.
      Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c426d991
  13. 15 1月, 2020 1 次提交
    • E
      fs-verity: implement readahead of Merkle tree pages · fd39073d
      Eric Biggers 提交于
      When fs-verity verifies data pages, currently it reads each Merkle tree
      page synchronously using read_mapping_page().
      
      Therefore, when the Merkle tree pages aren't already cached, fs-verity
      causes an extra 4 KiB I/O request for every 512 KiB of data (assuming
      that the Merkle tree uses SHA-256 and 4 KiB blocks).  This results in
      more I/O requests and performance loss than is strictly necessary.
      
      Therefore, implement readahead of the Merkle tree pages.
      
      For simplicity, we take advantage of the fact that the kernel already
      does readahead of the file's *data*, just like it does for any other
      file.  Due to this, we don't really need a separate readahead state
      (struct file_ra_state) just for the Merkle tree, but rather we just need
      to piggy-back on the existing data readahead requests.
      
      We also only really need to bother with the first level of the Merkle
      tree, since the usual fan-out factor is 128, so normally over 99% of
      Merkle tree I/O requests are for the first level.
      
      Therefore, make fsverity_verify_bio() enable readahead of the first
      Merkle tree level, for up to 1/4 the number of pages in the bio, when it
      sees that the REQ_RAHEAD flag is set on the bio.  The readahead size is
      then passed down to ->read_merkle_tree_page() for the filesystem to
      (optionally) implement if it sees that the requested page is uncached.
      
      While we're at it, also make build_merkle_tree_level() set the Merkle
      tree readahead size, since it's easy to do there.
      
      However, for now don't set the readahead size in fsverity_verify_page(),
      since currently it's only used to verify holes on ext4 and f2fs, and it
      would need parameters added to know how much to read ahead.
      
      This patch significantly improves fs-verity sequential read performance.
      Some quick benchmarks with 'cat'-ing a 250MB file after dropping caches:
      
          On an ARM64 phone (using sha256-ce):
              Before: 217 MB/s
              After: 263 MB/s
              (compare to sha256sum of non-verity file: 357 MB/s)
      
          In an x86_64 VM (using sha256-avx2):
              Before: 173 MB/s
              After: 215 MB/s
              (compare to sha256sum of non-verity file: 223 MB/s)
      
      Link: https://lore.kernel.org/r/20200106205533.137005-1-ebiggers@kernel.orgReviewed-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      fd39073d
  14. 20 11月, 2019 2 次提交
  15. 08 11月, 2019 1 次提交
  16. 26 10月, 2019 1 次提交
    • C
      f2fs: cache global IPU bio · 0b20fcec
      Chao Yu 提交于
      In commit 8648de2c ("f2fs: add bio cache for IPU"), we added
      f2fs_submit_ipu_bio() in __write_data_page() as below:
      
      __write_data_page()
      
      	if (!S_ISDIR(inode->i_mode) && !IS_NOQUOTA(inode)) {
      		f2fs_submit_ipu_bio(sbi, bio, page);
      		....
      	}
      
      in order to avoid below deadlock:
      
      Thread A				Thread B
      - __write_data_page (inode x, page y)
       - f2fs_do_write_data_page
        - set_page_writeback        ---- set writeback flag in page y
        - f2fs_inplace_write_data
       - f2fs_balance_fs
      					 - lock gc_mutex
       - lock gc_mutex
      					  - f2fs_gc
      					   - do_garbage_collect
      					    - gc_data_segment
      					     - move_data_page
      					      - f2fs_wait_on_page_writeback
      					       - wait_on_page_writeback  --- wait writeback of page y
      
      However, the bio submission breaks the merge of IPU IOs.
      
      So in this patch let's add a global bio cache for merged IPU pages,
      then f2fs_wait_on_page_writeback() is able to submit bio if a
      writebacked page is cached in global bio cache.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0b20fcec
  17. 05 10月, 2019 1 次提交
    • C
      f2fs: fix to update time in lazytime mode · fe1897ea
      Chao Yu 提交于
      generic/018 reports an inconsistent status of atime, the
      testcase is as below:
      - open file with O_SYNC
      - write file to construct fraged space
      - calc md5 of file
      - record {a,c,m}time
      - defrag file --- do nothing
      - umount & mount
      - check {a,c,m}time
      
      The root cause is, as f2fs enables lazytime by default, atime
      update will dirty vfs inode, rather than dirtying f2fs inode (by set
      with FI_DIRTY_INODE), so later f2fs_write_inode() called from VFS will
      fail to update inode page due to our skip:
      
      f2fs_write_inode()
      	if (is_inode_flag_set(inode, FI_DIRTY_INODE))
      		return 0;
      
      So eventually, after evict(), we lose last atime for ever.
      
      To fix this issue, we need to check whether {a,c,m,cr}time is
      consistent in between inode cache and inode page, and only skip
      f2fs_update_inode() if f2fs inode is not dirty and time is
      consistent as well.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fe1897ea