1. 18 1月, 2020 3 次提交
    • C
      f2fs: fix to add swap extent correctly · 3e5e479a
      Chao Yu 提交于
      As Youling reported in mailing list:
      
      https://www.linuxquestions.org/questions/linux-newbie-8/the-file-system-f2fs-is-broken-4175666043/
      
      https://www.linux.org/threads/the-file-system-f2fs-is-broken.26490/
      
      There is a test case can corrupt f2fs image:
      - dd if=/dev/zero of=/swapfile bs=1M count=4096
      - chmod 600 /swapfile
      - mkswap /swapfile
      - swapon --discard /swapfile
      
      The root cause is f2fs_swap_activate() intends to return zero value
      to setup_swap_extents() to enable SWP_FS mode (swap file goes through
      fs), in this flow, setup_swap_extents() setups swap extent with wrong
      block address range, result in discard_swap() erasing incorrect address.
      
      Because f2fs_swap_activate() has pinned swapfile, its data block
      address will not change, it's safe to let swap to handle IO through
      raw device, so we can get rid of SWAP_FS mode and initial swap extents
      inside f2fs_swap_activate(), by this way, later discard_swap() can trim
      in right address range.
      
      Fixes: 4969c06a ("f2fs: support swap file w/ DIO")
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      3e5e479a
    • J
      f2fs: run fsck when getting bad inode during GC · 4eea93e3
      Jaegeuk Kim 提交于
      This is to avoid inifinite GC when trying to disable checkpoint.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      4eea93e3
    • C
      f2fs: support data compression · 4c8ff709
      Chao Yu 提交于
      This patch tries to support compression in f2fs.
      
      - New term named cluster is defined as basic unit of compression, file can
      be divided into multiple clusters logically. One cluster includes 4 << n
      (n >= 0) logical pages, compression size is also cluster size, each of
      cluster can be compressed or not.
      
      - In cluster metadata layout, one special flag is used to indicate cluster
      is compressed one or normal one, for compressed cluster, following metadata
      maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
      data including compress header and compressed data.
      
      - In order to eliminate write amplification during overwrite, F2FS only
      support compression on write-once file, data can be compressed only when
      all logical blocks in file are valid and cluster compress ratio is lower
      than specified threshold.
      
      - To enable compression on regular inode, there are three ways:
      * chattr +c file
      * chattr +c dir; touch dir/file
      * mount w/ -o compress_extension=ext; touch file.ext
      
      Compress metadata layout:
                                   [Dnode Structure]
                   +-----------------------------------------------+
                   | cluster 1 | cluster 2 | ......... | cluster N |
                   +-----------------------------------------------+
                   .           .                       .           .
             .                       .                .                      .
        .         Compressed Cluster       .        .        Normal Cluster            .
      +----------+---------+---------+---------+  +---------+---------+---------+---------+
      |compr flag| block 1 | block 2 | block 3 |  | block 1 | block 2 | block 3 | block 4 |
      +----------+---------+---------+---------+  +---------+---------+---------+---------+
                 .                             .
               .                                           .
             .                                                           .
            +-------------+-------------+----------+----------------------------+
            | data length | data chksum | reserved |      compressed data       |
            +-------------+-------------+----------+----------------------------+
      
      Changelog:
      
      20190326:
      - fix error handling of read_end_io().
      - remove unneeded comments in f2fs_encrypt_one_page().
      
      20190327:
      - fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
      - don't jump into loop directly to avoid uninitialized variables.
      - add TODO tag in error path of f2fs_write_cache_pages().
      
      20190328:
      - fix wrong merge condition in f2fs_read_multi_pages().
      - check compressed file in f2fs_post_read_required().
      
      20190401
      - allow overwrite on non-compressed cluster.
      - check cluster meta before writing compressed data.
      
      20190402
      - don't preallocate blocks for compressed file.
      
      - add lz4 compress algorithm
      - process multiple post read works in one workqueue
        Now f2fs supports processing post read work in multiple workqueue,
        it shows low performance due to schedule overhead of multiple
        workqueue executing orderly.
      
      20190921
      - compress: support buffered overwrite
      C: compress cluster flag
      V: valid block address
      N: NEW_ADDR
      
      One cluster contain 4 blocks
      
       before overwrite   after overwrite
      
      - VVVV		->	CVNN
      - CVNN		->	VVVV
      
      - CVNN		->	CVNN
      - CVNN		->	CVVV
      
      - CVVV		->	CVNN
      - CVVV		->	CVVV
      
      20191029
      - add kconfig F2FS_FS_COMPRESSION to isolate compression related
      codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
      note that: will remove lzo backend if Jaegeuk agreed that too.
      - update codes according to Eric's comments.
      
      20191101
      - apply fixes from Jaegeuk
      
      20191113
      - apply fixes from Jaegeuk
      - split workqueue for fsverity
      
      20191216
      - apply fixes from Jaegeuk
      
      20200117
      - fix to avoid NULL pointer dereference
      
      [Jaegeuk Kim]
      - add tracepoint for f2fs_{,de}compress_pages()
      - fix many bugs and add some compression stats
      - fix overwrite/mmap bugs
      - address 32bit build error, reported by Geert.
      - bug fixes when handling errors and i_compressed_blocks
      
      Reported-by: <noreply@ellerman.id.au>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      4c8ff709
  2. 16 1月, 2020 9 次提交
    • J
      f2fs: free sysfs kobject · 820d3667
      Jaegeuk Kim 提交于
      Detected kmemleak.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      820d3667
    • J
      f2fs: declare nested quota_sem and remove unnecessary sems · 2c4e0c52
      Jaegeuk Kim 提交于
      1.
      f2fs_quota_sync
       -> down_read(&sbi->quota_sem)
       -> dquot_writeback_dquots
        -> f2fs_dquot_commit
         -> down_read(&sbi->quota_sem)
      
      2.
      f2fs_quota_sync
       -> down_read(&sbi->quota_sem)
        -> f2fs_write_data_pages
         -> f2fs_write_single_data_page
          -> down_write(&F2FS_I(inode)->i_sem)
      
      f2fs_mkdir
       -> f2fs_do_add_link
         -> down_write(&F2FS_I(inode)->i_sem)
         -> f2fs_init_inode_metadata
          -> f2fs_new_node_page
           -> dquot_alloc_inode
            -> f2fs_dquot_mark_dquot_dirty
             -> down_read(&sbi->quota_sem)
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2c4e0c52
    • J
      f2fs: don't put new_page twice in f2fs_rename · 762e4db5
      Jaegeuk Kim 提交于
      In f2fs_rename(), new_page is gone after f2fs_set_link(), but it tries
      to put again when whiteout is failed and jumped to put_out_dir.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      762e4db5
    • J
      f2fs: set I_LINKABLE early to avoid wrong access by vfs · 5b1dbb08
      Jaegeuk Kim 提交于
      This patch moves setting I_LINKABLE early in rename2(whiteout) to avoid the
      below warning.
      
      [ 3189.163385] WARNING: CPU: 3 PID: 59523 at fs/inode.c:358 inc_nlink+0x32/0x40
      [ 3189.246979] Call Trace:
      [ 3189.248707]  f2fs_init_inode_metadata+0x2d6/0x440 [f2fs]
      [ 3189.251399]  f2fs_add_inline_entry+0x162/0x8c0 [f2fs]
      [ 3189.254010]  f2fs_add_dentry+0x69/0xe0 [f2fs]
      [ 3189.256353]  f2fs_do_add_link+0xc5/0x100 [f2fs]
      [ 3189.258774]  f2fs_rename2+0xabf/0x1010 [f2fs]
      [ 3189.261079]  vfs_rename+0x3f8/0xaa0
      [ 3189.263056]  ? tomoyo_path_rename+0x44/0x60
      [ 3189.265283]  ? do_renameat2+0x49b/0x550
      [ 3189.267324]  do_renameat2+0x49b/0x550
      [ 3189.269316]  __x64_sys_renameat2+0x20/0x30
      [ 3189.271441]  do_syscall_64+0x5a/0x230
      [ 3189.273410]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [ 3189.275848] RIP: 0033:0x7f270b4d9a49
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5b1dbb08
    • E
      f2fs: don't keep META_MAPPING pages used for moving verity file blocks · 542989b6
      Eric Biggers 提交于
      META_MAPPING is used to move blocks for both encrypted and verity files.
      So the META_MAPPING invalidation condition in do_checkpoint() should
      consider verity too, not just encrypt.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      542989b6
    • C
      f2fs: introduce private bioset · f543805f
      Chao Yu 提交于
      In low memory scenario, we can allocate multiple bios without
      submitting any of them.
      
      - f2fs_write_checkpoint()
       - block_operations()
        - f2fs_sync_node_pages()
         step 1) flush cold nodes, allocate new bio from mempool
         - bio_alloc()
          - mempool_alloc()
         step 2) flush hot nodes, allocate a bio from mempool
         - bio_alloc()
          - mempool_alloc()
         step 3) flush warm nodes, be stuck in below call path
         - bio_alloc()
          - mempool_alloc()
           - loop to wait mempool element release, as we only
             reserved memory for two bio allocation, however above
             allocated two bios may never be submitted.
      
      So we need avoid using default bioset, in this patch we introduce a
      private bioset, in where we enlarg mempool element count to total
      number of log header, so that we can make sure we have enough
      backuped memory pool in scenario of allocating/holding multiple
      bios.
      Signed-off-by: NGao Xiang <gaoxiang25@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f543805f
    • S
      f2fs: cleanup duplicate stats for atomic files · 0e6d0164
      Sahitya Tummala 提交于
      Remove duplicate sbi->aw_cnt stats counter that tracks
      the number of atomic files currently opened (it also shows
      incorrect value sometimes). Use more relit lable sbi->atomic_files
      to show in the stats.
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0e6d0164
    • S
      f2fs: Check write pointer consistency of non-open zones · d508c94e
      Shin'ichiro Kawasaki 提交于
      To catch f2fs bugs in write pointer handling code for zoned block
      devices, check write pointers of non-open zones that current segments do
      not point to. Do this check at mount time, after the fsync data recovery
      and current segments' write pointer consistency fix. Or when fsync data
      recovery is disabled by mount option, do the check when there is no fsync
      data.
      
      Check two items comparing write pointers with valid block maps in SIT.
      The first item is check for zones with no valid blocks. When there is no
      valid blocks in a zone, the write pointer should be at the start of the
      zone. If not, next write operation to the zone will cause unaligned write
      error. If write pointer is not at the zone start, reset the write pointer
      to place at the zone start.
      
      The second item is check between the write pointer position and the last
      valid block in the zone. It is unexpected that the last valid block
      position is beyond the write pointer. In such a case, report as a bug.
      Fix is not required for such zone, because the zone is not selected for
      next write operation until the zone get discarded.
      Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d508c94e
    • S
      f2fs: Check write pointer consistency of open zones · c426d991
      Shin'ichiro Kawasaki 提交于
      On sudden f2fs shutdown, write pointers of zoned block devices can go
      further but f2fs meta data keeps current segments at positions before the
      write operations. After remounting the f2fs, this inconsistency causes
      write operations not at write pointers and "Unaligned write command"
      error is reported.
      
      To avoid the error, compare current segments with write pointers of open
      zones the current segments point to, during mount operation. If the write
      pointer position is not aligned with the current segment position, assign
      a new zone to the current segment. Also check the newly assigned zone has
      write pointer at zone start. If not, reset write pointer of the zone.
      
      Perform the consistency check during fsync recovery. Not to lose the
      fsync data, do the check after fsync data gets restored and before
      checkpoint commit which flushes data at current segment positions. Not to
      cause conflict with kworker's dirfy data/node flush, do the fix within
      SBI_POR_DOING protection.
      Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c426d991
  3. 13 12月, 2019 3 次提交
  4. 11 12月, 2019 1 次提交
  5. 10 12月, 2019 1 次提交
  6. 26 11月, 2019 3 次提交
    • J
      f2fs: stop GC when the victim becomes fully valid · 803e74be
      Jaegeuk Kim 提交于
      We must stop GC, once the segment becomes fully valid. Otherwise, it can
      produce another dirty segments by moving valid blocks in the segment partially.
      
      Ramon hit no free segment panic sometimes and saw this case happens when
      validating reliable file pinning feature.
      Signed-off-by: NRamon Pantin <pantin@google.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      803e74be
    • J
      f2fs: expose main_blkaddr in sysfs · a4db59ac
      Jaegeuk Kim 提交于
      Expose in /sys/fs/f2fs/<blockdev>/main_blkaddr the block address where the
      main area starts. This allows user mode programs to determine:
      
      - That pinned files that are made exclusively of fully allocated 2MB
        segments will never be unpinned by the file system.
      
      - Where the main area starts. This is required by programs that want to
        verify if a file is made exclusively of 2MB f2fs segments, the alignment
        boundary for segments starts at this address. Testing for 2MB alignment
        relative to the start of the device is incorrect, because for some
        filesystems main_blkaddr is not at a 2MB boundary relative to the start
        of the device.
      
      The entry will be used when validating reliable pinning file feature proposed
      by "f2fs: support aligned pinned file".
      Signed-off-by: NRamon Pantin <pantin@google.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      a4db59ac
    • C
      f2fs: choose hardlimit when softlimit is larger than hardlimit in f2fs_statfs_project() · 909110c0
      Chengguang Xu 提交于
      Setting softlimit larger than hardlimit seems meaningless
      for disk quota but currently it is allowed. In this case,
      there may be a bit of comfusion for users when they run
      df comamnd to directory which has project quota.
      
      For example, we set 20M softlimit and 10M hardlimit of
      block usage limit for project quota of test_dir(project id 123).
      
      [root@hades f2fs]# repquota -P -a
      *** Report for project quotas on device /dev/nvme0n1p8
      Block grace time: 7days; Inode grace time: 7days
      Block limits File limits
      Project used soft hard grace used soft hard grace
      ----------------------------------------------------------------------
      0 -- 4 0 0 1 0 0
      123 +- 10248 20480 10240 2 0 0
      
      The result of df command as below:
      
      [root@hades f2fs]# df -h /mnt/f2fs/test
      Filesystem Size Used Avail Use% Mounted on
      /dev/nvme0n1p8 20M 11M 10M 51% /mnt/f2fs
      
      Even though it looks like there is another 10M free space to use,
      if we write new data to diretory test(inherit project id),
      the write will fail with errno(-EDQUOT).
      
      After this patch, the df result looks like below.
      
      [root@hades f2fs]# df -h /mnt/f2fs/test
      Filesystem Size Used Avail Use% Mounted on
      /dev/nvme0n1p8 10M 10M 0 100% /mnt/f2fs
      Signed-off-by: NChengguang Xu <cgxu519@mykernel.net>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      909110c0
  7. 20 11月, 2019 2 次提交
  8. 14 11月, 2019 1 次提交
  9. 13 11月, 2019 1 次提交
  10. 08 11月, 2019 4 次提交
    • C
      f2fs: fix potential overflow · 1f0d5c91
      Chao Yu 提交于
      We expect 64-bit calculation result from below statement, however
      in 32-bit machine, looped left shift operation on pgoff_t type
      variable may cause overflow issue, fix it by forcing type cast.
      
      page->index << PAGE_SHIFT;
      
      Fixes: 26de9b11 ("f2fs: avoid unnecessary updating inode during fsync")
      Fixes: 0a2aa8fb ("f2fs: refactor __exchange_data_block for speed up")
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      1f0d5c91
    • C
      f2fs: fix to update dir's i_pino during cross_rename · 2a60637f
      Chao Yu 提交于
      As Eric reported:
      
      RENAME_EXCHANGE support was just added to fsstress in xfstests:
      
      	commit 65dfd40a97b6bbbd2a22538977bab355c5bc0f06
      	Author: kaixuxia <xiakaixu1987@gmail.com>
      	Date:   Thu Oct 31 14:41:48 2019 +0800
      
      	    fsstress: add EXCHANGE renameat2 support
      
      This is causing xfstest generic/579 to fail due to fsck.f2fs reporting errors.
      I'm not sure what the problem is, but it still happens even with all the
      fs-verity stuff in the test commented out, so that the test just runs fsstress.
      
      generic/579 23s ... 	[10:02:25]
      [    7.745370] run fstests generic/579 at 2019-11-04 10:02:25
      _check_generic_filesystem: filesystem on /dev/vdc is inconsistent
      (see /results/f2fs/results-default/generic/579.full for details)
       [10:02:47]
      Ran: generic/579
      Failures: generic/579
      Failed 1 of 1 tests
      Xunit report: /results/f2fs/results-default/result.xml
      
      Here's the contents of 579.full:
      
      _check_generic_filesystem: filesystem on /dev/vdc is inconsistent
      *** fsck.f2fs output ***
      [ASSERT] (__chk_dots_dentries:1378)  --> Bad inode number[0x24] for '..', parent parent ino is [0xd10]
      
      The root cause is that we forgot to update directory's i_pino during
      cross_rename, fix it.
      
      Fixes: 32f9bc25 ("f2fs: support ->rename2()")
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Tested-by: NEric Biggers <ebiggers@kernel.org>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2a60637f
    • J
      f2fs: support aligned pinned file · f5a53edc
      Jaegeuk Kim 提交于
      This patch supports 2MB-aligned pinned file, which can guarantee no GC at all
      by allocating fully valid 2MB segment.
      
      Check free segments by has_not_enough_free_secs() with large budget.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f5a53edc
    • J
      f2fs: avoid kernel panic on corruption test · bc005a4d
      Jaegeuk Kim 提交于
      xfstests/generic/475 complains kernel warn/panic while testing corrupted disk.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      bc005a4d
  11. 07 11月, 2019 2 次提交
  12. 04 11月, 2019 1 次提交
  13. 26 10月, 2019 1 次提交
    • C
      f2fs: cache global IPU bio · 0b20fcec
      Chao Yu 提交于
      In commit 8648de2c ("f2fs: add bio cache for IPU"), we added
      f2fs_submit_ipu_bio() in __write_data_page() as below:
      
      __write_data_page()
      
      	if (!S_ISDIR(inode->i_mode) && !IS_NOQUOTA(inode)) {
      		f2fs_submit_ipu_bio(sbi, bio, page);
      		....
      	}
      
      in order to avoid below deadlock:
      
      Thread A				Thread B
      - __write_data_page (inode x, page y)
       - f2fs_do_write_data_page
        - set_page_writeback        ---- set writeback flag in page y
        - f2fs_inplace_write_data
       - f2fs_balance_fs
      					 - lock gc_mutex
       - lock gc_mutex
      					  - f2fs_gc
      					   - do_garbage_collect
      					    - gc_data_segment
      					     - move_data_page
      					      - f2fs_wait_on_page_writeback
      					       - wait_on_page_writeback  --- wait writeback of page y
      
      However, the bio submission breaks the merge of IPU IOs.
      
      So in this patch let's add a global bio cache for merged IPU pages,
      then f2fs_wait_on_page_writeback() is able to submit bio if a
      writebacked page is cached in global bio cache.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0b20fcec
  14. 23 10月, 2019 5 次提交
  15. 05 10月, 2019 1 次提交
    • C
      f2fs: fix to update time in lazytime mode · fe1897ea
      Chao Yu 提交于
      generic/018 reports an inconsistent status of atime, the
      testcase is as below:
      - open file with O_SYNC
      - write file to construct fraged space
      - calc md5 of file
      - record {a,c,m}time
      - defrag file --- do nothing
      - umount & mount
      - check {a,c,m}time
      
      The root cause is, as f2fs enables lazytime by default, atime
      update will dirty vfs inode, rather than dirtying f2fs inode (by set
      with FI_DIRTY_INODE), so later f2fs_write_inode() called from VFS will
      fail to update inode page due to our skip:
      
      f2fs_write_inode()
      	if (is_inode_flag_set(inode, FI_DIRTY_INODE))
      		return 0;
      
      So eventually, after evict(), we lose last atime for ever.
      
      To fix this issue, we need to check whether {a,c,m,cr}time is
      consistent in between inode cache and inode page, and only skip
      f2fs_update_inode() if f2fs inode is not dirty and time is
      consistent as well.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fe1897ea
  16. 18 9月, 2019 1 次提交
  17. 16 9月, 2019 1 次提交