1. 19 6月, 2020 1 次提交
  2. 30 5月, 2020 1 次提交
    • C
      f2fs: fix wrong discard space · ca7f76e6
      Chao Yu 提交于
      Under heavy fsstress, we may triggle panic while issuing discard,
      because __check_sit_bitmap() detects that discard command may earse
      valid data blocks, the root cause is as below race stack described,
      since we removed lock when flushing quota data, quota data writeback
      may race with write_checkpoint(), so that it causes inconsistency in
      between cached discard entry and segment bitmap.
      
      - f2fs_write_checkpoint
       - block_operations
        - set_sbi_flag(sbi, SBI_QUOTA_SKIP_FLUSH)
       - f2fs_flush_sit_entries
        - add_discard_addrs
         - __set_bit_le(i, (void *)de->discard_map);
      						- f2fs_write_data_pages
      						 - f2fs_write_single_data_page
      						   : inode is quota one, cp_rwsem won't be locked
      						  - f2fs_do_write_data_page
      						   - f2fs_allocate_data_block
      						    - f2fs_wait_discard_bio
      						      : discard entry has not been added yet.
      						    - update_sit_entry
       - f2fs_clear_prefree_segments
        - f2fs_issue_discard
        : add discard entry
      
      In order to fix this, this patch uses node_write to serialize
      f2fs_allocate_data_block and checkpoint.
      
      Fixes: 435cbab9 ("f2fs: fix quota_sync failure due to f2fs_lock_op")
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      ca7f76e6
  3. 29 5月, 2020 1 次提交
  4. 18 4月, 2020 3 次提交
  5. 04 4月, 2020 1 次提交
  6. 31 3月, 2020 1 次提交
    • C
      f2fs: don't trigger data flush in foreground operation · 7bcd0cfa
      Chao Yu 提交于
      Data flush can generate heavy IO and cause long latency during
      flush, so it's not appropriate to trigger it in foreground
      operation.
      
      And also, we may face below potential deadlock during data flush:
      - f2fs_write_multi_pages
       - f2fs_write_raw_pages
        - f2fs_write_single_data_page
         - f2fs_balance_fs
          - f2fs_balance_fs_bg
           - f2fs_sync_dirty_inodes
            - filemap_fdatawrite   -- stuck on flush same cluster
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7bcd0cfa
  7. 20 3月, 2020 4 次提交
  8. 18 1月, 2020 3 次提交
    • C
      f2fs: change to use rwsem for gc_mutex · fb24fea7
      Chao Yu 提交于
      Mutex lock won't serialize callers, in order to avoid starving of unlucky
      caller, let's use rwsem lock instead.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fb24fea7
    • J
      f2fs: add a way to turn off ipu bio cache · 0e7f4197
      Jaegeuk Kim 提交于
      Setting 0x40 in /sys/fs/f2fs/dev/ipu_policy gives a way to turn off
      bio cache, which is useufl to check whether block layer using hardware
      encryption engine merges IOs correctly.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0e7f4197
    • C
      f2fs: support data compression · 4c8ff709
      Chao Yu 提交于
      This patch tries to support compression in f2fs.
      
      - New term named cluster is defined as basic unit of compression, file can
      be divided into multiple clusters logically. One cluster includes 4 << n
      (n >= 0) logical pages, compression size is also cluster size, each of
      cluster can be compressed or not.
      
      - In cluster metadata layout, one special flag is used to indicate cluster
      is compressed one or normal one, for compressed cluster, following metadata
      maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
      data including compress header and compressed data.
      
      - In order to eliminate write amplification during overwrite, F2FS only
      support compression on write-once file, data can be compressed only when
      all logical blocks in file are valid and cluster compress ratio is lower
      than specified threshold.
      
      - To enable compression on regular inode, there are three ways:
      * chattr +c file
      * chattr +c dir; touch dir/file
      * mount w/ -o compress_extension=ext; touch file.ext
      
      Compress metadata layout:
                                   [Dnode Structure]
                   +-----------------------------------------------+
                   | cluster 1 | cluster 2 | ......... | cluster N |
                   +-----------------------------------------------+
                   .           .                       .           .
             .                       .                .                      .
        .         Compressed Cluster       .        .        Normal Cluster            .
      +----------+---------+---------+---------+  +---------+---------+---------+---------+
      |compr flag| block 1 | block 2 | block 3 |  | block 1 | block 2 | block 3 | block 4 |
      +----------+---------+---------+---------+  +---------+---------+---------+---------+
                 .                             .
               .                                           .
             .                                                           .
            +-------------+-------------+----------+----------------------------+
            | data length | data chksum | reserved |      compressed data       |
            +-------------+-------------+----------+----------------------------+
      
      Changelog:
      
      20190326:
      - fix error handling of read_end_io().
      - remove unneeded comments in f2fs_encrypt_one_page().
      
      20190327:
      - fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
      - don't jump into loop directly to avoid uninitialized variables.
      - add TODO tag in error path of f2fs_write_cache_pages().
      
      20190328:
      - fix wrong merge condition in f2fs_read_multi_pages().
      - check compressed file in f2fs_post_read_required().
      
      20190401
      - allow overwrite on non-compressed cluster.
      - check cluster meta before writing compressed data.
      
      20190402
      - don't preallocate blocks for compressed file.
      
      - add lz4 compress algorithm
      - process multiple post read works in one workqueue
        Now f2fs supports processing post read work in multiple workqueue,
        it shows low performance due to schedule overhead of multiple
        workqueue executing orderly.
      
      20190921
      - compress: support buffered overwrite
      C: compress cluster flag
      V: valid block address
      N: NEW_ADDR
      
      One cluster contain 4 blocks
      
       before overwrite   after overwrite
      
      - VVVV		->	CVNN
      - CVNN		->	VVVV
      
      - CVNN		->	CVNN
      - CVNN		->	CVVV
      
      - CVVV		->	CVNN
      - CVVV		->	CVVV
      
      20191029
      - add kconfig F2FS_FS_COMPRESSION to isolate compression related
      codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
      note that: will remove lzo backend if Jaegeuk agreed that too.
      - update codes according to Eric's comments.
      
      20191101
      - apply fixes from Jaegeuk
      
      20191113
      - apply fixes from Jaegeuk
      - split workqueue for fsverity
      
      20191216
      - apply fixes from Jaegeuk
      
      20200117
      - fix to avoid NULL pointer dereference
      
      [Jaegeuk Kim]
      - add tracepoint for f2fs_{,de}compress_pages()
      - fix many bugs and add some compression stats
      - fix overwrite/mmap bugs
      - address 32bit build error, reported by Geert.
      - bug fixes when handling errors and i_compressed_blocks
      
      Reported-by: <noreply@ellerman.id.au>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      4c8ff709
  9. 16 1月, 2020 3 次提交
    • S
      f2fs: cleanup duplicate stats for atomic files · 0e6d0164
      Sahitya Tummala 提交于
      Remove duplicate sbi->aw_cnt stats counter that tracks
      the number of atomic files currently opened (it also shows
      incorrect value sometimes). Use more relit lable sbi->atomic_files
      to show in the stats.
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0e6d0164
    • S
      f2fs: Check write pointer consistency of non-open zones · d508c94e
      Shin'ichiro Kawasaki 提交于
      To catch f2fs bugs in write pointer handling code for zoned block
      devices, check write pointers of non-open zones that current segments do
      not point to. Do this check at mount time, after the fsync data recovery
      and current segments' write pointer consistency fix. Or when fsync data
      recovery is disabled by mount option, do the check when there is no fsync
      data.
      
      Check two items comparing write pointers with valid block maps in SIT.
      The first item is check for zones with no valid blocks. When there is no
      valid blocks in a zone, the write pointer should be at the start of the
      zone. If not, next write operation to the zone will cause unaligned write
      error. If write pointer is not at the zone start, reset the write pointer
      to place at the zone start.
      
      The second item is check between the write pointer position and the last
      valid block in the zone. It is unexpected that the last valid block
      position is beyond the write pointer. In such a case, report as a bug.
      Fix is not required for such zone, because the zone is not selected for
      next write operation until the zone get discarded.
      Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d508c94e
    • S
      f2fs: Check write pointer consistency of open zones · c426d991
      Shin'ichiro Kawasaki 提交于
      On sudden f2fs shutdown, write pointers of zoned block devices can go
      further but f2fs meta data keeps current segments at positions before the
      write operations. After remounting the f2fs, this inconsistency causes
      write operations not at write pointers and "Unaligned write command"
      error is reported.
      
      To avoid the error, compare current segments with write pointers of open
      zones the current segments point to, during mount operation. If the write
      pointer position is not aligned with the current segment position, assign
      a new zone to the current segment. Also check the newly assigned zone has
      write pointer at zone start. If not, reset write pointer of the zone.
      
      Perform the consistency check during fsync recovery. Not to lose the
      fsync data, do the check after fsync data gets restored and before
      checkpoint commit which flushes data at current segment positions. Not to
      cause conflict with kworker's dirfy data/node flush, do the fix within
      SBI_POR_DOING protection.
      Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c426d991
  10. 20 11月, 2019 2 次提交
  11. 08 11月, 2019 1 次提交
  12. 07 11月, 2019 1 次提交
  13. 26 10月, 2019 1 次提交
    • C
      f2fs: cache global IPU bio · 0b20fcec
      Chao Yu 提交于
      In commit 8648de2c ("f2fs: add bio cache for IPU"), we added
      f2fs_submit_ipu_bio() in __write_data_page() as below:
      
      __write_data_page()
      
      	if (!S_ISDIR(inode->i_mode) && !IS_NOQUOTA(inode)) {
      		f2fs_submit_ipu_bio(sbi, bio, page);
      		....
      	}
      
      in order to avoid below deadlock:
      
      Thread A				Thread B
      - __write_data_page (inode x, page y)
       - f2fs_do_write_data_page
        - set_page_writeback        ---- set writeback flag in page y
        - f2fs_inplace_write_data
       - f2fs_balance_fs
      					 - lock gc_mutex
       - lock gc_mutex
      					  - f2fs_gc
      					   - do_garbage_collect
      					    - gc_data_segment
      					     - move_data_page
      					      - f2fs_wait_on_page_writeback
      					       - wait_on_page_writeback  --- wait writeback of page y
      
      However, the bio submission breaks the merge of IPU IOs.
      
      So in this patch let's add a global bio cache for merged IPU pages,
      then f2fs_wait_on_page_writeback() is able to submit bio if a
      writebacked page is cached in global bio cache.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0b20fcec
  14. 16 9月, 2019 2 次提交
    • C
      f2fs: fix to add missing F2FS_IO_ALIGNED() condition · 8223ecc4
      Chao Yu 提交于
      In f2fs_allocate_data_block(), we will reset fio.retry for IO
      alignment feature instead of IO serialization feature.
      
      In addition, spread F2FS_IO_ALIGNED() to check IO alignment
      feature status explicitly.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      8223ecc4
    • J
      f2fs: avoid infinite GC loop due to stale atomic files · 743b620c
      Jaegeuk Kim 提交于
      If committing atomic pages is failed when doing f2fs_do_sync_file(), we can
      get commited pages but atomic_file being still set like:
      
      - inmem:    0, atomic IO:    4 (Max.   10), volatile IO:    0 (Max.    0)
      
      If GC selects this block, we can get an infinite loop like this:
      
      f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
      f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, sector = 18533696, size = 4096
      f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, pre_victim_secno = 4355, prefree = 0, free = 234
      f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
      f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
      f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, sector = 18533696, size = 4096
      f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, pre_victim_secno = 4355, prefree = 0, free = 234
      f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
      
      In that moment, we can observe:
      
      [Before]
      Try to move 5084219 blocks (BG: 384508)
        - data blocks : 4962373 (274483)
        - node blocks : 121846 (110025)
      Skipped : atomic write 4534686 (10)
      
      [After]
      Try to move 5088973 blocks (BG: 384508)
        - data blocks : 4967127 (274483)
        - node blocks : 121846 (110025)
      Skipped : atomic write 4539440 (10)
      
      So, refactor atomic_write flow like this:
      1. start_atomic_write
       - add inmem_list and set atomic_file
      
      2. write()
       - register it in inmem_pages
      
      3. commit_atomic_write
       - if no error, f2fs_drop_inmem_pages()
       - f2fs_commit_inmme_pages() failed
         : __revoked_inmem_pages() was done
       - f2fs_do_sync_file failed
         : abort_atomic_write later
      
      4. abort_atomic_write
       - f2fs_drop_inmem_pages
      
      5. f2fs_drop_inmem_pages
       - clear atomic_file
       - remove inmem_list
      
      Based on this change, when GC fails to move block in atomic_file,
      f2fs_drop_inmem_pages_all() can call f2fs_drop_inmem_pages().
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      743b620c
  15. 07 9月, 2019 1 次提交
  16. 23 8月, 2019 6 次提交
    • S
      f2fs: Fix indefinite loop in f2fs_gc() · bbf9f7d9
      Sahitya Tummala 提交于
      Policy - Foreground GC, LFS and greedy GC mode.
      
      Under this policy, f2fs_gc() loops forever to GC as it doesn't have
      enough free segements to proceed and thus it keeps calling gc_more
      for the same victim segment.  This can happen if the selected victim
      segment could not be GC'd due to failed blkaddr validity check i.e.
      is_alive() returns false for the blocks set in current validity map.
      
      Fix this by keeping track of such invalid segments and skip those
      segments for selection in get_victim_by_default() to avoid endless
      GC loop under such error scenarios. Currently, add this logic under
      CONFIG_F2FS_CHECK_FS to be able to root cause the issue in debug
      version.
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix wrong bitmap size]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      bbf9f7d9
    • C
      f2fs: allocate memory in batch in build_sit_info() · 2fde3dd1
      Chao Yu 提交于
      build_sit_info() allocate all bitmaps for each segment one by one,
      it's quite low efficiency, this pach changes to allocate large
      continuous memory at a time, and divide it and assign for each bitmaps
      of segment. For large size image, it can expect improving its mount
      speed.
      Signed-off-by: NChen Gong <gongchen4@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2fde3dd1
    • C
      f2fs: fix to avoid data corruption by forbidding SSR overwrite · 899fee36
      Chao Yu 提交于
      There is one case can cause data corruption.
      
      - write 4k to fileA
      - fsync fileA, 4k data is writebacked to lbaA
      - write 4k to fileA
      - kworker flushs 4k to lbaB; dnode contain lbaB didn't be persisted yet
      - write 4k to fileB
      - kworker flush 4k to lbaA due to SSR
      - SPOR -> dnode with lbaA will be recovered, however lbaA contains fileB's
      data
      
      One solution is tracking all fsynced file's block history, and disallow
      SSR overwrite on newly invalidated block on that file.
      
      However, during recovery, no matter the dnode is flushed or fsynced, all
      previous dnodes until last fsynced one in node chain can be recovered,
      that means we need to record all block change in flushed dnode, which
      will cause heavy cost, so let's just use simple fix by forbidding SSR
      overwrite directly.
      
      Fixes: 5b6c6be2 ("f2fs: use SSR for warm node as well")
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      899fee36
    • C
      Revert "f2fs: avoid out-of-range memory access" · a37d0862
      Chao Yu 提交于
      As Pavel Machek reported:
      
      "We normally use -EUCLEAN to signal filesystem corruption. Plus, it is
      good idea to report it to the syslog and mark filesystem as "needing
      fsck" if filesystem can do that."
      
      Still we need improve the original patch with:
      - use unlikely keyword
      - add message print
      - return EUCLEAN
      
      However, after rethink this patch, I don't think we should add such
      condition check here as below reasons:
      - We have already checked the field in f2fs_sanity_check_ckpt(),
      - If there is fs corrupt or security vulnerability, there is nothing
      to guarantee the field is integrated after the check, unless we do
      the check before each of its use, however no filesystem does that.
      - We only have similar check for bitmap, which was added due to there
      is bitmap corruption happened on f2fs' runtime in product.
      - There are so many key fields in SB/CP/NAT did have such check
      after f2fs_sanity_check_{sb,cp,..}.
      
      So I propose to revert this unneeded check.
      
      This reverts commit 56f3ce67.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      a37d0862
    • L
      f2fs: cleanup the code in build_sit_entries. · 290c30d4
      Lihong Kou 提交于
      We do not need to set the SBI_NEED_FSCK flag in the error paths, if we
      return error here, we will not update the checkpoint flag, so the code
      is useless, just remove it.
      Signed-off-by: NLihong Kou <koulihong@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      290c30d4
    • C
      f2fs: fix to avoid discard command leak · 04f9287a
      Chao Yu 提交于
       =============================================================================
       BUG discard_cmd (Tainted: G    B      OE  ): Objects remaining in discard_cmd on __kmem_cache_shutdown()
       -----------------------------------------------------------------------------
      
       INFO: Slab 0xffffe1ac481d22c0 objects=36 used=2 fp=0xffff936b4748bf50 flags=0x2ffff0000000100
       Call Trace:
        dump_stack+0x63/0x87
        slab_err+0xa1/0xb0
        __kmem_cache_shutdown+0x183/0x390
        shutdown_cache+0x14/0x110
        kmem_cache_destroy+0x195/0x1c0
        f2fs_destroy_segment_manager_caches+0x21/0x40 [f2fs]
        exit_f2fs_fs+0x35/0x641 [f2fs]
        SyS_delete_module+0x155/0x230
        ? vtime_user_exit+0x29/0x70
        do_syscall_64+0x6e/0x160
        entry_SYSCALL64_slow_path+0x25/0x25
      
       INFO: Object 0xffff936b4748b000 @offset=0
       INFO: Object 0xffff936b4748b070 @offset=112
       kmem_cache_destroy discard_cmd: Slab cache still has objects
       Call Trace:
        dump_stack+0x63/0x87
        kmem_cache_destroy+0x1b4/0x1c0
        f2fs_destroy_segment_manager_caches+0x21/0x40 [f2fs]
        exit_f2fs_fs+0x35/0x641 [f2fs]
        SyS_delete_module+0x155/0x230
        do_syscall_64+0x6e/0x160
        entry_SYSCALL64_slow_path+0x25/0x25
      
      Recovery can cache discard commands, so in error path of fill_super(),
      we need give a chance to handle them, otherwise it will lead to leak
      of discard_cmd slab cache.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      04f9287a
  17. 11 7月, 2019 2 次提交
  18. 03 7月, 2019 4 次提交
    • C
      f2fs: use generic EFSBADCRC/EFSCORRUPTED · 10f966bb
      Chao Yu 提交于
      f2fs uses EFAULT as error number to indicate filesystem is corrupted
      all the time, but generic filesystems use EUCLEAN for such condition,
      we need to change to follow others.
      
      This patch adds two new macros as below to wrap more generic error
      code macros, and spread them in code.
      
      EFSBADCRC	EBADMSG		/* Bad CRC detected */
      EFSCORRUPTED	EUCLEAN		/* Filesystem is corrupted */
      Reported-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      10f966bb
    • C
      f2fs: print kernel message if filesystem is inconsistent · 2d821c12
      Chao Yu 提交于
      As Pavel reported, once we detect filesystem inconsistency in
      f2fs_inplace_write_data(), it will be better to print kernel message as
      we did in other places.
      Reported-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2d821c12
    • J
      f2fs: introduce f2fs_<level> macros to wrap f2fs_printk() · dcbb4c10
      Joe Perches 提交于
      - Add and use f2fs_<level> macros
      - Convert f2fs_msg to f2fs_printk
      - Remove level from f2fs_printk and embed the level in the format
      - Coalesce formats and align multi-line arguments
      - Remove unnecessary duplicate extern f2fs_msg f2fs.h
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      dcbb4c10
    • Q
      f2fs: ioctl for removing a range from F2FS · 04f0b2ea
      Qiuyang Sun 提交于
      This ioctl shrinks a given length (aligned to sections) from end of the
      main area. Any cursegs and valid blocks will be moved out before
      invalidating the range.
      
      This feature can be used for adjusting partition sizes online.
      
      History of the patch:
      
      Sahitya Tummala:
       - Add this ioctl for f2fs_compat_ioctl() as well.
       - Fix debugfs status to reflect the online resize changes.
       - Fix potential race between online resize path and allocate new data
         block path or gc path.
      
      Others:
       - Rename some identifiers.
       - Add some error handling branches.
       - Clear sbi->next_victim_seg[BG_GC/FG_GC] in shrinking range.
       - Implement this interface as ext4's, and change the parameter from shrunk
      bytes to new block count of F2FS.
       - During resizing, force to empty sit_journal and forbid adding new
         entries to it, in order to avoid invalid segno in journal after resize.
       - Reduce sbi->user_block_count before resize starts.
       - Commit the updated superblock first, and then update in-memory metadata
         only when the former succeeds.
       - Target block count must align to sections.
       - Write checkpoint before and after committing the new superblock, w/o
      CP_FSCK_FLAG respectively, so that the FS can be fixed by fsck even if
      resize fails after the new superblock is committed.
       - In free_segment_range(), reduce granularity of gc_mutex.
       - Add protection on curseg migration.
       - Add freeze_bdev() and thaw_bdev() for resize fs.
       - Remove CUR_MAIN_SECS and use MAIN_SECS directly for allocation.
       - Recover super_block and FS metadata when resize fails.
       - No need to clear CP_FSCK_FLAG in update_ckpt_flags().
       - Clean up the sb and fs metadata update functions for resize_fs.
      
      Geert Uytterhoeven:
       - Use div_u64*() for 64-bit divisions
      
      Arnd Bergmann:
       - Not all architectures support get_user() with a 64-bit argument:
          ERROR: "__get_user_bad" [fs/f2fs/f2fs.ko] undefined!
          Use copy_from_user() here, this will always work.
      Signed-off-by: NQiuyang Sun <sunqiuyang@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      04f0b2ea
  19. 04 6月, 2019 2 次提交
    • D
      f2fs: Add option to limit required GC for checkpoint=disable · 4d3aed70
      Daniel Rosenberg 提交于
      This extends the checkpoint option to allow checkpoint=disable:%u[%]
      This allows you to specify what how much of the disk you are willing
      to lose access to while mounting with checkpoint=disable. If the amount
      lost would be higher, the mount will return -EAGAIN. This can be given
      as a percent of total space, or in blocks.
      
      Currently, we need to run garbage collection until the amount of holes
      is smaller than the OVP space. With the new option, f2fs can mark
      space as unusable up front instead of requiring garbage collection until
      the number of holes is small enough.
      Signed-off-by: NDaniel Rosenberg <drosen@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      4d3aed70
    • D
      f2fs: Lower threshold for disable_cp_again · ae4ad7ea
      Daniel Rosenberg 提交于
      The existing threshold for allowable holes at checkpoint=disable time is
      too high. The OVP space contains reserved segments, which are always in
      the form of free segments. These must be subtracted from the OVP value.
      
      The current threshold is meant to be the maximum value of holes of a
      single type we can have and still guarantee that we can fill the disk
      without failing to find space for a block of a given type.
      
      If the disk is full, ignoring current reserved, which only helps us,
      the amount of unused blocks is equal to the OVP area. Of that, there
      are reserved segments, which must be free segments, and the rest of the
      ovp area, which can come from either free segments or holes. The maximum
      possible amount of holes is OVP-reserved.
      
      Now, consider the disk when mounting with checkpoint=disable.
      We must be able to fill all available free space with either data or
      node blocks. When we start with checkpoint=disable, holes are locked to
      their current type. Say we have H of one type of hole, and H+X of the
      other. We can fill H of that space with arbitrary typed blocks via SSR.
      For the remaining H+X blocks, we may not have any of a given block type
      left at all. For instance, if we were to fill the disk entirely with
      blocks of the type with fewer holes, the H+X blocks of the opposite type
      would not be used. If H+X > OVP-reserved, there would be more holes than
      could possibly exist, and we would have failed to find a suitable block
      earlier on, leading to a crash in update_sit_entry.
      
      If H+X <= OVP-reserved, then the holes end up effectively masked by the OVP
      region in this case.
      Signed-off-by: NDaniel Rosenberg <drosen@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      ae4ad7ea