1. 12 9月, 2020 1 次提交
    • C
      f2fs: support age threshold based garbage collection · 093749e2
      Chao Yu 提交于
      There are several issues in current background GC algorithm:
      - valid blocks is one of key factors during cost overhead calculation,
      so if segment has less valid block, however even its age is young or
      it locates hot segment, CB algorithm will still choose the segment as
      victim, it's not appropriate.
      - GCed data/node will go to existing logs, no matter in-there datas'
      update frequency is the same or not, it may mix hot and cold data
      again.
      - GC alloctor mainly use LFS type segment, it will cost free segment
      more quickly.
      
      This patch introduces a new algorithm named age threshold based
      garbage collection to solve above issues, there are three steps
      mainly:
      
      1. select a source victim:
      - set an age threshold, and select candidates beased threshold:
      e.g.
       0 means youngest, 100 means oldest, if we set age threshold to 80
       then select dirty segments which has age in range of [80, 100] as
       candiddates;
      - set candidate_ratio threshold, and select candidates based the
      ratio, so that we can shrink candidates to those oldest segments;
      - select target segment with fewest valid blocks in order to
      migrate blocks with minimum cost;
      
      2. select a target victim:
      - select candidates beased age threshold;
      - set candidate_radius threshold, search candidates whose age is
      around source victims, searching radius should less than the
      radius threshold.
      - select target segment with most valid blocks in order to avoid
      migrating current target segment.
      
      3. merge valid blocks from source victim into target victim with
      SSR alloctor.
      
      Test steps:
      - create 160 dirty segments:
       * half of them have 128 valid blocks per segment
       * left of them have 384 valid blocks per segment
      - run background GC
      
      Benefit: GC count and block movement count both decrease obviously:
      
      - Before:
        - Valid: 86
        - Dirty: 1
        - Prefree: 11
        - Free: 6001 (6001)
      
      GC calls: 162 (BG: 220)
        - data segments : 160 (160)
        - node segments : 2 (2)
      Try to move 41454 blocks (BG: 41454)
        - data blocks : 40960 (40960)
        - node blocks : 494 (494)
      
      IPU: 0 blocks
      SSR: 0 blocks in 0 segments
      LFS: 41364 blocks in 81 segments
      
      - After:
      
        - Valid: 87
        - Dirty: 0
        - Prefree: 4
        - Free: 6008 (6008)
      
      GC calls: 75 (BG: 76)
        - data segments : 74 (74)
        - node segments : 1 (1)
      Try to move 12813 blocks (BG: 12813)
        - data blocks : 12544 (12544)
        - node blocks : 269 (269)
      
      IPU: 0 blocks
      SSR: 12032 blocks in 77 segments
      LFS: 855 blocks in 2 segments
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      093749e2
  2. 11 9月, 2020 1 次提交
    • C
      f2fs: introduce inmem curseg · d0b9e42a
      Chao Yu 提交于
      Previous implementation of aligned pinfile allocation will:
      - allocate new segment on cold data log no matter whether last used
      segment is partially used or not, it makes IOs more random;
      - force concurrent cold data/GCed IO going into warm data area, it
      can make a bad effect on hot/cold data separation;
      
      In this patch, we introduce a new type of log named 'inmem curseg',
      the differents from normal curseg is:
      - it reuses existed segment type (CURSEG_XXX_NODE/DATA);
      - it only exists in memory, its segno, blkofs, summary will not b
       persisted into checkpoint area;
      
      With this new feature, we can enhance scalability of log, special
      allocators can be created for purposes:
      - pure lfs allocator for aligned pinfile allocation or file
      defragmentation
      - pure ssr allocator for later feature
      
      So that, let's update aligned pinfile allocation to use this new
      inmem curseg fwk.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d0b9e42a
  3. 04 8月, 2020 3 次提交
  4. 26 7月, 2020 1 次提交
  5. 09 7月, 2020 1 次提交
  6. 09 6月, 2020 1 次提交
    • E
      f2fs: don't return vmalloc() memory from f2fs_kmalloc() · 0b6d4ca0
      Eric Biggers 提交于
      kmalloc() returns kmalloc'ed memory, and kvmalloc() returns either
      kmalloc'ed or vmalloc'ed memory.  But the f2fs wrappers, f2fs_kmalloc()
      and f2fs_kvmalloc(), both return both kinds of memory.
      
      It's redundant to have two functions that do the same thing, and also
      breaking the standard naming convention is causing bugs since people
      assume it's safe to kfree() memory allocated by f2fs_kmalloc().  See
      e.g. the various allocations in fs/f2fs/compress.c.
      
      Fix this by making f2fs_kmalloc() just use kmalloc().  And to avoid
      re-introducing the allocation failures that the vmalloc fallback was
      intended to fix, convert the largest allocations to use f2fs_kvmalloc().
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0b6d4ca0
  7. 19 5月, 2020 1 次提交
  8. 12 5月, 2020 3 次提交
  9. 18 4月, 2020 2 次提交
  10. 23 3月, 2020 1 次提交
  11. 20 3月, 2020 1 次提交
    • C
      f2fs: introduce DEFAULT_IO_TIMEOUT · 5df7731f
      Chao Yu 提交于
      As Geert Uytterhoeven reported:
      
      for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);
      
      On some platforms, HZ can be less than 50, then unexpected 0 timeout
      jiffies will be set in congestion_wait().
      
      This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
      value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.
      
      Quoted from Geert Uytterhoeven:
      
      "A timeout of HZ means 1 second.
      HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.
      
      If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
      as that takes care of the special cases, and never returns 0."
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5df7731f
  12. 11 3月, 2020 1 次提交
  13. 28 2月, 2020 1 次提交
    • S
      f2fs: fix the panic in do_checkpoint() · bf22c3cc
      Sahitya Tummala 提交于
      There could be a scenario where f2fs_sync_meta_pages() will not
      ensure that all F2FS_DIRTY_META pages are submitted for IO. Thus,
      resulting in the below panic in do_checkpoint() -
      
      f2fs_bug_on(sbi, get_pages(sbi, F2FS_DIRTY_META) &&
      				!f2fs_cp_error(sbi));
      
      This can happen in a low-memory condition, where shrinker could
      also be doing the writepage operation (stack shown below)
      at the same time when checkpoint is running on another core.
      
      schedule
      down_write
      f2fs_submit_page_write -> by this time, this page in page cache is tagged
      			as PAGECACHE_TAG_WRITEBACK and PAGECACHE_TAG_DIRTY
      			is cleared, due to which f2fs_sync_meta_pages()
      			cannot sync this page in do_checkpoint() path.
      f2fs_do_write_meta_page
      __f2fs_write_meta_page
      f2fs_write_meta_page
      shrink_page_list
      shrink_inactive_list
      shrink_node_memcg
      shrink_node
      kswapd
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      bf22c3cc
  14. 16 1月, 2020 1 次提交
  15. 20 11月, 2019 1 次提交
  16. 03 7月, 2019 4 次提交
    • J
      f2fs: add a rw_sem to cover quota flag changes · db6ec53b
      Jaegeuk Kim 提交于
      Two paths to update quota and f2fs_lock_op:
      
      1.
       - lock_op
       |  - quota_update
       `- unlock_op
      
      2.
       - quota_update
       - lock_op
       `- unlock_op
      
      But, we need to make a transaction on quota_update + lock_op in #2 case.
      So, this patch introduces:
      1. lock_op
      2. down_write
      3. check __need_flush
      4. up_write
      5. if there is dirty quota entries, flush them
      6. otherwise, good to go
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      db6ec53b
    • C
      f2fs: use generic EFSBADCRC/EFSCORRUPTED · 10f966bb
      Chao Yu 提交于
      f2fs uses EFAULT as error number to indicate filesystem is corrupted
      all the time, but generic filesystems use EUCLEAN for such condition,
      we need to change to follow others.
      
      This patch adds two new macros as below to wrap more generic error
      code macros, and spread them in code.
      
      EFSBADCRC	EBADMSG		/* Bad CRC detected */
      EFSCORRUPTED	EUCLEAN		/* Filesystem is corrupted */
      Reported-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      10f966bb
    • J
      f2fs: introduce f2fs_<level> macros to wrap f2fs_printk() · dcbb4c10
      Joe Perches 提交于
      - Add and use f2fs_<level> macros
      - Convert f2fs_msg to f2fs_printk
      - Remove level from f2fs_printk and embed the level in the format
      - Coalesce formats and align multi-line arguments
      - Remove unnecessary duplicate extern f2fs_msg f2fs.h
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      dcbb4c10
    • Q
      f2fs: ioctl for removing a range from F2FS · 04f0b2ea
      Qiuyang Sun 提交于
      This ioctl shrinks a given length (aligned to sections) from end of the
      main area. Any cursegs and valid blocks will be moved out before
      invalidating the range.
      
      This feature can be used for adjusting partition sizes online.
      
      History of the patch:
      
      Sahitya Tummala:
       - Add this ioctl for f2fs_compat_ioctl() as well.
       - Fix debugfs status to reflect the online resize changes.
       - Fix potential race between online resize path and allocate new data
         block path or gc path.
      
      Others:
       - Rename some identifiers.
       - Add some error handling branches.
       - Clear sbi->next_victim_seg[BG_GC/FG_GC] in shrinking range.
       - Implement this interface as ext4's, and change the parameter from shrunk
      bytes to new block count of F2FS.
       - During resizing, force to empty sit_journal and forbid adding new
         entries to it, in order to avoid invalid segno in journal after resize.
       - Reduce sbi->user_block_count before resize starts.
       - Commit the updated superblock first, and then update in-memory metadata
         only when the former succeeds.
       - Target block count must align to sections.
       - Write checkpoint before and after committing the new superblock, w/o
      CP_FSCK_FLAG respectively, so that the FS can be fixed by fsck even if
      resize fails after the new superblock is committed.
       - In free_segment_range(), reduce granularity of gc_mutex.
       - Add protection on curseg migration.
       - Add freeze_bdev() and thaw_bdev() for resize fs.
       - Remove CUR_MAIN_SECS and use MAIN_SECS directly for allocation.
       - Recover super_block and FS metadata when resize fails.
       - No need to clear CP_FSCK_FLAG in update_ckpt_flags().
       - Clean up the sb and fs metadata update functions for resize_fs.
      
      Geert Uytterhoeven:
       - Use div_u64*() for 64-bit divisions
      
      Arnd Bergmann:
       - Not all architectures support get_user() with a 64-bit argument:
          ERROR: "__get_user_bad" [fs/f2fs/f2fs.ko] undefined!
          Use copy_from_user() here, this will always work.
      Signed-off-by: NQiuyang Sun <sunqiuyang@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      04f0b2ea
  17. 23 5月, 2019 2 次提交
    • C
      f2fs: fix to check layout on last valid checkpoint park · 5dae2d39
      Chao Yu 提交于
      As Ju Hyung reported:
      
      "
      I was semi-forced today to use the new kernel and test f2fs.
      
      My Ubuntu initramfs got a bit wonky and I had to boot into live CD and
      fix some stuffs. The live CD was using 4.15 kernel, and just mounting
      the f2fs partition there corrupted f2fs and my 4.19(with 5.1-rc1-4.19
      f2fs-stable merged) refused to mount with "SIT is corrupted node"
      message.
      
      I used the latest f2fs-tools sent by Chao including "fsck.f2fs: fix to
      repair cp_loads blocks at correct position"
      
      It spit out 140M worth of output, but at least I didn't have to run it
      twice. Everything returned "Ok" in the 2nd run.
      The new log is at
      http://arter97.com/f2fs/final
      
      After fixing the image, I used my 4.19 kernel with 5.2-rc1-4.19
      f2fs-stable merged and it mounted.
      
      But, I got this:
      [    1.047791] F2FS-fs (nvme0n1p3): layout of large_nat_bitmap is
      deprecated, run fsck to repair, chksum_offset: 4092
      [    1.081307] F2FS-fs (nvme0n1p3): Found nat_bits in checkpoint
      [    1.161520] F2FS-fs (nvme0n1p3): recover fsync data on readonly fs
      [    1.162418] F2FS-fs (nvme0n1p3): Mounted with checkpoint version = 761c7e00
      
      But after doing a reboot, the message is gone:
      [    1.098423] F2FS-fs (nvme0n1p3): Found nat_bits in checkpoint
      [    1.177771] F2FS-fs (nvme0n1p3): recover fsync data on readonly fs
      [    1.178365] F2FS-fs (nvme0n1p3): Mounted with checkpoint version = 761c7eda
      
      I'm not exactly sure why the kernel detected that I'm still using the
      old layout on the first boot. Maybe fsck didn't fix it properly, or
      the check from the kernel is improper.
      "
      
      Although we have rebuild the old deprecated checkpoint with new layout
      during repair, we only repair last checkpoint park, the other old one is
      remained.
      
      Once the image was mounted, we will 1) sanity check layout and 2) decide
      which checkpoint park to use according to cp_ver. So that we will print
      reported message unnecessarily at step 1), to avoid it, we simply move
      layout check into f2fs_sanity_check_ckpt() after step 2).
      Reported-by: NPark Ju Hyung <qkrwngud825@gmail.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5dae2d39
    • J
      f2fs: link f2fs quota ops for sysfile · bc88ac96
      Jaegeuk Kim 提交于
      This patch reverts:
      commit fb40d618 ("f2fs: don't clear CP_QUOTA_NEED_FSCK_FLAG").
      
      We were missing error handlers used in f2fs quota ops.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      bc88ac96
  18. 09 5月, 2019 7 次提交
    • C
      f2fs: fix to avoid potential race on sbi->unusable_block_count access/update · c9c8ed50
      Chao Yu 提交于
      Use sbi.stat_lock to protect sbi->unusable_block_count accesss/udpate, in
      order to avoid potential race on it.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c9c8ed50
    • C
      f2fs: introduce DATA_GENERIC_ENHANCE · 93770ab7
      Chao Yu 提交于
      Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check
      whether @blkaddr locates in main area or not.
      
      That check is weak, since the block address in range of main area can
      point to the address which is not valid in segment info table, and we
      can not detect such condition, we may suffer worse corruption as system
      continues running.
      
      So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check
      which trigger SIT bitmap check rather than only range check.
      
      This patch did below changes as wel:
      - set SBI_NEED_FSCK in f2fs_is_valid_blkaddr().
      - get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid.
      - introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check.
      - spread blkaddr check in:
       * f2fs_get_node_info()
       * __read_out_blkaddrs()
       * f2fs_submit_page_read()
       * ra_data_block()
       * do_recover_data()
      
      This patch can fix bug reported from bugzilla below:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203215
      https://bugzilla.kernel.org/show_bug.cgi?id=203223
      https://bugzilla.kernel.org/show_bug.cgi?id=203231
      https://bugzilla.kernel.org/show_bug.cgi?id=203235
      https://bugzilla.kernel.org/show_bug.cgi?id=203241
      
      = Update by Jaegeuk Kim =
      
      DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths.
      But, xfstest/generic/446 compalins some generated kernel messages saying invalid
      bitmap was detected when reading a block. The reaons is, when we get the
      block addresses from extent_cache, there is no lock to synchronize it from
      truncating the blocks in parallel.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      93770ab7
    • C
      f2fs: fix to be aware of readonly device in write_checkpoint() · f5a131bb
      Chao Yu 提交于
      As Park Ju Hyung reported:
      
      Probably unrelated but a similar issue:
      Warning appears upon unmounting a corrupted R/O f2fs loop image.
      
      Should be a trivial issue to fix as well :)
      
      [ 2373.758424] ------------[ cut here ]------------
      [ 2373.758428] generic_make_request: Trying to write to read-only
      block-device loop1 (partno 0)
      [ 2373.758455] WARNING: CPU: 1 PID: 13950 at block/blk-core.c:2174
      generic_make_request_checks+0x590/0x630
      [ 2373.758556] CPU: 1 PID: 13950 Comm: umount Tainted: G           O
         4.19.35-zen+ #1
      [ 2373.758558] Hardware name: System manufacturer System Product
      Name/ROG MAXIMUS X HERO (WI-FI AC), BIOS 1704 09/14/2018
      [ 2373.758564] RIP: 0010:generic_make_request_checks+0x590/0x630
      [ 2373.758567] Code: 5c 03 00 00 48 8d 74 24 08 48 89 df c6 05 b5 cd
      36 01 01 e8 c2 90 01 00 48 89 c6 44 89 ea 48 c7 c7 98 64 59 82 e8 d5
      9b a7 ff <0f> 0b 48 8b 7b 08 e9 f2 fa ff ff 41 8b 86 98 02 00 00 49 8b
      16 89
      [ 2373.758570] RSP: 0018:ffff8882bdb43950 EFLAGS: 00010282
      [ 2373.758573] RAX: 0000000000000050 RBX: ffff8887244c6700 RCX: 0000000000000006
      [ 2373.758575] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff88884ec56340
      [ 2373.758577] RBP: ffff888849c426c0 R08: 0000000000000004 R09: 00000000000003ba
      [ 2373.758579] R10: 0000000000000001 R11: 0000000000000029 R12: 0000000000001000
      [ 2373.758581] R13: 0000000000000000 R14: ffff888844a2e800 R15: ffff8882bdb43ac0
      [ 2373.758584] FS:  00007fc0d114f8c0(0000) GS:ffff88884ec40000(0000)
      knlGS:0000000000000000
      [ 2373.758586] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 2373.758588] CR2: 00007fc0d1ad12c0 CR3: 00000002bdb82003 CR4: 00000000003606e0
      [ 2373.758590] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 2373.758592] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 2373.758593] Call Trace:
      [ 2373.758602]  ? generic_make_request+0x46/0x3d0
      [ 2373.758608]  ? wait_woken+0x80/0x80
      [ 2373.758612]  ? mempool_alloc+0xb7/0x1a0
      [ 2373.758618]  ? submit_bio+0x30/0x110
      [ 2373.758622]  ? bvec_alloc+0x7c/0xd0
      [ 2373.758628]  ? __submit_merged_bio+0x68/0x390
      [ 2373.758633]  ? f2fs_submit_page_write+0x1bb/0x7f0
      [ 2373.758638]  ? f2fs_do_write_meta_page+0x7f/0x160
      [ 2373.758642]  ? __f2fs_write_meta_page+0x70/0x140
      [ 2373.758647]  ? f2fs_sync_meta_pages+0x140/0x250
      [ 2373.758653]  ? f2fs_write_checkpoint+0x5c5/0x17b0
      [ 2373.758657]  ? f2fs_sync_fs+0x9c/0x110
      [ 2373.758664]  ? sync_filesystem+0x66/0x80
      [ 2373.758667]  ? generic_shutdown_super+0x1d/0x100
      [ 2373.758670]  ? kill_block_super+0x1c/0x40
      [ 2373.758674]  ? kill_f2fs_super+0x64/0xb0
      [ 2373.758678]  ? deactivate_locked_super+0x2d/0xb0
      [ 2373.758682]  ? cleanup_mnt+0x65/0xa0
      [ 2373.758688]  ? task_work_run+0x7f/0xa0
      [ 2373.758693]  ? exit_to_usermode_loop+0x9c/0xa0
      [ 2373.758698]  ? do_syscall_64+0xc7/0xf0
      [ 2373.758703]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [ 2373.758706] ---[ end trace 5d3639907c56271b ]---
      [ 2373.758780] print_req_error: I/O error, dev loop1, sector 143048
      [ 2373.758800] print_req_error: I/O error, dev loop1, sector 152200
      [ 2373.758808] print_req_error: I/O error, dev loop1, sector 8192
      [ 2373.758819] print_req_error: I/O error, dev loop1, sector 12272
      
      This patch adds to detect readonly device in write_checkpoint() to avoid
      trigger write IOs on it.
      Reported-by: NPark Ju Hyung <qkrwngud825@gmail.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f5a131bb
    • C
      f2fs: fix to skip recovery on readonly device · b61af314
      Chao Yu 提交于
      As Park Ju Hyung reported in mailing list:
      
      https://sourceforge.net/p/linux-f2fs/mailman/message/36639787/
      
      generic_make_request: Trying to write to read-only block-device loop0 (partno 0)
      WARNING: CPU: 0 PID: 23437 at block/blk-core.c:2174 generic_make_request_checks+0x594/0x630
      
       generic_make_request+0x46/0x3d0
       submit_bio+0x30/0x110
       __submit_merged_bio+0x68/0x390
       f2fs_submit_page_write+0x1bb/0x7f0
       f2fs_do_write_meta_page+0x7f/0x160
       __f2fs_write_meta_page+0x70/0x140
       f2fs_sync_meta_pages+0x140/0x250
       f2fs_write_checkpoint+0x5c5/0x17b0
       f2fs_sync_fs+0x9c/0x110
       sync_filesystem+0x66/0x80
       f2fs_recover_fsync_data+0x790/0xa30
       f2fs_fill_super+0xe4e/0x1980
       mount_bdev+0x518/0x610
       mount_fs+0x34/0x13f
       vfs_kern_mount.part.11+0x4f/0x120
       do_mount+0x2d1/0xe40
       __x64_sys_mount+0xbf/0xe0
       do_syscall_64+0x4a/0xf0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      print_req_error: I/O error, dev loop0, sector 4096
      
      If block device is readonly, we should never trigger write IO from
      filesystem layer, but previously, orphan and journal recovery didn't
      consider such condition, result in triggering above warning, fix it.
      Reported-by: NPark Ju Hyung <qkrwngud825@gmail.com>
      Tested-by: NPark Ju Hyung <qkrwngud825@gmail.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b61af314
    • C
      f2fs: relocate chksum_offset for large_nat_bitmap feature · b471eb99
      Chao Yu 提交于
      For large_nat_bitmap feature, there is a design flaw:
      
      Previous:
      
      struct f2fs_checkpoint layout:
      +--------------------------+  0x0000
      | checkpoint_ver           |
      | ......                   |
      | checksum_offset          |------+
      | ......                   |      |
      | sit_nat_version_bitmap[] |<-----|-------+
      | ......                   |      |       |
      | checksum_value           |<-----+       |
      +--------------------------+  0x1000      |
      |                          |      nat_bitmap + sit_bitmap
      | payload blocks           |              |
      |                          |              |
      +--------------------------|<-------------+
      
      Obviously, if nat_bitmap size + sit_bitmap size is larger than
      MAX_BITMAP_SIZE_IN_CKPT, nat_bitmap or sit_bitmap may overlap
      checkpoint checksum's position, once checkpoint() is triggered
      from kernel, nat or sit bitmap will be damaged by checksum field.
      
      In order to fix this, let's relocate checksum_value's position
      to the head of sit_nat_version_bitmap as below, then nat/sit
      bitmap and chksum value update will become safe.
      
      After:
      
      struct f2fs_checkpoint layout:
      +--------------------------+  0x0000
      | checkpoint_ver           |
      | ......                   |
      | checksum_offset          |------+
      | ......                   |      |
      | sit_nat_version_bitmap[] |<-----+
      | ......                   |<-------------+
      |                          |              |
      +--------------------------+  0x1000      |
      |                          |      nat_bitmap + sit_bitmap
      | payload blocks           |              |
      |                          |              |
      +--------------------------|<-------------+
      
      Related report and discussion:
      
      https://sourceforge.net/p/linux-f2fs/mailman/message/36642346/Reported-by: NPark Ju Hyung <qkrwngud825@gmail.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b471eb99
    • C
      f2fs: allow unfixed f2fs_checkpoint.checksum_offset · d7eb8f1c
      Chao Yu 提交于
      Previously, f2fs_checkpoint.checksum_offset points fixed position of
      f2fs_checkpoint structure:
      
      "#define CP_CHKSUM_OFFSET	4092"
      
      It is unnecessary, and it breaks the consecutiveness of nat and sit
      bitmap stored across checkpoint park block and payload blocks.
      
      This patch allows f2fs to handle unfixed .checksum_offset.
      
      In addition, for the case checksum value is stored in the middle of
      checkpoint park, calculating checksum value with superposition method
      like we did for inode_checksum.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d7eb8f1c
    • C
      f2fs: fix wrong __is_meta_io() macro · 6dc3a126
      Chao Yu 提交于
      This patch changes codes as below:
      - don't use is_read_io() as a condition to judge the meta IO.
      - use .is_por to replace .is_meta to indicate IO is from recovery explicitly.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      6dc3a126
  19. 06 4月, 2019 1 次提交
    • C
      f2fs: fix potential recursive call when enabling data_flush · 186857c5
      Chao Yu 提交于
      As Hagbard Celine reported:
      
      Hi, this is a long standing bug that I've hit before on older kernels,
      but I was not able to get the syslog saved because of the nature of
      the bug. This time I had booted form a pen-drive, and was able to save
      the log to it's efi-partition.
      What i did to trigger it was to create a partition and format it f2fs,
      then mount it with options:
      "rw,relatime,lazytime,background_gc=on,disable_ext_identify,discard,heap,user_xattr,inline_xattr,acl,inline_data,inline_dentry,flush_merge,data_flush,extent_cache,mode=adaptive,active_logs=6,whint_mode=fs-based,alloc_mode=default,fsync_mode=strict".
      Then I unpacked a big .tar.xz to the partition (I used a
      gentoo-stage3-tarball as I was in process of installing Gentoo).
      
      Same options just without data_flush gives no problems.
      
      Mar 20 20:54:01 usbgentoo kernel: FAT-fs (nvme0n1p4): Volume was not
      properly unmounted. Some data may be corrupt. Please run fsck.
      Mar 20 21:05:23 usbgentoo kernel: kworker/dying (1588) used greatest
      stack depth: 12064 bytes left
      Mar 20 21:06:40 usbgentoo kernel: BUG: stack guard page was hit at
      00000000a4b0733c (stack is 0000000056016422..0000000096e7463f)
      Mar 20 21:06:40 usbgentoo kernel: kernel stack overflow
      
      ......
      
      Mar 20 21:06:40 usbgentoo kernel: Call Trace:
      Mar 20 21:06:40 usbgentoo kernel:  read_node_page+0x71/0xf0
      Mar 20 21:06:40 usbgentoo kernel:  ? xas_load+0x8/0x50
      Mar 20 21:06:40 usbgentoo kernel:  __get_node_page+0x73/0x2a0
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_get_dnode_of_data+0x34e/0x580
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_write_inline_data+0x5e/0x2a0
      Mar 20 21:06:40 usbgentoo kernel:  __write_data_page+0x421/0x690
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_write_cache_pages+0x1cf/0x460
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_write_data_pages+0x2b3/0x2e0
      Mar 20 21:06:40 usbgentoo kernel:  ? f2fs_inode_chksum_verify+0x1d/0xc0
      Mar 20 21:06:40 usbgentoo kernel:  ? read_node_page+0x71/0xf0
      Mar 20 21:06:40 usbgentoo kernel:  do_writepages+0x3c/0xd0
      Mar 20 21:06:40 usbgentoo kernel:  __filemap_fdatawrite_range+0x7c/0xb0
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_sync_dirty_inodes+0xf2/0x200
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_balance_fs_bg+0x2a3/0x2c0
      Mar 20 21:06:40 usbgentoo kernel:  ? f2fs_inode_dirtied+0x21/0xc0
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_balance_fs+0xd6/0x2b0
      Mar 20 21:06:40 usbgentoo kernel:  __write_data_page+0x4fb/0x690
      
      ......
      
      Mar 20 21:06:40 usbgentoo kernel:  __writeback_single_inode+0x2a1/0x340
      Mar 20 21:06:40 usbgentoo kernel:  ? soft_cursor+0x1b4/0x220
      Mar 20 21:06:40 usbgentoo kernel:  writeback_sb_inodes+0x1d5/0x3e0
      Mar 20 21:06:40 usbgentoo kernel:  __writeback_inodes_wb+0x58/0xa0
      Mar 20 21:06:40 usbgentoo kernel:  wb_writeback+0x250/0x2e0
      Mar 20 21:06:40 usbgentoo kernel:  ? 0xffffffff8c000000
      Mar 20 21:06:40 usbgentoo kernel:  ? cpumask_next+0x16/0x20
      Mar 20 21:06:40 usbgentoo kernel:  wb_workfn+0x2f6/0x3b0
      Mar 20 21:06:40 usbgentoo kernel:  ? __switch_to_asm+0x40/0x70
      Mar 20 21:06:40 usbgentoo kernel:  process_one_work+0x1f5/0x3f0
      Mar 20 21:06:40 usbgentoo kernel:  worker_thread+0x28/0x3c0
      Mar 20 21:06:40 usbgentoo kernel:  ? rescuer_thread+0x330/0x330
      Mar 20 21:06:40 usbgentoo kernel:  kthread+0x10e/0x130
      Mar 20 21:06:40 usbgentoo kernel:  ? kthread_create_on_node+0x60/0x60
      Mar 20 21:06:40 usbgentoo kernel:  ret_from_fork+0x35/0x40
      
      The root cause is that we run into an infinite recursive calling in
      between f2fs_balance_fs_bg and writepage() as described below:
      
      - f2fs_write_data_pages		--- A
       - __write_data_page
        - f2fs_balance_fs
         - f2fs_balance_fs_bg		--- B
          - f2fs_sync_dirty_inodes
           - filemap_fdatawrite
            - f2fs_write_data_pages	--- A
      ...
                - f2fs_balance_fs_bg	--- B
      ...
      
      In order to fix this issue, let's detect such condition in __write_data_page()
      and just skip calling f2fs_balance_fs() recursively.
      Reported-by: NHagbard Celine <hagbardcelin@gmail.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      186857c5
  20. 13 3月, 2019 1 次提交
  21. 06 3月, 2019 1 次提交
  22. 16 2月, 2019 2 次提交
    • J
      f2fs: sync filesystem after roll-forward recovery · 812a9597
      Jaegeuk Kim 提交于
      Some works after roll-forward recovery can get an error which will release
      all the data structures. Let's flush them in order to make it clean.
      
      One possible corruption came from:
      
      [   90.400500] list_del corruption. prev->next should be ffffffed1f566208, but was (null)
      [   90.675349] Call trace:
      [   90.677869]  __list_del_entry_valid+0x94/0xb4
      [   90.682351]  remove_dirty_inode+0xac/0x114
      [   90.686563]  __f2fs_write_data_pages+0x6a8/0x6c8
      [   90.691302]  f2fs_write_data_pages+0x40/0x4c
      [   90.695695]  do_writepages+0x80/0xf0
      [   90.699372]  __writeback_single_inode+0xdc/0x4ac
      [   90.704113]  writeback_sb_inodes+0x280/0x440
      [   90.708501]  wb_writeback+0x1b8/0x3d0
      [   90.712267]  wb_workfn+0x1a8/0x4d4
      [   90.715765]  process_one_work+0x1c0/0x3d4
      [   90.719883]  worker_thread+0x224/0x344
      [   90.723739]  kthread+0x120/0x130
      [   90.727055]  ret_from_fork+0x10/0x18
      Reported-by: NSahitya Tummala <stummala@codeaurora.org>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      812a9597
    • J
      f2fs: add quick mode of checkpoint=disable for QA · db610a64
      Jaegeuk Kim 提交于
      This mode returns mount() quickly with EAGAIN. We can trigger this by
      shutdown(F2FS_GOING_DOWN_NEED_FSCK).
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      db610a64
  23. 27 12月, 2018 2 次提交