1. 11 3月, 2020 1 次提交
  2. 28 2月, 2020 2 次提交
  3. 04 2月, 2020 1 次提交
  4. 20 11月, 2019 1 次提交
  5. 08 11月, 2019 1 次提交
  6. 07 9月, 2019 2 次提交
    • J
      f2fs: fix flushing node pages when checkpoint is disabled · 100c0655
      Jaegeuk Kim 提交于
      This patch fixes skipping node page writes when checkpoint is disabled.
      In this period, we can't rely on checkpoint to flush node pages.
      
      Fixes: fd8c8caf ("f2fs: let checkpoint flush dnode page of regular")
      Fixes: 4354994f ("f2fs: checkpoint disabling")
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      100c0655
    • C
      f2fs: fix to writeout dirty inode during node flush · 052a82d8
      Chao Yu 提交于
      As Eric reported:
      
      On xfstest generic/204 on f2fs, I'm getting a kernel BUG.
      
       allocate_segment_by_default+0x9d/0x100 [f2fs]
       f2fs_allocate_data_block+0x3c0/0x5c0 [f2fs]
       do_write_page+0x62/0x110 [f2fs]
       f2fs_do_write_node_page+0x2b/0xa0 [f2fs]
       __write_node_page+0x2ec/0x590 [f2fs]
       f2fs_sync_node_pages+0x756/0x7e0 [f2fs]
       block_operations+0x25b/0x350 [f2fs]
       f2fs_write_checkpoint+0x104/0x1150 [f2fs]
       f2fs_sync_fs+0xa2/0x120 [f2fs]
       f2fs_balance_fs_bg+0x33c/0x390 [f2fs]
       f2fs_write_node_pages+0x4c/0x1f0 [f2fs]
       do_writepages+0x1c/0x70
       __writeback_single_inode+0x45/0x320
       writeback_sb_inodes+0x273/0x5c0
       wb_writeback+0xff/0x2e0
       wb_workfn+0xa1/0x370
       process_one_work+0x138/0x350
       worker_thread+0x4d/0x3d0
       kthread+0x109/0x140
      
      The root cause of this issue is, in a very small partition, e.g.
      in generic/204 testcase of fstest suit, filesystem's free space
      is 50MB, so at most we can write 12800 inline inode with command:
      `echo XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > $SCRATCH_MNT/$i`,
      then filesystem will have:
      - 12800 dirty inline data page
      - 12800 dirty inode page
      - and 12800 dirty imeta (dirty inode)
      
      When we flush node-inode's page cache, we can also flush inline
      data with each inode page, however it will run out-of-free-space
      in device, then once it triggers checkpoint, there is no room for
      huge number of imeta, at this time, GC is useless, as there is no
      dirty segment at all.
      
      In order to fix this, we try to recognize inode page during
      node_inode's page flushing, and update inode page from dirty inode,
      so that later another imeta (dirty inode) flush can be avoided.
      Reported-and-tested-by: NEric Biggers <ebiggers@kernel.org>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      052a82d8
  7. 23 8月, 2019 1 次提交
  8. 03 7月, 2019 2 次提交
  9. 31 5月, 2019 1 次提交
  10. 09 5月, 2019 5 次提交
    • C
      f2fs: introduce DATA_GENERIC_ENHANCE · 93770ab7
      Chao Yu 提交于
      Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check
      whether @blkaddr locates in main area or not.
      
      That check is weak, since the block address in range of main area can
      point to the address which is not valid in segment info table, and we
      can not detect such condition, we may suffer worse corruption as system
      continues running.
      
      So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check
      which trigger SIT bitmap check rather than only range check.
      
      This patch did below changes as wel:
      - set SBI_NEED_FSCK in f2fs_is_valid_blkaddr().
      - get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid.
      - introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check.
      - spread blkaddr check in:
       * f2fs_get_node_info()
       * __read_out_blkaddrs()
       * f2fs_submit_page_read()
       * ra_data_block()
       * do_recover_data()
      
      This patch can fix bug reported from bugzilla below:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203215
      https://bugzilla.kernel.org/show_bug.cgi?id=203223
      https://bugzilla.kernel.org/show_bug.cgi?id=203231
      https://bugzilla.kernel.org/show_bug.cgi?id=203235
      https://bugzilla.kernel.org/show_bug.cgi?id=203241
      
      = Update by Jaegeuk Kim =
      
      DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths.
      But, xfstest/generic/446 compalins some generated kernel messages saying invalid
      bitmap was detected when reading a block. The reaons is, when we get the
      block addresses from extent_cache, there is no lock to synchronize it from
      truncating the blocks in parallel.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      93770ab7
    • C
      f2fs: allow address pointer number of dnode aligning to specified size · d02a6e61
      Chao Yu 提交于
      This patch expands scalability of dnode layout, it allows address pointer
      number of dnode aligning to specified size (now, the size is one byte by
      default), and later the number can align to compress cluster size
      (1 << n bytes, n=[2,..)), it can avoid cluster acrossing two dnode, making
      design of compress meta layout simple.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d02a6e61
    • C
      f2fs: fix to do sanity check on free nid · 626bcf2b
      Chao Yu 提交于
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203225
      
      - Overview
      When mounting the attached crafted image and unmounting it, following errors are reported.
      Additionally, it hangs on sync after unmounting.
      
      The image is intentionally fuzzed from a normal f2fs image for testing.
      Compile options for F2FS are as follows.
      CONFIG_F2FS_FS=y
      CONFIG_F2FS_STAT_FS=y
      CONFIG_F2FS_FS_XATTR=y
      CONFIG_F2FS_FS_POSIX_ACL=y
      CONFIG_F2FS_CHECK_FS=y
      
      - Reproduces
      mkdir test
      mount -t f2fs tmp.img test
      touch test/t
      umount test
      sync
      
      - Messages
       kernel BUG at fs/f2fs/node.c:3073!
       RIP: 0010:f2fs_destroy_node_manager+0x2f0/0x300
       Call Trace:
        f2fs_put_super+0xf4/0x270
        generic_shutdown_super+0x62/0x110
        kill_block_super+0x1c/0x50
        kill_f2fs_super+0xad/0xd0
        deactivate_locked_super+0x35/0x60
        cleanup_mnt+0x36/0x70
        task_work_run+0x75/0x90
        exit_to_usermode_loop+0x93/0xa0
        do_syscall_64+0xba/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0010:f2fs_destroy_node_manager+0x2f0/0x300
      
      NAT table is corrupted, so reserved meta/node inode ids were added into
      free list incorrectly, during file creation, since reserved id has cached
      in inode hash, so it fails the creation and preallocated nid can not be
      released later, result in kernel panic.
      
      To fix this issue, let's do nid boundary check during free nid loading.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      626bcf2b
    • C
      f2fs: fix to do checksum even if inode page is uptodate · b42b179b
      Chao Yu 提交于
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203221
      
      - Overview
      When mounting the attached crafted image and running program, this error is reported.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_07.c
      mkdir test
      mount -t f2fs tmp.img test
      cp a.out test
      cd test
      sudo ./a.out
      
      - Messages
       kernel BUG at fs/f2fs/node.c:1279!
       RIP: 0010:read_node_page+0xcf/0xf0
       Call Trace:
        __get_node_page+0x6b/0x2f0
        f2fs_iget+0x8f/0xdf0
        f2fs_lookup+0x136/0x320
        __lookup_slow+0x92/0x140
        lookup_slow+0x30/0x50
        walk_component+0x1c1/0x350
        path_lookupat+0x62/0x200
        filename_lookup+0xb3/0x1a0
        do_fchmodat+0x3e/0xa0
        __x64_sys_chmod+0x12/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      On below paths, we can have opportunity to readahead inode page
      - gc_node_segment -> f2fs_ra_node_page
      - gc_data_segment -> f2fs_ra_node_page
      - f2fs_fill_dentries -> f2fs_ra_node_page
      
      Unlike synchronized read, on readahead path, we can set page uptodate
      before verifying page's checksum, then read_node_page() will trigger
      kernel panic once it encounters a uptodated page w/ incorrect checksum.
      
      So considering readahead scenario, we have to do checksum each time
      when loading inode page even if it is uptodated.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b42b179b
    • C
      f2fs: fix to avoid panic in f2fs_remove_inode_page() · 8b6810f8
      Chao Yu 提交于
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203219
      
      - Overview
      When mounting the attached crafted image and running program, I got this error.
      Additionally, it hangs on sync after running the program.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_06.c
      mkdir test
      mount -t f2fs tmp.img test
      cp a.out test
      cd test
      sudo ./a.out
      sync
      
      - Messages
       kernel BUG at fs/f2fs/node.c:1183!
       RIP: 0010:f2fs_remove_inode_page+0x294/0x2d0
       Call Trace:
        f2fs_evict_inode+0x2a3/0x3a0
        evict+0xba/0x180
        __dentry_kill+0xbe/0x160
        dentry_kill+0x46/0x180
        dput+0xbb/0x100
        do_renameat2+0x3c9/0x550
        __x64_sys_rename+0x17/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The reason is f2fs_remove_inode_page() will trigger kernel panic due to
      inconsistent i_blocks value of inode.
      
      To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
      give a hint to fsck for latter repairing of potential image corruption.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix build warning and add unlikely]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      8b6810f8
  11. 09 4月, 2019 1 次提交
    • G
      fs: mark expected switch fall-throughs · 0a4c9265
      Gustavo A. R. Silva 提交于
      In preparation to enabling -Wimplicit-fallthrough, mark switch cases
      where we are expecting to fall through.
      
      This patch fixes the following warnings:
      
      fs/affs/affs.h:124:38: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/configfs/dir.c:1692:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/configfs/dir.c:1694:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/ceph/file.c:249:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/ext4/hash.c:233:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/ext4/hash.c:246:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/ext2/inode.c:1237:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/ext2/inode.c:1244:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/ext4/indirect.c:1182:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/ext4/indirect.c:1188:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/ext4/indirect.c:1432:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/ext4/indirect.c:1440:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/f2fs/node.c:618:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/f2fs/node.c:620:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/btrfs/ref-verify.c:522:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/gfs2/bmap.c:711:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/gfs2/bmap.c:722:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/jffs2/fs.c:339:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/nfsd/nfs4proc.c:429:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/ufs/util.h:62:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/ufs/util.h:43:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/fcntl.c:770:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/seq_file.c:319:10: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/libfs.c:148:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/libfs.c:150:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/signalfd.c:178:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/locks.c:1473:16: warning: this statement may fall through [-Wimplicit-fallthrough=]
      
      Warning level 3 was used: -Wimplicit-fallthrough=3
      
      This patch is part of the ongoing efforts to enabling
      -Wimplicit-fallthrough.
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      0a4c9265
  12. 13 3月, 2019 1 次提交
  13. 16 2月, 2019 1 次提交
    • J
      f2fs: sync filesystem after roll-forward recovery · 812a9597
      Jaegeuk Kim 提交于
      Some works after roll-forward recovery can get an error which will release
      all the data structures. Let's flush them in order to make it clean.
      
      One possible corruption came from:
      
      [   90.400500] list_del corruption. prev->next should be ffffffed1f566208, but was (null)
      [   90.675349] Call trace:
      [   90.677869]  __list_del_entry_valid+0x94/0xb4
      [   90.682351]  remove_dirty_inode+0xac/0x114
      [   90.686563]  __f2fs_write_data_pages+0x6a8/0x6c8
      [   90.691302]  f2fs_write_data_pages+0x40/0x4c
      [   90.695695]  do_writepages+0x80/0xf0
      [   90.699372]  __writeback_single_inode+0xdc/0x4ac
      [   90.704113]  writeback_sb_inodes+0x280/0x440
      [   90.708501]  wb_writeback+0x1b8/0x3d0
      [   90.712267]  wb_workfn+0x1a8/0x4d4
      [   90.715765]  process_one_work+0x1c0/0x3d4
      [   90.719883]  worker_thread+0x224/0x344
      [   90.723739]  kthread+0x120/0x130
      [   90.727055]  ret_from_fork+0x10/0x18
      Reported-by: NSahitya Tummala <stummala@codeaurora.org>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      812a9597
  14. 27 12月, 2018 2 次提交
    • C
      f2fs: check PageWriteback flag for ordered case · bae0ee7a
      Chao Yu 提交于
      For all ordered cases in f2fs_wait_on_page_writeback(), we need to
      check PageWriteback status, so let's clean up to relocate the check
      into f2fs_wait_on_page_writeback().
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      bae0ee7a
    • J
      f2fs: use kvmalloc, if kmalloc is failed · 5222595d
      Jaegeuk Kim 提交于
      One report says memalloc failure during mount.
      
       (unwind_backtrace) from [<c010cd4c>] (show_stack+0x10/0x14)
       (show_stack) from [<c049c6b8>] (dump_stack+0x8c/0xa0)
       (dump_stack) from [<c024fcf0>] (warn_alloc+0xc4/0x160)
       (warn_alloc) from [<c0250218>] (__alloc_pages_nodemask+0x3f4/0x10d0)
       (__alloc_pages_nodemask) from [<c0270450>] (kmalloc_order_trace+0x2c/0x120)
       (kmalloc_order_trace) from [<c03fa748>] (build_node_manager+0x35c/0x688)
       (build_node_manager) from [<c03de494>] (f2fs_fill_super+0xf0c/0x16cc)
       (f2fs_fill_super) from [<c02a5864>] (mount_bdev+0x15c/0x188)
       (mount_bdev) from [<c03da624>] (f2fs_mount+0x18/0x20)
       (f2fs_mount) from [<c02a68b8>] (mount_fs+0x158/0x19c)
       (mount_fs) from [<c02c3c9c>] (vfs_kern_mount+0x78/0x134)
       (vfs_kern_mount) from [<c02c76ac>] (do_mount+0x474/0xca4)
       (do_mount) from [<c02c8264>] (SyS_mount+0x94/0xbc)
       (SyS_mount) from [<c0108180>] (ret_fast_syscall+0x0/0x48)
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5222595d
  15. 14 12月, 2018 1 次提交
  16. 27 11月, 2018 2 次提交
  17. 21 10月, 2018 1 次提交
  18. 17 10月, 2018 3 次提交
    • C
      f2fs: remove unneeded disable_nat_bits() · 3b30eb19
      Chao Yu 提交于
      Commit 7735730d ("f2fs: fix to propagate error from __get_meta_page()")
      added disable_nat_bits() in error path of __get_nat_bitmaps(), but it's
      unneeded, beause we will fail mount, we won't have chance to change nid
      usage status w/o nat full/empty bitmaps.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      3b30eb19
    • C
      f2fs: fix to recover cold bit of inode block during POR · ef2a0071
      Chao Yu 提交于
      Testcase to reproduce this bug:
      1. mkfs.f2fs /dev/sdd
      2. mount -t f2fs /dev/sdd /mnt/f2fs
      3. touch /mnt/f2fs/file
      4. sync
      5. chattr +A /mnt/f2fs/file
      6. xfs_io -f /mnt/f2fs/file -c "fsync"
      7. godown /mnt/f2fs
      8. umount /mnt/f2fs
      9. mount -t f2fs /dev/sdd /mnt/f2fs
      10. chattr -A /mnt/f2fs/file
      11. xfs_io -f /mnt/f2fs/file -c "fsync"
      12. umount /mnt/f2fs
      13. mount -t f2fs /dev/sdd /mnt/f2fs
      14. lsattr /mnt/f2fs/file
      
      -----------------N- /mnt/f2fs/file
      
      But actually, we expect the corrct result is:
      
      -------A---------N- /mnt/f2fs/file
      
      The reason is in step 9) we missed to recover cold bit flag in inode
      block, so later, in fsync, we will skip write inode block due to below
      condition check, result in lossing data in another SPOR.
      
      f2fs_fsync_node_pages()
      	if (!IS_DNODE(page) || !is_cold_node(page))
      		continue;
      
      Note that, I guess that some non-dir inode has already lost cold bit
      during POR, so in order to reenable recovery for those inode, let's
      try to recover cold bit in f2fs_iget() to save more fsynced data.
      
      Fixes: c5667575 ("f2fs: remove unneeded set_cold_node()")
      Cc: <stable@vger.kernel.org> 4.17+
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      ef2a0071
    • C
      f2fs: submit cached bio to avoid endless PageWriteback · 48018b4c
      Chao Yu 提交于
      When migrating encrypted block from background GC thread, we only add
      them into f2fs inner bio cache, but forget to submit the cached bio, it
      may cause potential deadlock when we are waiting page writebacked, fix
      it.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      48018b4c
  19. 01 10月, 2018 1 次提交
    • C
      Revert: "f2fs: check last page index in cached bio to decide submission" · bab475c5
      Chao Yu 提交于
      There is one case that we can leave bio in f2fs, result in hanging
      page writeback waiter.
      
      Thread A				Thread B
      - f2fs_write_cache_pages
       - f2fs_submit_page_write
       page #0 cached in bio #0 of cold log
       - f2fs_submit_page_write
       page #1 cached in bio #1 of warm log
      					- f2fs_write_cache_pages
      					 - f2fs_submit_page_write
      					 bio is full, submit bio #1 contain page #1
       - f2fs_submit_merged_write_cond(, page #1)
       fail to submit bio #0 due to page #1 is not in any cached bios.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      bab475c5
  20. 29 9月, 2018 2 次提交
  21. 27 9月, 2018 1 次提交
    • C
      f2fs: fix to recover inode's crtime during POR · 5cd1f387
      Chao Yu 提交于
      Testcase to reproduce this bug:
      1. mkfs.f2fs -O extra_attr -O inode_crtime /dev/sdd
      2. mount -t f2fs /dev/sdd /mnt/f2fs
      3. touch /mnt/f2fs/file
      4. xfs_io -f /mnt/f2fs/file -c "fsync"
      5. godown /mnt/f2fs
      6. umount /mnt/f2fs
      7. mount -t f2fs /dev/sdd /mnt/f2fs
      8. xfs_io -f /mnt/f2fs/file -c "statx -r"
      
      stat.btime.tv_sec = 0
      stat.btime.tv_nsec = 0
      
      This patch fixes to recover inode creation time fields during
      mount.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5cd1f387
  22. 21 9月, 2018 1 次提交
  23. 13 9月, 2018 1 次提交
  24. 15 8月, 2018 1 次提交
    • A
      f2fs: rework fault injection handling to avoid a warning · 7fa750a1
      Arnd Bergmann 提交于
      When CONFIG_F2FS_FAULT_INJECTION is disabled, we get a warning about an
      unused label:
      
      fs/f2fs/segment.c: In function '__submit_discard_cmd':
      fs/f2fs/segment.c:1059:1: error: label 'submit' defined but not used [-Werror=unused-label]
      
      This could be fixed by adding another #ifdef around it, but the more
      reliable way of doing this seems to be to remove the other #ifdefs
      where that is easily possible.
      
      By defining time_to_inject() as a trivial stub, most of the checks for
      CONFIG_F2FS_FAULT_INJECTION can go away. This also leads to nicer
      formatting of the code.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7fa750a1
  25. 14 8月, 2018 1 次提交
  26. 11 8月, 2018 2 次提交
    • C
      f2fs: fix to avoid broken of dnode block list · 50fa53ec
      Chao Yu 提交于
      f2fs recovery flow is relying on dnode block link list, it means fsynced
      file recovery depends on previous dnode's persistence in the list, so
      during fsync() we should wait on all regular inode's dnode writebacked
      before issuing flush.
      
      By this way, we can avoid dnode block list being broken by out-of-order
      IO submission due to IO scheduler or driver.
      
      Sheng Yong helps to do the test with this patch:
      
      Target:/data (f2fs, -)
      64MB / 32768KB / 4KB / 8
      
      1 / PERSIST / Index
      
      Base:
      	SEQ-RD(MB/s)	SEQ-WR(MB/s)	RND-RD(IOPS)	RND-WR(IOPS)	Insert(TPS)	Update(TPS)	Delete(TPS)
      1	867.82		204.15		41440.03	41370.54	680.8		1025.94		1031.08
      2	871.87		205.87		41370.3		40275.2		791.14		1065.84		1101.7
      3	866.52		205.69		41795.67	40596.16	694.69		1037.16		1031.48
      Avg	868.7366667	205.2366667	41535.33333	40747.3		722.21		1042.98		1054.753333
      
      After:
      	SEQ-RD(MB/s)	SEQ-WR(MB/s)	RND-RD(IOPS)	RND-WR(IOPS)	Insert(TPS)	Update(TPS)	Delete(TPS)
      1	798.81		202.5		41143		40613.87	602.71		838.08		913.83
      2	805.79		206.47		40297.2		41291.46	604.44		840.75		924.27
      3	814.83		206.17		41209.57	40453.62	602.85		834.66		927.91
      Avg	806.4766667	205.0466667	40883.25667	40786.31667	603.3333333	837.83		922.0033333
      
      Patched/Original:
      	0.928332713	0.999074239	0.984300676	1.000957528	0.835398753	0.803303994	0.874141189
      
      It looks like atomic write will suffer performance regression.
      
      I suspect that the criminal is that we forcing to wait all dnode being in
      storage cache before we issue PREFLUSH+FUA.
      
      BTW, will commit ("f2fs: don't need to wait for node writes for atomic write")
      cause the problem: we will lose data of last transaction after SPO, even if
      atomic write return no error:
      
      - atomic_open();
      - write() P1, P2, P3;
      - atomic_commit();
       - writeback data: P1, P2, P3;
       - writeback node: N1, N2, N3;  <--- If N1, N2 is not writebacked, N3 with fsync_mark is
      writebacked, In SPOR, we won't find N3 since node chain is broken, turns out that losing
      last transaction.
       - preflush + fua;
      - power-cut
      
      If we don't wait dnode writeback for atomic_write:
      
      	SEQ-RD(MB/s)	SEQ-WR(MB/s)	RND-RD(IOPS)	RND-WR(IOPS)	Insert(TPS)	Update(TPS)	Delete(TPS)
      1	779.91		206.03		41621.5		40333.16	716.9		1038.21		1034.85
      2	848.51		204.35		40082.44	39486.17	791.83		1119.96		1083.77
      3	772.12		206.27		41335.25	41599.65	723.29		1055.07		971.92
      Avg	800.18		205.55		41013.06333	40472.99333	744.0066667	1071.08		1030.18
      
      Patched/Original:
      	0.92108464	1.001526693	0.987425886	0.993268102	1.030180511	1.026942031	0.976702294
      
      SQLite's performance recovers.
      
      Jaegeuk:
      "Practically, I don't see db corruption becase of this. We can excuse to lose
      the last transaction."
      
      Finally, we decide to keep original implementation of atomic write interface
      sematics that we don't wait all dnode writeback before preflush+fua submission.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      50fa53ec
    • J
      f2fs: avoid f2fs_bug_on() in cp_error case · 8d714f8a
      Jaegeuk Kim 提交于
      There is a subtle race condition to invoke f2fs_bug_on() in shutdown tests. I've
      confirmed that the last checkpoint is preserved in consistent state, so it'd be
      fine to just return error at this moment.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      8d714f8a
  27. 02 8月, 2018 1 次提交
    • C
      f2fs: let checkpoint flush dnode page of regular · fd8c8caf
      Chao Yu 提交于
      Fsyncer will wait on all dnode pages of regular writeback before flushing,
      if there are async dnode pages blocked by IO scheduler, it may decrease
      fsync's performance.
      
      In this patch, we choose to let f2fs_balance_fs_bg() to trigger checkpoint
      to flush these dnode pages of regular, so async IO of dnode page can be
      elimitnated, making fsyncer only need to wait for sync IO.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fd8c8caf