1. 09 5月, 2019 15 次提交
    • C
      f2fs: fix to do sanity check on valid block count of segment · e95bcdb2
      Chao Yu 提交于
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203233
      
      - Overview
      When mounting the attached crafted image and running program, following errors are reported.
      Additionally, it hangs on sync after running program.
      
      The image is intentionally fuzzed from a normal f2fs image for testing.
      Compile options for F2FS are as follows.
      CONFIG_F2FS_FS=y
      CONFIG_F2FS_STAT_FS=y
      CONFIG_F2FS_FS_XATTR=y
      CONFIG_F2FS_FS_POSIX_ACL=y
      CONFIG_F2FS_CHECK_FS=y
      
      - Reproduces
      cc poc_13.c
      mkdir test
      mount -t f2fs tmp.img test
      cp a.out test
      cd test
      sudo ./a.out
      sync
      
      - Kernel messages
       F2FS-fs (sdb): Bitmap was wrongly set, blk:4608
       kernel BUG at fs/f2fs/segment.c:2102!
       RIP: 0010:update_sit_entry+0x394/0x410
       Call Trace:
        f2fs_allocate_data_block+0x16f/0x660
        do_write_page+0x62/0x170
        f2fs_do_write_node_page+0x33/0xa0
        __write_node_page+0x270/0x4e0
        f2fs_sync_node_pages+0x5df/0x670
        f2fs_write_checkpoint+0x372/0x1400
        f2fs_sync_fs+0xa3/0x130
        f2fs_do_sync_file+0x1a6/0x810
        do_fsync+0x33/0x60
        __x64_sys_fsync+0xb/0x10
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      sit.vblocks and sum valid block count in sit.valid_map may be
      inconsistent, segment w/ zero vblocks will be treated as free
      segment, while allocating in free segment, we may allocate a
      free block, if its bitmap is valid previously, it can cause
      kernel crash due to bitmap verification failure.
      
      Anyway, to avoid further serious metadata inconsistence and
      corruption, it is necessary and worth to detect SIT
      inconsistence. So let's enable check_block_count() to verify
      vblocks and valid_map all the time rather than do it only
      CONFIG_F2FS_CHECK_FS is enabled.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e95bcdb2
    • C
      f2fs: fix to do sanity check on valid node/block count · 7b63f72f
      Chao Yu 提交于
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203229
      
      - Overview
      When mounting the attached crafted image, following errors are reported.
      Additionally, it hangs on sync after trying to mount it.
      
      The image is intentionally fuzzed from a normal f2fs image for testing.
      Compile options for F2FS are as follows.
      CONFIG_F2FS_FS=y
      CONFIG_F2FS_STAT_FS=y
      CONFIG_F2FS_FS_XATTR=y
      CONFIG_F2FS_FS_POSIX_ACL=y
      CONFIG_F2FS_CHECK_FS=y
      
      - Reproduces
      mkdir test
      mount -t f2fs tmp.img test
      sync
      
      - Kernel message
       kernel BUG at fs/f2fs/recovery.c:591!
       RIP: 0010:recover_data+0x12d8/0x1780
       Call Trace:
        f2fs_recover_fsync_data+0x613/0x710
        f2fs_fill_super+0x1043/0x1aa0
        mount_bdev+0x16d/0x1a0
        mount_fs+0x4a/0x170
        vfs_kern_mount+0x5d/0x100
        do_mount+0x200/0xcf0
        ksys_mount+0x79/0xc0
        __x64_sys_mount+0x1c/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      With corrupted image wihch has out-of-range valid node/block count, during
      recovery, once we failed due to no free space, it will trigger kernel
      panic.
      
      Adding sanity check on valid node/block count in f2fs_sanity_check_ckpt()
      to detect such condition, so that potential panic can be avoided.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7b63f72f
    • C
      f2fs: fix to avoid panic in do_recover_data() · 22d61e28
      Chao Yu 提交于
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203227
      
      - Overview
      When mounting the attached crafted image, following errors are reported.
      Additionally, it hangs on sync after trying to mount it.
      
      The image is intentionally fuzzed from a normal f2fs image for testing.
      Compile options for F2FS are as follows.
      CONFIG_F2FS_FS=y
      CONFIG_F2FS_STAT_FS=y
      CONFIG_F2FS_FS_XATTR=y
      CONFIG_F2FS_FS_POSIX_ACL=y
      CONFIG_F2FS_CHECK_FS=y
      
      - Reproduces
      mkdir test
      mount -t f2fs tmp.img test
      sync
      
      - Messages
       kernel BUG at fs/f2fs/recovery.c:549!
       RIP: 0010:recover_data+0x167a/0x1780
       Call Trace:
        f2fs_recover_fsync_data+0x613/0x710
        f2fs_fill_super+0x1043/0x1aa0
        mount_bdev+0x16d/0x1a0
        mount_fs+0x4a/0x170
        vfs_kern_mount+0x5d/0x100
        do_mount+0x200/0xcf0
        ksys_mount+0x79/0xc0
        __x64_sys_mount+0x1c/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      During recovery, if ofs_of_node is inconsistent in between recovered
      node page and original checkpointed node page, let's just fail recovery
      instead of making kernel panic.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      22d61e28
    • C
      f2fs: fix to do sanity check on free nid · 626bcf2b
      Chao Yu 提交于
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203225
      
      - Overview
      When mounting the attached crafted image and unmounting it, following errors are reported.
      Additionally, it hangs on sync after unmounting.
      
      The image is intentionally fuzzed from a normal f2fs image for testing.
      Compile options for F2FS are as follows.
      CONFIG_F2FS_FS=y
      CONFIG_F2FS_STAT_FS=y
      CONFIG_F2FS_FS_XATTR=y
      CONFIG_F2FS_FS_POSIX_ACL=y
      CONFIG_F2FS_CHECK_FS=y
      
      - Reproduces
      mkdir test
      mount -t f2fs tmp.img test
      touch test/t
      umount test
      sync
      
      - Messages
       kernel BUG at fs/f2fs/node.c:3073!
       RIP: 0010:f2fs_destroy_node_manager+0x2f0/0x300
       Call Trace:
        f2fs_put_super+0xf4/0x270
        generic_shutdown_super+0x62/0x110
        kill_block_super+0x1c/0x50
        kill_f2fs_super+0xad/0xd0
        deactivate_locked_super+0x35/0x60
        cleanup_mnt+0x36/0x70
        task_work_run+0x75/0x90
        exit_to_usermode_loop+0x93/0xa0
        do_syscall_64+0xba/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0010:f2fs_destroy_node_manager+0x2f0/0x300
      
      NAT table is corrupted, so reserved meta/node inode ids were added into
      free list incorrectly, during file creation, since reserved id has cached
      in inode hash, so it fails the creation and preallocated nid can not be
      released later, result in kernel panic.
      
      To fix this issue, let's do nid boundary check during free nid loading.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      626bcf2b
    • C
      f2fs: fix to do checksum even if inode page is uptodate · b42b179b
      Chao Yu 提交于
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203221
      
      - Overview
      When mounting the attached crafted image and running program, this error is reported.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_07.c
      mkdir test
      mount -t f2fs tmp.img test
      cp a.out test
      cd test
      sudo ./a.out
      
      - Messages
       kernel BUG at fs/f2fs/node.c:1279!
       RIP: 0010:read_node_page+0xcf/0xf0
       Call Trace:
        __get_node_page+0x6b/0x2f0
        f2fs_iget+0x8f/0xdf0
        f2fs_lookup+0x136/0x320
        __lookup_slow+0x92/0x140
        lookup_slow+0x30/0x50
        walk_component+0x1c1/0x350
        path_lookupat+0x62/0x200
        filename_lookup+0xb3/0x1a0
        do_fchmodat+0x3e/0xa0
        __x64_sys_chmod+0x12/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      On below paths, we can have opportunity to readahead inode page
      - gc_node_segment -> f2fs_ra_node_page
      - gc_data_segment -> f2fs_ra_node_page
      - f2fs_fill_dentries -> f2fs_ra_node_page
      
      Unlike synchronized read, on readahead path, we can set page uptodate
      before verifying page's checksum, then read_node_page() will trigger
      kernel panic once it encounters a uptodated page w/ incorrect checksum.
      
      So considering readahead scenario, we have to do checksum each time
      when loading inode page even if it is uptodated.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b42b179b
    • C
      f2fs: fix to avoid panic in f2fs_remove_inode_page() · 8b6810f8
      Chao Yu 提交于
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203219
      
      - Overview
      When mounting the attached crafted image and running program, I got this error.
      Additionally, it hangs on sync after running the program.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_06.c
      mkdir test
      mount -t f2fs tmp.img test
      cp a.out test
      cd test
      sudo ./a.out
      sync
      
      - Messages
       kernel BUG at fs/f2fs/node.c:1183!
       RIP: 0010:f2fs_remove_inode_page+0x294/0x2d0
       Call Trace:
        f2fs_evict_inode+0x2a3/0x3a0
        evict+0xba/0x180
        __dentry_kill+0xbe/0x160
        dentry_kill+0x46/0x180
        dput+0xbb/0x100
        do_renameat2+0x3c9/0x550
        __x64_sys_rename+0x17/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The reason is f2fs_remove_inode_page() will trigger kernel panic due to
      inconsistent i_blocks value of inode.
      
      To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
      give a hint to fsck for latter repairing of potential image corruption.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix build warning and add unlikely]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      8b6810f8
    • C
      f2fs: fix to clear dirty inode in error path of f2fs_iget() · 546d22f0
      Chao Yu 提交于
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203217
      
      - Overview
      When mounting the attached crafted image and running program, I got this error.
      Additionally, it hangs on sync after running the program.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_test_05.c
      mkdir test
      mount -t f2fs tmp.img test
      sudo ./a.out
      sync
      
      - Messages
       kernel BUG at fs/f2fs/inode.c:707!
       RIP: 0010:f2fs_evict_inode+0x33f/0x3a0
       Call Trace:
        evict+0xba/0x180
        f2fs_iget+0x598/0xdf0
        f2fs_lookup+0x136/0x320
        __lookup_slow+0x92/0x140
        lookup_slow+0x30/0x50
        walk_component+0x1c1/0x350
        path_lookupat+0x62/0x200
        filename_lookup+0xb3/0x1a0
        do_readlinkat+0x56/0x110
        __x64_sys_readlink+0x16/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      During inode loading, __recover_inline_status() can recovery inode status
      and set inode dirty, once we failed in following process, it will fail
      the check in f2fs_evict_inode, result in trigger BUG_ON().
      
      Let's clear dirty inode in error path of f2fs_iget() to avoid panic.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      546d22f0
    • C
      f2fs: remove new blank line of f2fs kernel message · bda52397
      Chao Yu 提交于
      Just removing '\n' in f2fs_msg(, "\n") to avoid redundant new blank line.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      bda52397
    • C
      f2fs: fix wrong __is_meta_io() macro · 6dc3a126
      Chao Yu 提交于
      This patch changes codes as below:
      - don't use is_read_io() as a condition to judge the meta IO.
      - use .is_por to replace .is_meta to indicate IO is from recovery explicitly.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      6dc3a126
    • C
      f2fs: fix to avoid panic in dec_valid_node_count() · ea6d7e72
      Chao Yu 提交于
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203213
      
      - Overview
      When mounting the attached crafted image and running program, I got this error.
      Additionally, it hangs on sync after running the this script.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      mkdir test
      mount -t f2fs tmp.img test
      cp a.out test
      cd test
      sudo ./a.out
      sync
      
       kernel BUG at fs/f2fs/f2fs.h:2012!
       RIP: 0010:truncate_node+0x2c9/0x2e0
       Call Trace:
        f2fs_truncate_xattr_node+0xa1/0x130
        f2fs_remove_inode_page+0x82/0x2d0
        f2fs_evict_inode+0x2a3/0x3a0
        evict+0xba/0x180
        __dentry_kill+0xbe/0x160
        dentry_kill+0x46/0x180
        dput+0xbb/0x100
        do_renameat2+0x3c9/0x550
        __x64_sys_rename+0x17/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The reason is dec_valid_node_count() will trigger kernel panic due to
      inconsistent count in between inode.i_blocks and actual block.
      
      To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
      give a hint to fsck for latter repairing.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix build warning and add unlikely]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      ea6d7e72
    • C
      f2fs: fix to avoid panic in dec_valid_block_count() · 5e159cd3
      Chao Yu 提交于
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203209
      
      - Overview
      When mounting the attached crafted image and running program, I got this error.
      Additionally, it hangs on sync after the this script.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_01.c
      ./run.sh f2fs
      sync
      
       kernel BUG at fs/f2fs/f2fs.h:1788!
       RIP: 0010:f2fs_truncate_data_blocks_range+0x342/0x350
       Call Trace:
        f2fs_truncate_blocks+0x36d/0x3c0
        f2fs_truncate+0x88/0x110
        f2fs_setattr+0x3e1/0x460
        notify_change+0x2da/0x400
        do_truncate+0x6d/0xb0
        do_sys_ftruncate+0xf1/0x160
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The reason is dec_valid_block_count() will trigger kernel panic due to
      inconsistent count in between inode.i_blocks and actual block.
      
      To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
      give a hint to fsck for latter repairing.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix build warning and add unlikely]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5e159cd3
    • C
      f2fs: fix to use inline space only if inline_xattr is enable · 622927f3
      Chao Yu 提交于
      With below mkfs and mount option:
      
      MKFS_OPTIONS  -- -O extra_attr -O project_quota -O inode_checksum -O flexible_inline_xattr -O inode_crtime -f
      MOUNT_OPTIONS -- -o noinline_xattr
      
      We may miss xattr data with below testcase:
      - mkdir dir
      - setfattr -n "user.name" -v 0 dir
      - for ((i = 0; i < 190; i++)) do touch dir/$i; done
      - umount
      - mount
      - getfattr -n "user.name" dir
      
      user.name: No such attribute
      
      The root cause is that we persist xattr data into reserved inline xattr
      space, even if inline_xattr is not enable in inline directory inode, after
      inline dentry conversion, reserved space no longer exists, so that xattr
      data missed.
      
      Let's use inline xattr space only if inline_xattr flag is set on inode
      to fix this iusse.
      
      Fixes: 6afc662e ("f2fs: support flexible inline xattr size")
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      622927f3
    • C
      f2fs: fix to retrieve inline xattr space · 45a74688
      Chao Yu 提交于
      With below mkfs and mount option, generic/339 of fstest will report that
      scratch image becomes corrupted.
      
      MKFS_OPTIONS  -- -O extra_attr -O project_quota -O inode_checksum -O flexible_inline_xattr -O inode_crtime -f /dev/zram1
      MOUNT_OPTIONS -- -o acl,user_xattr -o discard,noinline_xattr /dev/zram1 /mnt/scratch_f2fs
      
      [ASSERT] (f2fs_check_dirent_position:1315)  --> Wrong position of dirent pino:1970, name: (...)
      level:8, dir_level:0, pgofs:951, correct range:[900, 901]
      
      In old kernel, inline data and directory always reserved 200 bytes in
      inode layout, even if inline_xattr is disabled, then new kernel tries
      to retrieve that space for non-inline xattr inode, but for inline dentry,
      its layout size should be fixed, so we just keep that reserved space.
      
      But the problem here is that, after inline dentry conversion, inline
      dentry layout no longer exists, if we still reserve inline xattr space,
      after dents updates, there will be a hole in inline xattr space, which
      can break hierarchy hash directory structure.
      
      This patch fixes this issue by retrieving inline xattr space after
      inline dentry conversion.
      
      Fixes: 6afc662e ("f2fs: support flexible inline xattr size")
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      45a74688
    • C
      f2fs: fix error path of recovery · 98838579
      Chao Yu 提交于
      There are some places in where we missed to unlock page or unlock page
      incorrectly, fix them.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      98838579
    • C
      f2fs: fix to avoid deadloop in foreground GC · 793ab1c8
      Chao Yu 提交于
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203211
      
      - Overview
      When mounting the attached crafted image and making a new file, I got this error and the error messages keep repeating.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I run with option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      mkdir test
      mount -t f2fs tmp.img test
      cd test
      touch t
      
      - Messages
      [   58.820451] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.821485] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.822530] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.823571] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.824616] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.825640] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.826663] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.827698] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.828719] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.829759] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.830783] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.831828] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.832869] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.833888] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.834945] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.835996] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.837028] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.838051] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.839072] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.840100] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.841147] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.842186] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.843214] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.844267] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.845282] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.846305] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.847341] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      ... (repeating)
      
      During GC, if segment type stored in SSA and SIT is inconsistent, we just
      skip migrating current segment directly, since we need to know the exact
      type to decide the migration function we use.
      
      So in foreground GC, we will easily run into a infinite loop as we may
      select the same victim segment which has inconsistent type due to greedy
      policy. In order to end up this, we choose to shutdown filesystem. For
      backgrond GC, we need to do that as well, so that we can avoid latter
      potential infinite looped foreground GC.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      793ab1c8
  2. 17 4月, 2019 2 次提交
  3. 06 4月, 2019 5 次提交
    • C
      f2fs: add comment for conditional compilation statement · e1074d4b
      Chao Yu 提交于
      Commit af033b2a ("f2fs: guarantee journalled quota data by checkpoint")
      added function is_journalled_quota() in f2fs.h, but it located outside of
      _LINUX_F2FS_H macro coverage, it has been fixed with commit 0af725fc
      ("f2fs: fix wrong #endif").
      
      But anyway, in order to avoid making same mistake latter, let's add single
      line comment to notice which #if the last #endif is corresponding to.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: Remove unnecessary empty EOL]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e1074d4b
    • C
      f2fs: fix potential recursive call when enabling data_flush · 186857c5
      Chao Yu 提交于
      As Hagbard Celine reported:
      
      Hi, this is a long standing bug that I've hit before on older kernels,
      but I was not able to get the syslog saved because of the nature of
      the bug. This time I had booted form a pen-drive, and was able to save
      the log to it's efi-partition.
      What i did to trigger it was to create a partition and format it f2fs,
      then mount it with options:
      "rw,relatime,lazytime,background_gc=on,disable_ext_identify,discard,heap,user_xattr,inline_xattr,acl,inline_data,inline_dentry,flush_merge,data_flush,extent_cache,mode=adaptive,active_logs=6,whint_mode=fs-based,alloc_mode=default,fsync_mode=strict".
      Then I unpacked a big .tar.xz to the partition (I used a
      gentoo-stage3-tarball as I was in process of installing Gentoo).
      
      Same options just without data_flush gives no problems.
      
      Mar 20 20:54:01 usbgentoo kernel: FAT-fs (nvme0n1p4): Volume was not
      properly unmounted. Some data may be corrupt. Please run fsck.
      Mar 20 21:05:23 usbgentoo kernel: kworker/dying (1588) used greatest
      stack depth: 12064 bytes left
      Mar 20 21:06:40 usbgentoo kernel: BUG: stack guard page was hit at
      00000000a4b0733c (stack is 0000000056016422..0000000096e7463f)
      Mar 20 21:06:40 usbgentoo kernel: kernel stack overflow
      
      ......
      
      Mar 20 21:06:40 usbgentoo kernel: Call Trace:
      Mar 20 21:06:40 usbgentoo kernel:  read_node_page+0x71/0xf0
      Mar 20 21:06:40 usbgentoo kernel:  ? xas_load+0x8/0x50
      Mar 20 21:06:40 usbgentoo kernel:  __get_node_page+0x73/0x2a0
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_get_dnode_of_data+0x34e/0x580
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_write_inline_data+0x5e/0x2a0
      Mar 20 21:06:40 usbgentoo kernel:  __write_data_page+0x421/0x690
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_write_cache_pages+0x1cf/0x460
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_write_data_pages+0x2b3/0x2e0
      Mar 20 21:06:40 usbgentoo kernel:  ? f2fs_inode_chksum_verify+0x1d/0xc0
      Mar 20 21:06:40 usbgentoo kernel:  ? read_node_page+0x71/0xf0
      Mar 20 21:06:40 usbgentoo kernel:  do_writepages+0x3c/0xd0
      Mar 20 21:06:40 usbgentoo kernel:  __filemap_fdatawrite_range+0x7c/0xb0
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_sync_dirty_inodes+0xf2/0x200
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_balance_fs_bg+0x2a3/0x2c0
      Mar 20 21:06:40 usbgentoo kernel:  ? f2fs_inode_dirtied+0x21/0xc0
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_balance_fs+0xd6/0x2b0
      Mar 20 21:06:40 usbgentoo kernel:  __write_data_page+0x4fb/0x690
      
      ......
      
      Mar 20 21:06:40 usbgentoo kernel:  __writeback_single_inode+0x2a1/0x340
      Mar 20 21:06:40 usbgentoo kernel:  ? soft_cursor+0x1b4/0x220
      Mar 20 21:06:40 usbgentoo kernel:  writeback_sb_inodes+0x1d5/0x3e0
      Mar 20 21:06:40 usbgentoo kernel:  __writeback_inodes_wb+0x58/0xa0
      Mar 20 21:06:40 usbgentoo kernel:  wb_writeback+0x250/0x2e0
      Mar 20 21:06:40 usbgentoo kernel:  ? 0xffffffff8c000000
      Mar 20 21:06:40 usbgentoo kernel:  ? cpumask_next+0x16/0x20
      Mar 20 21:06:40 usbgentoo kernel:  wb_workfn+0x2f6/0x3b0
      Mar 20 21:06:40 usbgentoo kernel:  ? __switch_to_asm+0x40/0x70
      Mar 20 21:06:40 usbgentoo kernel:  process_one_work+0x1f5/0x3f0
      Mar 20 21:06:40 usbgentoo kernel:  worker_thread+0x28/0x3c0
      Mar 20 21:06:40 usbgentoo kernel:  ? rescuer_thread+0x330/0x330
      Mar 20 21:06:40 usbgentoo kernel:  kthread+0x10e/0x130
      Mar 20 21:06:40 usbgentoo kernel:  ? kthread_create_on_node+0x60/0x60
      Mar 20 21:06:40 usbgentoo kernel:  ret_from_fork+0x35/0x40
      
      The root cause is that we run into an infinite recursive calling in
      between f2fs_balance_fs_bg and writepage() as described below:
      
      - f2fs_write_data_pages		--- A
       - __write_data_page
        - f2fs_balance_fs
         - f2fs_balance_fs_bg		--- B
          - f2fs_sync_dirty_inodes
           - filemap_fdatawrite
            - f2fs_write_data_pages	--- A
      ...
                - f2fs_balance_fs_bg	--- B
      ...
      
      In order to fix this issue, let's detect such condition in __write_data_page()
      and just skip calling f2fs_balance_fs() recursively.
      Reported-by: NHagbard Celine <hagbardcelin@gmail.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      186857c5
    • D
      f2fs: improve discard handling with multi-device volumes · 7f3d7719
      Damien Le Moal 提交于
      f2fs_hw_support_discard() only tests if the super block device supports
      discard. However, for a multi-device volume, not all disks used may
      support discard. Improve the check performed to test all devices of
      the volume and report discard as supported if at least one device of
      the volume supports discard. To implement this, introduce the helper
      function f2fs_bdev_support_discard(), which returns true for zoned block
      devices (where discard is processed as a zone reset) and for regular
      disks supporting the discard command.
      
      f2fs_bdev_support_discard() is also used in __queue_discard_cmd() to
      handle discard command issuing for a particular device of the volume.
      That is, prevent issuing a discard command for block devices that do
      not support it.
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7f3d7719
    • D
      f2fs: Reduce zoned block device memory usage · 95175daf
      Damien Le Moal 提交于
      For zoned block devices, an array of zone types for each device is
      allocated and initialized in order to determine if a section is stored
      on a sequential zone (zone reset needed) or a conventional zone (no
      zone reset needed and regular discard applies). Considering this usage,
      the zone types stored in memory can be replaced with a bitmap to
      indicate an equivalent information, that is, if a zone is sequential or
      not. This reduces the memory usage for each zoned device by roughly 8:
      on a 14TB disk with zones of 256 MB, the zone type array consumes
      13x4KB pages while the bitmap uses only 2x4KB pages.
      
      This patch changes the f2fs_dev_info structure blkz_type field to the
      bitmap blkz_seq. Access to this bitmap is done using the helper
      function f2fs_blkz_is_seq(), which is a rewrite of the function
      get_blkz_type().
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      95175daf
    • D
      f2fs: Fix use of number of devices · 0916878d
      Damien Le Moal 提交于
      For a single device mount using a zoned block device, the zone
      information for the device is stored in the sbi->devs single entry
      array and sbi->s_ndevs is set to 1. This differs from a single device
      mount using a regular block device which does not allocate sbi->devs
      and sets sbi->s_ndevs to 0.
      
      However, sbi->s_devs == 0 condition is used throughout the code to
      differentiate a single device mount from a multi-device mount where
      sbi->s_ndevs is always larger than 1. This results in problems with
      single zoned block device volumes as these are treated as multi-device
      mounts but do not have the start_blk and end_blk information set. One
      of the problem observed is skipping of zone discard issuing resulting in
      write commands being issued to full zones or unaligned to a zone write
      pointer.
      
      Fix this problem by simply treating the cases sbi->s_ndevs == 0 (single
      regular block device mount) and sbi->s_ndevs == 1 (single zoned block
      device mount) in the same manner. This is done by introducing the
      helper function f2fs_is_multi_device() and using this helper in place
      of direct tests of sbi->s_ndevs value, improving code readability.
      
      Fixes: 7bb3a371 ("f2fs: Fix zoned block device support")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0916878d
  4. 04 4月, 2019 1 次提交
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid · 8ed86627
      Linus Torvalds 提交于
      Pull HID fixes from Jiri Kosina:
      
       - build dependency fix for hid-asus from Arnd Bergmann
      
       - addition of omitted mapping of _ASSISTANT key from Dmitry Torokhov
      
       - race condition fix in hid-debug inftastructure from He, Bo
      
       - fixed support for devices with big maximum report size from Kai-Heng
         Feng
      
       - deadlock fix in hid-steam from Rodrigo Rivas Costa
      
       - quite a few device-specific quirks
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
        HID: input: add mapping for Assistant key
        HID: i2c-hid: Disable runtime PM on Synaptics touchpad
        HID: quirks: Fix keyboard + touchpad on Lenovo Miix 630
        HID: logitech: Handle 0 scroll events for the m560
        HID: debug: fix race condition with between rdesc_show() and device removal
        HID: logitech: check the return value of create_singlethread_workqueue
        HID: Increase maximum report size allowed by hid_field_extract()
        HID: steam: fix deadlock with input devices.
        HID: uclogic: remove redudant duplicated null check on ver_ptr
        HID: quirks: Drop misused kernel-doc annotation
        HID: hid-asus: select CONFIG_POWER_SUPPLY
        HID: quirks: use correct format chars in dbg_hid
      8ed86627
  5. 03 4月, 2019 4 次提交
  6. 02 4月, 2019 1 次提交
    • J
      signal: don't silently convert SI_USER signals to non-current pidfd · 556a888a
      Jann Horn 提交于
      The current sys_pidfd_send_signal() silently turns signals with explicit
      SI_USER context that are sent to non-current tasks into signals with
      kernel-generated siginfo.
      This is unlike do_rt_sigqueueinfo(), which returns -EPERM in this case.
      If a user actually wants to send a signal with kernel-provided siginfo,
      they can do that with pidfd_send_signal(pidfd, sig, NULL, 0); so allowing
      this case is unnecessary.
      
      Instead of silently replacing the siginfo, just bail out with an error;
      this is consistent with other interfaces and avoids special-casing behavior
      based on security checks.
      
      Fixes: 3eb39f47 ("signal: add pidfd_send_signal() syscall")
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NChristian Brauner <christian@brauner.io>
      556a888a
  7. 01 4月, 2019 7 次提交
  8. 31 3月, 2019 5 次提交
    • L
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 63fc9c23
      Linus Torvalds 提交于
      Pull KVM fixes from Paolo Bonzini:
       "A collection of x86 and ARM bugfixes, and some improvements to
        documentation.
      
        On top of this, a cleanup of kvm_para.h headers, which were exported
        by some architectures even though they not support KVM at all. This is
        responsible for all the Kbuild changes in the diffstat"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
        Documentation: kvm: clarify KVM_SET_USER_MEMORY_REGION
        KVM: doc: Document the life cycle of a VM and its resources
        KVM: selftests: complete IO before migrating guest state
        KVM: selftests: disable stack protector for all KVM tests
        KVM: selftests: explicitly disable PIE for tests
        KVM: selftests: assert on exit reason in CR4/cpuid sync test
        KVM: x86: update %rip after emulating IO
        x86/kvm/hyper-v: avoid spurious pending stimer on vCPU init
        kvm/x86: Move MSR_IA32_ARCH_CAPABILITIES to array emulated_msrs
        KVM: x86: Emulate MSR_IA32_ARCH_CAPABILITIES on AMD hosts
        kvm: don't redefine flags as something else
        kvm: mmu: Used range based flushing in slot_handle_level_range
        KVM: export <linux/kvm_para.h> and <asm/kvm_para.h> iif KVM is supported
        KVM: x86: remove check on nr_mmu_pages in kvm_arch_commit_memory_region()
        kvm: nVMX: Add a vmentry check for HOST_SYSENTER_ESP and HOST_SYSENTER_EIP fields
        KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)
        KVM: Reject device ioctls from processes other than the VM's creator
        KVM: doc: Fix incorrect word ordering regarding supported use of APIs
        KVM: x86: fix handling of role.cr4_pae and rename it to 'gpte_size'
        KVM: nVMX: Do not inherit quadrant and invalid for the root shadow EPT
        ...
      63fc9c23
    • L
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 915ee0da
      Linus Torvalds 提交于
      Pull x86 fixes from Thomas Gleixner:
       "A pile of x86 updates:
      
         - Prevent exceeding he valid physical address space in the /dev/mem
           limit checks.
      
         - Move all header content inside the header guard to prevent compile
           failures.
      
         - Fix the bogus __percpu annotation in this_cpu_has() which makes
           sparse very noisy.
      
         - Disable switch jump tables completely when retpolines are enabled.
      
         - Prevent leaking the trampoline address"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/realmode: Make set_real_mode_mem() static inline
        x86/cpufeature: Fix __percpu annotation in this_cpu_has()
        x86/mm: Don't exceed the valid physical address space
        x86/retpolines: Disable switch jump tables when retpolines are enabled
        x86/realmode: Don't leak the trampoline kernel address
        x86/boot: Fix incorrect ifdeffery scope
        x86/resctrl: Remove unused variable
      915ee0da
    • L
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 590627f7
      Linus Torvalds 提交于
      Pull perf tooling fixes from Thomas Gleixner:
       "Core libraries:
         - Fix max perf_event_attr.precise_ip detection.
         - Fix parser error for uncore event alias
         - Fixup ordering of kernel maps after obtaining the main kernel map
           address.
      
        Intel PT:
         - Fix TSC slip where A TSC packet can slip past MTC packets so that
           the timestamp appears to go backwards.
         - Fixes for exported-sql-viewer GUI conversion to python3.
      
        ARM coresight:
         - Fix the build by adding a missing case value for enumeration value
           introduced in newer library, that now is the required one.
      
        tool headers:
         - Syncronize kernel headers with the kernel, getting new io_uring and
           pidfd_send_signal syscalls so that 'perf trace' can handle them"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf pmu: Fix parser error for uncore event alias
        perf scripts python: exported-sql-viewer.py: Fix python3 support
        perf scripts python: exported-sql-viewer.py: Fix never-ending loop
        perf machine: Update kernel map address and re-order properly
        tools headers uapi: Sync powerpc's asm/kvm.h copy with the kernel sources
        tools headers: Update x86's syscall_64.tbl and uapi/asm-generic/unistd
        tools headers uapi: Update drm/i915_drm.h
        tools arch x86: Sync asm/cpufeatures.h with the kernel sources
        tools headers uapi: Sync linux/fcntl.h to get the F_SEAL_FUTURE_WRITE addition
        tools headers uapi: Sync asm-generic/mman-common.h and linux/mman.h
        perf evsel: Fix max perf_event_attr.precise_ip detection
        perf intel-pt: Fix TSC slip
        perf cs-etm: Add missing case value
      590627f7
    • L
      Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c29d8541
      Linus Torvalds 提交于
      Pull CPU hotplug fixes from Thomas Gleixner:
       "Two SMT/hotplug related fixes:
      
         - Prevent crash when HOTPLUG_CPU is disabled and the CPU bringup
           aborts. This is triggered with the 'nosmt' command line option, but
           can happen by any abort condition. As the real unplug code is not
           compiled in, prevent the fail by keeping the CPU in zombie state.
      
         - Enforce HOTPLUG_CPU for SMP on x86 to avoid the above situation
           completely. With 'nosmt' being a popular option it's required to
           unplug the half brought up sibling CPUs (due to the MCE wreckage)
           completely"
      
      * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/smp: Enforce CONFIG_HOTPLUG_CPU when SMP=y
        cpu/hotplug: Prevent crash when CPU bringup fails on CONFIG_HOTPLUG_CPU=n
      c29d8541
    • L
      Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 573efdc5
      Linus Torvalds 提交于
      Pull locking fixlet from Thomas Gleixner:
       "Trivial update to the maintainers file"
      
      * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        MAINTAINERS: Remove deleted file from futex file pattern
      573efdc5