1. 03 6月, 2021 1 次提交
    • C
      f2fs: fix to avoid out-of-bounds memory access · 1fcf6d1b
      Chao Yu 提交于
      stable inclusion
      from stable-5.10.36
      commit 9aa4602237d535b83c579eb752e8fc1c3e7e7055
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit b862676e upstream.
      
      butt3rflyh4ck <butterflyhuangxx@gmail.com> reported a bug found by
      syzkaller fuzzer with custom modifications in 5.12.0-rc3+ [1]:
      
       dump_stack+0xfa/0x151 lib/dump_stack.c:120
       print_address_description.constprop.0.cold+0x82/0x32c mm/kasan/report.c:232
       __kasan_report mm/kasan/report.c:399 [inline]
       kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416
       f2fs_test_bit fs/f2fs/f2fs.h:2572 [inline]
       current_nat_addr fs/f2fs/node.h:213 [inline]
       get_next_nat_page fs/f2fs/node.c:123 [inline]
       __flush_nat_entry_set fs/f2fs/node.c:2888 [inline]
       f2fs_flush_nat_entries+0x258e/0x2960 fs/f2fs/node.c:2991
       f2fs_write_checkpoint+0x1372/0x6a70 fs/f2fs/checkpoint.c:1640
       f2fs_issue_checkpoint+0x149/0x410 fs/f2fs/checkpoint.c:1807
       f2fs_sync_fs+0x20f/0x420 fs/f2fs/super.c:1454
       __sync_filesystem fs/sync.c:39 [inline]
       sync_filesystem fs/sync.c:67 [inline]
       sync_filesystem+0x1b5/0x260 fs/sync.c:48
       generic_shutdown_super+0x70/0x370 fs/super.c:448
       kill_block_super+0x97/0xf0 fs/super.c:1394
      
      The root cause is, if nat entry in checkpoint journal area is corrupted,
      e.g. nid of journalled nat entry exceeds max nid value, during checkpoint,
      once it tries to flush nat journal to NAT area, get_next_nat_page() may
      access out-of-bounds memory on nat_bitmap due to it uses wrong nid value
      as bitmap offset.
      
      [1] https://lore.kernel.org/lkml/CAFcO6XOMWdr8pObek6eN6-fs58KG9doRFadgJj-FnF-1x43s2g@mail.gmail.com/T/#uReported-and-tested-by: Nbutt3rflyh4ck <butterflyhuangxx@gmail.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      1fcf6d1b
  2. 18 1月, 2021 1 次提交
  3. 12 1月, 2021 1 次提交
  4. 14 10月, 2020 1 次提交
    • J
      f2fs: handle errors of f2fs_get_meta_page_nofail · 86f33603
      Jaegeuk Kim 提交于
      First problem is we hit BUG_ON() in f2fs_get_sum_page given EIO on
      f2fs_get_meta_page_nofail().
      
      Quick fix was not to give any error with infinite loop, but syzbot caught
      a case where it goes to that loop from fuzzed image. In turned out we abused
      f2fs_get_meta_page_nofail() like in the below call stack.
      
      - f2fs_fill_super
       - f2fs_build_segment_manager
        - build_sit_entries
         - get_current_sit_page
      
      INFO: task syz-executor178:6870 can't die for more than 143 seconds.
      task:syz-executor178 state:R
       stack:26960 pid: 6870 ppid:  6869 flags:0x00004006
      Call Trace:
      
      Showing all locks held in the system:
      1 lock held by khungtaskd/1179:
       #0: ffffffff8a554da0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6242
      1 lock held by systemd-journal/3920:
      1 lock held by in:imklog/6769:
       #0: ffff88809eebc130 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:930
      1 lock held by syz-executor178/6870:
       #0: ffff8880925120e0 (&type->s_umount_key#47/1){+.+.}-{3:3}, at: alloc_super+0x201/0xaf0 fs/super.c:229
      
      Actually, we didn't have to use _nofail in this case, since we could return
      error to mount(2) already with the error handler.
      
      As a result, this patch tries to 1) remove _nofail callers as much as possible,
      2) deal with error case in last remaining caller, f2fs_get_sum_page().
      
      Reported-by: syzbot+ee250ac8137be41d7b13@syzkaller.appspotmail.com
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      86f33603
  5. 29 9月, 2020 1 次提交
  6. 15 9月, 2020 1 次提交
  7. 09 9月, 2020 1 次提交
    • S
      f2fs: fix indefinite loop scanning for free nid · e2cab031
      Sahitya Tummala 提交于
      If the sbi->ckpt->next_free_nid is not NAT block aligned and if there
      are free nids in that NAT block between the start of the block and
      next_free_nid, then those free nids will not be scanned in scan_nat_page().
      This results into mismatch between nm_i->available_nids and the sum of
      nm_i->free_nid_count of all NAT blocks scanned. And nm_i->available_nids
      will always be greater than the sum of free nids in all the blocks.
      Under this condition, if we use all the currently scanned free nids,
      then it will loop forever in f2fs_alloc_nid() as nm_i->available_nids
      is still not zero but nm_i->free_nid_count of that partially scanned
      NAT block is zero.
      
      Fix this to align the nm_i->next_scan_nid to the first nid of the
      corresponding NAT block.
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e2cab031
  8. 24 8月, 2020 1 次提交
  9. 26 7月, 2020 1 次提交
  10. 24 7月, 2020 1 次提交
  11. 22 7月, 2020 1 次提交
    • J
      f2fs: should avoid inode eviction in synchronous path · b0f3b87f
      Jaegeuk Kim 提交于
      https://bugzilla.kernel.org/show_bug.cgi?id=208565
      
      PID: 257    TASK: ecdd0000  CPU: 0   COMMAND: "init"
        #0 [<c0b420ec>] (__schedule) from [<c0b423c8>]
        #1 [<c0b423c8>] (schedule) from [<c0b459d4>]
        #2 [<c0b459d4>] (rwsem_down_read_failed) from [<c0b44fa0>]
        #3 [<c0b44fa0>] (down_read) from [<c044233c>]
        #4 [<c044233c>] (f2fs_truncate_blocks) from [<c0442890>]
        #5 [<c0442890>] (f2fs_truncate) from [<c044d408>]
        #6 [<c044d408>] (f2fs_evict_inode) from [<c030be18>]
        #7 [<c030be18>] (evict) from [<c030a558>]
        #8 [<c030a558>] (iput) from [<c047c600>]
        #9 [<c047c600>] (f2fs_sync_node_pages) from [<c0465414>]
       #10 [<c0465414>] (f2fs_write_checkpoint) from [<c04575f4>]
       #11 [<c04575f4>] (f2fs_sync_fs) from [<c0441918>]
       #12 [<c0441918>] (f2fs_do_sync_file) from [<c0441098>]
       #13 [<c0441098>] (f2fs_sync_file) from [<c0323fa0>]
       #14 [<c0323fa0>] (vfs_fsync_range) from [<c0324294>]
       #15 [<c0324294>] (do_fsync) from [<c0324014>]
       #16 [<c0324014>] (sys_fsync) from [<c0108bc0>]
      
      This can be caused by flush_dirty_inode() in f2fs_sync_node_pages() where
      iput() requires f2fs_lock_op() again resulting in livelock.
      Reported-by: NZhiguo Niu <Zhiguo.Niu@unisoc.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b0f3b87f
  12. 09 7月, 2020 1 次提交
  13. 08 7月, 2020 2 次提交
  14. 09 6月, 2020 1 次提交
    • E
      f2fs: don't return vmalloc() memory from f2fs_kmalloc() · 0b6d4ca0
      Eric Biggers 提交于
      kmalloc() returns kmalloc'ed memory, and kvmalloc() returns either
      kmalloc'ed or vmalloc'ed memory.  But the f2fs wrappers, f2fs_kmalloc()
      and f2fs_kvmalloc(), both return both kinds of memory.
      
      It's redundant to have two functions that do the same thing, and also
      breaking the standard naming convention is causing bugs since people
      assume it's safe to kfree() memory allocated by f2fs_kmalloc().  See
      e.g. the various allocations in fs/f2fs/compress.c.
      
      Fix this by making f2fs_kmalloc() just use kmalloc().  And to avoid
      re-introducing the allocation failures that the vmalloc fallback was
      intended to fix, convert the largest allocations to use f2fs_kvmalloc().
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0b6d4ca0
  15. 25 5月, 2020 1 次提交
  16. 12 5月, 2020 2 次提交
    • S
      f2fs: Avoid double lock for cp_rwsem during checkpoint · 34c061ad
      Sayali Lokhande 提交于
      There could be a scenario where f2fs_sync_node_pages gets
      called during checkpoint, which in turn tries to flush
      inline data and calls iput(). This results in deadlock as
      iput() tries to hold cp_rwsem, which is already held at the
      beginning by checkpoint->block_operations().
      
      Call stack :
      
      Thread A		Thread B
      f2fs_write_checkpoint()
      - block_operations(sbi)
       - f2fs_lock_all(sbi);
        - down_write(&sbi->cp_rwsem);
      
                              - open()
                               - igrab()
                              - write() write inline data
                              - unlink()
      - f2fs_sync_node_pages()
       - if (is_inline_node(page))
        - flush_inline_data()
         - ilookup()
           page = f2fs_pagecache_get_page()
           if (!page)
            goto iput_out;
           iput_out:
      			-close()
      			-iput()
             iput(inode);
             - f2fs_evict_inode()
              - f2fs_truncate_blocks()
               - f2fs_lock_op()
                 - down_read(&sbi->cp_rwsem);
      
      Fixes: 2049d4fc ("f2fs: avoid multiple node page writes due to inline_data")
      Signed-off-by: NSayali Lokhande <sayalil@codeaurora.org>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      34c061ad
    • C
      f2fs: shrink spinlock coverage · 042be373
      Chao Yu 提交于
      In f2fs_try_to_free_nids(), .nid_list_lock spinlock critical region will
      increase as expected shrink number increase, to avoid spining other CPUs
      for long time, we change to release nid caches with small batch each time
      under .nid_list_lock coverage.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      042be373
  17. 18 4月, 2020 1 次提交
  18. 31 3月, 2020 1 次提交
    • C
      f2fs: don't trigger data flush in foreground operation · 7bcd0cfa
      Chao Yu 提交于
      Data flush can generate heavy IO and cause long latency during
      flush, so it's not appropriate to trigger it in foreground
      operation.
      
      And also, we may face below potential deadlock during data flush:
      - f2fs_write_multi_pages
       - f2fs_write_raw_pages
        - f2fs_write_single_data_page
         - f2fs_balance_fs
          - f2fs_balance_fs_bg
           - f2fs_sync_dirty_inodes
            - filemap_fdatawrite   -- stuck on flush same cluster
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7bcd0cfa
  19. 20 3月, 2020 3 次提交
  20. 11 3月, 2020 1 次提交
  21. 28 2月, 2020 2 次提交
  22. 04 2月, 2020 1 次提交
  23. 20 11月, 2019 1 次提交
  24. 08 11月, 2019 1 次提交
  25. 07 9月, 2019 2 次提交
    • J
      f2fs: fix flushing node pages when checkpoint is disabled · 100c0655
      Jaegeuk Kim 提交于
      This patch fixes skipping node page writes when checkpoint is disabled.
      In this period, we can't rely on checkpoint to flush node pages.
      
      Fixes: fd8c8caf ("f2fs: let checkpoint flush dnode page of regular")
      Fixes: 4354994f ("f2fs: checkpoint disabling")
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      100c0655
    • C
      f2fs: fix to writeout dirty inode during node flush · 052a82d8
      Chao Yu 提交于
      As Eric reported:
      
      On xfstest generic/204 on f2fs, I'm getting a kernel BUG.
      
       allocate_segment_by_default+0x9d/0x100 [f2fs]
       f2fs_allocate_data_block+0x3c0/0x5c0 [f2fs]
       do_write_page+0x62/0x110 [f2fs]
       f2fs_do_write_node_page+0x2b/0xa0 [f2fs]
       __write_node_page+0x2ec/0x590 [f2fs]
       f2fs_sync_node_pages+0x756/0x7e0 [f2fs]
       block_operations+0x25b/0x350 [f2fs]
       f2fs_write_checkpoint+0x104/0x1150 [f2fs]
       f2fs_sync_fs+0xa2/0x120 [f2fs]
       f2fs_balance_fs_bg+0x33c/0x390 [f2fs]
       f2fs_write_node_pages+0x4c/0x1f0 [f2fs]
       do_writepages+0x1c/0x70
       __writeback_single_inode+0x45/0x320
       writeback_sb_inodes+0x273/0x5c0
       wb_writeback+0xff/0x2e0
       wb_workfn+0xa1/0x370
       process_one_work+0x138/0x350
       worker_thread+0x4d/0x3d0
       kthread+0x109/0x140
      
      The root cause of this issue is, in a very small partition, e.g.
      in generic/204 testcase of fstest suit, filesystem's free space
      is 50MB, so at most we can write 12800 inline inode with command:
      `echo XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > $SCRATCH_MNT/$i`,
      then filesystem will have:
      - 12800 dirty inline data page
      - 12800 dirty inode page
      - and 12800 dirty imeta (dirty inode)
      
      When we flush node-inode's page cache, we can also flush inline
      data with each inode page, however it will run out-of-free-space
      in device, then once it triggers checkpoint, there is no room for
      huge number of imeta, at this time, GC is useless, as there is no
      dirty segment at all.
      
      In order to fix this, we try to recognize inode page during
      node_inode's page flushing, and update inode page from dirty inode,
      so that later another imeta (dirty inode) flush can be avoided.
      Reported-and-tested-by: NEric Biggers <ebiggers@kernel.org>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      052a82d8
  26. 23 8月, 2019 1 次提交
  27. 03 7月, 2019 2 次提交
  28. 31 5月, 2019 1 次提交
  29. 09 5月, 2019 5 次提交
    • C
      f2fs: introduce DATA_GENERIC_ENHANCE · 93770ab7
      Chao Yu 提交于
      Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check
      whether @blkaddr locates in main area or not.
      
      That check is weak, since the block address in range of main area can
      point to the address which is not valid in segment info table, and we
      can not detect such condition, we may suffer worse corruption as system
      continues running.
      
      So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check
      which trigger SIT bitmap check rather than only range check.
      
      This patch did below changes as wel:
      - set SBI_NEED_FSCK in f2fs_is_valid_blkaddr().
      - get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid.
      - introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check.
      - spread blkaddr check in:
       * f2fs_get_node_info()
       * __read_out_blkaddrs()
       * f2fs_submit_page_read()
       * ra_data_block()
       * do_recover_data()
      
      This patch can fix bug reported from bugzilla below:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203215
      https://bugzilla.kernel.org/show_bug.cgi?id=203223
      https://bugzilla.kernel.org/show_bug.cgi?id=203231
      https://bugzilla.kernel.org/show_bug.cgi?id=203235
      https://bugzilla.kernel.org/show_bug.cgi?id=203241
      
      = Update by Jaegeuk Kim =
      
      DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths.
      But, xfstest/generic/446 compalins some generated kernel messages saying invalid
      bitmap was detected when reading a block. The reaons is, when we get the
      block addresses from extent_cache, there is no lock to synchronize it from
      truncating the blocks in parallel.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      93770ab7
    • C
      f2fs: allow address pointer number of dnode aligning to specified size · d02a6e61
      Chao Yu 提交于
      This patch expands scalability of dnode layout, it allows address pointer
      number of dnode aligning to specified size (now, the size is one byte by
      default), and later the number can align to compress cluster size
      (1 << n bytes, n=[2,..)), it can avoid cluster acrossing two dnode, making
      design of compress meta layout simple.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d02a6e61
    • C
      f2fs: fix to do sanity check on free nid · 626bcf2b
      Chao Yu 提交于
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203225
      
      - Overview
      When mounting the attached crafted image and unmounting it, following errors are reported.
      Additionally, it hangs on sync after unmounting.
      
      The image is intentionally fuzzed from a normal f2fs image for testing.
      Compile options for F2FS are as follows.
      CONFIG_F2FS_FS=y
      CONFIG_F2FS_STAT_FS=y
      CONFIG_F2FS_FS_XATTR=y
      CONFIG_F2FS_FS_POSIX_ACL=y
      CONFIG_F2FS_CHECK_FS=y
      
      - Reproduces
      mkdir test
      mount -t f2fs tmp.img test
      touch test/t
      umount test
      sync
      
      - Messages
       kernel BUG at fs/f2fs/node.c:3073!
       RIP: 0010:f2fs_destroy_node_manager+0x2f0/0x300
       Call Trace:
        f2fs_put_super+0xf4/0x270
        generic_shutdown_super+0x62/0x110
        kill_block_super+0x1c/0x50
        kill_f2fs_super+0xad/0xd0
        deactivate_locked_super+0x35/0x60
        cleanup_mnt+0x36/0x70
        task_work_run+0x75/0x90
        exit_to_usermode_loop+0x93/0xa0
        do_syscall_64+0xba/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0010:f2fs_destroy_node_manager+0x2f0/0x300
      
      NAT table is corrupted, so reserved meta/node inode ids were added into
      free list incorrectly, during file creation, since reserved id has cached
      in inode hash, so it fails the creation and preallocated nid can not be
      released later, result in kernel panic.
      
      To fix this issue, let's do nid boundary check during free nid loading.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      626bcf2b
    • C
      f2fs: fix to do checksum even if inode page is uptodate · b42b179b
      Chao Yu 提交于
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203221
      
      - Overview
      When mounting the attached crafted image and running program, this error is reported.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_07.c
      mkdir test
      mount -t f2fs tmp.img test
      cp a.out test
      cd test
      sudo ./a.out
      
      - Messages
       kernel BUG at fs/f2fs/node.c:1279!
       RIP: 0010:read_node_page+0xcf/0xf0
       Call Trace:
        __get_node_page+0x6b/0x2f0
        f2fs_iget+0x8f/0xdf0
        f2fs_lookup+0x136/0x320
        __lookup_slow+0x92/0x140
        lookup_slow+0x30/0x50
        walk_component+0x1c1/0x350
        path_lookupat+0x62/0x200
        filename_lookup+0xb3/0x1a0
        do_fchmodat+0x3e/0xa0
        __x64_sys_chmod+0x12/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      On below paths, we can have opportunity to readahead inode page
      - gc_node_segment -> f2fs_ra_node_page
      - gc_data_segment -> f2fs_ra_node_page
      - f2fs_fill_dentries -> f2fs_ra_node_page
      
      Unlike synchronized read, on readahead path, we can set page uptodate
      before verifying page's checksum, then read_node_page() will trigger
      kernel panic once it encounters a uptodated page w/ incorrect checksum.
      
      So considering readahead scenario, we have to do checksum each time
      when loading inode page even if it is uptodated.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b42b179b
    • C
      f2fs: fix to avoid panic in f2fs_remove_inode_page() · 8b6810f8
      Chao Yu 提交于
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203219
      
      - Overview
      When mounting the attached crafted image and running program, I got this error.
      Additionally, it hangs on sync after running the program.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_06.c
      mkdir test
      mount -t f2fs tmp.img test
      cp a.out test
      cd test
      sudo ./a.out
      sync
      
      - Messages
       kernel BUG at fs/f2fs/node.c:1183!
       RIP: 0010:f2fs_remove_inode_page+0x294/0x2d0
       Call Trace:
        f2fs_evict_inode+0x2a3/0x3a0
        evict+0xba/0x180
        __dentry_kill+0xbe/0x160
        dentry_kill+0x46/0x180
        dput+0xbb/0x100
        do_renameat2+0x3c9/0x550
        __x64_sys_rename+0x17/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The reason is f2fs_remove_inode_page() will trigger kernel panic due to
      inconsistent i_blocks value of inode.
      
      To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
      give a hint to fsck for latter repairing of potential image corruption.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix build warning and add unlikely]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      8b6810f8