1. 01 6月, 2018 40 次提交
    • Y
      fs: f2fs: insert space around that ':' and ', ' · 1061fd48
      youngjun yoo 提交于
      clean up checkpatch error:
      ERROR: space required after that ':'
      ERROR: space required after that ','
      Signed-off-by: Nyoungjun yoo <youngjun.willow@gmail.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      1061fd48
    • Y
      fs: f2fs: add missing blank lines after declarations · f11e98bd
      youngjun yoo 提交于
      clean up checkpatch warning:
      WARNING: Missing a blank line after declarations
      Signed-off-by: Nyoungjun yoo <youngjun.willow@gmail.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f11e98bd
    • Y
      fs: f2fs: changed variable type of offset "unsigned" to "loff_t" · 193bea1d
      youngjun yoo 提交于
      clean up checkpatch warning:
      WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
      Signed-off-by: Nyoungjun yoo <youngjun.willow@gmail.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      193bea1d
    • C
      f2fs: clean up symbol namespace · 4d57b86d
      Chao Yu 提交于
      As Ted reported:
      
      "Hi, I was looking at f2fs's sources recently, and I noticed that there
      is a very large number of non-static symbols which don't have a f2fs
      prefix.  There's well over a hundred (see attached below).
      
      As one example, in fs/f2fs/dir.c there is:
      
      unsigned char get_de_type(struct f2fs_dir_entry *de)
      
      This function is clearly only useful for f2fs, but it has a generic
      name.  This means that if any other file system tries to have the same
      symbol name, there will be a symbol conflict and the kernel would not
      successfully build.  It also means that when someone is looking f2fs
      sources, it's not at all obvious whether a function such as
      read_data_page(), invalidate_blocks(), is a generic kernel function
      found in the fs, mm, or block layers, or a f2fs specific function.
      
      You might want to fix this at some point.  Hopefully Kent's bcachefs
      isn't similarly using genericly named functions, since that might
      cause conflicts with f2fs's functions --- but just as this would be a
      problem that we would rightly insist that Kent fix, this is something
      that we should have rightly insisted that f2fs should have fixed
      before it was integrated into the mainline kernel.
      
      acquire_orphan_inode
      add_ino_entry
      add_orphan_inode
      allocate_data_block
      allocate_new_segments
      alloc_nid
      alloc_nid_done
      alloc_nid_failed
      available_free_memory
      ...."
      
      This patch adds "f2fs_" prefix for all non-static symbols in order to:
      a) avoid conflict with other kernel generic symbols;
      b) to indicate the function is f2fs specific one instead of generic
      one;
      Reported-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      4d57b86d
    • C
      f2fs: make set_de_type() static · 2e79d951
      Chao Yu 提交于
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2e79d951
    • C
      f2fs: make __f2fs_write_data_pages() static · fc99fe27
      Chao Yu 提交于
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fc99fe27
    • C
      f2fs: fix to avoid accessing cross the boundary · 9fd62605
      Chao Yu 提交于
      Configure io_bits with 2 and enable LFS mode, generic/017 reports below dmesg:
      
      BUG: unable to handle kernel NULL pointer dereference at 00000039
      *pdpt = 000000002fcb2001 *pde = 0000000000000000
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in: crc32_generic zram f2fs(O) bnep rfcomm bluetooth ecdh_generic snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi pcbc snd_seq joydev aesni_intel aes_i586 snd_seq_device snd_timer crypto_simd cryptd snd soundcore i2c_piix4 serio_raw mac_hid video parport_pc ppdev lp parport hid_generic usbhid psmouse hid e1000
      CPU: 2 PID: 20779 Comm: xfs_io Tainted: G           O      4.17.0-rc2 #38
      Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      EIP: is_checkpointed_data+0x84/0xd0 [f2fs]
      EFLAGS: 00010207 CPU: 2
      EAX: 00000000 EBX: f5cd7000 ECX: fffffe32 EDX: 00000039
      ESI: 000001cd EDI: ec95fb6c EBP: e264bd80 ESP: e264bd6c
       DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      CR0: 80050033 CR2: 00000039 CR3: 2fe55660 CR4: 000406f0
      Call Trace:
       __exchange_data_block+0xb3f/0x1000 [f2fs]
       f2fs_fallocate+0xab9/0x16b0 [f2fs]
       vfs_fallocate+0x17c/0x2d0
       ksys_fallocate+0x42/0x70
       sys_fallocate+0x31/0x40
       do_fast_syscall_32+0xaa/0x22c
       entry_SYSENTER_32+0x4c/0x7b
      EIP: 0xb7f98c51
      EFLAGS: 00000293 CPU: 2
      EAX: ffffffda EBX: 00000003 ECX: 00000008 EDX: 01001000
      ESI: 00000000 EDI: 00001000 EBP: 00000000 ESP: bfc0357c
       DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
      Code: 00 00 d3 e8 8b 4d ec 2b 02 8b 55 f0 6b c0 1c 03 41 70 29 d6 8b 93 d0 06 00 00 8b 40 0c 83 ea 01 21 d6 89 f2 89 f1 c1 ea 03 f7 d1 <0f> be 14 10 83 e1 07 b8 01 00 00 00 d3 e0 85 c2 89 f8 0f 95 c3
      EIP: is_checkpointed_data+0x84/0xd0 [f2fs] SS:ESP: 0068:e264bd6c
      CR2: 0000000000000039
      ---[ end trace 9a4d4087cce6080a ]---
      
      This is because in recovery flow of __exchange_data_block, we didn't pass olen to
      __roll_back_blkaddrs, instead we passed len, which indicates wrong array size, result
      in copying random block address into dnode page.
      
      Later, once that random block address was accessed by is_checkpointed_data, it can
      cause NULL pointer dereference.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      9fd62605
    • C
      f2fs: fix to let caller retry allocating block address · fe16efe6
      Chao Yu 提交于
      Configure io_bits with 2 and enable LFS mode, generic/013 reports below dmesg:
      
      BUG: unable to handle kernel NULL pointer dereference at 00000104
      *pdpt = 0000000029b7b001 *pde = 0000000000000000
      Oops: 0002 [#1] PREEMPT SMP
      Modules linked in: crc32_generic zram f2fs(O) rfcomm bnep bluetooth ecdh_generic snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq pcbc joydev snd_seq_device aesni_intel snd_timer aes_i586 snd crypto_simd cryptd soundcore i2c_piix4 serio_raw mac_hid video parport_pc ppdev lp parport hid_generic psmouse usbhid hid e1000
      CPU: 0 PID: 11161 Comm: fsstress Tainted: G           O      4.17.0-rc2 #38
      Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      EIP: f2fs_submit_page_write+0x28d/0x550 [f2fs]
      EFLAGS: 00010206 CPU: 0
      EAX: e863dcd8 EBX: 00000000 ECX: 00000100 EDX: 00000200
      ESI: e863dcf4 EDI: f6f82768 EBP: e863dbb0 ESP: e863db74
       DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      CR0: 80050033 CR2: 00000104 CR3: 29a62020 CR4: 000406f0
      Call Trace:
       do_write_page+0x6f/0xc0 [f2fs]
       write_data_page+0x4a/0xd0 [f2fs]
       do_write_data_page+0x327/0x630 [f2fs]
       __write_data_page+0x34b/0x820 [f2fs]
       __f2fs_write_data_pages+0x42d/0x8c0 [f2fs]
       f2fs_write_data_pages+0x27/0x30 [f2fs]
       do_writepages+0x1a/0x70
       __filemap_fdatawrite_range+0x94/0xd0
       filemap_write_and_wait_range+0x3d/0xa0
       __generic_file_write_iter+0x11a/0x1f0
       f2fs_file_write_iter+0xdd/0x3b0 [f2fs]
       __vfs_write+0xd2/0x150
       vfs_write+0x9b/0x190
       ksys_write+0x45/0x90
       sys_write+0x16/0x20
       do_fast_syscall_32+0xaa/0x22c
       entry_SYSENTER_32+0x4c/0x7b
      EIP: 0xb7fc8c51
      EFLAGS: 00000246 CPU: 0
      EAX: ffffffda EBX: 00000003 ECX: 09cde000 EDX: 00001000
      ESI: 00000003 EDI: 00001000 EBP: 00000000 ESP: bfbded38
       DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
      Code: e8 f9 77 34 c9 8b 45 e0 8b 80 b8 00 00 00 39 45 d8 0f 84 bb 02 00 00 8b 45 e0 8b 80 b8 00 00 00 8d 50 d8 8b 08 89 55 f0 8b 50 04 <89> 51 04 89 0a c7 00 00 01 00 00 c7 40 04 00 02 00 00 8b 45 dc
      EIP: f2fs_submit_page_write+0x28d/0x550 [f2fs] SS:ESP: 0068:e863db74
      CR2: 0000000000000104
      ---[ end trace 4cac79c0d1305ee6 ]---
      
      allocate_data_block will submit all sequential pending IOs sorted by a
      FIFO list, If we failed to submit other user's IO due to unaligned write,
      we will retry to allocate new block address for current IO, then it will
      initialize fio.list again, if fio was in the list before, it can break
      FIFO list, result in above panic.
      
      Thread A			Thread B
      - do_write_page
       - allocate_data_block
        - list_add_tail
        : fioA cached in FIFO list.
      				- do_write_page
      				 - allocate_data_block
      				  - list_add_tail
      				  : fioB cached in FIFO list.
      				 - f2fs_submit_page_write
      				 : fail to submit IO
      				 - allocate_data_block
      				  - INIT_LIST_HEAD
       - f2fs_submit_page_write
        - list_del  <-- NULL pointer dereference
      
      This patch adds fio.retry parameter to indicate failure status for each
      IO, and avoid bailing out if there is still pending IO in FIFO list for
      fixing.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fe16efe6
    • A
      disable loading f2fs module on PAGE_SIZE > 4KB · 4071e67c
      Anatoly Pugachev 提交于
      The following patch disables loading of f2fs module on architectures
      which have PAGE_SIZE > 4096 , since it is impossible to mount f2fs on
      such architectures , log messages are:
      
      mount: /mnt: wrong fs type, bad option, bad superblock on
      /dev/vdiskb1, missing codepage or helper program, or other error.
      /dev/vdiskb1: F2FS filesystem,
      UUID=1d8b9ca4-2389-4910-af3b-10998969f09c, volume name ""
      
      May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Invalid
      page_cache_size (8192), supports only 4KB
      May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Can't find valid F2FS
      filesystem in 1th superblock
      May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Invalid
      page_cache_size (8192), supports only 4KB
      May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Can't find valid F2FS
      filesystem in 2th superblock
      May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Invalid
      page_cache_size (8192), supports only 4KB
      
      which was introduced by git commit 5c9b4692
      
      tested on git kernel 4.17.0-rc6-00309-gec30dcf7
      
      with patch applied:
      
      modprobe: ERROR: could not insert 'f2fs': Invalid argument
      May 28 01:40:28 v215 kernel: F2FS not supported on PAGE_SIZE(8192) != 4096
      Signed-off-by: NAnatoly Pugachev <matorola@gmail.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      4071e67c
    • C
      f2fs: fix error path of move_data_page · 14a28559
      Chao Yu 提交于
      This patch fixes error path of move_data_page:
      - clear cold data flag if it fails to write page.
      - redirty page for non-ENOMEM case.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      14a28559
    • C
      f2fs: don't drop dentry pages after fs shutdown · 1174abfd
      Chao Yu 提交于
      As description in commit "f2fs: don't drop any page on f2fs_cp_error()
      case":
      
      "We still provide readdir() after shtudown, so we should keep pages to
      avoid additional IOs."
      
      In order to provider lastest directory structure, let's keep dentry
      pages in cache after fs shutdown.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      1174abfd
    • C
      f2fs: fix to avoid race during access gc_thread pointer · 250dbf51
      Chao Yu 提交于
      Thread A			Thread B
      - f2fs_remount
       - stop_gc_thread
      				- f2fs_sbi_store
         sbi->gc_thread = NULL;
      				  access sbi->gc_thread->gc_*
      
      Previously, we allocate memory for sbi->gc_thread based on background
      gc thread mount option, the memory can be released if we turn off
      that mount option, but still there are several places access gc_thread
      pointer without considering race condition, result in NULL point
      dereference.
      
      In order to fix this issue, use sb->s_umount to exclude those operations.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      250dbf51
    • C
      f2fs: clean up with clear_radix_tree_dirty_tag · aec2f729
      Chao Yu 提交于
      Introduce clear_radix_tree_dirty_tag to include common codes for cleanup.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      aec2f729
    • C
      f2fs: fix to don't trigger writeback during recovery · 64c74a7a
      Chao Yu 提交于
      - f2fs_fill_super
       - recover_fsync_data
        - recover_data
         - del_fsync_inode
          - iput
           - iput_final
            - write_inode_now
             - f2fs_write_inode
              - f2fs_balance_fs
               - f2fs_balance_fs_bg
                - sync_dirty_inodes
      
      With data_flush mount option, during recovery, in order to avoid entering
      above writeback flow, let's detect recovery status and do skip in
      f2fs_balance_fs_bg.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NYunlei He <heyunlei@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      64c74a7a
    • S
      f2fs: clear discard_wake earlier · 35a9a766
      Sheng Yong 提交于
      If SBI_NEED_FSCK is set, discard_wake will never be cleared. As a
      result, the condition of wait_event_interruptible_timeout() is always
      true, which gets discard thread run too frequently.
      Signed-off-by: NSheng Yong <shengyong1@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      35a9a766
    • Y
      f2fs: let discard thread wait a little longer if dev is busy · f9d1dced
      Yunlei He 提交于
      This patch modify discard thread wait policy as below:
      	issued       io_interrupted     wait time(ms)
      1.        8                 0               50
      2.      (0,8)               1               50
      3.        0                 1              500 (dev is busy)
      4.        0                 0            60000 (no candidates)
      Signed-off-by: NYunlei He <heyunlei@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f9d1dced
    • C
      f2fs: avoid stucking GC due to atomic write · 2ef79ecb
      Chao Yu 提交于
      f2fs doesn't allow abuse on atomic write class interface, so except
      limiting in-mem pages' total memory usage capacity, we need to limit
      atomic-write usage as well when filesystem is seriously fragmented,
      otherwise we may run into infinite loop during foreground GC because
      target blocks in victim segment are belong to atomic opened file for
      long time.
      
      Now, we will detect failure due to atomic write in foreground GC, if
      the count exceeds threshold, we will drop all atomic written data in
      cache, by this, I expect it can keep our system running safely to
      prevent Dos attack.
      
      In addition, his patch adds to show GC skip information in debugfs,
      now it just shows count of skipped caused by atomic write.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2ef79ecb
    • J
      f2fs: introduce sbi->gc_mode to determine the policy · 5b0e9539
      Jaegeuk Kim 提交于
      This is to avoid sbi->gc_thread pointer access.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5b0e9539
    • C
      f2fs: keep migration IO order in LFS mode · 107a805d
      Chao Yu 提交于
      For non-migration IO, we will keep order of data/node blocks' submitting
      as allocation sequence by sorting IOs in per log io_list list, but for
      migration IO, it could be out-of-order.
      
      In LFS mode, we should keep all IOs including migration IO be ordered,
      so that this patch fixes to add an additional lock to keep submitting
      order.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NYunlong Song <yunlong.song@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      107a805d
    • C
      f2fs: fix to wait page writeback during revoking atomic write · e5e5732d
      Chao Yu 提交于
      After revoking atomic write, related LBA can be reused by others, so we
      need to wait page writeback before reusing the LBA, in order to avoid
      interference between old atomic written in-flight IO and new IO.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e5e5732d
    • S
      f2fs: Fix deadlock in shutdown ioctl · 60b2b4ee
      Sahitya Tummala 提交于
      f2fs_ioc_shutdown() ioctl gets stuck in the below path
      when issued with F2FS_GOING_DOWN_FULLSYNC option.
      
      __switch_to+0x90/0xc4
      percpu_down_write+0x8c/0xc0
      freeze_super+0xec/0x1e4
      freeze_bdev+0xc4/0xcc
      f2fs_ioctl+0xc0c/0x1ce0
      f2fs_compat_ioctl+0x98/0x1f0
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      60b2b4ee
    • C
      f2fs: detect synchronous writeback more earlier · f8de4331
      Chao Yu 提交于
      This patch changes to detect synchronous writeback more earlier before,
      in order to avoid unnecessary page writeback before exiting asynchronous
      writeback.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f8de4331
    • C
      f2fs: clean up with is_valid_blkaddr() · 7b525dd0
      Chao Yu 提交于
      - rename is_valid_blkaddr() to is_valid_meta_blkaddr() for readability.
      - introduce is_valid_blkaddr() for cleanup.
      
      No logic change in this patch.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7b525dd0
    • C
      f2fs: fix to initialize min_mtime with ULLONG_MAX · 5ad25442
      Chao Yu 提交于
      Since sit_i.min_mtime's type is unsigned long long, so we should
      initialize it with max value of the type ULLONG_MAX instead of
      LLONG_MAX.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5ad25442
    • C
      f2fs: fix to let checkpoint guarantee atomic page persistence · e7a4feb0
      Chao Yu 提交于
      1. thread A: commit_inmem_pages submit data into block layer, but
      haven't waited it writeback.
      2. thread A: commit_inmem_pages update related node.
      3. thread B: do checkpoint, flush all nodes to disk.
      4. SPOR
      
      Then, atomic file becomes corrupted since nodes is flushed before data.
      
      This patch fixes to treat atomic page as checkpoint guaranteed one,
      then in checkpoint, we can make sure all atomic page can be writebacked
      with metadata of atomic file.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e7a4feb0
    • C
      f2fs: fix to initialize i_current_depth according to inode type · 1c41e680
      Chao Yu 提交于
      i_current_depth is used only for directory inode, but its space is
      shared with i_gc_failures field used for regular inode, in order to
      avoid affecting i_gc_failures' value, this patch fixes to initialize
      the union's fields according to inode type.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      1c41e680
    • C
      Revert "f2fs: add ovp valid_blocks check for bg gc victim to fg_gc" · 299254d8
      Chao Yu 提交于
      For extreme case:
      10 section, op = 10%, no_fggc_threshold = 90%
      All section usage: 85% 85% 85% 85% 90% 90% 95% 95% 95% 95%
      
      During foreground GC, if we skip select dirty section whose usage
      is larger than no_fggc_threshold, we can only recycle 80% invalid
      space from four 85% usage sections and two 90% usage sections,
      result in encountering out-of-space issue.
      
      This reverts commit e93b9865 to
      fix this issue, besides, we keep the logic that we scan all dirty
      section when searching a victim, so that GC can select victim with
      least valid blocks.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      299254d8
    • J
      f2fs: don't drop any page on f2fs_cp_error() case · 868de613
      Jaegeuk Kim 提交于
      We still provide readdir() after shtudown, so we should keep pages to avoid
      additional IOs.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      868de613
    • C
      f2fs: fix spelling mistake: "extenstion" -> "extension" · 4580038e
      Colin Ian King 提交于
      Trivial fix to spelling mistake in extension list text
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      4580038e
    • J
      f2fs: enhance sanity_check_raw_super() to avoid potential overflows · 0cfe75c5
      Jaegeuk Kim 提交于
      In order to avoid the below overflow issue, we should have checked the
      boundaries in superblock before reaching out to allocation. As Linus suggested,
      the right place should be sanity_check_raw_super().
      
      Dr Silvio Cesare of InfoSect reported:
      
      There are integer overflows with using the cp_payload superblock field in the
      f2fs filesystem potentially leading to memory corruption.
      
      include/linux/f2fs_fs.h
      
      struct f2fs_super_block {
      ...
              __le32 cp_payload;
      
      fs/f2fs/f2fs.h
      
      typedef u32 block_t;    /*
                               * should not change u32, since it is the on-disk block
                               * address format, __le32.
                               */
      ...
      
      static inline block_t __cp_payload(struct f2fs_sb_info *sbi)
      {
              return le32_to_cpu(F2FS_RAW_SUPER(sbi)->cp_payload);
      }
      
      fs/f2fs/checkpoint.c
      
              block_t start_blk, orphan_blocks, i, j;
      ...
              start_blk = __start_cp_addr(sbi) + 1 + __cp_payload(sbi);
              orphan_blocks = __start_sum_addr(sbi) - 1 - __cp_payload(sbi);
      
      +++ integer overflows
      
      ...
              unsigned int cp_blks = 1 + __cp_payload(sbi);
      ...
              sbi->ckpt = kzalloc(cp_blks * blk_size, GFP_KERNEL);
      
      +++ integer overflow leading to incorrect heap allocation.
      
              int cp_payload_blks = __cp_payload(sbi);
      ...
              ckpt->cp_pack_start_sum = cpu_to_le32(1 + cp_payload_blks +
                              orphan_blocks);
      
      +++ sign bug and integer overflow
      
      ...
              for (i = 1; i < 1 + cp_payload_blks; i++)
      
      +++ integer overflow
      
      ...
      
            sbi->max_orphans = (sbi->blocks_per_seg - F2FS_CP_PACKS -
                              NR_CURSEG_TYPE - __cp_payload(sbi)) *
                                      F2FS_ORPHANS_PER_BLOCK;
      
      +++ integer overflow
      Reported-by: NGreg KH <greg@kroah.com>
      Reported-by: NSilvio Cesare <silvio.cesare@gmail.com>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0cfe75c5
    • C
      f2fs: treat volatile file's data as hot one · b4c3ca8b
      Chao Yu 提交于
      Volatile file's data will be updated oftenly, so it'd better to place
      its data into hot data segment.
      
      In addition, for atomic file, we change to check FI_ATOMIC_FILE instead
      of FI_HOT_DATA to make code readability better.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b4c3ca8b
    • C
      f2fs: introduce release_discard_addr() for cleanup · af8ff65b
      Chao Yu 提交于
      Introduce release_discard_addr() to include common codes for cleanup.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Fengguang Wu: declare static function, reported by kbuild test robot]
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      af8ff65b
    • C
      f2fs: fix potential overflow · a9af3fdc
      Chao Yu 提交于
      In build_sit_entries(), if valid_blocks in SIT block is smaller than
      valid_blocks in journal, for below calculation:
      
      sbi->discard_blks += old_valid_blocks - se->valid_blocks;
      
      There will be two times potential overflow:
      - old_valid_blocks - se->valid_blocks will overflow, and be a very
      large number.
      - sbi->discard_blks += result will overflow again, comes out a correct
      result accidently.
      
      Anyway, it should be fixed.
      
      Fixes: d600af23 ("f2fs: avoid unneeded loop in build_sit_entries")
      Fixes: 1f43e2ad ("f2fs: introduce CP_TRIMMED_FLAG to avoid unneeded discard")
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      a9af3fdc
    • C
      f2fs: rename dio_rwsem to i_gc_rwsem · b2532c69
      Chao Yu 提交于
      RW semphore dio_rwsem in struct f2fs_inode_info is introduced to avoid
      race between dio and data gc, but now, it is more wildly used to avoid
      foreground operation vs data gc. So rename it to i_gc_rwsem to improve
      its readability.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b2532c69
    • Y
      f2fs: move mnt_want_write_file after range check · b82f6e34
      Yunlei He 提交于
      This patch move mnt_want_write_file after range check,
      it's needless to check arguments with it.
      Signed-off-by: NYunlei He <heyunlei@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b82f6e34
    • Y
      f2fs: fix missing clear FI_NO_PREALLOC in some error case · cba41be0
      Yunlei He 提交于
      This patch fix missing clear FI_NO_PREALLOC in some error case
      Signed-off-by: NYunlei He <heyunlei@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      cba41be0
    • J
      f2fs: enforce fsync_mode=strict for renamed directory · ade990f9
      Jaegeuk Kim 提交于
      This is to give a option for user to be able to recover B/foo in the below
      case.
      
      mkdir A
      sync()
      rename(A, B)
      creat (B/foo)
      fsync (B/foo)
      ---crash---
      Sugessted-by: NVelayudhan Pillai <vijay@cs.utexas.edu>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      ade990f9
    • J
      f2fs: sanity check for total valid node blocks · 8a29c126
      Jaegeuk Kim 提交于
      This patch enhances sanity check for SIT entries.
      
      syzbot hit the following crash on upstream commit
      83beed7b (Fri Apr 20 17:56:32 2018 +0000)
      Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal
      syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=bf9253040425feb155ad
      
      syzkaller reproducer: https://syzkaller.appspot.com/x/repro.syz?id=5692130282438656
      Raw console output: https://syzkaller.appspot.com/x/log.txt?id=5095924598571008
      Kernel config: https://syzkaller.appspot.com/x/.config?id=1808800213120130118
      compiler: gcc (GCC) 8.0.1 20180413 (experimental)
      
      IMPORTANT: if you fix the bug, please add the following tag to the commit:
      Reported-by: syzbot+bf9253040425feb155ad@syzkaller.appspotmail.com
      It will help syzbot understand when the bug is fixed. See footer for details.
      If you forward the report, please keep this part and the footer.
      
      F2FS-fs (loop0): invalid crc value
      F2FS-fs (loop0): Try to recover 1th superblock, ret: 0
      F2FS-fs (loop0): Mounted with checkpoint version = d
      F2FS-fs (loop0): Bitmap was wrongly cleared, blk:9740
      ------------[ cut here ]------------
      kernel BUG at fs/f2fs/segment.c:1884!
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 1 PID: 4508 Comm: syz-executor0 Not tainted 4.17.0-rc1+ #10
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:update_sit_entry+0x1215/0x1590 fs/f2fs/segment.c:1882
      RSP: 0018:ffff8801af526708 EFLAGS: 00010282
      RAX: ffffed0035ea4cc0 RBX: ffff8801ad454f90 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffffffff82eeb87e RDI: ffffed0035ea4cb6
      RBP: ffff8801af526760 R08: ffff8801ad4a2480 R09: ffffed003b5e4f90
      R10: ffffed003b5e4f90 R11: ffff8801daf27c87 R12: ffff8801adb8d380
      R13: 0000000000000001 R14: 0000000000000008 R15: 00000000ffffffff
      FS:  00000000014af940(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f06bc223000 CR3: 00000001adb02000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       allocate_data_block+0x66f/0x2050 fs/f2fs/segment.c:2663
       do_write_page+0x105/0x1b0 fs/f2fs/segment.c:2727
       write_node_page+0x129/0x350 fs/f2fs/segment.c:2770
       __write_node_page+0x7da/0x1370 fs/f2fs/node.c:1398
       sync_node_pages+0x18cf/0x1eb0 fs/f2fs/node.c:1652
       block_operations+0x429/0xa60 fs/f2fs/checkpoint.c:1088
       write_checkpoint+0x3ba/0x5380 fs/f2fs/checkpoint.c:1405
       f2fs_sync_fs+0x2fb/0x6a0 fs/f2fs/super.c:1077
       __sync_filesystem fs/sync.c:39 [inline]
       sync_filesystem+0x265/0x310 fs/sync.c:67
       generic_shutdown_super+0xd7/0x520 fs/super.c:429
       kill_block_super+0xa4/0x100 fs/super.c:1191
       kill_f2fs_super+0x9f/0xd0 fs/f2fs/super.c:3030
       deactivate_locked_super+0x97/0x100 fs/super.c:316
       deactivate_super+0x188/0x1b0 fs/super.c:347
       cleanup_mnt+0xbf/0x160 fs/namespace.c:1174
       __cleanup_mnt+0x16/0x20 fs/namespace.c:1181
       task_work_run+0x1e4/0x290 kernel/task_work.c:113
       tracehook_notify_resume include/linux/tracehook.h:191 [inline]
       exit_to_usermode_loop+0x2bd/0x310 arch/x86/entry/common.c:166
       prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
       syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
       do_syscall_64+0x6ac/0x800 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x457d97
      RSP: 002b:00007ffd46f9c8e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000457d97
      RDX: 00000000014b09a3 RSI: 0000000000000002 RDI: 00007ffd46f9da50
      RBP: 00007ffd46f9da50 R08: 0000000000000000 R09: 0000000000000009
      R10: 0000000000000005 R11: 0000000000000246 R12: 00000000014b0940
      R13: 0000000000000000 R14: 0000000000000002 R15: 000000000000658e
      RIP: update_sit_entry+0x1215/0x1590 fs/f2fs/segment.c:1882 RSP: ffff8801af526708
      ---[ end trace f498328bb02610a2 ]---
      
      Reported-and-tested-by: syzbot+bf9253040425feb155ad@syzkaller.appspotmail.com
      Reported-and-tested-by: syzbot+7d6d31d3bc702f566ce3@syzkaller.appspotmail.com
      Reported-and-tested-by: syzbot+0a725420475916460f12@syzkaller.appspotmail.com
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      8a29c126
    • J
      f2fs: sanity check on sit entry · b2ca374f
      Jaegeuk Kim 提交于
      syzbot hit the following crash on upstream commit
      87ef1202 (Wed Apr 18 19:48:17 2018 +0000)
      Merge tag 'ceph-for-4.17-rc2' of git://github.com/ceph/ceph-client
      syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=83699adeb2d13579c31e
      
      C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5805208181407744
      syzkaller reproducer: https://syzkaller.appspot.com/x/repro.syz?id=6005073343676416
      Raw console output: https://syzkaller.appspot.com/x/log.txt?id=6555047731134464
      Kernel config: https://syzkaller.appspot.com/x/.config?id=1808800213120130118
      compiler: gcc (GCC) 8.0.1 20180413 (experimental)
      
      IMPORTANT: if you fix the bug, please add the following tag to the commit:
      Reported-by: syzbot+83699adeb2d13579c31e@syzkaller.appspotmail.com
      It will help syzbot understand when the bug is fixed. See footer for details.
      If you forward the report, please keep this part and the footer.
      
      F2FS-fs (loop0): Magic Mismatch, valid(0xf2f52010) - read(0x0)
      F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock
      F2FS-fs (loop0): invalid crc value
      BUG: unable to handle kernel paging request at ffffed006b2a50c0
      PGD 21ffee067 P4D 21ffee067 PUD 21fbeb067 PMD 0
      Oops: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 0 PID: 4514 Comm: syzkaller989480 Not tainted 4.17.0-rc1+ #8
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:build_sit_entries fs/f2fs/segment.c:3653 [inline]
      RIP: 0010:build_segment_manager+0x7ef7/0xbf70 fs/f2fs/segment.c:3852
      RSP: 0018:ffff8801b102e5b0 EFLAGS: 00010a06
      RAX: 1ffff1006b2a50c0 RBX: 0000000000000004 RCX: 0000000000000001
      RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8801ac74243e
      RBP: ffff8801b102f410 R08: ffff8801acbd46c0 R09: fffffbfff14d9af8
      R10: fffffbfff14d9af8 R11: ffff8801acbd46c0 R12: ffff8801ac742a80
      R13: ffff8801d9519100 R14: dffffc0000000000 R15: ffff880359528600
      FS:  0000000001e04880(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffed006b2a50c0 CR3: 00000001ac6ac000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       f2fs_fill_super+0x4095/0x7bf0 fs/f2fs/super.c:2803
       mount_bdev+0x30c/0x3e0 fs/super.c:1165
       f2fs_mount+0x34/0x40 fs/f2fs/super.c:3020
       mount_fs+0xae/0x328 fs/super.c:1268
       vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037
       vfs_kern_mount fs/namespace.c:1027 [inline]
       do_new_mount fs/namespace.c:2517 [inline]
       do_mount+0x564/0x3070 fs/namespace.c:2847
       ksys_mount+0x12d/0x140 fs/namespace.c:3063
       __do_sys_mount fs/namespace.c:3077 [inline]
       __se_sys_mount fs/namespace.c:3074 [inline]
       __x64_sys_mount+0xbe/0x150 fs/namespace.c:3074
       do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x443d6a
      RSP: 002b:00007ffd312813c8 EFLAGS: 00000297 ORIG_RAX: 00000000000000a5
      RAX: ffffffffffffffda RBX: 0000000020000c00 RCX: 0000000000443d6a
      RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007ffd312813d0
      RBP: 0000000000000003 R08: 0000000020016a00 R09: 000000000000000a
      R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000004
      R13: 0000000000402c60 R14: 0000000000000000 R15: 0000000000000000
      RIP: build_sit_entries fs/f2fs/segment.c:3653 [inline] RSP: ffff8801b102e5b0
      RIP: build_segment_manager+0x7ef7/0xbf70 fs/f2fs/segment.c:3852 RSP: ffff8801b102e5b0
      CR2: ffffed006b2a50c0
      ---[ end trace a2034989e196ff17 ]---
      
      Reported-and-tested-by: syzbot+83699adeb2d13579c31e@syzkaller.appspotmail.com
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b2ca374f
    • J
      f2fs: avoid bug_on on corrupted inode · 5d64600d
      Jaegeuk Kim 提交于
      syzbot has tested the proposed patch but the reproducer still triggered crash:
      kernel BUG at fs/f2fs/inode.c:LINE!
      
      F2FS-fs (loop1): invalid crc value
      F2FS-fs (loop5): Magic Mismatch, valid(0xf2f52010) - read(0x0)
      F2FS-fs (loop5): Can't find valid F2FS filesystem in 1th superblock
      F2FS-fs (loop5): invalid crc value
      ------------[ cut here ]------------
      kernel BUG at fs/f2fs/inode.c:238!
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 1 PID: 4886 Comm: syz-executor1 Not tainted 4.17.0-rc1+ #1
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:do_read_inode fs/f2fs/inode.c:238 [inline]
      RIP: 0010:f2fs_iget+0x3307/0x3ca0 fs/f2fs/inode.c:313
      RSP: 0018:ffff8801c44a70e8 EFLAGS: 00010293
      RAX: ffff8801ce208040 RBX: ffff8801b3621080 RCX: ffffffff82eace18
      F2FS-fs (loop2): Magic Mismatch, valid(0xf2f52010) - read(0x0)
      RDX: 0000000000000000 RSI: ffffffff82eaf047 RDI: 0000000000000007
      RBP: ffff8801c44a7410 R08: ffff8801ce208040 R09: ffffed0039ee4176
      R10: ffffed0039ee4176 R11: ffff8801cf720bb7 R12: ffff8801c0efa000
      R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
      FS:  00007f753aa9d700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
      ------------[ cut here ]------------
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      kernel BUG at fs/f2fs/inode.c:238!
      CR2: 0000000001b03018 CR3: 00000001c8b74000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       f2fs_fill_super+0x4377/0x7bf0 fs/f2fs/super.c:2842
       mount_bdev+0x30c/0x3e0 fs/super.c:1165
       f2fs_mount+0x34/0x40 fs/f2fs/super.c:3020
       mount_fs+0xae/0x328 fs/super.c:1268
       vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037
       vfs_kern_mount fs/namespace.c:1027 [inline]
       do_new_mount fs/namespace.c:2517 [inline]
       do_mount+0x564/0x3070 fs/namespace.c:2847
       ksys_mount+0x12d/0x140 fs/namespace.c:3063
       __do_sys_mount fs/namespace.c:3077 [inline]
       __se_sys_mount fs/namespace.c:3074 [inline]
       __x64_sys_mount+0xbe/0x150 fs/namespace.c:3074
       do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x457daa
      RSP: 002b:00007f753aa9cba8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
      RAX: ffffffffffffffda RBX: 0000000020000000 RCX: 0000000000457daa
      RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007f753aa9cbf0
      RBP: 0000000000000064 R08: 0000000020016a00 R09: 0000000020000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
      R13: 0000000000000064 R14: 00000000006fcb80 R15: 0000000000000000
      RIP: do_read_inode fs/f2fs/inode.c:238 [inline] RSP: ffff8801c44a70e8
      RIP: f2fs_iget+0x3307/0x3ca0 fs/f2fs/inode.c:313 RSP: ffff8801c44a70e8
      invalid opcode: 0000 [#2] SMP KASAN
      ---[ end trace 1cbcbec2156680bc ]---
      
      Reported-and-tested-by: syzbot+41a1b341571f0952badb@syzkaller.appspotmail.com
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5d64600d