1. 03 7月, 2019 3 次提交
    • J
      f2fs: introduce f2fs_<level> macros to wrap f2fs_printk() · dcbb4c10
      Joe Perches 提交于
      - Add and use f2fs_<level> macros
      - Convert f2fs_msg to f2fs_printk
      - Remove level from f2fs_printk and embed the level in the format
      - Coalesce formats and align multi-line arguments
      - Remove unnecessary duplicate extern f2fs_msg f2fs.h
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      dcbb4c10
    • C
      f2fs: avoid get_valid_blocks() for cleanup · 8740edc3
      Chao Yu 提交于
      No logic change.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      8740edc3
    • Q
      f2fs: ioctl for removing a range from F2FS · 04f0b2ea
      Qiuyang Sun 提交于
      This ioctl shrinks a given length (aligned to sections) from end of the
      main area. Any cursegs and valid blocks will be moved out before
      invalidating the range.
      
      This feature can be used for adjusting partition sizes online.
      
      History of the patch:
      
      Sahitya Tummala:
       - Add this ioctl for f2fs_compat_ioctl() as well.
       - Fix debugfs status to reflect the online resize changes.
       - Fix potential race between online resize path and allocate new data
         block path or gc path.
      
      Others:
       - Rename some identifiers.
       - Add some error handling branches.
       - Clear sbi->next_victim_seg[BG_GC/FG_GC] in shrinking range.
       - Implement this interface as ext4's, and change the parameter from shrunk
      bytes to new block count of F2FS.
       - During resizing, force to empty sit_journal and forbid adding new
         entries to it, in order to avoid invalid segno in journal after resize.
       - Reduce sbi->user_block_count before resize starts.
       - Commit the updated superblock first, and then update in-memory metadata
         only when the former succeeds.
       - Target block count must align to sections.
       - Write checkpoint before and after committing the new superblock, w/o
      CP_FSCK_FLAG respectively, so that the FS can be fixed by fsck even if
      resize fails after the new superblock is committed.
       - In free_segment_range(), reduce granularity of gc_mutex.
       - Add protection on curseg migration.
       - Add freeze_bdev() and thaw_bdev() for resize fs.
       - Remove CUR_MAIN_SECS and use MAIN_SECS directly for allocation.
       - Recover super_block and FS metadata when resize fails.
       - No need to clear CP_FSCK_FLAG in update_ckpt_flags().
       - Clean up the sb and fs metadata update functions for resize_fs.
      
      Geert Uytterhoeven:
       - Use div_u64*() for 64-bit divisions
      
      Arnd Bergmann:
       - Not all architectures support get_user() with a 64-bit argument:
          ERROR: "__get_user_bad" [fs/f2fs/f2fs.ko] undefined!
          Use copy_from_user() here, this will always work.
      Signed-off-by: NQiuyang Sun <sunqiuyang@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      04f0b2ea
  2. 22 6月, 2019 3 次提交
  3. 04 6月, 2019 4 次提交
    • D
      f2fs: Add option to limit required GC for checkpoint=disable · 4d3aed70
      Daniel Rosenberg 提交于
      This extends the checkpoint option to allow checkpoint=disable:%u[%]
      This allows you to specify what how much of the disk you are willing
      to lose access to while mounting with checkpoint=disable. If the amount
      lost would be higher, the mount will return -EAGAIN. This can be given
      as a percent of total space, or in blocks.
      
      Currently, we need to run garbage collection until the amount of holes
      is smaller than the OVP space. With the new option, f2fs can mark
      space as unusable up front instead of requiring garbage collection until
      the number of holes is small enough.
      Signed-off-by: NDaniel Rosenberg <drosen@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      4d3aed70
    • D
      f2fs: Fix accounting for unusable blocks · a4c3ecaa
      Daniel Rosenberg 提交于
      Fixes possible underflows when dealing with unusable blocks.
      Signed-off-by: NDaniel Rosenberg <drosen@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      a4c3ecaa
    • D
      f2fs: Fix root reserved on remount · 9a9aecaa
      Daniel Rosenberg 提交于
      On a remount, you can currently set root reserved if it was not
      previously set. This can cause an underflow if reserved has been set to
      a very high value, since then root reserved + current reserved could be
      greater than user_block_count. inc_valid_block_count later subtracts out
      these values from user_block_count, causing an underflow.
      Signed-off-by: NDaniel Rosenberg <drosen@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      9a9aecaa
    • D
      f2fs: Lower threshold for disable_cp_again · ae4ad7ea
      Daniel Rosenberg 提交于
      The existing threshold for allowable holes at checkpoint=disable time is
      too high. The OVP space contains reserved segments, which are always in
      the form of free segments. These must be subtracted from the OVP value.
      
      The current threshold is meant to be the maximum value of holes of a
      single type we can have and still guarantee that we can fill the disk
      without failing to find space for a block of a given type.
      
      If the disk is full, ignoring current reserved, which only helps us,
      the amount of unused blocks is equal to the OVP area. Of that, there
      are reserved segments, which must be free segments, and the rest of the
      ovp area, which can come from either free segments or holes. The maximum
      possible amount of holes is OVP-reserved.
      
      Now, consider the disk when mounting with checkpoint=disable.
      We must be able to fill all available free space with either data or
      node blocks. When we start with checkpoint=disable, holes are locked to
      their current type. Say we have H of one type of hole, and H+X of the
      other. We can fill H of that space with arbitrary typed blocks via SSR.
      For the remaining H+X blocks, we may not have any of a given block type
      left at all. For instance, if we were to fill the disk entirely with
      blocks of the type with fewer holes, the H+X blocks of the opposite type
      would not be used. If H+X > OVP-reserved, there would be more holes than
      could possibly exist, and we would have failed to find a suitable block
      earlier on, leading to a crash in update_sit_entry.
      
      If H+X <= OVP-reserved, then the holes end up effectively masked by the OVP
      region in this case.
      Signed-off-by: NDaniel Rosenberg <drosen@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      ae4ad7ea
  4. 31 5月, 2019 4 次提交
  5. 23 5月, 2019 6 次提交
    • C
      f2fs: fix to avoid deadloop if data_flush is on · 040d2bb3
      Chao Yu 提交于
      As Hagbard Celine reported:
      
      [  615.697824] INFO: task kworker/u16:5:344 blocked for more than 120 seconds.
      [  615.697825]       Not tainted 5.0.15-gentoo-f2fslog #4
      [  615.697826] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
      disables this message.
      [  615.697827] kworker/u16:5   D    0   344      2 0x80000000
      [  615.697831] Workqueue: writeback wb_workfn (flush-259:0)
      [  615.697832] Call Trace:
      [  615.697836]  ? __schedule+0x2c5/0x8b0
      [  615.697839]  schedule+0x32/0x80
      [  615.697841]  schedule_preempt_disabled+0x14/0x20
      [  615.697842]  __mutex_lock.isra.8+0x2ba/0x4d0
      [  615.697845]  ? log_store+0xf5/0x260
      [  615.697848]  f2fs_write_data_pages+0x133/0x320
      [  615.697851]  ? trace_hardirqs_on+0x2c/0xe0
      [  615.697854]  do_writepages+0x41/0xd0
      [  615.697857]  __filemap_fdatawrite_range+0x81/0xb0
      [  615.697859]  f2fs_sync_dirty_inodes+0x1dd/0x200
      [  615.697861]  f2fs_balance_fs_bg+0x2a7/0x2c0
      [  615.697863]  ? up_read+0x5/0x20
      [  615.697865]  ? f2fs_do_write_data_page+0x2cb/0x940
      [  615.697867]  f2fs_balance_fs+0xe5/0x2c0
      [  615.697869]  __write_data_page+0x1c8/0x6e0
      [  615.697873]  f2fs_write_cache_pages+0x1e0/0x450
      [  615.697878]  f2fs_write_data_pages+0x14b/0x320
      [  615.697880]  ? trace_hardirqs_on+0x2c/0xe0
      [  615.697883]  do_writepages+0x41/0xd0
      [  615.697885]  __filemap_fdatawrite_range+0x81/0xb0
      [  615.697887]  f2fs_sync_dirty_inodes+0x1dd/0x200
      [  615.697889]  f2fs_balance_fs_bg+0x2a7/0x2c0
      [  615.697891]  f2fs_write_node_pages+0x51/0x220
      [  615.697894]  do_writepages+0x41/0xd0
      [  615.697897]  __writeback_single_inode+0x3d/0x3d0
      [  615.697899]  writeback_sb_inodes+0x1e8/0x410
      [  615.697902]  __writeback_inodes_wb+0x5d/0xb0
      [  615.697904]  wb_writeback+0x28f/0x340
      [  615.697906]  ? cpumask_next+0x16/0x20
      [  615.697908]  wb_workfn+0x33e/0x420
      [  615.697911]  process_one_work+0x1a1/0x3d0
      [  615.697913]  worker_thread+0x30/0x380
      [  615.697915]  ? process_one_work+0x3d0/0x3d0
      [  615.697916]  kthread+0x116/0x130
      [  615.697918]  ? kthread_create_worker_on_cpu+0x70/0x70
      [  615.697921]  ret_from_fork+0x3a/0x50
      
      There is still deadloop in below condition:
      
      d A
      - do_writepages
       - f2fs_write_node_pages
        - f2fs_balance_fs_bg
         - f2fs_sync_dirty_inodes
          - f2fs_write_cache_pages
           - mutex_lock(&sbi->writepages)	-- lock once
           - __write_data_page
            - f2fs_balance_fs_bg
             - f2fs_sync_dirty_inodes
              - f2fs_write_data_pages
               - mutex_lock(&sbi->writepages)	-- lock again
      
      Thread A			Thread B
      - do_writepages
       - f2fs_write_node_pages
        - f2fs_balance_fs_bg
         - f2fs_sync_dirty_inodes
          - .cp_task = current
      				- f2fs_sync_dirty_inodes
      				 - .cp_task = current
      				 - filemap_fdatawrite
      				 - .cp_task = NULL
          - filemap_fdatawrite
           - f2fs_write_cache_pages
            - enter f2fs_balance_fs_bg since .cp_task is NULL
          - .cp_task = NULL
      
      Change as below to avoid this:
      - add condition to avoid holding .writepages mutex lock in path
      of data flush
      - introduce mutex lock sbi.flush_lock to exclude concurrent data
      flush in background.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      040d2bb3
    • P
      f2fs: always assume that the device is idle under gc_urgent · f7dfd9f3
      Park Ju Hyung 提交于
      This allows more aggressive discards and balancing job to be done
      under gc_urgent.
      Signed-off-by: NPark Ju Hyung <qkrwngud825@gmail.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f7dfd9f3
    • C
      f2fs: add bio cache for IPU · 8648de2c
      Chao Yu 提交于
      SQLite in Wal mode may trigger sequential IPU write in db-wal file, after
      commit d1b3e72d ("f2fs: submit bio of in-place-update pages"), we
      lost the chance of merging page in inner managed bio cache, result in
      submitting more small-sized IO.
      
      So let's add temporary bio in writepages() to cache mergeable write IO as
      much as possible.
      
      Test case:
      1. xfs_io -f /mnt/f2fs/file -c "pwrite 0 65536" -c "fsync"
      2. xfs_io -f /mnt/f2fs/file -c "pwrite 0 65536" -c "fsync"
      
      Before:
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65544, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65552, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65560, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65568, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65576, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65584, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65592, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65600, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65608, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65616, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65624, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65632, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65640, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65648, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65656, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65664, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), NODE, sector = 57352, size = 4096
      
      After:
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65544, size = 65536
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), NODE, sector = 57368, size = 4096
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      8648de2c
    • J
      f2fs: allow ssr block allocation during checkpoint=disable period · 49dd883c
      Jaegeuk Kim 提交于
      This patch allows to use ssr during checkpoint is disabled.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      49dd883c
    • C
      f2fs: fix to check layout on last valid checkpoint park · 5dae2d39
      Chao Yu 提交于
      As Ju Hyung reported:
      
      "
      I was semi-forced today to use the new kernel and test f2fs.
      
      My Ubuntu initramfs got a bit wonky and I had to boot into live CD and
      fix some stuffs. The live CD was using 4.15 kernel, and just mounting
      the f2fs partition there corrupted f2fs and my 4.19(with 5.1-rc1-4.19
      f2fs-stable merged) refused to mount with "SIT is corrupted node"
      message.
      
      I used the latest f2fs-tools sent by Chao including "fsck.f2fs: fix to
      repair cp_loads blocks at correct position"
      
      It spit out 140M worth of output, but at least I didn't have to run it
      twice. Everything returned "Ok" in the 2nd run.
      The new log is at
      http://arter97.com/f2fs/final
      
      After fixing the image, I used my 4.19 kernel with 5.2-rc1-4.19
      f2fs-stable merged and it mounted.
      
      But, I got this:
      [    1.047791] F2FS-fs (nvme0n1p3): layout of large_nat_bitmap is
      deprecated, run fsck to repair, chksum_offset: 4092
      [    1.081307] F2FS-fs (nvme0n1p3): Found nat_bits in checkpoint
      [    1.161520] F2FS-fs (nvme0n1p3): recover fsync data on readonly fs
      [    1.162418] F2FS-fs (nvme0n1p3): Mounted with checkpoint version = 761c7e00
      
      But after doing a reboot, the message is gone:
      [    1.098423] F2FS-fs (nvme0n1p3): Found nat_bits in checkpoint
      [    1.177771] F2FS-fs (nvme0n1p3): recover fsync data on readonly fs
      [    1.178365] F2FS-fs (nvme0n1p3): Mounted with checkpoint version = 761c7eda
      
      I'm not exactly sure why the kernel detected that I'm still using the
      old layout on the first boot. Maybe fsck didn't fix it properly, or
      the check from the kernel is improper.
      "
      
      Although we have rebuild the old deprecated checkpoint with new layout
      during repair, we only repair last checkpoint park, the other old one is
      remained.
      
      Once the image was mounted, we will 1) sanity check layout and 2) decide
      which checkpoint park to use according to cp_ver. So that we will print
      reported message unnecessarily at step 1), to avoid it, we simply move
      layout check into f2fs_sanity_check_ckpt() after step 2).
      Reported-by: NPark Ju Hyung <qkrwngud825@gmail.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5dae2d39
    • J
      f2fs: link f2fs quota ops for sysfile · bc88ac96
      Jaegeuk Kim 提交于
      This patch reverts:
      commit fb40d618 ("f2fs: don't clear CP_QUOTA_NEED_FSCK_FLAG").
      
      We were missing error handlers used in f2fs quota ops.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      bc88ac96
  6. 10 5月, 2019 1 次提交
    • R
      f2fs: fix to avoid accessing xattr across the boundary · 2777e654
      Randall Huang 提交于
      When we traverse xattr entries via __find_xattr(),
      if the raw filesystem content is faked or any hardware failure occurs,
      out-of-bound error can be detected by KASAN.
      Fix the issue by introducing boundary check.
      
      [   38.402878] c7   1827 BUG: KASAN: slab-out-of-bounds in f2fs_getxattr+0x518/0x68c
      [   38.402891] c7   1827 Read of size 4 at addr ffffffc0b6fb35dc by task
      [   38.402935] c7   1827 Call trace:
      [   38.402952] c7   1827 [<ffffff900809003c>] dump_backtrace+0x0/0x6bc
      [   38.402966] c7   1827 [<ffffff9008090030>] show_stack+0x20/0x2c
      [   38.402981] c7   1827 [<ffffff900871ab10>] dump_stack+0xfc/0x140
      [   38.402995] c7   1827 [<ffffff9008325c40>] print_address_description+0x80/0x2d8
      [   38.403009] c7   1827 [<ffffff900832629c>] kasan_report_error+0x198/0x1fc
      [   38.403022] c7   1827 [<ffffff9008326104>] kasan_report_error+0x0/0x1fc
      [   38.403037] c7   1827 [<ffffff9008325000>] __asan_load4+0x1b0/0x1b8
      [   38.403051] c7   1827 [<ffffff90085fcc44>] f2fs_getxattr+0x518/0x68c
      [   38.403066] c7   1827 [<ffffff90085fc508>] f2fs_xattr_generic_get+0xb0/0xd0
      [   38.403080] c7   1827 [<ffffff9008395708>] __vfs_getxattr+0x1f4/0x1fc
      [   38.403096] c7   1827 [<ffffff9008621bd0>] inode_doinit_with_dentry+0x360/0x938
      [   38.403109] c7   1827 [<ffffff900862d6cc>] selinux_d_instantiate+0x2c/0x38
      [   38.403123] c7   1827 [<ffffff900861b018>] security_d_instantiate+0x68/0x98
      [   38.403136] c7   1827 [<ffffff9008377db8>] d_splice_alias+0x58/0x348
      [   38.403149] c7   1827 [<ffffff900858d16c>] f2fs_lookup+0x608/0x774
      [   38.403163] c7   1827 [<ffffff900835eacc>] lookup_slow+0x1e0/0x2cc
      [   38.403177] c7   1827 [<ffffff9008367fe0>] walk_component+0x160/0x520
      [   38.403190] c7   1827 [<ffffff9008369ef4>] path_lookupat+0x110/0x2b4
      [   38.403203] c7   1827 [<ffffff900835dd38>] filename_lookup+0x1d8/0x3a8
      [   38.403216] c7   1827 [<ffffff900835eeb0>] user_path_at_empty+0x54/0x68
      [   38.403229] c7   1827 [<ffffff9008395f44>] SyS_getxattr+0xb4/0x18c
      [   38.403241] c7   1827 [<ffffff9008084200>] el0_svc_naked+0x34/0x38
      Signed-off-by: NRandall Huang <huangrandall@google.com>
      [Jaegeuk Kim: Fix wrong ending boundary]
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2777e654
  7. 09 5月, 2019 19 次提交
    • C
      f2fs: fix to avoid potential race on sbi->unusable_block_count access/update · c9c8ed50
      Chao Yu 提交于
      Use sbi.stat_lock to protect sbi->unusable_block_count accesss/udpate, in
      order to avoid potential race on it.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c9c8ed50
    • C
      f2fs: add tracepoint for f2fs_filemap_fault() · d7648343
      Chao Yu 提交于
      This patch adds tracepoint for f2fs_filemap_fault().
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d7648343
    • C
      f2fs: introduce DATA_GENERIC_ENHANCE · 93770ab7
      Chao Yu 提交于
      Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check
      whether @blkaddr locates in main area or not.
      
      That check is weak, since the block address in range of main area can
      point to the address which is not valid in segment info table, and we
      can not detect such condition, we may suffer worse corruption as system
      continues running.
      
      So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check
      which trigger SIT bitmap check rather than only range check.
      
      This patch did below changes as wel:
      - set SBI_NEED_FSCK in f2fs_is_valid_blkaddr().
      - get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid.
      - introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check.
      - spread blkaddr check in:
       * f2fs_get_node_info()
       * __read_out_blkaddrs()
       * f2fs_submit_page_read()
       * ra_data_block()
       * do_recover_data()
      
      This patch can fix bug reported from bugzilla below:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203215
      https://bugzilla.kernel.org/show_bug.cgi?id=203223
      https://bugzilla.kernel.org/show_bug.cgi?id=203231
      https://bugzilla.kernel.org/show_bug.cgi?id=203235
      https://bugzilla.kernel.org/show_bug.cgi?id=203241
      
      = Update by Jaegeuk Kim =
      
      DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths.
      But, xfstest/generic/446 compalins some generated kernel messages saying invalid
      bitmap was detected when reading a block. The reaons is, when we get the
      block addresses from extent_cache, there is no lock to synchronize it from
      truncating the blocks in parallel.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      93770ab7
    • C
      f2fs: fix to handle error in f2fs_disable_checkpoint() · 896285ad
      Chao Yu 提交于
      In f2fs_disable_checkpoint(), it needs to detect and propagate error
      number returned from f2fs_write_checkpoint().
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      896285ad
    • C
      f2fs: remove redundant check in f2fs_file_write_iter() · d5d5f0c0
      Chengguang Xu 提交于
      We have already checked flag IOCB_DIRECT in the sanity
      check of flag IOCB_NOWAIT, so don't have to check it
      again here.
      Signed-off-by: NChengguang Xu <cgxu519@gmx.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d5d5f0c0
    • C
      f2fs: fix to be aware of readonly device in write_checkpoint() · f5a131bb
      Chao Yu 提交于
      As Park Ju Hyung reported:
      
      Probably unrelated but a similar issue:
      Warning appears upon unmounting a corrupted R/O f2fs loop image.
      
      Should be a trivial issue to fix as well :)
      
      [ 2373.758424] ------------[ cut here ]------------
      [ 2373.758428] generic_make_request: Trying to write to read-only
      block-device loop1 (partno 0)
      [ 2373.758455] WARNING: CPU: 1 PID: 13950 at block/blk-core.c:2174
      generic_make_request_checks+0x590/0x630
      [ 2373.758556] CPU: 1 PID: 13950 Comm: umount Tainted: G           O
         4.19.35-zen+ #1
      [ 2373.758558] Hardware name: System manufacturer System Product
      Name/ROG MAXIMUS X HERO (WI-FI AC), BIOS 1704 09/14/2018
      [ 2373.758564] RIP: 0010:generic_make_request_checks+0x590/0x630
      [ 2373.758567] Code: 5c 03 00 00 48 8d 74 24 08 48 89 df c6 05 b5 cd
      36 01 01 e8 c2 90 01 00 48 89 c6 44 89 ea 48 c7 c7 98 64 59 82 e8 d5
      9b a7 ff <0f> 0b 48 8b 7b 08 e9 f2 fa ff ff 41 8b 86 98 02 00 00 49 8b
      16 89
      [ 2373.758570] RSP: 0018:ffff8882bdb43950 EFLAGS: 00010282
      [ 2373.758573] RAX: 0000000000000050 RBX: ffff8887244c6700 RCX: 0000000000000006
      [ 2373.758575] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff88884ec56340
      [ 2373.758577] RBP: ffff888849c426c0 R08: 0000000000000004 R09: 00000000000003ba
      [ 2373.758579] R10: 0000000000000001 R11: 0000000000000029 R12: 0000000000001000
      [ 2373.758581] R13: 0000000000000000 R14: ffff888844a2e800 R15: ffff8882bdb43ac0
      [ 2373.758584] FS:  00007fc0d114f8c0(0000) GS:ffff88884ec40000(0000)
      knlGS:0000000000000000
      [ 2373.758586] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 2373.758588] CR2: 00007fc0d1ad12c0 CR3: 00000002bdb82003 CR4: 00000000003606e0
      [ 2373.758590] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 2373.758592] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 2373.758593] Call Trace:
      [ 2373.758602]  ? generic_make_request+0x46/0x3d0
      [ 2373.758608]  ? wait_woken+0x80/0x80
      [ 2373.758612]  ? mempool_alloc+0xb7/0x1a0
      [ 2373.758618]  ? submit_bio+0x30/0x110
      [ 2373.758622]  ? bvec_alloc+0x7c/0xd0
      [ 2373.758628]  ? __submit_merged_bio+0x68/0x390
      [ 2373.758633]  ? f2fs_submit_page_write+0x1bb/0x7f0
      [ 2373.758638]  ? f2fs_do_write_meta_page+0x7f/0x160
      [ 2373.758642]  ? __f2fs_write_meta_page+0x70/0x140
      [ 2373.758647]  ? f2fs_sync_meta_pages+0x140/0x250
      [ 2373.758653]  ? f2fs_write_checkpoint+0x5c5/0x17b0
      [ 2373.758657]  ? f2fs_sync_fs+0x9c/0x110
      [ 2373.758664]  ? sync_filesystem+0x66/0x80
      [ 2373.758667]  ? generic_shutdown_super+0x1d/0x100
      [ 2373.758670]  ? kill_block_super+0x1c/0x40
      [ 2373.758674]  ? kill_f2fs_super+0x64/0xb0
      [ 2373.758678]  ? deactivate_locked_super+0x2d/0xb0
      [ 2373.758682]  ? cleanup_mnt+0x65/0xa0
      [ 2373.758688]  ? task_work_run+0x7f/0xa0
      [ 2373.758693]  ? exit_to_usermode_loop+0x9c/0xa0
      [ 2373.758698]  ? do_syscall_64+0xc7/0xf0
      [ 2373.758703]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [ 2373.758706] ---[ end trace 5d3639907c56271b ]---
      [ 2373.758780] print_req_error: I/O error, dev loop1, sector 143048
      [ 2373.758800] print_req_error: I/O error, dev loop1, sector 152200
      [ 2373.758808] print_req_error: I/O error, dev loop1, sector 8192
      [ 2373.758819] print_req_error: I/O error, dev loop1, sector 12272
      
      This patch adds to detect readonly device in write_checkpoint() to avoid
      trigger write IOs on it.
      Reported-by: NPark Ju Hyung <qkrwngud825@gmail.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f5a131bb
    • C
      f2fs: fix to skip recovery on readonly device · b61af314
      Chao Yu 提交于
      As Park Ju Hyung reported in mailing list:
      
      https://sourceforge.net/p/linux-f2fs/mailman/message/36639787/
      
      generic_make_request: Trying to write to read-only block-device loop0 (partno 0)
      WARNING: CPU: 0 PID: 23437 at block/blk-core.c:2174 generic_make_request_checks+0x594/0x630
      
       generic_make_request+0x46/0x3d0
       submit_bio+0x30/0x110
       __submit_merged_bio+0x68/0x390
       f2fs_submit_page_write+0x1bb/0x7f0
       f2fs_do_write_meta_page+0x7f/0x160
       __f2fs_write_meta_page+0x70/0x140
       f2fs_sync_meta_pages+0x140/0x250
       f2fs_write_checkpoint+0x5c5/0x17b0
       f2fs_sync_fs+0x9c/0x110
       sync_filesystem+0x66/0x80
       f2fs_recover_fsync_data+0x790/0xa30
       f2fs_fill_super+0xe4e/0x1980
       mount_bdev+0x518/0x610
       mount_fs+0x34/0x13f
       vfs_kern_mount.part.11+0x4f/0x120
       do_mount+0x2d1/0xe40
       __x64_sys_mount+0xbf/0xe0
       do_syscall_64+0x4a/0xf0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      print_req_error: I/O error, dev loop0, sector 4096
      
      If block device is readonly, we should never trigger write IO from
      filesystem layer, but previously, orphan and journal recovery didn't
      consider such condition, result in triggering above warning, fix it.
      Reported-by: NPark Ju Hyung <qkrwngud825@gmail.com>
      Tested-by: NPark Ju Hyung <qkrwngud825@gmail.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b61af314
    • C
      f2fs: fix to consider multiple device for readonly check · f824deb5
      Chao Yu 提交于
      This patch introduce f2fs_hw_is_readonly() to check whether lower
      device is readonly or not, it adapts multiple device scenario.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f824deb5
    • C
      f2fs: relocate chksum_offset for large_nat_bitmap feature · b471eb99
      Chao Yu 提交于
      For large_nat_bitmap feature, there is a design flaw:
      
      Previous:
      
      struct f2fs_checkpoint layout:
      +--------------------------+  0x0000
      | checkpoint_ver           |
      | ......                   |
      | checksum_offset          |------+
      | ......                   |      |
      | sit_nat_version_bitmap[] |<-----|-------+
      | ......                   |      |       |
      | checksum_value           |<-----+       |
      +--------------------------+  0x1000      |
      |                          |      nat_bitmap + sit_bitmap
      | payload blocks           |              |
      |                          |              |
      +--------------------------|<-------------+
      
      Obviously, if nat_bitmap size + sit_bitmap size is larger than
      MAX_BITMAP_SIZE_IN_CKPT, nat_bitmap or sit_bitmap may overlap
      checkpoint checksum's position, once checkpoint() is triggered
      from kernel, nat or sit bitmap will be damaged by checksum field.
      
      In order to fix this, let's relocate checksum_value's position
      to the head of sit_nat_version_bitmap as below, then nat/sit
      bitmap and chksum value update will become safe.
      
      After:
      
      struct f2fs_checkpoint layout:
      +--------------------------+  0x0000
      | checkpoint_ver           |
      | ......                   |
      | checksum_offset          |------+
      | ......                   |      |
      | sit_nat_version_bitmap[] |<-----+
      | ......                   |<-------------+
      |                          |              |
      +--------------------------+  0x1000      |
      |                          |      nat_bitmap + sit_bitmap
      | payload blocks           |              |
      |                          |              |
      +--------------------------|<-------------+
      
      Related report and discussion:
      
      https://sourceforge.net/p/linux-f2fs/mailman/message/36642346/Reported-by: NPark Ju Hyung <qkrwngud825@gmail.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b471eb99
    • C
      f2fs: allow unfixed f2fs_checkpoint.checksum_offset · d7eb8f1c
      Chao Yu 提交于
      Previously, f2fs_checkpoint.checksum_offset points fixed position of
      f2fs_checkpoint structure:
      
      "#define CP_CHKSUM_OFFSET	4092"
      
      It is unnecessary, and it breaks the consecutiveness of nat and sit
      bitmap stored across checkpoint park block and payload blocks.
      
      This patch allows f2fs to handle unfixed .checksum_offset.
      
      In addition, for the case checksum value is stored in the middle of
      checkpoint park, calculating checksum value with superposition method
      like we did for inode_checksum.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d7eb8f1c
    • Y
      f2fs: Replace spaces with tab · 3a912b77
      Youngjun Yoo 提交于
      Modify coding style
      ERROR: code indent should use tabs where possible
      Signed-off-by: NYoungjun Yoo <youngjun.willow@gmail.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      3a912b77
    • Y
      f2fs: insert space before the open parenthesis '(' · c456362b
      Youngjun Yoo 提交于
      Modify coding style
      ERROR: space required before the open parenthesis '('
      Signed-off-by: NYoungjun Yoo <youngjun.willow@gmail.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c456362b
    • C
      f2fs: allow address pointer number of dnode aligning to specified size · d02a6e61
      Chao Yu 提交于
      This patch expands scalability of dnode layout, it allows address pointer
      number of dnode aligning to specified size (now, the size is one byte by
      default), and later the number can align to compress cluster size
      (1 << n bytes, n=[2,..)), it can avoid cluster acrossing two dnode, making
      design of compress meta layout simple.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d02a6e61
    • C
      f2fs: introduce f2fs_read_single_page() for cleanup · 2df0ab04
      Chao Yu 提交于
      This patch introduces f2fs_read_single_page() to wrap core operations
      of reading one page in f2fs_mpage_readpages().
      
      In addition, if we failed in f2fs_mpage_readpages(), propagate error
      number to f2fs_read_data_page(), for f2fs_read_data_pages() path,
      always return success.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2df0ab04
    • P
      f2fs: mark is_extension_exist() inline · 5c533b19
      Park Ju Hyung 提交于
      The caller set_file_temperature() is marked as inline as well.
      It doesn't make much sense to leave is_extension_exist() un-inlined.
      Signed-off-by: NPark Ju Hyung <qkrwngud825@gmail.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5c533b19
    • C
      f2fs: fix to set FI_UPDATE_WRITE correctly · cd23ffa9
      Chao Yu 提交于
      This patch fixes to set FI_UPDATE_WRITE only if in-place IO was issued.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      cd23ffa9
    • C
      f2fs: fix to avoid panic in f2fs_inplace_write_data() · 05573d6c
      Chao Yu 提交于
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203239
      
      - Overview
      When mounting the attached crafted image and running program, following errors are reported.
      Additionally, it hangs on sync after running program.
      
      The image is intentionally fuzzed from a normal f2fs image for testing.
      Compile options for F2FS are as follows.
      CONFIG_F2FS_FS=y
      CONFIG_F2FS_STAT_FS=y
      CONFIG_F2FS_FS_XATTR=y
      CONFIG_F2FS_FS_POSIX_ACL=y
      CONFIG_F2FS_CHECK_FS=y
      
      - Reproduces
      cc poc_15.c
      ./run.sh f2fs
      sync
      
      - Kernel messages
       ------------[ cut here ]------------
       kernel BUG at fs/f2fs/segment.c:3162!
       RIP: 0010:f2fs_inplace_write_data+0x12d/0x160
       Call Trace:
        f2fs_do_write_data_page+0x3c1/0x820
        __write_data_page+0x156/0x720
        f2fs_write_cache_pages+0x20d/0x460
        f2fs_write_data_pages+0x1b4/0x300
        do_writepages+0x15/0x60
        __filemap_fdatawrite_range+0x7c/0xb0
        file_write_and_wait_range+0x2c/0x80
        f2fs_do_sync_file+0x102/0x810
        do_fsync+0x33/0x60
        __x64_sys_fsync+0xb/0x10
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The reason is f2fs_inplace_write_data() will trigger kernel panic due
      to data block locates in node type segment.
      
      To avoid panic, let's just return error code and set SBI_NEED_FSCK to
      give a hint to fsck for latter repairing.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      05573d6c
    • C
      f2fs: fix to do sanity check on valid block count of segment · e95bcdb2
      Chao Yu 提交于
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203233
      
      - Overview
      When mounting the attached crafted image and running program, following errors are reported.
      Additionally, it hangs on sync after running program.
      
      The image is intentionally fuzzed from a normal f2fs image for testing.
      Compile options for F2FS are as follows.
      CONFIG_F2FS_FS=y
      CONFIG_F2FS_STAT_FS=y
      CONFIG_F2FS_FS_XATTR=y
      CONFIG_F2FS_FS_POSIX_ACL=y
      CONFIG_F2FS_CHECK_FS=y
      
      - Reproduces
      cc poc_13.c
      mkdir test
      mount -t f2fs tmp.img test
      cp a.out test
      cd test
      sudo ./a.out
      sync
      
      - Kernel messages
       F2FS-fs (sdb): Bitmap was wrongly set, blk:4608
       kernel BUG at fs/f2fs/segment.c:2102!
       RIP: 0010:update_sit_entry+0x394/0x410
       Call Trace:
        f2fs_allocate_data_block+0x16f/0x660
        do_write_page+0x62/0x170
        f2fs_do_write_node_page+0x33/0xa0
        __write_node_page+0x270/0x4e0
        f2fs_sync_node_pages+0x5df/0x670
        f2fs_write_checkpoint+0x372/0x1400
        f2fs_sync_fs+0xa3/0x130
        f2fs_do_sync_file+0x1a6/0x810
        do_fsync+0x33/0x60
        __x64_sys_fsync+0xb/0x10
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      sit.vblocks and sum valid block count in sit.valid_map may be
      inconsistent, segment w/ zero vblocks will be treated as free
      segment, while allocating in free segment, we may allocate a
      free block, if its bitmap is valid previously, it can cause
      kernel crash due to bitmap verification failure.
      
      Anyway, to avoid further serious metadata inconsistence and
      corruption, it is necessary and worth to detect SIT
      inconsistence. So let's enable check_block_count() to verify
      vblocks and valid_map all the time rather than do it only
      CONFIG_F2FS_CHECK_FS is enabled.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e95bcdb2
    • C
      f2fs: fix to do sanity check on valid node/block count · 7b63f72f
      Chao Yu 提交于
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203229
      
      - Overview
      When mounting the attached crafted image, following errors are reported.
      Additionally, it hangs on sync after trying to mount it.
      
      The image is intentionally fuzzed from a normal f2fs image for testing.
      Compile options for F2FS are as follows.
      CONFIG_F2FS_FS=y
      CONFIG_F2FS_STAT_FS=y
      CONFIG_F2FS_FS_XATTR=y
      CONFIG_F2FS_FS_POSIX_ACL=y
      CONFIG_F2FS_CHECK_FS=y
      
      - Reproduces
      mkdir test
      mount -t f2fs tmp.img test
      sync
      
      - Kernel message
       kernel BUG at fs/f2fs/recovery.c:591!
       RIP: 0010:recover_data+0x12d8/0x1780
       Call Trace:
        f2fs_recover_fsync_data+0x613/0x710
        f2fs_fill_super+0x1043/0x1aa0
        mount_bdev+0x16d/0x1a0
        mount_fs+0x4a/0x170
        vfs_kern_mount+0x5d/0x100
        do_mount+0x200/0xcf0
        ksys_mount+0x79/0xc0
        __x64_sys_mount+0x1c/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      With corrupted image wihch has out-of-range valid node/block count, during
      recovery, once we failed due to no free space, it will trigger kernel
      panic.
      
      Adding sanity check on valid node/block count in f2fs_sanity_check_ckpt()
      to detect such condition, so that potential panic can be avoided.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7b63f72f