1. 11 7月, 2019 1 次提交
  2. 03 7月, 2019 4 次提交
    • C
      f2fs: use generic EFSBADCRC/EFSCORRUPTED · 10f966bb
      Chao Yu 提交于
      f2fs uses EFAULT as error number to indicate filesystem is corrupted
      all the time, but generic filesystems use EUCLEAN for such condition,
      we need to change to follow others.
      
      This patch adds two new macros as below to wrap more generic error
      code macros, and spread them in code.
      
      EFSBADCRC	EBADMSG		/* Bad CRC detected */
      EFSCORRUPTED	EUCLEAN		/* Filesystem is corrupted */
      Reported-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      10f966bb
    • C
      f2fs: print kernel message if filesystem is inconsistent · 2d821c12
      Chao Yu 提交于
      As Pavel reported, once we detect filesystem inconsistency in
      f2fs_inplace_write_data(), it will be better to print kernel message as
      we did in other places.
      Reported-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2d821c12
    • J
      f2fs: introduce f2fs_<level> macros to wrap f2fs_printk() · dcbb4c10
      Joe Perches 提交于
      - Add and use f2fs_<level> macros
      - Convert f2fs_msg to f2fs_printk
      - Remove level from f2fs_printk and embed the level in the format
      - Coalesce formats and align multi-line arguments
      - Remove unnecessary duplicate extern f2fs_msg f2fs.h
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      dcbb4c10
    • Q
      f2fs: ioctl for removing a range from F2FS · 04f0b2ea
      Qiuyang Sun 提交于
      This ioctl shrinks a given length (aligned to sections) from end of the
      main area. Any cursegs and valid blocks will be moved out before
      invalidating the range.
      
      This feature can be used for adjusting partition sizes online.
      
      History of the patch:
      
      Sahitya Tummala:
       - Add this ioctl for f2fs_compat_ioctl() as well.
       - Fix debugfs status to reflect the online resize changes.
       - Fix potential race between online resize path and allocate new data
         block path or gc path.
      
      Others:
       - Rename some identifiers.
       - Add some error handling branches.
       - Clear sbi->next_victim_seg[BG_GC/FG_GC] in shrinking range.
       - Implement this interface as ext4's, and change the parameter from shrunk
      bytes to new block count of F2FS.
       - During resizing, force to empty sit_journal and forbid adding new
         entries to it, in order to avoid invalid segno in journal after resize.
       - Reduce sbi->user_block_count before resize starts.
       - Commit the updated superblock first, and then update in-memory metadata
         only when the former succeeds.
       - Target block count must align to sections.
       - Write checkpoint before and after committing the new superblock, w/o
      CP_FSCK_FLAG respectively, so that the FS can be fixed by fsck even if
      resize fails after the new superblock is committed.
       - In free_segment_range(), reduce granularity of gc_mutex.
       - Add protection on curseg migration.
       - Add freeze_bdev() and thaw_bdev() for resize fs.
       - Remove CUR_MAIN_SECS and use MAIN_SECS directly for allocation.
       - Recover super_block and FS metadata when resize fails.
       - No need to clear CP_FSCK_FLAG in update_ckpt_flags().
       - Clean up the sb and fs metadata update functions for resize_fs.
      
      Geert Uytterhoeven:
       - Use div_u64*() for 64-bit divisions
      
      Arnd Bergmann:
       - Not all architectures support get_user() with a 64-bit argument:
          ERROR: "__get_user_bad" [fs/f2fs/f2fs.ko] undefined!
          Use copy_from_user() here, this will always work.
      Signed-off-by: NQiuyang Sun <sunqiuyang@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      04f0b2ea
  3. 04 6月, 2019 2 次提交
    • D
      f2fs: Add option to limit required GC for checkpoint=disable · 4d3aed70
      Daniel Rosenberg 提交于
      This extends the checkpoint option to allow checkpoint=disable:%u[%]
      This allows you to specify what how much of the disk you are willing
      to lose access to while mounting with checkpoint=disable. If the amount
      lost would be higher, the mount will return -EAGAIN. This can be given
      as a percent of total space, or in blocks.
      
      Currently, we need to run garbage collection until the amount of holes
      is smaller than the OVP space. With the new option, f2fs can mark
      space as unusable up front instead of requiring garbage collection until
      the number of holes is small enough.
      Signed-off-by: NDaniel Rosenberg <drosen@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      4d3aed70
    • D
      f2fs: Lower threshold for disable_cp_again · ae4ad7ea
      Daniel Rosenberg 提交于
      The existing threshold for allowable holes at checkpoint=disable time is
      too high. The OVP space contains reserved segments, which are always in
      the form of free segments. These must be subtracted from the OVP value.
      
      The current threshold is meant to be the maximum value of holes of a
      single type we can have and still guarantee that we can fill the disk
      without failing to find space for a block of a given type.
      
      If the disk is full, ignoring current reserved, which only helps us,
      the amount of unused blocks is equal to the OVP area. Of that, there
      are reserved segments, which must be free segments, and the rest of the
      ovp area, which can come from either free segments or holes. The maximum
      possible amount of holes is OVP-reserved.
      
      Now, consider the disk when mounting with checkpoint=disable.
      We must be able to fill all available free space with either data or
      node blocks. When we start with checkpoint=disable, holes are locked to
      their current type. Say we have H of one type of hole, and H+X of the
      other. We can fill H of that space with arbitrary typed blocks via SSR.
      For the remaining H+X blocks, we may not have any of a given block type
      left at all. For instance, if we were to fill the disk entirely with
      blocks of the type with fewer holes, the H+X blocks of the opposite type
      would not be used. If H+X > OVP-reserved, there would be more holes than
      could possibly exist, and we would have failed to find a suitable block
      earlier on, leading to a crash in update_sit_entry.
      
      If H+X <= OVP-reserved, then the holes end up effectively masked by the OVP
      region in this case.
      Signed-off-by: NDaniel Rosenberg <drosen@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      ae4ad7ea
  4. 31 5月, 2019 2 次提交
  5. 23 5月, 2019 2 次提交
    • C
      f2fs: fix to avoid deadloop if data_flush is on · 040d2bb3
      Chao Yu 提交于
      As Hagbard Celine reported:
      
      [  615.697824] INFO: task kworker/u16:5:344 blocked for more than 120 seconds.
      [  615.697825]       Not tainted 5.0.15-gentoo-f2fslog #4
      [  615.697826] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
      disables this message.
      [  615.697827] kworker/u16:5   D    0   344      2 0x80000000
      [  615.697831] Workqueue: writeback wb_workfn (flush-259:0)
      [  615.697832] Call Trace:
      [  615.697836]  ? __schedule+0x2c5/0x8b0
      [  615.697839]  schedule+0x32/0x80
      [  615.697841]  schedule_preempt_disabled+0x14/0x20
      [  615.697842]  __mutex_lock.isra.8+0x2ba/0x4d0
      [  615.697845]  ? log_store+0xf5/0x260
      [  615.697848]  f2fs_write_data_pages+0x133/0x320
      [  615.697851]  ? trace_hardirqs_on+0x2c/0xe0
      [  615.697854]  do_writepages+0x41/0xd0
      [  615.697857]  __filemap_fdatawrite_range+0x81/0xb0
      [  615.697859]  f2fs_sync_dirty_inodes+0x1dd/0x200
      [  615.697861]  f2fs_balance_fs_bg+0x2a7/0x2c0
      [  615.697863]  ? up_read+0x5/0x20
      [  615.697865]  ? f2fs_do_write_data_page+0x2cb/0x940
      [  615.697867]  f2fs_balance_fs+0xe5/0x2c0
      [  615.697869]  __write_data_page+0x1c8/0x6e0
      [  615.697873]  f2fs_write_cache_pages+0x1e0/0x450
      [  615.697878]  f2fs_write_data_pages+0x14b/0x320
      [  615.697880]  ? trace_hardirqs_on+0x2c/0xe0
      [  615.697883]  do_writepages+0x41/0xd0
      [  615.697885]  __filemap_fdatawrite_range+0x81/0xb0
      [  615.697887]  f2fs_sync_dirty_inodes+0x1dd/0x200
      [  615.697889]  f2fs_balance_fs_bg+0x2a7/0x2c0
      [  615.697891]  f2fs_write_node_pages+0x51/0x220
      [  615.697894]  do_writepages+0x41/0xd0
      [  615.697897]  __writeback_single_inode+0x3d/0x3d0
      [  615.697899]  writeback_sb_inodes+0x1e8/0x410
      [  615.697902]  __writeback_inodes_wb+0x5d/0xb0
      [  615.697904]  wb_writeback+0x28f/0x340
      [  615.697906]  ? cpumask_next+0x16/0x20
      [  615.697908]  wb_workfn+0x33e/0x420
      [  615.697911]  process_one_work+0x1a1/0x3d0
      [  615.697913]  worker_thread+0x30/0x380
      [  615.697915]  ? process_one_work+0x3d0/0x3d0
      [  615.697916]  kthread+0x116/0x130
      [  615.697918]  ? kthread_create_worker_on_cpu+0x70/0x70
      [  615.697921]  ret_from_fork+0x3a/0x50
      
      There is still deadloop in below condition:
      
      d A
      - do_writepages
       - f2fs_write_node_pages
        - f2fs_balance_fs_bg
         - f2fs_sync_dirty_inodes
          - f2fs_write_cache_pages
           - mutex_lock(&sbi->writepages)	-- lock once
           - __write_data_page
            - f2fs_balance_fs_bg
             - f2fs_sync_dirty_inodes
              - f2fs_write_data_pages
               - mutex_lock(&sbi->writepages)	-- lock again
      
      Thread A			Thread B
      - do_writepages
       - f2fs_write_node_pages
        - f2fs_balance_fs_bg
         - f2fs_sync_dirty_inodes
          - .cp_task = current
      				- f2fs_sync_dirty_inodes
      				 - .cp_task = current
      				 - filemap_fdatawrite
      				 - .cp_task = NULL
          - filemap_fdatawrite
           - f2fs_write_cache_pages
            - enter f2fs_balance_fs_bg since .cp_task is NULL
          - .cp_task = NULL
      
      Change as below to avoid this:
      - add condition to avoid holding .writepages mutex lock in path
      of data flush
      - introduce mutex lock sbi.flush_lock to exclude concurrent data
      flush in background.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      040d2bb3
    • C
      f2fs: add bio cache for IPU · 8648de2c
      Chao Yu 提交于
      SQLite in Wal mode may trigger sequential IPU write in db-wal file, after
      commit d1b3e72d ("f2fs: submit bio of in-place-update pages"), we
      lost the chance of merging page in inner managed bio cache, result in
      submitting more small-sized IO.
      
      So let's add temporary bio in writepages() to cache mergeable write IO as
      much as possible.
      
      Test case:
      1. xfs_io -f /mnt/f2fs/file -c "pwrite 0 65536" -c "fsync"
      2. xfs_io -f /mnt/f2fs/file -c "pwrite 0 65536" -c "fsync"
      
      Before:
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65544, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65552, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65560, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65568, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65576, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65584, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65592, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65600, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65608, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65616, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65624, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65632, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65640, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65648, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65656, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65664, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), NODE, sector = 57352, size = 4096
      
      After:
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65544, size = 65536
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), NODE, sector = 57368, size = 4096
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      8648de2c
  6. 09 5月, 2019 3 次提交
    • C
      f2fs: fix to avoid potential race on sbi->unusable_block_count access/update · c9c8ed50
      Chao Yu 提交于
      Use sbi.stat_lock to protect sbi->unusable_block_count accesss/udpate, in
      order to avoid potential race on it.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c9c8ed50
    • C
      f2fs: introduce DATA_GENERIC_ENHANCE · 93770ab7
      Chao Yu 提交于
      Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check
      whether @blkaddr locates in main area or not.
      
      That check is weak, since the block address in range of main area can
      point to the address which is not valid in segment info table, and we
      can not detect such condition, we may suffer worse corruption as system
      continues running.
      
      So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check
      which trigger SIT bitmap check rather than only range check.
      
      This patch did below changes as wel:
      - set SBI_NEED_FSCK in f2fs_is_valid_blkaddr().
      - get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid.
      - introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check.
      - spread blkaddr check in:
       * f2fs_get_node_info()
       * __read_out_blkaddrs()
       * f2fs_submit_page_read()
       * ra_data_block()
       * do_recover_data()
      
      This patch can fix bug reported from bugzilla below:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203215
      https://bugzilla.kernel.org/show_bug.cgi?id=203223
      https://bugzilla.kernel.org/show_bug.cgi?id=203231
      https://bugzilla.kernel.org/show_bug.cgi?id=203235
      https://bugzilla.kernel.org/show_bug.cgi?id=203241
      
      = Update by Jaegeuk Kim =
      
      DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths.
      But, xfstest/generic/446 compalins some generated kernel messages saying invalid
      bitmap was detected when reading a block. The reaons is, when we get the
      block addresses from extent_cache, there is no lock to synchronize it from
      truncating the blocks in parallel.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      93770ab7
    • C
      f2fs: fix to avoid panic in f2fs_inplace_write_data() · 05573d6c
      Chao Yu 提交于
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203239
      
      - Overview
      When mounting the attached crafted image and running program, following errors are reported.
      Additionally, it hangs on sync after running program.
      
      The image is intentionally fuzzed from a normal f2fs image for testing.
      Compile options for F2FS are as follows.
      CONFIG_F2FS_FS=y
      CONFIG_F2FS_STAT_FS=y
      CONFIG_F2FS_FS_XATTR=y
      CONFIG_F2FS_FS_POSIX_ACL=y
      CONFIG_F2FS_CHECK_FS=y
      
      - Reproduces
      cc poc_15.c
      ./run.sh f2fs
      sync
      
      - Kernel messages
       ------------[ cut here ]------------
       kernel BUG at fs/f2fs/segment.c:3162!
       RIP: 0010:f2fs_inplace_write_data+0x12d/0x160
       Call Trace:
        f2fs_do_write_data_page+0x3c1/0x820
        __write_data_page+0x156/0x720
        f2fs_write_cache_pages+0x20d/0x460
        f2fs_write_data_pages+0x1b4/0x300
        do_writepages+0x15/0x60
        __filemap_fdatawrite_range+0x7c/0xb0
        file_write_and_wait_range+0x2c/0x80
        f2fs_do_sync_file+0x102/0x810
        do_fsync+0x33/0x60
        __x64_sys_fsync+0xb/0x10
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The reason is f2fs_inplace_write_data() will trigger kernel panic due
      to data block locates in node type segment.
      
      To avoid panic, let's just return error code and set SBI_NEED_FSCK to
      give a hint to fsck for latter repairing.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      05573d6c
  7. 06 4月, 2019 3 次提交
    • D
      f2fs: improve discard handling with multi-device volumes · 7f3d7719
      Damien Le Moal 提交于
      f2fs_hw_support_discard() only tests if the super block device supports
      discard. However, for a multi-device volume, not all disks used may
      support discard. Improve the check performed to test all devices of
      the volume and report discard as supported if at least one device of
      the volume supports discard. To implement this, introduce the helper
      function f2fs_bdev_support_discard(), which returns true for zoned block
      devices (where discard is processed as a zone reset) and for regular
      disks supporting the discard command.
      
      f2fs_bdev_support_discard() is also used in __queue_discard_cmd() to
      handle discard command issuing for a particular device of the volume.
      That is, prevent issuing a discard command for block devices that do
      not support it.
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7f3d7719
    • D
      f2fs: Reduce zoned block device memory usage · 95175daf
      Damien Le Moal 提交于
      For zoned block devices, an array of zone types for each device is
      allocated and initialized in order to determine if a section is stored
      on a sequential zone (zone reset needed) or a conventional zone (no
      zone reset needed and regular discard applies). Considering this usage,
      the zone types stored in memory can be replaced with a bitmap to
      indicate an equivalent information, that is, if a zone is sequential or
      not. This reduces the memory usage for each zoned device by roughly 8:
      on a 14TB disk with zones of 256 MB, the zone type array consumes
      13x4KB pages while the bitmap uses only 2x4KB pages.
      
      This patch changes the f2fs_dev_info structure blkz_type field to the
      bitmap blkz_seq. Access to this bitmap is done using the helper
      function f2fs_blkz_is_seq(), which is a rewrite of the function
      get_blkz_type().
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      95175daf
    • D
      f2fs: Fix use of number of devices · 0916878d
      Damien Le Moal 提交于
      For a single device mount using a zoned block device, the zone
      information for the device is stored in the sbi->devs single entry
      array and sbi->s_ndevs is set to 1. This differs from a single device
      mount using a regular block device which does not allocate sbi->devs
      and sets sbi->s_ndevs to 0.
      
      However, sbi->s_devs == 0 condition is used throughout the code to
      differentiate a single device mount from a multi-device mount where
      sbi->s_ndevs is always larger than 1. This results in problems with
      single zoned block device volumes as these are treated as multi-device
      mounts but do not have the start_blk and end_blk information set. One
      of the problem observed is skipping of zone discard issuing resulting in
      write commands being issued to full zones or unaligned to a zone write
      pointer.
      
      Fix this problem by simply treating the cases sbi->s_ndevs == 0 (single
      regular block device mount) and sbi->s_ndevs == 1 (single zoned block
      device mount) in the same manner. This is done by introducing the
      helper function f2fs_is_multi_device() and using this helper in place
      of direct tests of sbi->s_ndevs value, improving code readability.
      
      Fixes: 7bb3a371 ("f2fs: Fix zoned block device support")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0916878d
  8. 13 3月, 2019 4 次提交
    • C
      f2fs: fix to add refcount once page is tagged PG_private · 240a5915
      Chao Yu 提交于
      As Gao Xiang reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=202749
      
      f2fs may skip pageout() due to incorrect page reference count.
      
      The problem here is that MM defined the rule [1] very clearly that
      once page was set with PG_private flag, we should increment the
      refcount in that page, also main flows like pageout(), migrate_page()
      will assume there is one additional page reference count if
      page_has_private() returns true.
      
      But currently, f2fs won't add/del refcount when changing PG_private
      flag. Anyway, f2fs should follow MM's rule to make MM's related flows
      running as expected.
      
      [1] https://lore.kernel.org/lkml/2b19b3c4-2bc4-15fa-15cc-27a13e5c7af1@aol.com/Reported-by: NGao Xiang <gaoxiang25@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      240a5915
    • C
      f2fs: fix to avoid deadlock of atomic file operations · 48432984
      Chao Yu 提交于
      Thread A				Thread B
      - __fput
       - f2fs_release_file
        - drop_inmem_pages
         - mutex_lock(&fi->inmem_lock)
         - __revoke_inmem_pages
          - lock_page(page)
      					- open
      					- f2fs_setattr
      					- truncate_setsize
      					 - truncate_inode_pages_range
      					  - lock_page(page)
      					  - truncate_cleanup_page
      					   - f2fs_invalidate_page
      					    - drop_inmem_page
      					    - mutex_lock(&fi->inmem_lock);
      
      We may encounter above ABBA deadlock as reported by Kyungtae Kim:
      
      I'm reporting a bug in linux-4.17.19: "INFO: task hung in
      drop_inmem_page" (no reproducer)
      
      I think this might be somehow related to the following:
      https://groups.google.com/forum/#!searchin/syzkaller-bugs/INFO$3A$20task$20hung$20in$20%7Csort:date/syzkaller-bugs/c6soBTrdaIo/AjAzPeIzCgAJ
      
      =========================================
      INFO: task syz-executor7:10822 blocked for more than 120 seconds.
            Not tainted 4.17.19 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      syz-executor7   D27024 10822   6346 0x00000004
      Call Trace:
       context_switch kernel/sched/core.c:2867 [inline]
       __schedule+0x721/0x1e60 kernel/sched/core.c:3515
       schedule+0x88/0x1c0 kernel/sched/core.c:3559
       schedule_preempt_disabled+0x18/0x30 kernel/sched/core.c:3617
       __mutex_lock_common kernel/locking/mutex.c:833 [inline]
       __mutex_lock+0x5bd/0x1410 kernel/locking/mutex.c:893
       mutex_lock_nested+0x1b/0x20 kernel/locking/mutex.c:908
       drop_inmem_page+0xcb/0x810 fs/f2fs/segment.c:327
       f2fs_invalidate_page+0x337/0x5e0 fs/f2fs/data.c:2401
       do_invalidatepage mm/truncate.c:165 [inline]
       truncate_cleanup_page+0x261/0x330 mm/truncate.c:187
       truncate_inode_pages_range+0x552/0x1610 mm/truncate.c:367
       truncate_inode_pages mm/truncate.c:478 [inline]
       truncate_pagecache+0x6d/0x90 mm/truncate.c:801
       truncate_setsize+0x81/0xa0 mm/truncate.c:826
       f2fs_setattr+0x44f/0x1270 fs/f2fs/file.c:781
       notify_change+0xa62/0xe80 fs/attr.c:313
       do_truncate+0x12e/0x1e0 fs/open.c:63
       do_last fs/namei.c:2955 [inline]
       path_openat+0x2042/0x29f0 fs/namei.c:3505
       do_filp_open+0x1bd/0x2c0 fs/namei.c:3540
       do_sys_open+0x35e/0x4e0 fs/open.c:1101
       __do_sys_open fs/open.c:1119 [inline]
       __se_sys_open fs/open.c:1114 [inline]
       __x64_sys_open+0x89/0xc0 fs/open.c:1114
       do_syscall_64+0xc4/0x4e0 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x4497b9
      RSP: 002b:00007f734e459c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
      RAX: ffffffffffffffda RBX: 00007f734e45a6cc RCX: 00000000004497b9
      RDX: 0000000000000104 RSI: 00000000000a8280 RDI: 0000000020000080
      RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 0000000000007230 R14: 00000000006f02d0 R15: 00007f734e45a700
      INFO: task syz-executor7:10858 blocked for more than 120 seconds.
            Not tainted 4.17.19 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      syz-executor7   D28880 10858   6346 0x00000004
      Call Trace:
       context_switch kernel/sched/core.c:2867 [inline]
       __schedule+0x721/0x1e60 kernel/sched/core.c:3515
       schedule+0x88/0x1c0 kernel/sched/core.c:3559
       __rwsem_down_write_failed_common kernel/locking/rwsem-xadd.c:565 [inline]
       rwsem_down_write_failed+0x5e6/0xc90 kernel/locking/rwsem-xadd.c:594
       call_rwsem_down_write_failed+0x17/0x30 arch/x86/lib/rwsem.S:117
       __down_write arch/x86/include/asm/rwsem.h:142 [inline]
       down_write+0x58/0xa0 kernel/locking/rwsem.c:72
       inode_lock include/linux/fs.h:713 [inline]
       do_truncate+0x120/0x1e0 fs/open.c:61
       do_last fs/namei.c:2955 [inline]
       path_openat+0x2042/0x29f0 fs/namei.c:3505
       do_filp_open+0x1bd/0x2c0 fs/namei.c:3540
       do_sys_open+0x35e/0x4e0 fs/open.c:1101
       __do_sys_open fs/open.c:1119 [inline]
       __se_sys_open fs/open.c:1114 [inline]
       __x64_sys_open+0x89/0xc0 fs/open.c:1114
       do_syscall_64+0xc4/0x4e0 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x4497b9
      RSP: 002b:00007f734e3b4c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
      RAX: ffffffffffffffda RBX: 00007f734e3b56cc RCX: 00000000004497b9
      RDX: 0000000000000104 RSI: 00000000000a8280 RDI: 0000000020000080
      RBP: 000000000071c238 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 0000000000007230 R14: 00000000006f02d0 R15: 00007f734e3b5700
      INFO: task syz-executor5:10829 blocked for more than 120 seconds.
            Not tainted 4.17.19 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      syz-executor5   D28760 10829   6308 0x80000002
      Call Trace:
       context_switch kernel/sched/core.c:2867 [inline]
       __schedule+0x721/0x1e60 kernel/sched/core.c:3515
       schedule+0x88/0x1c0 kernel/sched/core.c:3559
       io_schedule+0x21/0x80 kernel/sched/core.c:5179
       wait_on_page_bit_common mm/filemap.c:1100 [inline]
       __lock_page+0x2b5/0x390 mm/filemap.c:1273
       lock_page include/linux/pagemap.h:483 [inline]
       __revoke_inmem_pages+0xb35/0x11c0 fs/f2fs/segment.c:231
       drop_inmem_pages+0xa3/0x3e0 fs/f2fs/segment.c:306
       f2fs_release_file+0x2c7/0x330 fs/f2fs/file.c:1556
       __fput+0x2c7/0x780 fs/file_table.c:209
       ____fput+0x1a/0x20 fs/file_table.c:243
       task_work_run+0x151/0x1d0 kernel/task_work.c:113
       exit_task_work include/linux/task_work.h:22 [inline]
       do_exit+0x8ba/0x30a0 kernel/exit.c:865
       do_group_exit+0x13b/0x3a0 kernel/exit.c:968
       get_signal+0x6bb/0x1650 kernel/signal.c:2482
       do_signal+0x84/0x1b70 arch/x86/kernel/signal.c:810
       exit_to_usermode_loop+0x155/0x190 arch/x86/entry/common.c:162
       prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
       syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
       do_syscall_64+0x445/0x4e0 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x4497b9
      RSP: 002b:00007f1c68e74ce8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
      RAX: fffffffffffffe00 RBX: 000000000071bf80 RCX: 00000000004497b9
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000071bf80
      RBP: 000000000071bf80 R08: 0000000000000000 R09: 000000000071bf58
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 0000000000000000 R14: 00007f1c68e759c0 R15: 00007f1c68e75700
      
      This patch tries to use trylock_page to mitigate such deadlock condition
      for fix.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      48432984
    • C
      f2fs: fix to update iostat correctly in IPU path · e46f6bd8
      Chao Yu 提交于
      In error path of IPU, we didn't account iostat correctly, fix it.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e46f6bd8
    • C
      f2fs: make fault injection covering __submit_flush_wait() · dc37910d
      Chao Yu 提交于
      This patch changes to allow failure of f2fs_bio_alloc() in
      __submit_flush_wait(), which can simulate flush error in checkpoint()
      for covering more error paths.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      dc37910d
  9. 16 2月, 2019 2 次提交
  10. 05 2月, 2019 1 次提交
  11. 27 12月, 2018 5 次提交
  12. 14 12月, 2018 1 次提交
  13. 27 11月, 2018 3 次提交
  14. 23 10月, 2018 2 次提交
  15. 17 10月, 2018 2 次提交
    • C
      f2fs: use rb_*_cached friends · 4dada3fd
      Chao Yu 提交于
      As rbtree supports caching leftmost node natively, update f2fs codes
      to use rb_*_cached helpers to speed up leftmost node visiting.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      4dada3fd
    • D
      f2fs: checkpoint disabling · 4354994f
      Daniel Rosenberg 提交于
      Note that, it requires "f2fs: return correct errno in f2fs_gc".
      
      This adds a lightweight non-persistent snapshotting scheme to f2fs.
      
      To use, mount with the option checkpoint=disable, and to return to
      normal operation, remount with checkpoint=enable. If the filesystem
      is shut down before remounting with checkpoint=enable, it will revert
      back to its apparent state when it was first mounted with
      checkpoint=disable. This is useful for situations where you wish to be
      able to roll back the state of the disk in case of some critical
      failure.
      Signed-off-by: NDaniel Rosenberg <drosen@google.com>
      [Jaegeuk Kim: use SB_RDONLY instead of MS_RDONLY]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      4354994f
  16. 01 10月, 2018 2 次提交
  17. 29 9月, 2018 1 次提交