1. 01 10月, 2019 1 次提交
  2. 15 6月, 2019 2 次提交
    • C
      f2fs: fix to use inline space only if inline_xattr is enable · 101e48fe
      Chao Yu 提交于
      [ Upstream commit 622927f3b8809206f6da54a6a7ed4df1a7770fce ]
      
      With below mkfs and mount option:
      
      MKFS_OPTIONS  -- -O extra_attr -O project_quota -O inode_checksum -O flexible_inline_xattr -O inode_crtime -f
      MOUNT_OPTIONS -- -o noinline_xattr
      
      We may miss xattr data with below testcase:
      - mkdir dir
      - setfattr -n "user.name" -v 0 dir
      - for ((i = 0; i < 190; i++)) do touch dir/$i; done
      - umount
      - mount
      - getfattr -n "user.name" dir
      
      user.name: No such attribute
      
      The root cause is that we persist xattr data into reserved inline xattr
      space, even if inline_xattr is not enable in inline directory inode, after
      inline dentry conversion, reserved space no longer exists, so that xattr
      data missed.
      
      Let's use inline xattr space only if inline_xattr flag is set on inode
      to fix this iusse.
      
      Fixes: 6afc662e ("f2fs: support flexible inline xattr size")
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      101e48fe
    • C
      f2fs: fix to avoid panic in dec_valid_block_count() · 45624f0e
      Chao Yu 提交于
      [ Upstream commit 5e159cd349bf3a31fb7e35c23a93308eb30f4f71 ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203209
      
      - Overview
      When mounting the attached crafted image and running program, I got this error.
      Additionally, it hangs on sync after the this script.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_01.c
      ./run.sh f2fs
      sync
      
       kernel BUG at fs/f2fs/f2fs.h:1788!
       RIP: 0010:f2fs_truncate_data_blocks_range+0x342/0x350
       Call Trace:
        f2fs_truncate_blocks+0x36d/0x3c0
        f2fs_truncate+0x88/0x110
        f2fs_setattr+0x3e1/0x460
        notify_change+0x2da/0x400
        do_truncate+0x6d/0xb0
        do_sys_ftruncate+0xf1/0x160
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The reason is dec_valid_block_count() will trigger kernel panic due to
      inconsistent count in between inode.i_blocks and actual block.
      
      To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
      give a hint to fsck for latter repairing.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix build warning and add unlikely]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      45624f0e
  3. 31 5月, 2019 1 次提交
    • D
      f2fs: Fix use of number of devices · 70d33cce
      Damien Le Moal 提交于
      commit 0916878da355650d7e77104a7ac0fa1784eca852 upstream.
      
      For a single device mount using a zoned block device, the zone
      information for the device is stored in the sbi->devs single entry
      array and sbi->s_ndevs is set to 1. This differs from a single device
      mount using a regular block device which does not allocate sbi->devs
      and sets sbi->s_ndevs to 0.
      
      However, sbi->s_devs == 0 condition is used throughout the code to
      differentiate a single device mount from a multi-device mount where
      sbi->s_ndevs is always larger than 1. This results in problems with
      single zoned block device volumes as these are treated as multi-device
      mounts but do not have the start_blk and end_blk information set. One
      of the problem observed is skipping of zone discard issuing resulting in
      write commands being issued to full zones or unaligned to a zone write
      pointer.
      
      Fix this problem by simply treating the cases sbi->s_ndevs == 0 (single
      regular block device mount) and sbi->s_ndevs == 1 (single zoned block
      device mount) in the same manner. This is done by introducing the
      helper function f2fs_is_multi_device() and using this helper in place
      of direct tests of sbi->s_ndevs value, improving code readability.
      
      Fixes: 7bb3a371 ("f2fs: Fix zoned block device support")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      70d33cce
  4. 20 4月, 2019 1 次提交
    • C
      f2fs: fix to avoid NULL pointer dereference on se->discard_map · f9368366
      Chao Yu 提交于
      [ Upstream commit 7d20c8abb2edcf962ca857d51f4d0f9cd4b19053 ]
      
      https://bugzilla.kernel.org/show_bug.cgi?id=200951
      
      These is a NULL pointer dereference issue reported in bugzilla:
      
      Hi,
      in the setup there is a SATA SSD connected to a SATA-to-USB bridge.
      
      The disc is "Samsung SSD 850 PRO 256G" which supports TRIM.
      There are four partitions:
       sda1: FAT  /boot
       sda2: F2FS /
       sda3: F2FS /home
       sda4: F2FS
      
      The bridge is ASMT1153e which uses the "uas" driver.
      There is no TRIM pass-through, so, when mounting it reports:
       mounting with "discard" option, but the device does not support discard
      
      The USB host is USB3.0 and UASP capable. It is the one on RK3399.
      
      Given this everything works fine, except there is no TRIM support.
      
      In order to enable TRIM a new UDEV rule is added [1]:
       /etc/udev/rules.d/10-sata-bridge-trim.rules:
       ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap"
      After reboot any F2FS write hangs forever and dmesg reports:
       Unable to handle kernel NULL pointer dereference
      
      Also tested on a x86_64 system: works fine even with TRIM enabled.
       same disc
       same bridge
       different usb host controller
       different cpu architecture
       not root filesystem
      
      Regards,
        Vicenç.
      
      [1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280
      
       Unable to handle kernel NULL pointer dereference at virtual address 000000000000003e
       Mem abort info:
         ESR = 0x96000004
         Exception class = DABT (current EL), IL = 32 bits
         SET = 0, FnV = 0
         EA = 0, S1PTW = 0
       Data abort info:
         ISV = 0, ISS = 0x00000004
         CM = 0, WnR = 0
       user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000626e3122
       [000000000000003e] pgd=0000000000000000
       Internal error: Oops: 96000004 [#1] SMP
       Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys hid_kensington
       CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1
       Hardware name: Sapphire-RK3399 Board (DT)
       pstate: 00000005 (nzcv daif -PAN -UAO)
       pc : update_sit_entry+0x304/0x4b0
       lr : update_sit_entry+0x108/0x4b0
       sp : ffff00000ca13bd0
       x29: ffff00000ca13bd0 x28: 000000000000003e
       x27: 0000000000000020 x26: 0000000000080000
       x25: 0000000000000048 x24: ffff8000ebb85cf8
       x23: 0000000000000253 x22: 00000000ffffffff
       x21: 00000000000535f2 x20: 00000000ffffffdf
       x19: ffff8000eb9e6800 x18: ffff8000eb9e6be8
       x17: 0000000007ce6926 x16: 000000001c83ffa8
       x15: 0000000000000000 x14: ffff8000f602df90
       x13: 0000000000000006 x12: 0000000000000040
       x11: 0000000000000228 x10: 0000000000000000
       x9 : 0000000000000000 x8 : 0000000000000000
       x7 : 00000000000535f2 x6 : ffff8000ebff3440
       x5 : ffff8000ebff3440 x4 : ffff8000ebe3a6c8
       x3 : 00000000ffffffff x2 : 0000000000000020
       x1 : 0000000000000000 x0 : ffff8000eb9e5800
       Process nvim (pid: 957, stack limit = 0x0000000063a78320)
       Call trace:
        update_sit_entry+0x304/0x4b0
        f2fs_invalidate_blocks+0x98/0x140
        truncate_node+0x90/0x400
        f2fs_remove_inode_page+0xe8/0x340
        f2fs_evict_inode+0x2b0/0x408
        evict+0xe0/0x1e0
        iput+0x160/0x260
        do_unlinkat+0x214/0x298
        __arm64_sys_unlinkat+0x3c/0x68
        el0_svc_handler+0x94/0x118
        el0_svc+0x8/0xc
       Code: f9400800 b9488400 36080140 f9400f01 (387c4820)
       ---[ end trace a0f21a307118c477 ]---
      
      The reason is it is possible to enable discard flag on block queue via
      UDEV, but during mount, f2fs will initialize se->discard_map only if
      this flag is set, once the flag is set after mount, f2fs may dereference
      NULL pointer on se->discard_map.
      
      So this patch does below changes to fix this issue:
      - initialize and update se->discard_map all the time.
      - don't clear DISCARD option if device has no QUEUE_FLAG_DISCARD flag
      during mount.
      - don't issue small discard on zoned block device.
      - introduce some functions to enhance the readability.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Tested-by: NVicente Bergas <vicencb@gmail.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      f9368366
  5. 06 4月, 2019 1 次提交
    • C
      f2fs: fix to check inline_xattr_size boundary correctly · 4ab78f4d
      Chao Yu 提交于
      [ Upstream commit 500e0b28ecd3c5aade98f3c3a339d18dcb166bb6 ]
      
      We use below condition to check inline_xattr_size boundary:
      
      	if (!F2FS_OPTION(sbi).inline_xattr_size ||
      		F2FS_OPTION(sbi).inline_xattr_size >=
      				DEF_ADDRS_PER_INODE -
      				F2FS_TOTAL_EXTRA_ATTR_SIZE -
      				DEF_INLINE_RESERVED_SIZE -
      				DEF_MIN_INLINE_SIZE)
      
      There is there problems in that check:
      - we should allow inline_xattr_size equaling to min size of inline
      {data,dentry} area.
      - F2FS_TOTAL_EXTRA_ATTR_SIZE and inline_xattr_size are based on
      different size unit, previous one is 4 bytes, latter one is 1 bytes.
      - DEF_MIN_INLINE_SIZE only indicate min size of inline data area,
      however, we need to consider min size of inline dentry area as well,
      minimal inline dentry should at least contain two entries: '.' and
      '..', so that min inline_dentry size is 40 bytes.
      
      .bitmap		1 * 1 = 1
      .reserved	1 * 1 = 1
      .dentry		11 * 2 = 22
      .filename	8 * 2 = 16
      total		40
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4ab78f4d
  6. 13 2月, 2019 1 次提交
    • S
      f2fs: fix sbi->extent_list corruption issue · df13b036
      Sahitya Tummala 提交于
      [ Upstream commit e4589fa545e0020dbbc3c9bde35f35f949901392 ]
      
      When there is a failure in f2fs_fill_super() after/during
      the recovery of fsync'd nodes, it frees the current sbi and
      retries again. This time the mount is successful, but the files
      that got recovered before retry, still holds the extent tree,
      whose extent nodes list is corrupted since sbi and sbi->extent_list
      is freed up. The list_del corruption issue is observed when the
      file system is getting unmounted and when those recoverd files extent
      node is being freed up in the below context.
      
      list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)
      <...>
      kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
      lr : __list_del_entry_valid+0x94/0xb4
      pc : __list_del_entry_valid+0x94/0xb4
      <...>
      Call trace:
      __list_del_entry_valid+0x94/0xb4
      __release_extent_node+0xb0/0x114
      __free_extent_tree+0x58/0x7c
      f2fs_shrink_extent_tree+0xdc/0x3b0
      f2fs_leave_shrinker+0x28/0x7c
      f2fs_put_super+0xfc/0x1e0
      generic_shutdown_super+0x70/0xf4
      kill_block_super+0x2c/0x5c
      kill_f2fs_super+0x44/0x50
      deactivate_locked_super+0x60/0x8c
      deactivate_super+0x68/0x74
      cleanup_mnt+0x40/0x78
      __cleanup_mnt+0x1c/0x28
      task_work_run+0x48/0xd0
      do_notify_resume+0x678/0xe98
      work_pending+0x8/0x14
      
      Fix this by not creating extents for those recovered files if shrinker is
      not registered yet. Once mount is successful and shrinker is registered,
      those files can have extents again.
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      df13b036
  7. 14 11月, 2018 2 次提交
    • C
      f2fs: fix to flush all dirty inodes recovered in readonly fs · cd295fdd
      Chao Yu 提交于
      [ Upstream commit 1378752b9921e60749eaf18ec6c47b33f9001abb ]
      
      generic/417 reported as blow:
      
      ------------[ cut here ]------------
      kernel BUG at /home/yuchao/git/devf2fs/inode.c:695!
      invalid opcode: 0000 [#1] PREEMPT SMP
      CPU: 1 PID: 21697 Comm: umount Tainted: G        W  O      4.18.0-rc2+ #39
      Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      EIP: f2fs_evict_inode+0x556/0x580 [f2fs]
      Call Trace:
       ? _raw_spin_unlock+0x2c/0x50
       evict+0xa8/0x170
       dispose_list+0x34/0x40
       evict_inodes+0x118/0x120
       generic_shutdown_super+0x41/0x100
       ? rcu_read_lock_sched_held+0x97/0xa0
       kill_block_super+0x22/0x50
       kill_f2fs_super+0x6f/0x80 [f2fs]
       deactivate_locked_super+0x3d/0x70
       deactivate_super+0x40/0x60
       cleanup_mnt+0x39/0x70
       __cleanup_mnt+0x10/0x20
       task_work_run+0x81/0xa0
       exit_to_usermode_loop+0x59/0xa7
       do_fast_syscall_32+0x1f5/0x22c
       entry_SYSENTER_32+0x53/0x86
      EIP: f2fs_evict_inode+0x556/0x580 [f2fs]
      
      It can simply reproduced with scripts:
      
      Enable quota feature during mkfs.
      
      Testcase1:
      1. mkfs.f2fs /dev/zram0
      2. mount -t f2fs /dev/zram0 /mnt/f2fs
      3. xfs_io -f /mnt/f2fs/file -c "pwrite 0 4k" -c "fsync"
      4. godown /mnt/f2fs
      5. umount /mnt/f2fs
      6. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
      7. umount /mnt/f2fs
      
      Testcase2:
      1. mkfs.f2fs /dev/zram0
      2. mount -t f2fs /dev/zram0 /mnt/f2fs
      3. touch /mnt/f2fs/file
      4. create process[pid = x] do:
      	a) open /mnt/f2fs/file;
      	b) unlink /mnt/f2fs/file
      5. godown -f /mnt/f2fs
      6. kill process[pid = x]
      7. umount /mnt/f2fs
      8. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
      9. umount /mnt/f2fs
      
      The reason is: during recovery, i_{c,m}time of inode will be updated, then
      the inode can be set dirty w/o being tracked in sbi->inode_list[DIRTY_META]
      global list, so later write_checkpoint will not flush such dirty inode into
      node page.
      
      Once umount is called, sync_filesystem() in generic_shutdown_super() will
      skip syncng dirty inodes due to sb_rdonly check, leaving dirty inodes
      there.
      
      To solve this issue, during umount, add remove SB_RDONLY flag in
      sb->s_flags, to make sure sync_filesystem() will not be skipped.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd295fdd
    • Z
      f2fs: avoid sleeping under spin_lock · 207093ca
      Zhikang Zhang 提交于
      [ Upstream commit b430f7263673eab1dc40e662ae3441a9619d16b8 ]
      
      In the call trace below, we might sleep in function dput().
      
      So in order to avoid sleeping under spin_lock, we remove f2fs_mark_inode_dirty_sync
      from __try_update_largest_extent && __drop_largest_extent.
      
      BUG: sleeping function called from invalid context at fs/dcache.c:796
      Call trace:
      	dump_backtrace+0x0/0x3f4
      	show_stack+0x24/0x30
      	dump_stack+0xe0/0x138
      	___might_sleep+0x2a8/0x2c8
      	__might_sleep+0x78/0x10c
      	dput+0x7c/0x750
      	block_dump___mark_inode_dirty+0x120/0x17c
      	__mark_inode_dirty+0x344/0x11f0
      	f2fs_mark_inode_dirty_sync+0x40/0x50
      	__insert_extent_tree+0x2e0/0x2f4
      	f2fs_update_extent_tree_range+0xcf4/0xde8
      	f2fs_update_extent_cache+0x114/0x12c
      	f2fs_update_data_blkaddr+0x40/0x50
      	write_data_page+0x150/0x314
      	do_write_data_page+0x648/0x2318
      	__write_data_page+0xdb4/0x1640
      	f2fs_write_cache_pages+0x768/0xafc
      	__f2fs_write_data_pages+0x590/0x1218
      	f2fs_write_data_pages+0x64/0x74
      	do_writepages+0x74/0xe4
      	__writeback_single_inode+0xdc/0x15f0
      	writeback_sb_inodes+0x574/0xc98
      	__writeback_inodes_wb+0x190/0x204
      	wb_writeback+0x730/0xf14
      	wb_check_old_data_flush+0x1bc/0x1c8
      	wb_workfn+0x554/0xf74
      	process_one_work+0x440/0x118c
      	worker_thread+0xac/0x974
      	kthread+0x1a0/0x1c8
      	ret_from_fork+0x10/0x1c
      Signed-off-by: NZhikang Zhang <zhangzhikang1@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      207093ca
  8. 21 8月, 2018 2 次提交
  9. 15 8月, 2018 1 次提交
    • A
      f2fs: rework fault injection handling to avoid a warning · 7fa750a1
      Arnd Bergmann 提交于
      When CONFIG_F2FS_FAULT_INJECTION is disabled, we get a warning about an
      unused label:
      
      fs/f2fs/segment.c: In function '__submit_discard_cmd':
      fs/f2fs/segment.c:1059:1: error: label 'submit' defined but not used [-Werror=unused-label]
      
      This could be fixed by adding another #ifdef around it, but the more
      reliable way of doing this seems to be to remove the other #ifdefs
      where that is easily possible.
      
      By defining time_to_inject() as a trivial stub, most of the checks for
      CONFIG_F2FS_FAULT_INJECTION can go away. This also leads to nicer
      formatting of the code.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7fa750a1
  10. 14 8月, 2018 4 次提交
  11. 11 8月, 2018 2 次提交
    • C
      f2fs: fix to avoid broken of dnode block list · 50fa53ec
      Chao Yu 提交于
      f2fs recovery flow is relying on dnode block link list, it means fsynced
      file recovery depends on previous dnode's persistence in the list, so
      during fsync() we should wait on all regular inode's dnode writebacked
      before issuing flush.
      
      By this way, we can avoid dnode block list being broken by out-of-order
      IO submission due to IO scheduler or driver.
      
      Sheng Yong helps to do the test with this patch:
      
      Target:/data (f2fs, -)
      64MB / 32768KB / 4KB / 8
      
      1 / PERSIST / Index
      
      Base:
      	SEQ-RD(MB/s)	SEQ-WR(MB/s)	RND-RD(IOPS)	RND-WR(IOPS)	Insert(TPS)	Update(TPS)	Delete(TPS)
      1	867.82		204.15		41440.03	41370.54	680.8		1025.94		1031.08
      2	871.87		205.87		41370.3		40275.2		791.14		1065.84		1101.7
      3	866.52		205.69		41795.67	40596.16	694.69		1037.16		1031.48
      Avg	868.7366667	205.2366667	41535.33333	40747.3		722.21		1042.98		1054.753333
      
      After:
      	SEQ-RD(MB/s)	SEQ-WR(MB/s)	RND-RD(IOPS)	RND-WR(IOPS)	Insert(TPS)	Update(TPS)	Delete(TPS)
      1	798.81		202.5		41143		40613.87	602.71		838.08		913.83
      2	805.79		206.47		40297.2		41291.46	604.44		840.75		924.27
      3	814.83		206.17		41209.57	40453.62	602.85		834.66		927.91
      Avg	806.4766667	205.0466667	40883.25667	40786.31667	603.3333333	837.83		922.0033333
      
      Patched/Original:
      	0.928332713	0.999074239	0.984300676	1.000957528	0.835398753	0.803303994	0.874141189
      
      It looks like atomic write will suffer performance regression.
      
      I suspect that the criminal is that we forcing to wait all dnode being in
      storage cache before we issue PREFLUSH+FUA.
      
      BTW, will commit ("f2fs: don't need to wait for node writes for atomic write")
      cause the problem: we will lose data of last transaction after SPO, even if
      atomic write return no error:
      
      - atomic_open();
      - write() P1, P2, P3;
      - atomic_commit();
       - writeback data: P1, P2, P3;
       - writeback node: N1, N2, N3;  <--- If N1, N2 is not writebacked, N3 with fsync_mark is
      writebacked, In SPOR, we won't find N3 since node chain is broken, turns out that losing
      last transaction.
       - preflush + fua;
      - power-cut
      
      If we don't wait dnode writeback for atomic_write:
      
      	SEQ-RD(MB/s)	SEQ-WR(MB/s)	RND-RD(IOPS)	RND-WR(IOPS)	Insert(TPS)	Update(TPS)	Delete(TPS)
      1	779.91		206.03		41621.5		40333.16	716.9		1038.21		1034.85
      2	848.51		204.35		40082.44	39486.17	791.83		1119.96		1083.77
      3	772.12		206.27		41335.25	41599.65	723.29		1055.07		971.92
      Avg	800.18		205.55		41013.06333	40472.99333	744.0066667	1071.08		1030.18
      
      Patched/Original:
      	0.92108464	1.001526693	0.987425886	0.993268102	1.030180511	1.026942031	0.976702294
      
      SQLite's performance recovers.
      
      Jaegeuk:
      "Practically, I don't see db corruption becase of this. We can excuse to lose
      the last transaction."
      
      Finally, we decide to keep original implementation of atomic write interface
      sematics that we don't wait all dnode writeback before preflush+fua submission.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      50fa53ec
    • G
      f2fs: use true and false for boolean values · 6e45f2a5
      Gustavo A. R. Silva 提交于
      Return statements in functions returning bool should use true or false
      instead of an integer value.
      
      This issue was detected with the help of Coccinelle.
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      6e45f2a5
  12. 02 8月, 2018 7 次提交
    • C
      f2fs: fix to active page in lru list for read path · 82cf4f13
      Chao Yu 提交于
      If config CONFIG_F2FS_FAULT_INJECTION is on, for both read or write path
      we will call find_lock_page() to get the page, but for read path, it
      missed to passing FGP_ACCESSED to allocator to active the page in LRU
      list, result in being reclaimed in advance incorrectly, fix it.
      Reported-by: NXianrong Zhou <zhouxianrong@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      82cf4f13
    • C
      f2fs: restrict setting up inode.i_advise · 797c1cb5
      Chao Yu 提交于
      In order to give advise to f2fs to recognize hot/cold file, it is possible
      that we can set specific bit in inode.i_advise through setxattr(), but
      there are several bits which are used internally, such as encrypt_bit,
      keep_size_bit, they should never be changed through setxattr().
      
      So that this patch 1) adds FADVISE_MODIFIABLE_BITS to filter modifiable
      bits user given, 2) supports to clear {hot,cold}_file bits.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      797c1cb5
    • C
      f2fs: kill EXT_TREE_VEC_SIZE · 6122003a
      Chao Yu 提交于
      Since commit 201ef5e0 ("f2fs: improve shrink performance of extent nodes"),
      there is no user of EXT_TREE_VEC_SIZE, just kill it for cleanup.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      6122003a
    • C
      f2fs: fix to propagate error from __get_meta_page() · 7735730d
      Chao Yu 提交于
      If caller of __get_meta_page() can handle error, let's propagate error
      from __get_meta_page().
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7735730d
    • D
      f2fs: Keep alloc_valid_block_count in sync · 36b877af
      Daniel Rosenberg 提交于
      If we attempt to request more blocks than we have room for, we try to
      instead request as much as we can, however, alloc_valid_block_count
      is not decremented to match the new value, allowing it to drift higher
      until the next checkpoint. This always decrements it when the requested
      amount cannot be fulfilled.
      Signed-off-by: NDaniel Rosenberg <drosen@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      36b877af
    • C
      f2fs: issue small discard by LBA order · 20ee4382
      Chao Yu 提交于
      For small granularity discard which size is smaller than 64KB, if we
      issue those kind of discards orderly by size, their IOs will be spread
      into entire logical address, so that in FTL, L2P table will be updated
      randomly, result bad wear rate in the table.
      
      In this patch, we choose to issue small discard by LBA order, by this
      way, we can expect that L2P table updates from adjacent discard IOs can
      be merged in the cache, so it can reduce lifetime wearing of flash.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      20ee4382
    • C
      f2fs: fix to do sanity check with block address in main area · c9b60788
      Chao Yu 提交于
      This patch add to do sanity check with below field:
      - cp_pack_total_block_count
      - blkaddr of data/node
      - extent info
      
      - Overview
      BUG() in verify_block_addr() when writing to a corrupted f2fs image
      
      - Reproduce (4.18 upstream kernel)
      
      - POC (poc.c)
      
      static void activity(char *mpoint) {
      
        char *foo_bar_baz;
        int err;
      
        static int buf[8192];
        memset(buf, 0, sizeof(buf));
      
        err = asprintf(&foo_bar_baz, "%s/foo/bar/baz", mpoint);
      
        int fd = open(foo_bar_baz, O_RDWR | O_TRUNC, 0777);
        if (fd >= 0) {
          write(fd, (char *)buf, sizeof(buf));
          fdatasync(fd);
          close(fd);
        }
      }
      
      int main(int argc, char *argv[]) {
        activity(argv[1]);
        return 0;
      }
      
      - Kernel message
      [  689.349473] F2FS-fs (loop0): Mounted with checkpoint version = 3
      [  699.728662] WARNING: CPU: 0 PID: 1309 at fs/f2fs/segment.c:2860 f2fs_inplace_write_data+0x232/0x240
      [  699.728670] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
      [  699.729056] CPU: 0 PID: 1309 Comm: a.out Not tainted 4.18.0-rc1+ #4
      [  699.729064] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [  699.729074] RIP: 0010:f2fs_inplace_write_data+0x232/0x240
      [  699.729076] Code: ff e9 cf fe ff ff 49 8d 7d 10 e8 39 45 ad ff 4d 8b 7d 10 be 04 00 00 00 49 8d 7f 48 e8 07 49 ad ff 45 8b 7f 48 e9 fb fe ff ff <0f> 0b f0 41 80 4d 48 04 e9 65 fe ff ff 90 66 66 66 66 90 55 48 8d
      [  699.729130] RSP: 0018:ffff8801f43af568 EFLAGS: 00010202
      [  699.729139] RAX: 000000000000003f RBX: ffff8801f43af7b8 RCX: ffffffffb88c9113
      [  699.729142] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffff8802024e5540
      [  699.729144] RBP: ffff8801f43af590 R08: 0000000000000009 R09: ffffffffffffffe8
      [  699.729147] R10: 0000000000000001 R11: ffffed0039b0596a R12: ffff8802024e5540
      [  699.729149] R13: ffff8801f0335500 R14: ffff8801e3e7a700 R15: ffff8801e1ee4450
      [  699.729154] FS:  00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
      [  699.729156] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  699.729159] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0
      [  699.729171] Call Trace:
      [  699.729192]  f2fs_do_write_data_page+0x2e2/0xe00
      [  699.729203]  ? f2fs_should_update_outplace+0xd0/0xd0
      [  699.729238]  ? memcg_drain_all_list_lrus+0x280/0x280
      [  699.729269]  ? __radix_tree_replace+0xa3/0x120
      [  699.729276]  __write_data_page+0x5c7/0xe30
      [  699.729291]  ? kasan_check_read+0x11/0x20
      [  699.729310]  ? page_mapped+0x8a/0x110
      [  699.729321]  ? page_mkclean+0xe9/0x160
      [  699.729327]  ? f2fs_do_write_data_page+0xe00/0xe00
      [  699.729331]  ? invalid_page_referenced_vma+0x130/0x130
      [  699.729345]  ? clear_page_dirty_for_io+0x332/0x450
      [  699.729351]  f2fs_write_cache_pages+0x4ca/0x860
      [  699.729358]  ? __write_data_page+0xe30/0xe30
      [  699.729374]  ? percpu_counter_add_batch+0x22/0xa0
      [  699.729380]  ? kasan_check_write+0x14/0x20
      [  699.729391]  ? _raw_spin_lock+0x17/0x40
      [  699.729403]  ? f2fs_mark_inode_dirty_sync.part.18+0x16/0x30
      [  699.729413]  ? iov_iter_advance+0x113/0x640
      [  699.729418]  ? f2fs_write_end+0x133/0x2e0
      [  699.729423]  ? balance_dirty_pages_ratelimited+0x239/0x640
      [  699.729428]  f2fs_write_data_pages+0x329/0x520
      [  699.729433]  ? generic_perform_write+0x250/0x320
      [  699.729438]  ? f2fs_write_cache_pages+0x860/0x860
      [  699.729454]  ? current_time+0x110/0x110
      [  699.729459]  ? f2fs_preallocate_blocks+0x1ef/0x370
      [  699.729464]  do_writepages+0x37/0xb0
      [  699.729468]  ? f2fs_write_cache_pages+0x860/0x860
      [  699.729472]  ? do_writepages+0x37/0xb0
      [  699.729478]  __filemap_fdatawrite_range+0x19a/0x1f0
      [  699.729483]  ? delete_from_page_cache_batch+0x4e0/0x4e0
      [  699.729496]  ? __vfs_write+0x2b2/0x410
      [  699.729501]  file_write_and_wait_range+0x66/0xb0
      [  699.729506]  f2fs_do_sync_file+0x1f9/0xd90
      [  699.729511]  ? truncate_partial_data_page+0x290/0x290
      [  699.729521]  ? __sb_end_write+0x30/0x50
      [  699.729526]  ? vfs_write+0x20f/0x260
      [  699.729530]  f2fs_sync_file+0x9a/0xb0
      [  699.729534]  ? f2fs_do_sync_file+0xd90/0xd90
      [  699.729548]  vfs_fsync_range+0x68/0x100
      [  699.729554]  ? __fget_light+0xc9/0xe0
      [  699.729558]  do_fsync+0x3d/0x70
      [  699.729562]  __x64_sys_fdatasync+0x24/0x30
      [  699.729585]  do_syscall_64+0x78/0x170
      [  699.729595]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  699.729613] RIP: 0033:0x7f9bf930d800
      [  699.729615] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d 49 bf 2c 00 00 75 10 b8 4b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be 78 01 00 48 89 04 24
      [  699.729668] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
      [  699.729673] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800
      [  699.729675] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003
      [  699.729678] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000
      [  699.729680] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610
      [  699.729683] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000
      [  699.729687] ---[ end trace 4ce02f25ff7d3df5 ]---
      [  699.729782] ------------[ cut here ]------------
      [  699.729785] kernel BUG at fs/f2fs/segment.h:654!
      [  699.731055] invalid opcode: 0000 [#1] SMP KASAN PTI
      [  699.732104] CPU: 0 PID: 1309 Comm: a.out Tainted: G        W         4.18.0-rc1+ #4
      [  699.733684] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [  699.735611] RIP: 0010:f2fs_submit_page_bio+0x29b/0x730
      [  699.736649] Code: 54 49 8d bd 18 04 00 00 e8 b2 59 af ff 41 8b 8d 18 04 00 00 8b 45 b8 41 d3 e6 44 01 f0 4c 8d 73 14 41 39 c7 0f 82 37 fe ff ff <0f> 0b 65 8b 05 2c 04 77 47 89 c0 48 0f a3 05 52 c1 d5 01 0f 92 c0
      [  699.740524] RSP: 0018:ffff8801f43af508 EFLAGS: 00010283
      [  699.741573] RAX: 0000000000000000 RBX: ffff8801f43af7b8 RCX: ffffffffb88a7cef
      [  699.743006] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8801e3e7a64c
      [  699.744426] RBP: ffff8801f43af558 R08: ffffed003e066b55 R09: ffffed003e066b55
      [  699.745833] R10: 0000000000000001 R11: ffffed003e066b54 R12: ffffea0007876940
      [  699.747256] R13: ffff8801f0335500 R14: ffff8801e3e7a600 R15: 0000000000000001
      [  699.748683] FS:  00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
      [  699.750293] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  699.751462] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0
      [  699.752874] Call Trace:
      [  699.753386]  ? f2fs_inplace_write_data+0x93/0x240
      [  699.754341]  f2fs_inplace_write_data+0xd2/0x240
      [  699.755271]  f2fs_do_write_data_page+0x2e2/0xe00
      [  699.756214]  ? f2fs_should_update_outplace+0xd0/0xd0
      [  699.757215]  ? memcg_drain_all_list_lrus+0x280/0x280
      [  699.758209]  ? __radix_tree_replace+0xa3/0x120
      [  699.759164]  __write_data_page+0x5c7/0xe30
      [  699.760002]  ? kasan_check_read+0x11/0x20
      [  699.760823]  ? page_mapped+0x8a/0x110
      [  699.761573]  ? page_mkclean+0xe9/0x160
      [  699.762345]  ? f2fs_do_write_data_page+0xe00/0xe00
      [  699.763332]  ? invalid_page_referenced_vma+0x130/0x130
      [  699.764374]  ? clear_page_dirty_for_io+0x332/0x450
      [  699.765347]  f2fs_write_cache_pages+0x4ca/0x860
      [  699.766276]  ? __write_data_page+0xe30/0xe30
      [  699.767161]  ? percpu_counter_add_batch+0x22/0xa0
      [  699.768112]  ? kasan_check_write+0x14/0x20
      [  699.768951]  ? _raw_spin_lock+0x17/0x40
      [  699.769739]  ? f2fs_mark_inode_dirty_sync.part.18+0x16/0x30
      [  699.770885]  ? iov_iter_advance+0x113/0x640
      [  699.771743]  ? f2fs_write_end+0x133/0x2e0
      [  699.772569]  ? balance_dirty_pages_ratelimited+0x239/0x640
      [  699.773680]  f2fs_write_data_pages+0x329/0x520
      [  699.774603]  ? generic_perform_write+0x250/0x320
      [  699.775544]  ? f2fs_write_cache_pages+0x860/0x860
      [  699.776510]  ? current_time+0x110/0x110
      [  699.777299]  ? f2fs_preallocate_blocks+0x1ef/0x370
      [  699.778279]  do_writepages+0x37/0xb0
      [  699.779026]  ? f2fs_write_cache_pages+0x860/0x860
      [  699.779978]  ? do_writepages+0x37/0xb0
      [  699.780755]  __filemap_fdatawrite_range+0x19a/0x1f0
      [  699.781746]  ? delete_from_page_cache_batch+0x4e0/0x4e0
      [  699.782820]  ? __vfs_write+0x2b2/0x410
      [  699.783597]  file_write_and_wait_range+0x66/0xb0
      [  699.784540]  f2fs_do_sync_file+0x1f9/0xd90
      [  699.785381]  ? truncate_partial_data_page+0x290/0x290
      [  699.786415]  ? __sb_end_write+0x30/0x50
      [  699.787204]  ? vfs_write+0x20f/0x260
      [  699.787941]  f2fs_sync_file+0x9a/0xb0
      [  699.788694]  ? f2fs_do_sync_file+0xd90/0xd90
      [  699.789572]  vfs_fsync_range+0x68/0x100
      [  699.790360]  ? __fget_light+0xc9/0xe0
      [  699.791128]  do_fsync+0x3d/0x70
      [  699.791779]  __x64_sys_fdatasync+0x24/0x30
      [  699.792614]  do_syscall_64+0x78/0x170
      [  699.793371]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  699.794406] RIP: 0033:0x7f9bf930d800
      [  699.795134] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d 49 bf 2c 00 00 75 10 b8 4b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be 78 01 00 48 89 04 24
      [  699.798960] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
      [  699.800483] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800
      [  699.801923] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003
      [  699.803373] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000
      [  699.804798] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610
      [  699.806233] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000
      [  699.807667] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
      [  699.817079] ---[ end trace 4ce02f25ff7d3df6 ]---
      [  699.818068] RIP: 0010:f2fs_submit_page_bio+0x29b/0x730
      [  699.819114] Code: 54 49 8d bd 18 04 00 00 e8 b2 59 af ff 41 8b 8d 18 04 00 00 8b 45 b8 41 d3 e6 44 01 f0 4c 8d 73 14 41 39 c7 0f 82 37 fe ff ff <0f> 0b 65 8b 05 2c 04 77 47 89 c0 48 0f a3 05 52 c1 d5 01 0f 92 c0
      [  699.822919] RSP: 0018:ffff8801f43af508 EFLAGS: 00010283
      [  699.823977] RAX: 0000000000000000 RBX: ffff8801f43af7b8 RCX: ffffffffb88a7cef
      [  699.825436] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8801e3e7a64c
      [  699.826881] RBP: ffff8801f43af558 R08: ffffed003e066b55 R09: ffffed003e066b55
      [  699.828292] R10: 0000000000000001 R11: ffffed003e066b54 R12: ffffea0007876940
      [  699.829750] R13: ffff8801f0335500 R14: ffff8801e3e7a600 R15: 0000000000000001
      [  699.831192] FS:  00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
      [  699.832793] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  699.833981] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0
      [  699.835556] ==================================================================
      [  699.837029] BUG: KASAN: stack-out-of-bounds in update_stack_state+0x38c/0x3e0
      [  699.838462] Read of size 8 at addr ffff8801f43af970 by task a.out/1309
      
      [  699.840086] CPU: 0 PID: 1309 Comm: a.out Tainted: G      D W         4.18.0-rc1+ #4
      [  699.841603] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [  699.843475] Call Trace:
      [  699.843982]  dump_stack+0x7b/0xb5
      [  699.844661]  print_address_description+0x70/0x290
      [  699.845607]  kasan_report+0x291/0x390
      [  699.846351]  ? update_stack_state+0x38c/0x3e0
      [  699.853831]  __asan_load8+0x54/0x90
      [  699.854569]  update_stack_state+0x38c/0x3e0
      [  699.855428]  ? __read_once_size_nocheck.constprop.7+0x20/0x20
      [  699.856601]  ? __save_stack_trace+0x5e/0x100
      [  699.857476]  unwind_next_frame.part.5+0x18e/0x490
      [  699.858448]  ? unwind_dump+0x290/0x290
      [  699.859217]  ? clear_page_dirty_for_io+0x332/0x450
      [  699.860185]  __unwind_start+0x106/0x190
      [  699.860974]  __save_stack_trace+0x5e/0x100
      [  699.861808]  ? __save_stack_trace+0x5e/0x100
      [  699.862691]  ? unlink_anon_vmas+0xba/0x2c0
      [  699.863525]  save_stack_trace+0x1f/0x30
      [  699.864312]  save_stack+0x46/0xd0
      [  699.864993]  ? __alloc_pages_slowpath+0x1420/0x1420
      [  699.865990]  ? flush_tlb_mm_range+0x15e/0x220
      [  699.866889]  ? kasan_check_write+0x14/0x20
      [  699.867724]  ? __dec_node_state+0x92/0xb0
      [  699.868543]  ? lock_page_memcg+0x85/0xf0
      [  699.869350]  ? unlock_page_memcg+0x16/0x80
      [  699.870185]  ? page_remove_rmap+0x198/0x520
      [  699.871048]  ? mark_page_accessed+0x133/0x200
      [  699.871930]  ? _cond_resched+0x1a/0x50
      [  699.872700]  ? unmap_page_range+0xcd4/0xe50
      [  699.873551]  ? rb_next+0x58/0x80
      [  699.874217]  ? rb_next+0x58/0x80
      [  699.874895]  __kasan_slab_free+0x13c/0x1a0
      [  699.875734]  ? unlink_anon_vmas+0xba/0x2c0
      [  699.876563]  kasan_slab_free+0xe/0x10
      [  699.877315]  kmem_cache_free+0x89/0x1e0
      [  699.878095]  unlink_anon_vmas+0xba/0x2c0
      [  699.878913]  free_pgtables+0x101/0x1b0
      [  699.879677]  exit_mmap+0x146/0x2a0
      [  699.880378]  ? __ia32_sys_munmap+0x50/0x50
      [  699.881214]  ? kasan_check_read+0x11/0x20
      [  699.882052]  ? mm_update_next_owner+0x322/0x380
      [  699.882985]  mmput+0x8b/0x1d0
      [  699.883602]  do_exit+0x43a/0x1390
      [  699.884288]  ? mm_update_next_owner+0x380/0x380
      [  699.885212]  ? f2fs_sync_file+0x9a/0xb0
      [  699.885995]  ? f2fs_do_sync_file+0xd90/0xd90
      [  699.886877]  ? vfs_fsync_range+0x68/0x100
      [  699.887694]  ? __fget_light+0xc9/0xe0
      [  699.888442]  ? do_fsync+0x3d/0x70
      [  699.889118]  ? __x64_sys_fdatasync+0x24/0x30
      [  699.889996]  rewind_stack_do_exit+0x17/0x20
      [  699.890860] RIP: 0033:0x7f9bf930d800
      [  699.891585] Code: Bad RIP value.
      [  699.892268] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
      [  699.893781] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800
      [  699.895220] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003
      [  699.896643] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000
      [  699.898069] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610
      [  699.899505] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000
      
      [  699.901241] The buggy address belongs to the page:
      [  699.902215] page:ffffea0007d0ebc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
      [  699.903811] flags: 0x2ffff0000000000()
      [  699.904585] raw: 02ffff0000000000 0000000000000000 ffffffff07d00101 0000000000000000
      [  699.906125] raw: 0000000000000000 0000000000240000 00000000ffffffff 0000000000000000
      [  699.907673] page dumped because: kasan: bad access detected
      
      [  699.909108] Memory state around the buggy address:
      [  699.910077]  ffff8801f43af800: 00 f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00 00 00
      [  699.911528]  ffff8801f43af880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  699.912953] >ffff8801f43af900: 00 00 00 00 00 00 00 00 f1 01 f4 f4 f4 f2 f2 f2
      [  699.914392]                                                              ^
      [  699.915758]  ffff8801f43af980: f2 00 f4 f4 00 00 00 00 f2 00 00 00 00 00 00 00
      [  699.917193]  ffff8801f43afa00: 00 00 00 00 00 00 00 00 00 f3 f3 f3 00 00 00 00
      [  699.918634] ==================================================================
      
      - Location
      https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/f2fs/segment.h#L644
      
      Reported-by Wen Xu <wen.xu@gatech.edu>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c9b60788
  13. 29 7月, 2018 1 次提交
  14. 27 7月, 2018 5 次提交
    • C
      f2fs: disable f2fs_check_rb_tree_consistence · 67fce70b
      Chao Yu 提交于
      If there is millions of discard entries cached in rb tree, each
      sanity check of it can cause very long latency as held cmd_lock
      blocking other lock grabbers.
      
      In other aspect, we have enabled the check very long time, as
      we see, there is no such inconsistent condition caused by bugs.
      
      But still we do not choose to kill it directly, instead, adding
      an flag to disable the check now, if there is related code change,
      we can reuse it to detect bugs.
      Signed-off-by: NYunlei He <heyunlei@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      67fce70b
    • C
      f2fs: introduce and spread verify_blkaddr · e1da7872
      Chao Yu 提交于
      This patch introduces verify_blkaddr to check meta/data block address
      with valid range to detect bug earlier.
      
      In addition, once we encounter an invalid blkaddr, notice user to run
      fsck to fix, and let the kernel panic.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e1da7872
    • A
      f2fs: use timespec64 for inode timestamps · 24b81dfc
      Arnd Bergmann 提交于
      The on-disk representation and the vfs both use 64-bit tv_sec values,
      so let's change the last missing piece in the middle.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      24b81dfc
    • C
      f2fs: fix to propagate return value of scan_nat_page() · e2374015
      Chao Yu 提交于
      As Anatoly Trosinenko reported in bugzilla:
      
      How to reproduce:
      1. Compile the 73fcb1a3 version of the kernel using the config attached
      2. Unpack and mount the attached filesystem image as F2FS
      3. The kernel will BUG() on mount (BUGs are explicitly enabled in config)
      
      [    2.233612] F2FS-fs (sda): Found nat_bits in checkpoint
      [    2.248422] ------------[ cut here ]------------
      [    2.248857] kernel BUG at fs/f2fs/node.c:1967!
      [    2.249760] invalid opcode: 0000 [#1] SMP NOPTI
      [    2.250219] Modules linked in:
      [    2.251848] CPU: 0 PID: 944 Comm: mount Not tainted 4.17.0-rc5+ #1
      [    2.252331] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [    2.253305] RIP: 0010:build_free_nids+0x337/0x3f0
      [    2.253672] RSP: 0018:ffffae7fc0857c50 EFLAGS: 00000246
      [    2.254080] RAX: 00000000ffffffff RBX: 0000000000000123 RCX: 0000000000000001
      [    2.254638] RDX: ffff9aa7063d5c00 RSI: 0000000000000122 RDI: ffff9aa705852e00
      [    2.255190] RBP: ffff9aa705852e00 R08: 0000000000000001 R09: ffff9aa7059090c0
      [    2.255719] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9aa705852e00
      [    2.256242] R13: ffff9aa7063ad000 R14: ffff9aa705919000 R15: 0000000000000123
      [    2.256809] FS:  00000000023078c0(0000) GS:ffff9aa707800000(0000) knlGS:0000000000000000
      [    2.258654] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    2.259153] CR2: 00000000005511ae CR3: 0000000005872000 CR4: 00000000000006f0
      [    2.259801] Call Trace:
      [    2.260583]  build_node_manager+0x5cd/0x600
      [    2.260963]  f2fs_fill_super+0x66a/0x17c0
      [    2.261300]  ? f2fs_commit_super+0xe0/0xe0
      [    2.261622]  mount_bdev+0x16e/0x1a0
      [    2.261899]  mount_fs+0x30/0x150
      [    2.262398]  vfs_kern_mount.part.28+0x4f/0xf0
      [    2.262743]  do_mount+0x5d0/0xc60
      [    2.263010]  ? _copy_from_user+0x37/0x60
      [    2.263313]  ? memdup_user+0x39/0x60
      [    2.263692]  ksys_mount+0x7b/0xd0
      [    2.263960]  __x64_sys_mount+0x1c/0x20
      [    2.264268]  do_syscall_64+0x43/0xf0
      [    2.264560]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [    2.265095] RIP: 0033:0x48d31a
      [    2.265502] RSP: 002b:00007ffc6fe60a08 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
      [    2.266089] RAX: ffffffffffffffda RBX: 0000000000008000 RCX: 000000000048d31a
      [    2.266607] RDX: 00007ffc6fe62fa5 RSI: 00007ffc6fe62f9d RDI: 00007ffc6fe62f94
      [    2.267130] RBP: 00000000023078a0 R08: 0000000000000000 R09: 0000000000000000
      [    2.267670] R10: 0000000000008000 R11: 0000000000000246 R12: 0000000000000000
      [    2.268192] R13: 0000000000000000 R14: 00007ffc6fe60c78 R15: 0000000000000000
      [    2.268767] Code: e8 5f c3 ff ff 83 c3 01 41 83 c7 01 81 fb c7 01 00 00 74 48 44 39 7d 04 76 42 48 63 c3 48 8d 04 c0 41 8b 44 06 05 83 f8 ff 75 c1 <0f> 0b 49 8b 45 50 48 8d b8 b0 00 00 00 e8 37 59 69 00 b9 01 00
      [    2.270434] RIP: build_free_nids+0x337/0x3f0 RSP: ffffae7fc0857c50
      [    2.271426] ---[ end trace ab20c06cd3c8fde4 ]---
      
      During loading NAT entries, we will do sanity check, once the entry info
      is corrupted, it will cause BUG_ON directly to protect user data from
      being overwrited.
      
      In this case, it will be better to just return failure on mount() instead
      of panic, so that user can get hint from kmsg and try fsck for recovery
      immediately rather than after an abnormal reboot.
      
      https://bugzilla.kernel.org/show_bug.cgi?id=199769Reported-by: NAnatoly Trosinenko <anatoly.trosinenko@gmail.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e2374015
    • J
      f2fs: indicate shutdown f2fs to allow unmount successfully · 83a3bfdb
      Jaegeuk Kim 提交于
      Once we shutdown f2fs, we have to flush stale pages in order to unmount
      the system. In order to make stable, we need to stop fault injection as well.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      83a3bfdb
  15. 18 7月, 2018 1 次提交
  16. 06 6月, 2018 1 次提交
    • D
      vfs: change inode times to use struct timespec64 · 95582b00
      Deepa Dinamani 提交于
      struct timespec is not y2038 safe. Transition vfs to use
      y2038 safe struct timespec64 instead.
      
      The change was made with the help of the following cocinelle
      script. This catches about 80% of the changes.
      All the header file and logic changes are included in the
      first 5 rules. The rest are trivial substitutions.
      I avoid changing any of the function signatures or any other
      filesystem specific data structures to keep the patch simple
      for review.
      
      The script can be a little shorter by combining different cases.
      But, this version was sufficient for my usecase.
      
      virtual patch
      
      @ depends on patch @
      identifier now;
      @@
      - struct timespec
      + struct timespec64
        current_time ( ... )
        {
      - struct timespec now = current_kernel_time();
      + struct timespec64 now = current_kernel_time64();
        ...
      - return timespec_trunc(
      + return timespec64_trunc(
        ... );
        }
      
      @ depends on patch @
      identifier xtime;
      @@
       struct \( iattr \| inode \| kstat \) {
       ...
      -       struct timespec xtime;
      +       struct timespec64 xtime;
       ...
       }
      
      @ depends on patch @
      identifier t;
      @@
       struct inode_operations {
       ...
      int (*update_time) (...,
      -       struct timespec t,
      +       struct timespec64 t,
      ...);
       ...
       }
      
      @ depends on patch @
      identifier t;
      identifier fn_update_time =~ "update_time$";
      @@
       fn_update_time (...,
      - struct timespec *t,
      + struct timespec64 *t,
       ...) { ... }
      
      @ depends on patch @
      identifier t;
      @@
      lease_get_mtime( ... ,
      - struct timespec *t
      + struct timespec64 *t
        ) { ... }
      
      @te depends on patch forall@
      identifier ts;
      local idexpression struct inode *inode_node;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier fn_update_time =~ "update_time$";
      identifier fn;
      expression e, E3;
      local idexpression struct inode *node1;
      local idexpression struct inode *node2;
      local idexpression struct iattr *attr1;
      local idexpression struct iattr *attr2;
      local idexpression struct iattr attr;
      identifier i_xtime1 =~ "^i_[acm]time$";
      identifier i_xtime2 =~ "^i_[acm]time$";
      identifier ia_xtime1 =~ "^ia_[acm]time$";
      identifier ia_xtime2 =~ "^ia_[acm]time$";
      @@
      (
      (
      - struct timespec ts;
      + struct timespec64 ts;
      |
      - struct timespec ts = current_time(inode_node);
      + struct timespec64 ts = current_time(inode_node);
      )
      
      <+... when != ts
      (
      - timespec_equal(&inode_node->i_xtime, &ts)
      + timespec64_equal(&inode_node->i_xtime, &ts)
      |
      - timespec_equal(&ts, &inode_node->i_xtime)
      + timespec64_equal(&ts, &inode_node->i_xtime)
      |
      - timespec_compare(&inode_node->i_xtime, &ts)
      + timespec64_compare(&inode_node->i_xtime, &ts)
      |
      - timespec_compare(&ts, &inode_node->i_xtime)
      + timespec64_compare(&ts, &inode_node->i_xtime)
      |
      ts = current_time(e)
      |
      fn_update_time(..., &ts,...)
      |
      inode_node->i_xtime = ts
      |
      node1->i_xtime = ts
      |
      ts = inode_node->i_xtime
      |
      <+... attr1->ia_xtime ...+> = ts
      |
      ts = attr1->ia_xtime
      |
      ts.tv_sec
      |
      ts.tv_nsec
      |
      btrfs_set_stack_timespec_sec(..., ts.tv_sec)
      |
      btrfs_set_stack_timespec_nsec(..., ts.tv_nsec)
      |
      - ts = timespec64_to_timespec(
      + ts =
      ...
      -)
      |
      - ts = ktime_to_timespec(
      + ts = ktime_to_timespec64(
      ...)
      |
      - ts = E3
      + ts = timespec_to_timespec64(E3)
      |
      - ktime_get_real_ts(&ts)
      + ktime_get_real_ts64(&ts)
      |
      fn(...,
      - ts
      + timespec64_to_timespec(ts)
      ,...)
      )
      ...+>
      (
      <... when != ts
      - return ts;
      + return timespec64_to_timespec(ts);
      ...>
      )
      |
      - timespec_equal(&node1->i_xtime1, &node2->i_xtime2)
      + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2)
      |
      - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2)
      + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2)
      |
      - timespec_compare(&node1->i_xtime1, &node2->i_xtime2)
      + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2)
      |
      node1->i_xtime1 =
      - timespec_trunc(attr1->ia_xtime1,
      + timespec64_trunc(attr1->ia_xtime1,
      ...)
      |
      - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2,
      + attr1->ia_xtime1 =  timespec64_trunc(attr2->ia_xtime2,
      ...)
      |
      - ktime_get_real_ts(&attr1->ia_xtime1)
      + ktime_get_real_ts64(&attr1->ia_xtime1)
      |
      - ktime_get_real_ts(&attr.ia_xtime1)
      + ktime_get_real_ts64(&attr.ia_xtime1)
      )
      
      @ depends on patch @
      struct inode *node;
      struct iattr *attr;
      identifier fn;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      expression e;
      @@
      (
      - fn(node->i_xtime);
      + fn(timespec64_to_timespec(node->i_xtime));
      |
       fn(...,
      - node->i_xtime);
      + timespec64_to_timespec(node->i_xtime));
      |
      - e = fn(attr->ia_xtime);
      + e = fn(timespec64_to_timespec(attr->ia_xtime));
      )
      
      @ depends on patch forall @
      struct inode *node;
      struct iattr *attr;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier fn;
      @@
      {
      + struct timespec ts;
      <+...
      (
      + ts = timespec64_to_timespec(node->i_xtime);
      fn (...,
      - &node->i_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      fn (...,
      - &attr->ia_xtime,
      + &ts,
      ...);
      )
      ...+>
      }
      
      @ depends on patch forall @
      struct inode *node;
      struct iattr *attr;
      struct kstat *stat;
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier i_xtime =~ "^i_[acm]time$";
      identifier xtime =~ "^[acm]time$";
      identifier fn, ret;
      @@
      {
      + struct timespec ts;
      <+...
      (
      + ts = timespec64_to_timespec(node->i_xtime);
      ret = fn (...,
      - &node->i_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(node->i_xtime);
      ret = fn (...,
      - &node->i_xtime);
      + &ts);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      ret = fn (...,
      - &attr->ia_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      ret = fn (...,
      - &attr->ia_xtime);
      + &ts);
      |
      + ts = timespec64_to_timespec(stat->xtime);
      ret = fn (...,
      - &stat->xtime);
      + &ts);
      )
      ...+>
      }
      
      @ depends on patch @
      struct inode *node;
      struct inode *node2;
      identifier i_xtime1 =~ "^i_[acm]time$";
      identifier i_xtime2 =~ "^i_[acm]time$";
      identifier i_xtime3 =~ "^i_[acm]time$";
      struct iattr *attrp;
      struct iattr *attrp2;
      struct iattr attr ;
      identifier ia_xtime1 =~ "^ia_[acm]time$";
      identifier ia_xtime2 =~ "^ia_[acm]time$";
      struct kstat *stat;
      struct kstat stat1;
      struct timespec64 ts;
      identifier xtime =~ "^[acmb]time$";
      expression e;
      @@
      (
      ( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1  ;
      |
       node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \);
      |
       node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
      |
       node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
      |
       stat->xtime = node2->i_xtime1;
      |
       stat1.xtime = node2->i_xtime1;
      |
      ( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1  ;
      |
      ( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2;
      |
      - e = node->i_xtime1;
      + e = timespec64_to_timespec( node->i_xtime1 );
      |
      - e = attrp->ia_xtime1;
      + e = timespec64_to_timespec( attrp->ia_xtime1 );
      |
      node->i_xtime1 = current_time(...);
      |
       node->i_xtime2 = node->i_xtime1 = node->i_xtime3 =
      - e;
      + timespec_to_timespec64(e);
      |
       node->i_xtime1 = node->i_xtime3 =
      - e;
      + timespec_to_timespec64(e);
      |
      - node->i_xtime1 = e;
      + node->i_xtime1 = timespec_to_timespec64(e);
      )
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Cc: <anton@tuxera.com>
      Cc: <balbi@kernel.org>
      Cc: <bfields@fieldses.org>
      Cc: <darrick.wong@oracle.com>
      Cc: <dhowells@redhat.com>
      Cc: <dsterba@suse.com>
      Cc: <dwmw2@infradead.org>
      Cc: <hch@lst.de>
      Cc: <hirofumi@mail.parknet.co.jp>
      Cc: <hubcap@omnibond.com>
      Cc: <jack@suse.com>
      Cc: <jaegeuk@kernel.org>
      Cc: <jaharkes@cs.cmu.edu>
      Cc: <jslaby@suse.com>
      Cc: <keescook@chromium.org>
      Cc: <mark@fasheh.com>
      Cc: <miklos@szeredi.hu>
      Cc: <nico@linaro.org>
      Cc: <reiserfs-devel@vger.kernel.org>
      Cc: <richard@nod.at>
      Cc: <sage@redhat.com>
      Cc: <sfrench@samba.org>
      Cc: <swhiteho@redhat.com>
      Cc: <tj@kernel.org>
      Cc: <trond.myklebust@primarydata.com>
      Cc: <tytso@mit.edu>
      Cc: <viro@zeniv.linux.org.uk>
      95582b00
  17. 05 6月, 2018 1 次提交
    • C
      f2fs: let sync node IO interrupt async one · c29fd0c0
      Chao Yu 提交于
      Although mixed sync/async IOs can have continuous LBA, as they have
      different IO priority, block IO scheduler will add them into different
      queues and commit them separately, result in splited IOs which causes
      wrose performance.
      
      This patch gives high priority to synchronous IO of nodes, means that
      once synchronous flow starts, it can interrupt asynchronous writeback
      flow of system flusher, so more big IOs can be expected.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c29fd0c0
  18. 01 6月, 2018 6 次提交
    • C
      f2fs: clean up symbol namespace · 4d57b86d
      Chao Yu 提交于
      As Ted reported:
      
      "Hi, I was looking at f2fs's sources recently, and I noticed that there
      is a very large number of non-static symbols which don't have a f2fs
      prefix.  There's well over a hundred (see attached below).
      
      As one example, in fs/f2fs/dir.c there is:
      
      unsigned char get_de_type(struct f2fs_dir_entry *de)
      
      This function is clearly only useful for f2fs, but it has a generic
      name.  This means that if any other file system tries to have the same
      symbol name, there will be a symbol conflict and the kernel would not
      successfully build.  It also means that when someone is looking f2fs
      sources, it's not at all obvious whether a function such as
      read_data_page(), invalidate_blocks(), is a generic kernel function
      found in the fs, mm, or block layers, or a f2fs specific function.
      
      You might want to fix this at some point.  Hopefully Kent's bcachefs
      isn't similarly using genericly named functions, since that might
      cause conflicts with f2fs's functions --- but just as this would be a
      problem that we would rightly insist that Kent fix, this is something
      that we should have rightly insisted that f2fs should have fixed
      before it was integrated into the mainline kernel.
      
      acquire_orphan_inode
      add_ino_entry
      add_orphan_inode
      allocate_data_block
      allocate_new_segments
      alloc_nid
      alloc_nid_done
      alloc_nid_failed
      available_free_memory
      ...."
      
      This patch adds "f2fs_" prefix for all non-static symbols in order to:
      a) avoid conflict with other kernel generic symbols;
      b) to indicate the function is f2fs specific one instead of generic
      one;
      Reported-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      4d57b86d
    • C
      f2fs: make set_de_type() static · 2e79d951
      Chao Yu 提交于
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2e79d951
    • C
      f2fs: make __f2fs_write_data_pages() static · fc99fe27
      Chao Yu 提交于
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fc99fe27
    • C
      f2fs: fix to let caller retry allocating block address · fe16efe6
      Chao Yu 提交于
      Configure io_bits with 2 and enable LFS mode, generic/013 reports below dmesg:
      
      BUG: unable to handle kernel NULL pointer dereference at 00000104
      *pdpt = 0000000029b7b001 *pde = 0000000000000000
      Oops: 0002 [#1] PREEMPT SMP
      Modules linked in: crc32_generic zram f2fs(O) rfcomm bnep bluetooth ecdh_generic snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq pcbc joydev snd_seq_device aesni_intel snd_timer aes_i586 snd crypto_simd cryptd soundcore i2c_piix4 serio_raw mac_hid video parport_pc ppdev lp parport hid_generic psmouse usbhid hid e1000
      CPU: 0 PID: 11161 Comm: fsstress Tainted: G           O      4.17.0-rc2 #38
      Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      EIP: f2fs_submit_page_write+0x28d/0x550 [f2fs]
      EFLAGS: 00010206 CPU: 0
      EAX: e863dcd8 EBX: 00000000 ECX: 00000100 EDX: 00000200
      ESI: e863dcf4 EDI: f6f82768 EBP: e863dbb0 ESP: e863db74
       DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      CR0: 80050033 CR2: 00000104 CR3: 29a62020 CR4: 000406f0
      Call Trace:
       do_write_page+0x6f/0xc0 [f2fs]
       write_data_page+0x4a/0xd0 [f2fs]
       do_write_data_page+0x327/0x630 [f2fs]
       __write_data_page+0x34b/0x820 [f2fs]
       __f2fs_write_data_pages+0x42d/0x8c0 [f2fs]
       f2fs_write_data_pages+0x27/0x30 [f2fs]
       do_writepages+0x1a/0x70
       __filemap_fdatawrite_range+0x94/0xd0
       filemap_write_and_wait_range+0x3d/0xa0
       __generic_file_write_iter+0x11a/0x1f0
       f2fs_file_write_iter+0xdd/0x3b0 [f2fs]
       __vfs_write+0xd2/0x150
       vfs_write+0x9b/0x190
       ksys_write+0x45/0x90
       sys_write+0x16/0x20
       do_fast_syscall_32+0xaa/0x22c
       entry_SYSENTER_32+0x4c/0x7b
      EIP: 0xb7fc8c51
      EFLAGS: 00000246 CPU: 0
      EAX: ffffffda EBX: 00000003 ECX: 09cde000 EDX: 00001000
      ESI: 00000003 EDI: 00001000 EBP: 00000000 ESP: bfbded38
       DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
      Code: e8 f9 77 34 c9 8b 45 e0 8b 80 b8 00 00 00 39 45 d8 0f 84 bb 02 00 00 8b 45 e0 8b 80 b8 00 00 00 8d 50 d8 8b 08 89 55 f0 8b 50 04 <89> 51 04 89 0a c7 00 00 01 00 00 c7 40 04 00 02 00 00 8b 45 dc
      EIP: f2fs_submit_page_write+0x28d/0x550 [f2fs] SS:ESP: 0068:e863db74
      CR2: 0000000000000104
      ---[ end trace 4cac79c0d1305ee6 ]---
      
      allocate_data_block will submit all sequential pending IOs sorted by a
      FIFO list, If we failed to submit other user's IO due to unaligned write,
      we will retry to allocate new block address for current IO, then it will
      initialize fio.list again, if fio was in the list before, it can break
      FIFO list, result in above panic.
      
      Thread A			Thread B
      - do_write_page
       - allocate_data_block
        - list_add_tail
        : fioA cached in FIFO list.
      				- do_write_page
      				 - allocate_data_block
      				  - list_add_tail
      				  : fioB cached in FIFO list.
      				 - f2fs_submit_page_write
      				 : fail to submit IO
      				 - allocate_data_block
      				  - INIT_LIST_HEAD
       - f2fs_submit_page_write
        - list_del  <-- NULL pointer dereference
      
      This patch adds fio.retry parameter to indicate failure status for each
      IO, and avoid bailing out if there is still pending IO in FIFO list for
      fixing.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fe16efe6
    • C
      f2fs: clean up with clear_radix_tree_dirty_tag · aec2f729
      Chao Yu 提交于
      Introduce clear_radix_tree_dirty_tag to include common codes for cleanup.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      aec2f729
    • Y
      f2fs: let discard thread wait a little longer if dev is busy · f9d1dced
      Yunlei He 提交于
      This patch modify discard thread wait policy as below:
      	issued       io_interrupted     wait time(ms)
      1.        8                 0               50
      2.      (0,8)               1               50
      3.        0                 1              500 (dev is busy)
      4.        0                 0            60000 (no candidates)
      Signed-off-by: NYunlei He <heyunlei@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f9d1dced