1. 13 9月, 2018 1 次提交
  2. 12 9月, 2018 2 次提交
    • Z
      f2fs: avoid sleeping under spin_lock · b430f726
      Zhikang Zhang 提交于
      In the call trace below, we might sleep in function dput().
      
      So in order to avoid sleeping under spin_lock, we remove f2fs_mark_inode_dirty_sync
      from __try_update_largest_extent && __drop_largest_extent.
      
      BUG: sleeping function called from invalid context at fs/dcache.c:796
      Call trace:
      	dump_backtrace+0x0/0x3f4
      	show_stack+0x24/0x30
      	dump_stack+0xe0/0x138
      	___might_sleep+0x2a8/0x2c8
      	__might_sleep+0x78/0x10c
      	dput+0x7c/0x750
      	block_dump___mark_inode_dirty+0x120/0x17c
      	__mark_inode_dirty+0x344/0x11f0
      	f2fs_mark_inode_dirty_sync+0x40/0x50
      	__insert_extent_tree+0x2e0/0x2f4
      	f2fs_update_extent_tree_range+0xcf4/0xde8
      	f2fs_update_extent_cache+0x114/0x12c
      	f2fs_update_data_blkaddr+0x40/0x50
      	write_data_page+0x150/0x314
      	do_write_data_page+0x648/0x2318
      	__write_data_page+0xdb4/0x1640
      	f2fs_write_cache_pages+0x768/0xafc
      	__f2fs_write_data_pages+0x590/0x1218
      	f2fs_write_data_pages+0x64/0x74
      	do_writepages+0x74/0xe4
      	__writeback_single_inode+0xdc/0x15f0
      	writeback_sb_inodes+0x574/0xc98
      	__writeback_inodes_wb+0x190/0x204
      	wb_writeback+0x730/0xf14
      	wb_check_old_data_flush+0x1bc/0x1c8
      	wb_workfn+0x554/0xf74
      	process_one_work+0x440/0x118c
      	worker_thread+0xac/0x974
      	kthread+0x1a0/0x1c8
      	ret_from_fork+0x10/0x1c
      Signed-off-by: NZhikang Zhang <zhangzhikang1@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b430f726
    • C
      f2fs: fix to flush all dirty inodes recovered in readonly fs · 1378752b
      Chao Yu 提交于
      generic/417 reported as blow:
      
      ------------[ cut here ]------------
      kernel BUG at /home/yuchao/git/devf2fs/inode.c:695!
      invalid opcode: 0000 [#1] PREEMPT SMP
      CPU: 1 PID: 21697 Comm: umount Tainted: G        W  O      4.18.0-rc2+ #39
      Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      EIP: f2fs_evict_inode+0x556/0x580 [f2fs]
      Call Trace:
       ? _raw_spin_unlock+0x2c/0x50
       evict+0xa8/0x170
       dispose_list+0x34/0x40
       evict_inodes+0x118/0x120
       generic_shutdown_super+0x41/0x100
       ? rcu_read_lock_sched_held+0x97/0xa0
       kill_block_super+0x22/0x50
       kill_f2fs_super+0x6f/0x80 [f2fs]
       deactivate_locked_super+0x3d/0x70
       deactivate_super+0x40/0x60
       cleanup_mnt+0x39/0x70
       __cleanup_mnt+0x10/0x20
       task_work_run+0x81/0xa0
       exit_to_usermode_loop+0x59/0xa7
       do_fast_syscall_32+0x1f5/0x22c
       entry_SYSENTER_32+0x53/0x86
      EIP: f2fs_evict_inode+0x556/0x580 [f2fs]
      
      It can simply reproduced with scripts:
      
      Enable quota feature during mkfs.
      
      Testcase1:
      1. mkfs.f2fs /dev/zram0
      2. mount -t f2fs /dev/zram0 /mnt/f2fs
      3. xfs_io -f /mnt/f2fs/file -c "pwrite 0 4k" -c "fsync"
      4. godown /mnt/f2fs
      5. umount /mnt/f2fs
      6. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
      7. umount /mnt/f2fs
      
      Testcase2:
      1. mkfs.f2fs /dev/zram0
      2. mount -t f2fs /dev/zram0 /mnt/f2fs
      3. touch /mnt/f2fs/file
      4. create process[pid = x] do:
      	a) open /mnt/f2fs/file;
      	b) unlink /mnt/f2fs/file
      5. godown -f /mnt/f2fs
      6. kill process[pid = x]
      7. umount /mnt/f2fs
      8. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
      9. umount /mnt/f2fs
      
      The reason is: during recovery, i_{c,m}time of inode will be updated, then
      the inode can be set dirty w/o being tracked in sbi->inode_list[DIRTY_META]
      global list, so later write_checkpoint will not flush such dirty inode into
      node page.
      
      Once umount is called, sync_filesystem() in generic_shutdown_super() will
      skip syncng dirty inodes due to sb_rdonly check, leaving dirty inodes
      there.
      
      To solve this issue, during umount, add remove SB_RDONLY flag in
      sb->s_flags, to make sure sync_filesystem() will not be skipped.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      1378752b
  3. 06 9月, 2018 3 次提交
    • J
      f2fs: avoid wrong decrypted data from disk · 0ded69f6
      Jaegeuk Kim 提交于
      1. Create a file in an encrypted directory
      2. Do GC & drop caches
      3. Read stale data before its bio for metapage was not issued yet
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0ded69f6
    • C
      Revert "f2fs: use printk_ratelimited for f2fs_msg" · 22d7ea13
      Chao Yu 提交于
      Don't limit printing log, so that we will not miss any key messages.
      
      This reverts commit a36c106d.
      
      In addition, we use printk_ratelimited to avoid too many log prints.
      - error injection
      - discard submission failure
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      22d7ea13
    • C
      f2fs: fix to avoid NULL pointer dereference on se->discard_map · 7d20c8ab
      Chao Yu 提交于
      https://bugzilla.kernel.org/show_bug.cgi?id=200951
      
      These is a NULL pointer dereference issue reported in bugzilla:
      
      Hi,
      in the setup there is a SATA SSD connected to a SATA-to-USB bridge.
      
      The disc is "Samsung SSD 850 PRO 256G" which supports TRIM.
      There are four partitions:
       sda1: FAT  /boot
       sda2: F2FS /
       sda3: F2FS /home
       sda4: F2FS
      
      The bridge is ASMT1153e which uses the "uas" driver.
      There is no TRIM pass-through, so, when mounting it reports:
       mounting with "discard" option, but the device does not support discard
      
      The USB host is USB3.0 and UASP capable. It is the one on RK3399.
      
      Given this everything works fine, except there is no TRIM support.
      
      In order to enable TRIM a new UDEV rule is added [1]:
       /etc/udev/rules.d/10-sata-bridge-trim.rules:
       ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap"
      After reboot any F2FS write hangs forever and dmesg reports:
       Unable to handle kernel NULL pointer dereference
      
      Also tested on a x86_64 system: works fine even with TRIM enabled.
       same disc
       same bridge
       different usb host controller
       different cpu architecture
       not root filesystem
      
      Regards,
        Vicenç.
      
      [1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280
      
       Unable to handle kernel NULL pointer dereference at virtual address 000000000000003e
       Mem abort info:
         ESR = 0x96000004
         Exception class = DABT (current EL), IL = 32 bits
         SET = 0, FnV = 0
         EA = 0, S1PTW = 0
       Data abort info:
         ISV = 0, ISS = 0x00000004
         CM = 0, WnR = 0
       user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000626e3122
       [000000000000003e] pgd=0000000000000000
       Internal error: Oops: 96000004 [#1] SMP
       Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys hid_kensington
       CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1
       Hardware name: Sapphire-RK3399 Board (DT)
       pstate: 00000005 (nzcv daif -PAN -UAO)
       pc : update_sit_entry+0x304/0x4b0
       lr : update_sit_entry+0x108/0x4b0
       sp : ffff00000ca13bd0
       x29: ffff00000ca13bd0 x28: 000000000000003e
       x27: 0000000000000020 x26: 0000000000080000
       x25: 0000000000000048 x24: ffff8000ebb85cf8
       x23: 0000000000000253 x22: 00000000ffffffff
       x21: 00000000000535f2 x20: 00000000ffffffdf
       x19: ffff8000eb9e6800 x18: ffff8000eb9e6be8
       x17: 0000000007ce6926 x16: 000000001c83ffa8
       x15: 0000000000000000 x14: ffff8000f602df90
       x13: 0000000000000006 x12: 0000000000000040
       x11: 0000000000000228 x10: 0000000000000000
       x9 : 0000000000000000 x8 : 0000000000000000
       x7 : 00000000000535f2 x6 : ffff8000ebff3440
       x5 : ffff8000ebff3440 x4 : ffff8000ebe3a6c8
       x3 : 00000000ffffffff x2 : 0000000000000020
       x1 : 0000000000000000 x0 : ffff8000eb9e5800
       Process nvim (pid: 957, stack limit = 0x0000000063a78320)
       Call trace:
        update_sit_entry+0x304/0x4b0
        f2fs_invalidate_blocks+0x98/0x140
        truncate_node+0x90/0x400
        f2fs_remove_inode_page+0xe8/0x340
        f2fs_evict_inode+0x2b0/0x408
        evict+0xe0/0x1e0
        iput+0x160/0x260
        do_unlinkat+0x214/0x298
        __arm64_sys_unlinkat+0x3c/0x68
        el0_svc_handler+0x94/0x118
        el0_svc+0x8/0xc
       Code: f9400800 b9488400 36080140 f9400f01 (387c4820)
       ---[ end trace a0f21a307118c477 ]---
      
      The reason is it is possible to enable discard flag on block queue via
      UDEV, but during mount, f2fs will initialize se->discard_map only if
      this flag is set, once the flag is set after mount, f2fs may dereference
      NULL pointer on se->discard_map.
      
      So this patch does below changes to fix this issue:
      - initialize and update se->discard_map all the time.
      - don't clear DISCARD option if device has no QUEUE_FLAG_DISCARD flag
      during mount.
      - don't issue small discard on zoned block device.
      - introduce some functions to enhance the readability.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Tested-by: NVicente Bergas <vicencb@gmail.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7d20c8ab
  4. 21 8月, 2018 2 次提交
  5. 15 8月, 2018 1 次提交
    • A
      f2fs: rework fault injection handling to avoid a warning · 7fa750a1
      Arnd Bergmann 提交于
      When CONFIG_F2FS_FAULT_INJECTION is disabled, we get a warning about an
      unused label:
      
      fs/f2fs/segment.c: In function '__submit_discard_cmd':
      fs/f2fs/segment.c:1059:1: error: label 'submit' defined but not used [-Werror=unused-label]
      
      This could be fixed by adding another #ifdef around it, but the more
      reliable way of doing this seems to be to remove the other #ifdefs
      where that is easily possible.
      
      By defining time_to_inject() as a trivial stub, most of the checks for
      CONFIG_F2FS_FAULT_INJECTION can go away. This also leads to nicer
      formatting of the code.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7fa750a1
  6. 14 8月, 2018 4 次提交
  7. 11 8月, 2018 2 次提交
    • C
      f2fs: fix to avoid broken of dnode block list · 50fa53ec
      Chao Yu 提交于
      f2fs recovery flow is relying on dnode block link list, it means fsynced
      file recovery depends on previous dnode's persistence in the list, so
      during fsync() we should wait on all regular inode's dnode writebacked
      before issuing flush.
      
      By this way, we can avoid dnode block list being broken by out-of-order
      IO submission due to IO scheduler or driver.
      
      Sheng Yong helps to do the test with this patch:
      
      Target:/data (f2fs, -)
      64MB / 32768KB / 4KB / 8
      
      1 / PERSIST / Index
      
      Base:
      	SEQ-RD(MB/s)	SEQ-WR(MB/s)	RND-RD(IOPS)	RND-WR(IOPS)	Insert(TPS)	Update(TPS)	Delete(TPS)
      1	867.82		204.15		41440.03	41370.54	680.8		1025.94		1031.08
      2	871.87		205.87		41370.3		40275.2		791.14		1065.84		1101.7
      3	866.52		205.69		41795.67	40596.16	694.69		1037.16		1031.48
      Avg	868.7366667	205.2366667	41535.33333	40747.3		722.21		1042.98		1054.753333
      
      After:
      	SEQ-RD(MB/s)	SEQ-WR(MB/s)	RND-RD(IOPS)	RND-WR(IOPS)	Insert(TPS)	Update(TPS)	Delete(TPS)
      1	798.81		202.5		41143		40613.87	602.71		838.08		913.83
      2	805.79		206.47		40297.2		41291.46	604.44		840.75		924.27
      3	814.83		206.17		41209.57	40453.62	602.85		834.66		927.91
      Avg	806.4766667	205.0466667	40883.25667	40786.31667	603.3333333	837.83		922.0033333
      
      Patched/Original:
      	0.928332713	0.999074239	0.984300676	1.000957528	0.835398753	0.803303994	0.874141189
      
      It looks like atomic write will suffer performance regression.
      
      I suspect that the criminal is that we forcing to wait all dnode being in
      storage cache before we issue PREFLUSH+FUA.
      
      BTW, will commit ("f2fs: don't need to wait for node writes for atomic write")
      cause the problem: we will lose data of last transaction after SPO, even if
      atomic write return no error:
      
      - atomic_open();
      - write() P1, P2, P3;
      - atomic_commit();
       - writeback data: P1, P2, P3;
       - writeback node: N1, N2, N3;  <--- If N1, N2 is not writebacked, N3 with fsync_mark is
      writebacked, In SPOR, we won't find N3 since node chain is broken, turns out that losing
      last transaction.
       - preflush + fua;
      - power-cut
      
      If we don't wait dnode writeback for atomic_write:
      
      	SEQ-RD(MB/s)	SEQ-WR(MB/s)	RND-RD(IOPS)	RND-WR(IOPS)	Insert(TPS)	Update(TPS)	Delete(TPS)
      1	779.91		206.03		41621.5		40333.16	716.9		1038.21		1034.85
      2	848.51		204.35		40082.44	39486.17	791.83		1119.96		1083.77
      3	772.12		206.27		41335.25	41599.65	723.29		1055.07		971.92
      Avg	800.18		205.55		41013.06333	40472.99333	744.0066667	1071.08		1030.18
      
      Patched/Original:
      	0.92108464	1.001526693	0.987425886	0.993268102	1.030180511	1.026942031	0.976702294
      
      SQLite's performance recovers.
      
      Jaegeuk:
      "Practically, I don't see db corruption becase of this. We can excuse to lose
      the last transaction."
      
      Finally, we decide to keep original implementation of atomic write interface
      sematics that we don't wait all dnode writeback before preflush+fua submission.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      50fa53ec
    • G
      f2fs: use true and false for boolean values · 6e45f2a5
      Gustavo A. R. Silva 提交于
      Return statements in functions returning bool should use true or false
      instead of an integer value.
      
      This issue was detected with the help of Coccinelle.
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      6e45f2a5
  8. 02 8月, 2018 7 次提交
    • C
      f2fs: fix to active page in lru list for read path · 82cf4f13
      Chao Yu 提交于
      If config CONFIG_F2FS_FAULT_INJECTION is on, for both read or write path
      we will call find_lock_page() to get the page, but for read path, it
      missed to passing FGP_ACCESSED to allocator to active the page in LRU
      list, result in being reclaimed in advance incorrectly, fix it.
      Reported-by: NXianrong Zhou <zhouxianrong@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      82cf4f13
    • C
      f2fs: restrict setting up inode.i_advise · 797c1cb5
      Chao Yu 提交于
      In order to give advise to f2fs to recognize hot/cold file, it is possible
      that we can set specific bit in inode.i_advise through setxattr(), but
      there are several bits which are used internally, such as encrypt_bit,
      keep_size_bit, they should never be changed through setxattr().
      
      So that this patch 1) adds FADVISE_MODIFIABLE_BITS to filter modifiable
      bits user given, 2) supports to clear {hot,cold}_file bits.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      797c1cb5
    • C
      f2fs: kill EXT_TREE_VEC_SIZE · 6122003a
      Chao Yu 提交于
      Since commit 201ef5e0 ("f2fs: improve shrink performance of extent nodes"),
      there is no user of EXT_TREE_VEC_SIZE, just kill it for cleanup.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      6122003a
    • C
      f2fs: fix to propagate error from __get_meta_page() · 7735730d
      Chao Yu 提交于
      If caller of __get_meta_page() can handle error, let's propagate error
      from __get_meta_page().
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7735730d
    • D
      f2fs: Keep alloc_valid_block_count in sync · 36b877af
      Daniel Rosenberg 提交于
      If we attempt to request more blocks than we have room for, we try to
      instead request as much as we can, however, alloc_valid_block_count
      is not decremented to match the new value, allowing it to drift higher
      until the next checkpoint. This always decrements it when the requested
      amount cannot be fulfilled.
      Signed-off-by: NDaniel Rosenberg <drosen@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      36b877af
    • C
      f2fs: issue small discard by LBA order · 20ee4382
      Chao Yu 提交于
      For small granularity discard which size is smaller than 64KB, if we
      issue those kind of discards orderly by size, their IOs will be spread
      into entire logical address, so that in FTL, L2P table will be updated
      randomly, result bad wear rate in the table.
      
      In this patch, we choose to issue small discard by LBA order, by this
      way, we can expect that L2P table updates from adjacent discard IOs can
      be merged in the cache, so it can reduce lifetime wearing of flash.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      20ee4382
    • C
      f2fs: fix to do sanity check with block address in main area · c9b60788
      Chao Yu 提交于
      This patch add to do sanity check with below field:
      - cp_pack_total_block_count
      - blkaddr of data/node
      - extent info
      
      - Overview
      BUG() in verify_block_addr() when writing to a corrupted f2fs image
      
      - Reproduce (4.18 upstream kernel)
      
      - POC (poc.c)
      
      static void activity(char *mpoint) {
      
        char *foo_bar_baz;
        int err;
      
        static int buf[8192];
        memset(buf, 0, sizeof(buf));
      
        err = asprintf(&foo_bar_baz, "%s/foo/bar/baz", mpoint);
      
        int fd = open(foo_bar_baz, O_RDWR | O_TRUNC, 0777);
        if (fd >= 0) {
          write(fd, (char *)buf, sizeof(buf));
          fdatasync(fd);
          close(fd);
        }
      }
      
      int main(int argc, char *argv[]) {
        activity(argv[1]);
        return 0;
      }
      
      - Kernel message
      [  689.349473] F2FS-fs (loop0): Mounted with checkpoint version = 3
      [  699.728662] WARNING: CPU: 0 PID: 1309 at fs/f2fs/segment.c:2860 f2fs_inplace_write_data+0x232/0x240
      [  699.728670] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
      [  699.729056] CPU: 0 PID: 1309 Comm: a.out Not tainted 4.18.0-rc1+ #4
      [  699.729064] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [  699.729074] RIP: 0010:f2fs_inplace_write_data+0x232/0x240
      [  699.729076] Code: ff e9 cf fe ff ff 49 8d 7d 10 e8 39 45 ad ff 4d 8b 7d 10 be 04 00 00 00 49 8d 7f 48 e8 07 49 ad ff 45 8b 7f 48 e9 fb fe ff ff <0f> 0b f0 41 80 4d 48 04 e9 65 fe ff ff 90 66 66 66 66 90 55 48 8d
      [  699.729130] RSP: 0018:ffff8801f43af568 EFLAGS: 00010202
      [  699.729139] RAX: 000000000000003f RBX: ffff8801f43af7b8 RCX: ffffffffb88c9113
      [  699.729142] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffff8802024e5540
      [  699.729144] RBP: ffff8801f43af590 R08: 0000000000000009 R09: ffffffffffffffe8
      [  699.729147] R10: 0000000000000001 R11: ffffed0039b0596a R12: ffff8802024e5540
      [  699.729149] R13: ffff8801f0335500 R14: ffff8801e3e7a700 R15: ffff8801e1ee4450
      [  699.729154] FS:  00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
      [  699.729156] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  699.729159] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0
      [  699.729171] Call Trace:
      [  699.729192]  f2fs_do_write_data_page+0x2e2/0xe00
      [  699.729203]  ? f2fs_should_update_outplace+0xd0/0xd0
      [  699.729238]  ? memcg_drain_all_list_lrus+0x280/0x280
      [  699.729269]  ? __radix_tree_replace+0xa3/0x120
      [  699.729276]  __write_data_page+0x5c7/0xe30
      [  699.729291]  ? kasan_check_read+0x11/0x20
      [  699.729310]  ? page_mapped+0x8a/0x110
      [  699.729321]  ? page_mkclean+0xe9/0x160
      [  699.729327]  ? f2fs_do_write_data_page+0xe00/0xe00
      [  699.729331]  ? invalid_page_referenced_vma+0x130/0x130
      [  699.729345]  ? clear_page_dirty_for_io+0x332/0x450
      [  699.729351]  f2fs_write_cache_pages+0x4ca/0x860
      [  699.729358]  ? __write_data_page+0xe30/0xe30
      [  699.729374]  ? percpu_counter_add_batch+0x22/0xa0
      [  699.729380]  ? kasan_check_write+0x14/0x20
      [  699.729391]  ? _raw_spin_lock+0x17/0x40
      [  699.729403]  ? f2fs_mark_inode_dirty_sync.part.18+0x16/0x30
      [  699.729413]  ? iov_iter_advance+0x113/0x640
      [  699.729418]  ? f2fs_write_end+0x133/0x2e0
      [  699.729423]  ? balance_dirty_pages_ratelimited+0x239/0x640
      [  699.729428]  f2fs_write_data_pages+0x329/0x520
      [  699.729433]  ? generic_perform_write+0x250/0x320
      [  699.729438]  ? f2fs_write_cache_pages+0x860/0x860
      [  699.729454]  ? current_time+0x110/0x110
      [  699.729459]  ? f2fs_preallocate_blocks+0x1ef/0x370
      [  699.729464]  do_writepages+0x37/0xb0
      [  699.729468]  ? f2fs_write_cache_pages+0x860/0x860
      [  699.729472]  ? do_writepages+0x37/0xb0
      [  699.729478]  __filemap_fdatawrite_range+0x19a/0x1f0
      [  699.729483]  ? delete_from_page_cache_batch+0x4e0/0x4e0
      [  699.729496]  ? __vfs_write+0x2b2/0x410
      [  699.729501]  file_write_and_wait_range+0x66/0xb0
      [  699.729506]  f2fs_do_sync_file+0x1f9/0xd90
      [  699.729511]  ? truncate_partial_data_page+0x290/0x290
      [  699.729521]  ? __sb_end_write+0x30/0x50
      [  699.729526]  ? vfs_write+0x20f/0x260
      [  699.729530]  f2fs_sync_file+0x9a/0xb0
      [  699.729534]  ? f2fs_do_sync_file+0xd90/0xd90
      [  699.729548]  vfs_fsync_range+0x68/0x100
      [  699.729554]  ? __fget_light+0xc9/0xe0
      [  699.729558]  do_fsync+0x3d/0x70
      [  699.729562]  __x64_sys_fdatasync+0x24/0x30
      [  699.729585]  do_syscall_64+0x78/0x170
      [  699.729595]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  699.729613] RIP: 0033:0x7f9bf930d800
      [  699.729615] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d 49 bf 2c 00 00 75 10 b8 4b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be 78 01 00 48 89 04 24
      [  699.729668] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
      [  699.729673] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800
      [  699.729675] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003
      [  699.729678] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000
      [  699.729680] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610
      [  699.729683] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000
      [  699.729687] ---[ end trace 4ce02f25ff7d3df5 ]---
      [  699.729782] ------------[ cut here ]------------
      [  699.729785] kernel BUG at fs/f2fs/segment.h:654!
      [  699.731055] invalid opcode: 0000 [#1] SMP KASAN PTI
      [  699.732104] CPU: 0 PID: 1309 Comm: a.out Tainted: G        W         4.18.0-rc1+ #4
      [  699.733684] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [  699.735611] RIP: 0010:f2fs_submit_page_bio+0x29b/0x730
      [  699.736649] Code: 54 49 8d bd 18 04 00 00 e8 b2 59 af ff 41 8b 8d 18 04 00 00 8b 45 b8 41 d3 e6 44 01 f0 4c 8d 73 14 41 39 c7 0f 82 37 fe ff ff <0f> 0b 65 8b 05 2c 04 77 47 89 c0 48 0f a3 05 52 c1 d5 01 0f 92 c0
      [  699.740524] RSP: 0018:ffff8801f43af508 EFLAGS: 00010283
      [  699.741573] RAX: 0000000000000000 RBX: ffff8801f43af7b8 RCX: ffffffffb88a7cef
      [  699.743006] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8801e3e7a64c
      [  699.744426] RBP: ffff8801f43af558 R08: ffffed003e066b55 R09: ffffed003e066b55
      [  699.745833] R10: 0000000000000001 R11: ffffed003e066b54 R12: ffffea0007876940
      [  699.747256] R13: ffff8801f0335500 R14: ffff8801e3e7a600 R15: 0000000000000001
      [  699.748683] FS:  00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
      [  699.750293] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  699.751462] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0
      [  699.752874] Call Trace:
      [  699.753386]  ? f2fs_inplace_write_data+0x93/0x240
      [  699.754341]  f2fs_inplace_write_data+0xd2/0x240
      [  699.755271]  f2fs_do_write_data_page+0x2e2/0xe00
      [  699.756214]  ? f2fs_should_update_outplace+0xd0/0xd0
      [  699.757215]  ? memcg_drain_all_list_lrus+0x280/0x280
      [  699.758209]  ? __radix_tree_replace+0xa3/0x120
      [  699.759164]  __write_data_page+0x5c7/0xe30
      [  699.760002]  ? kasan_check_read+0x11/0x20
      [  699.760823]  ? page_mapped+0x8a/0x110
      [  699.761573]  ? page_mkclean+0xe9/0x160
      [  699.762345]  ? f2fs_do_write_data_page+0xe00/0xe00
      [  699.763332]  ? invalid_page_referenced_vma+0x130/0x130
      [  699.764374]  ? clear_page_dirty_for_io+0x332/0x450
      [  699.765347]  f2fs_write_cache_pages+0x4ca/0x860
      [  699.766276]  ? __write_data_page+0xe30/0xe30
      [  699.767161]  ? percpu_counter_add_batch+0x22/0xa0
      [  699.768112]  ? kasan_check_write+0x14/0x20
      [  699.768951]  ? _raw_spin_lock+0x17/0x40
      [  699.769739]  ? f2fs_mark_inode_dirty_sync.part.18+0x16/0x30
      [  699.770885]  ? iov_iter_advance+0x113/0x640
      [  699.771743]  ? f2fs_write_end+0x133/0x2e0
      [  699.772569]  ? balance_dirty_pages_ratelimited+0x239/0x640
      [  699.773680]  f2fs_write_data_pages+0x329/0x520
      [  699.774603]  ? generic_perform_write+0x250/0x320
      [  699.775544]  ? f2fs_write_cache_pages+0x860/0x860
      [  699.776510]  ? current_time+0x110/0x110
      [  699.777299]  ? f2fs_preallocate_blocks+0x1ef/0x370
      [  699.778279]  do_writepages+0x37/0xb0
      [  699.779026]  ? f2fs_write_cache_pages+0x860/0x860
      [  699.779978]  ? do_writepages+0x37/0xb0
      [  699.780755]  __filemap_fdatawrite_range+0x19a/0x1f0
      [  699.781746]  ? delete_from_page_cache_batch+0x4e0/0x4e0
      [  699.782820]  ? __vfs_write+0x2b2/0x410
      [  699.783597]  file_write_and_wait_range+0x66/0xb0
      [  699.784540]  f2fs_do_sync_file+0x1f9/0xd90
      [  699.785381]  ? truncate_partial_data_page+0x290/0x290
      [  699.786415]  ? __sb_end_write+0x30/0x50
      [  699.787204]  ? vfs_write+0x20f/0x260
      [  699.787941]  f2fs_sync_file+0x9a/0xb0
      [  699.788694]  ? f2fs_do_sync_file+0xd90/0xd90
      [  699.789572]  vfs_fsync_range+0x68/0x100
      [  699.790360]  ? __fget_light+0xc9/0xe0
      [  699.791128]  do_fsync+0x3d/0x70
      [  699.791779]  __x64_sys_fdatasync+0x24/0x30
      [  699.792614]  do_syscall_64+0x78/0x170
      [  699.793371]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  699.794406] RIP: 0033:0x7f9bf930d800
      [  699.795134] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d 49 bf 2c 00 00 75 10 b8 4b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be 78 01 00 48 89 04 24
      [  699.798960] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
      [  699.800483] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800
      [  699.801923] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003
      [  699.803373] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000
      [  699.804798] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610
      [  699.806233] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000
      [  699.807667] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
      [  699.817079] ---[ end trace 4ce02f25ff7d3df6 ]---
      [  699.818068] RIP: 0010:f2fs_submit_page_bio+0x29b/0x730
      [  699.819114] Code: 54 49 8d bd 18 04 00 00 e8 b2 59 af ff 41 8b 8d 18 04 00 00 8b 45 b8 41 d3 e6 44 01 f0 4c 8d 73 14 41 39 c7 0f 82 37 fe ff ff <0f> 0b 65 8b 05 2c 04 77 47 89 c0 48 0f a3 05 52 c1 d5 01 0f 92 c0
      [  699.822919] RSP: 0018:ffff8801f43af508 EFLAGS: 00010283
      [  699.823977] RAX: 0000000000000000 RBX: ffff8801f43af7b8 RCX: ffffffffb88a7cef
      [  699.825436] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8801e3e7a64c
      [  699.826881] RBP: ffff8801f43af558 R08: ffffed003e066b55 R09: ffffed003e066b55
      [  699.828292] R10: 0000000000000001 R11: ffffed003e066b54 R12: ffffea0007876940
      [  699.829750] R13: ffff8801f0335500 R14: ffff8801e3e7a600 R15: 0000000000000001
      [  699.831192] FS:  00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
      [  699.832793] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  699.833981] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0
      [  699.835556] ==================================================================
      [  699.837029] BUG: KASAN: stack-out-of-bounds in update_stack_state+0x38c/0x3e0
      [  699.838462] Read of size 8 at addr ffff8801f43af970 by task a.out/1309
      
      [  699.840086] CPU: 0 PID: 1309 Comm: a.out Tainted: G      D W         4.18.0-rc1+ #4
      [  699.841603] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [  699.843475] Call Trace:
      [  699.843982]  dump_stack+0x7b/0xb5
      [  699.844661]  print_address_description+0x70/0x290
      [  699.845607]  kasan_report+0x291/0x390
      [  699.846351]  ? update_stack_state+0x38c/0x3e0
      [  699.853831]  __asan_load8+0x54/0x90
      [  699.854569]  update_stack_state+0x38c/0x3e0
      [  699.855428]  ? __read_once_size_nocheck.constprop.7+0x20/0x20
      [  699.856601]  ? __save_stack_trace+0x5e/0x100
      [  699.857476]  unwind_next_frame.part.5+0x18e/0x490
      [  699.858448]  ? unwind_dump+0x290/0x290
      [  699.859217]  ? clear_page_dirty_for_io+0x332/0x450
      [  699.860185]  __unwind_start+0x106/0x190
      [  699.860974]  __save_stack_trace+0x5e/0x100
      [  699.861808]  ? __save_stack_trace+0x5e/0x100
      [  699.862691]  ? unlink_anon_vmas+0xba/0x2c0
      [  699.863525]  save_stack_trace+0x1f/0x30
      [  699.864312]  save_stack+0x46/0xd0
      [  699.864993]  ? __alloc_pages_slowpath+0x1420/0x1420
      [  699.865990]  ? flush_tlb_mm_range+0x15e/0x220
      [  699.866889]  ? kasan_check_write+0x14/0x20
      [  699.867724]  ? __dec_node_state+0x92/0xb0
      [  699.868543]  ? lock_page_memcg+0x85/0xf0
      [  699.869350]  ? unlock_page_memcg+0x16/0x80
      [  699.870185]  ? page_remove_rmap+0x198/0x520
      [  699.871048]  ? mark_page_accessed+0x133/0x200
      [  699.871930]  ? _cond_resched+0x1a/0x50
      [  699.872700]  ? unmap_page_range+0xcd4/0xe50
      [  699.873551]  ? rb_next+0x58/0x80
      [  699.874217]  ? rb_next+0x58/0x80
      [  699.874895]  __kasan_slab_free+0x13c/0x1a0
      [  699.875734]  ? unlink_anon_vmas+0xba/0x2c0
      [  699.876563]  kasan_slab_free+0xe/0x10
      [  699.877315]  kmem_cache_free+0x89/0x1e0
      [  699.878095]  unlink_anon_vmas+0xba/0x2c0
      [  699.878913]  free_pgtables+0x101/0x1b0
      [  699.879677]  exit_mmap+0x146/0x2a0
      [  699.880378]  ? __ia32_sys_munmap+0x50/0x50
      [  699.881214]  ? kasan_check_read+0x11/0x20
      [  699.882052]  ? mm_update_next_owner+0x322/0x380
      [  699.882985]  mmput+0x8b/0x1d0
      [  699.883602]  do_exit+0x43a/0x1390
      [  699.884288]  ? mm_update_next_owner+0x380/0x380
      [  699.885212]  ? f2fs_sync_file+0x9a/0xb0
      [  699.885995]  ? f2fs_do_sync_file+0xd90/0xd90
      [  699.886877]  ? vfs_fsync_range+0x68/0x100
      [  699.887694]  ? __fget_light+0xc9/0xe0
      [  699.888442]  ? do_fsync+0x3d/0x70
      [  699.889118]  ? __x64_sys_fdatasync+0x24/0x30
      [  699.889996]  rewind_stack_do_exit+0x17/0x20
      [  699.890860] RIP: 0033:0x7f9bf930d800
      [  699.891585] Code: Bad RIP value.
      [  699.892268] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
      [  699.893781] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800
      [  699.895220] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003
      [  699.896643] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000
      [  699.898069] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610
      [  699.899505] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000
      
      [  699.901241] The buggy address belongs to the page:
      [  699.902215] page:ffffea0007d0ebc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
      [  699.903811] flags: 0x2ffff0000000000()
      [  699.904585] raw: 02ffff0000000000 0000000000000000 ffffffff07d00101 0000000000000000
      [  699.906125] raw: 0000000000000000 0000000000240000 00000000ffffffff 0000000000000000
      [  699.907673] page dumped because: kasan: bad access detected
      
      [  699.909108] Memory state around the buggy address:
      [  699.910077]  ffff8801f43af800: 00 f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00 00 00
      [  699.911528]  ffff8801f43af880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  699.912953] >ffff8801f43af900: 00 00 00 00 00 00 00 00 f1 01 f4 f4 f4 f2 f2 f2
      [  699.914392]                                                              ^
      [  699.915758]  ffff8801f43af980: f2 00 f4 f4 00 00 00 00 f2 00 00 00 00 00 00 00
      [  699.917193]  ffff8801f43afa00: 00 00 00 00 00 00 00 00 00 f3 f3 f3 00 00 00 00
      [  699.918634] ==================================================================
      
      - Location
      https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/f2fs/segment.h#L644
      
      Reported-by Wen Xu <wen.xu@gatech.edu>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c9b60788
  9. 29 7月, 2018 1 次提交
  10. 27 7月, 2018 5 次提交
    • C
      f2fs: disable f2fs_check_rb_tree_consistence · 67fce70b
      Chao Yu 提交于
      If there is millions of discard entries cached in rb tree, each
      sanity check of it can cause very long latency as held cmd_lock
      blocking other lock grabbers.
      
      In other aspect, we have enabled the check very long time, as
      we see, there is no such inconsistent condition caused by bugs.
      
      But still we do not choose to kill it directly, instead, adding
      an flag to disable the check now, if there is related code change,
      we can reuse it to detect bugs.
      Signed-off-by: NYunlei He <heyunlei@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      67fce70b
    • C
      f2fs: introduce and spread verify_blkaddr · e1da7872
      Chao Yu 提交于
      This patch introduces verify_blkaddr to check meta/data block address
      with valid range to detect bug earlier.
      
      In addition, once we encounter an invalid blkaddr, notice user to run
      fsck to fix, and let the kernel panic.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e1da7872
    • A
      f2fs: use timespec64 for inode timestamps · 24b81dfc
      Arnd Bergmann 提交于
      The on-disk representation and the vfs both use 64-bit tv_sec values,
      so let's change the last missing piece in the middle.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      24b81dfc
    • C
      f2fs: fix to propagate return value of scan_nat_page() · e2374015
      Chao Yu 提交于
      As Anatoly Trosinenko reported in bugzilla:
      
      How to reproduce:
      1. Compile the 73fcb1a3 version of the kernel using the config attached
      2. Unpack and mount the attached filesystem image as F2FS
      3. The kernel will BUG() on mount (BUGs are explicitly enabled in config)
      
      [    2.233612] F2FS-fs (sda): Found nat_bits in checkpoint
      [    2.248422] ------------[ cut here ]------------
      [    2.248857] kernel BUG at fs/f2fs/node.c:1967!
      [    2.249760] invalid opcode: 0000 [#1] SMP NOPTI
      [    2.250219] Modules linked in:
      [    2.251848] CPU: 0 PID: 944 Comm: mount Not tainted 4.17.0-rc5+ #1
      [    2.252331] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [    2.253305] RIP: 0010:build_free_nids+0x337/0x3f0
      [    2.253672] RSP: 0018:ffffae7fc0857c50 EFLAGS: 00000246
      [    2.254080] RAX: 00000000ffffffff RBX: 0000000000000123 RCX: 0000000000000001
      [    2.254638] RDX: ffff9aa7063d5c00 RSI: 0000000000000122 RDI: ffff9aa705852e00
      [    2.255190] RBP: ffff9aa705852e00 R08: 0000000000000001 R09: ffff9aa7059090c0
      [    2.255719] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9aa705852e00
      [    2.256242] R13: ffff9aa7063ad000 R14: ffff9aa705919000 R15: 0000000000000123
      [    2.256809] FS:  00000000023078c0(0000) GS:ffff9aa707800000(0000) knlGS:0000000000000000
      [    2.258654] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    2.259153] CR2: 00000000005511ae CR3: 0000000005872000 CR4: 00000000000006f0
      [    2.259801] Call Trace:
      [    2.260583]  build_node_manager+0x5cd/0x600
      [    2.260963]  f2fs_fill_super+0x66a/0x17c0
      [    2.261300]  ? f2fs_commit_super+0xe0/0xe0
      [    2.261622]  mount_bdev+0x16e/0x1a0
      [    2.261899]  mount_fs+0x30/0x150
      [    2.262398]  vfs_kern_mount.part.28+0x4f/0xf0
      [    2.262743]  do_mount+0x5d0/0xc60
      [    2.263010]  ? _copy_from_user+0x37/0x60
      [    2.263313]  ? memdup_user+0x39/0x60
      [    2.263692]  ksys_mount+0x7b/0xd0
      [    2.263960]  __x64_sys_mount+0x1c/0x20
      [    2.264268]  do_syscall_64+0x43/0xf0
      [    2.264560]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [    2.265095] RIP: 0033:0x48d31a
      [    2.265502] RSP: 002b:00007ffc6fe60a08 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
      [    2.266089] RAX: ffffffffffffffda RBX: 0000000000008000 RCX: 000000000048d31a
      [    2.266607] RDX: 00007ffc6fe62fa5 RSI: 00007ffc6fe62f9d RDI: 00007ffc6fe62f94
      [    2.267130] RBP: 00000000023078a0 R08: 0000000000000000 R09: 0000000000000000
      [    2.267670] R10: 0000000000008000 R11: 0000000000000246 R12: 0000000000000000
      [    2.268192] R13: 0000000000000000 R14: 00007ffc6fe60c78 R15: 0000000000000000
      [    2.268767] Code: e8 5f c3 ff ff 83 c3 01 41 83 c7 01 81 fb c7 01 00 00 74 48 44 39 7d 04 76 42 48 63 c3 48 8d 04 c0 41 8b 44 06 05 83 f8 ff 75 c1 <0f> 0b 49 8b 45 50 48 8d b8 b0 00 00 00 e8 37 59 69 00 b9 01 00
      [    2.270434] RIP: build_free_nids+0x337/0x3f0 RSP: ffffae7fc0857c50
      [    2.271426] ---[ end trace ab20c06cd3c8fde4 ]---
      
      During loading NAT entries, we will do sanity check, once the entry info
      is corrupted, it will cause BUG_ON directly to protect user data from
      being overwrited.
      
      In this case, it will be better to just return failure on mount() instead
      of panic, so that user can get hint from kmsg and try fsck for recovery
      immediately rather than after an abnormal reboot.
      
      https://bugzilla.kernel.org/show_bug.cgi?id=199769Reported-by: NAnatoly Trosinenko <anatoly.trosinenko@gmail.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e2374015
    • J
      f2fs: indicate shutdown f2fs to allow unmount successfully · 83a3bfdb
      Jaegeuk Kim 提交于
      Once we shutdown f2fs, we have to flush stale pages in order to unmount
      the system. In order to make stable, we need to stop fault injection as well.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      83a3bfdb
  11. 18 7月, 2018 1 次提交
  12. 06 6月, 2018 1 次提交
    • D
      vfs: change inode times to use struct timespec64 · 95582b00
      Deepa Dinamani 提交于
      struct timespec is not y2038 safe. Transition vfs to use
      y2038 safe struct timespec64 instead.
      
      The change was made with the help of the following cocinelle
      script. This catches about 80% of the changes.
      All the header file and logic changes are included in the
      first 5 rules. The rest are trivial substitutions.
      I avoid changing any of the function signatures or any other
      filesystem specific data structures to keep the patch simple
      for review.
      
      The script can be a little shorter by combining different cases.
      But, this version was sufficient for my usecase.
      
      virtual patch
      
      @ depends on patch @
      identifier now;
      @@
      - struct timespec
      + struct timespec64
        current_time ( ... )
        {
      - struct timespec now = current_kernel_time();
      + struct timespec64 now = current_kernel_time64();
        ...
      - return timespec_trunc(
      + return timespec64_trunc(
        ... );
        }
      
      @ depends on patch @
      identifier xtime;
      @@
       struct \( iattr \| inode \| kstat \) {
       ...
      -       struct timespec xtime;
      +       struct timespec64 xtime;
       ...
       }
      
      @ depends on patch @
      identifier t;
      @@
       struct inode_operations {
       ...
      int (*update_time) (...,
      -       struct timespec t,
      +       struct timespec64 t,
      ...);
       ...
       }
      
      @ depends on patch @
      identifier t;
      identifier fn_update_time =~ "update_time$";
      @@
       fn_update_time (...,
      - struct timespec *t,
      + struct timespec64 *t,
       ...) { ... }
      
      @ depends on patch @
      identifier t;
      @@
      lease_get_mtime( ... ,
      - struct timespec *t
      + struct timespec64 *t
        ) { ... }
      
      @te depends on patch forall@
      identifier ts;
      local idexpression struct inode *inode_node;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier fn_update_time =~ "update_time$";
      identifier fn;
      expression e, E3;
      local idexpression struct inode *node1;
      local idexpression struct inode *node2;
      local idexpression struct iattr *attr1;
      local idexpression struct iattr *attr2;
      local idexpression struct iattr attr;
      identifier i_xtime1 =~ "^i_[acm]time$";
      identifier i_xtime2 =~ "^i_[acm]time$";
      identifier ia_xtime1 =~ "^ia_[acm]time$";
      identifier ia_xtime2 =~ "^ia_[acm]time$";
      @@
      (
      (
      - struct timespec ts;
      + struct timespec64 ts;
      |
      - struct timespec ts = current_time(inode_node);
      + struct timespec64 ts = current_time(inode_node);
      )
      
      <+... when != ts
      (
      - timespec_equal(&inode_node->i_xtime, &ts)
      + timespec64_equal(&inode_node->i_xtime, &ts)
      |
      - timespec_equal(&ts, &inode_node->i_xtime)
      + timespec64_equal(&ts, &inode_node->i_xtime)
      |
      - timespec_compare(&inode_node->i_xtime, &ts)
      + timespec64_compare(&inode_node->i_xtime, &ts)
      |
      - timespec_compare(&ts, &inode_node->i_xtime)
      + timespec64_compare(&ts, &inode_node->i_xtime)
      |
      ts = current_time(e)
      |
      fn_update_time(..., &ts,...)
      |
      inode_node->i_xtime = ts
      |
      node1->i_xtime = ts
      |
      ts = inode_node->i_xtime
      |
      <+... attr1->ia_xtime ...+> = ts
      |
      ts = attr1->ia_xtime
      |
      ts.tv_sec
      |
      ts.tv_nsec
      |
      btrfs_set_stack_timespec_sec(..., ts.tv_sec)
      |
      btrfs_set_stack_timespec_nsec(..., ts.tv_nsec)
      |
      - ts = timespec64_to_timespec(
      + ts =
      ...
      -)
      |
      - ts = ktime_to_timespec(
      + ts = ktime_to_timespec64(
      ...)
      |
      - ts = E3
      + ts = timespec_to_timespec64(E3)
      |
      - ktime_get_real_ts(&ts)
      + ktime_get_real_ts64(&ts)
      |
      fn(...,
      - ts
      + timespec64_to_timespec(ts)
      ,...)
      )
      ...+>
      (
      <... when != ts
      - return ts;
      + return timespec64_to_timespec(ts);
      ...>
      )
      |
      - timespec_equal(&node1->i_xtime1, &node2->i_xtime2)
      + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2)
      |
      - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2)
      + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2)
      |
      - timespec_compare(&node1->i_xtime1, &node2->i_xtime2)
      + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2)
      |
      node1->i_xtime1 =
      - timespec_trunc(attr1->ia_xtime1,
      + timespec64_trunc(attr1->ia_xtime1,
      ...)
      |
      - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2,
      + attr1->ia_xtime1 =  timespec64_trunc(attr2->ia_xtime2,
      ...)
      |
      - ktime_get_real_ts(&attr1->ia_xtime1)
      + ktime_get_real_ts64(&attr1->ia_xtime1)
      |
      - ktime_get_real_ts(&attr.ia_xtime1)
      + ktime_get_real_ts64(&attr.ia_xtime1)
      )
      
      @ depends on patch @
      struct inode *node;
      struct iattr *attr;
      identifier fn;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      expression e;
      @@
      (
      - fn(node->i_xtime);
      + fn(timespec64_to_timespec(node->i_xtime));
      |
       fn(...,
      - node->i_xtime);
      + timespec64_to_timespec(node->i_xtime));
      |
      - e = fn(attr->ia_xtime);
      + e = fn(timespec64_to_timespec(attr->ia_xtime));
      )
      
      @ depends on patch forall @
      struct inode *node;
      struct iattr *attr;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier fn;
      @@
      {
      + struct timespec ts;
      <+...
      (
      + ts = timespec64_to_timespec(node->i_xtime);
      fn (...,
      - &node->i_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      fn (...,
      - &attr->ia_xtime,
      + &ts,
      ...);
      )
      ...+>
      }
      
      @ depends on patch forall @
      struct inode *node;
      struct iattr *attr;
      struct kstat *stat;
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier i_xtime =~ "^i_[acm]time$";
      identifier xtime =~ "^[acm]time$";
      identifier fn, ret;
      @@
      {
      + struct timespec ts;
      <+...
      (
      + ts = timespec64_to_timespec(node->i_xtime);
      ret = fn (...,
      - &node->i_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(node->i_xtime);
      ret = fn (...,
      - &node->i_xtime);
      + &ts);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      ret = fn (...,
      - &attr->ia_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      ret = fn (...,
      - &attr->ia_xtime);
      + &ts);
      |
      + ts = timespec64_to_timespec(stat->xtime);
      ret = fn (...,
      - &stat->xtime);
      + &ts);
      )
      ...+>
      }
      
      @ depends on patch @
      struct inode *node;
      struct inode *node2;
      identifier i_xtime1 =~ "^i_[acm]time$";
      identifier i_xtime2 =~ "^i_[acm]time$";
      identifier i_xtime3 =~ "^i_[acm]time$";
      struct iattr *attrp;
      struct iattr *attrp2;
      struct iattr attr ;
      identifier ia_xtime1 =~ "^ia_[acm]time$";
      identifier ia_xtime2 =~ "^ia_[acm]time$";
      struct kstat *stat;
      struct kstat stat1;
      struct timespec64 ts;
      identifier xtime =~ "^[acmb]time$";
      expression e;
      @@
      (
      ( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1  ;
      |
       node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \);
      |
       node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
      |
       node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
      |
       stat->xtime = node2->i_xtime1;
      |
       stat1.xtime = node2->i_xtime1;
      |
      ( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1  ;
      |
      ( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2;
      |
      - e = node->i_xtime1;
      + e = timespec64_to_timespec( node->i_xtime1 );
      |
      - e = attrp->ia_xtime1;
      + e = timespec64_to_timespec( attrp->ia_xtime1 );
      |
      node->i_xtime1 = current_time(...);
      |
       node->i_xtime2 = node->i_xtime1 = node->i_xtime3 =
      - e;
      + timespec_to_timespec64(e);
      |
       node->i_xtime1 = node->i_xtime3 =
      - e;
      + timespec_to_timespec64(e);
      |
      - node->i_xtime1 = e;
      + node->i_xtime1 = timespec_to_timespec64(e);
      )
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Cc: <anton@tuxera.com>
      Cc: <balbi@kernel.org>
      Cc: <bfields@fieldses.org>
      Cc: <darrick.wong@oracle.com>
      Cc: <dhowells@redhat.com>
      Cc: <dsterba@suse.com>
      Cc: <dwmw2@infradead.org>
      Cc: <hch@lst.de>
      Cc: <hirofumi@mail.parknet.co.jp>
      Cc: <hubcap@omnibond.com>
      Cc: <jack@suse.com>
      Cc: <jaegeuk@kernel.org>
      Cc: <jaharkes@cs.cmu.edu>
      Cc: <jslaby@suse.com>
      Cc: <keescook@chromium.org>
      Cc: <mark@fasheh.com>
      Cc: <miklos@szeredi.hu>
      Cc: <nico@linaro.org>
      Cc: <reiserfs-devel@vger.kernel.org>
      Cc: <richard@nod.at>
      Cc: <sage@redhat.com>
      Cc: <sfrench@samba.org>
      Cc: <swhiteho@redhat.com>
      Cc: <tj@kernel.org>
      Cc: <trond.myklebust@primarydata.com>
      Cc: <tytso@mit.edu>
      Cc: <viro@zeniv.linux.org.uk>
      95582b00
  13. 05 6月, 2018 1 次提交
    • C
      f2fs: let sync node IO interrupt async one · c29fd0c0
      Chao Yu 提交于
      Although mixed sync/async IOs can have continuous LBA, as they have
      different IO priority, block IO scheduler will add them into different
      queues and commit them separately, result in splited IOs which causes
      wrose performance.
      
      This patch gives high priority to synchronous IO of nodes, means that
      once synchronous flow starts, it can interrupt asynchronous writeback
      flow of system flusher, so more big IOs can be expected.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c29fd0c0
  14. 01 6月, 2018 9 次提交
    • C
      f2fs: clean up symbol namespace · 4d57b86d
      Chao Yu 提交于
      As Ted reported:
      
      "Hi, I was looking at f2fs's sources recently, and I noticed that there
      is a very large number of non-static symbols which don't have a f2fs
      prefix.  There's well over a hundred (see attached below).
      
      As one example, in fs/f2fs/dir.c there is:
      
      unsigned char get_de_type(struct f2fs_dir_entry *de)
      
      This function is clearly only useful for f2fs, but it has a generic
      name.  This means that if any other file system tries to have the same
      symbol name, there will be a symbol conflict and the kernel would not
      successfully build.  It also means that when someone is looking f2fs
      sources, it's not at all obvious whether a function such as
      read_data_page(), invalidate_blocks(), is a generic kernel function
      found in the fs, mm, or block layers, or a f2fs specific function.
      
      You might want to fix this at some point.  Hopefully Kent's bcachefs
      isn't similarly using genericly named functions, since that might
      cause conflicts with f2fs's functions --- but just as this would be a
      problem that we would rightly insist that Kent fix, this is something
      that we should have rightly insisted that f2fs should have fixed
      before it was integrated into the mainline kernel.
      
      acquire_orphan_inode
      add_ino_entry
      add_orphan_inode
      allocate_data_block
      allocate_new_segments
      alloc_nid
      alloc_nid_done
      alloc_nid_failed
      available_free_memory
      ...."
      
      This patch adds "f2fs_" prefix for all non-static symbols in order to:
      a) avoid conflict with other kernel generic symbols;
      b) to indicate the function is f2fs specific one instead of generic
      one;
      Reported-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      4d57b86d
    • C
      f2fs: make set_de_type() static · 2e79d951
      Chao Yu 提交于
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2e79d951
    • C
      f2fs: make __f2fs_write_data_pages() static · fc99fe27
      Chao Yu 提交于
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fc99fe27
    • C
      f2fs: fix to let caller retry allocating block address · fe16efe6
      Chao Yu 提交于
      Configure io_bits with 2 and enable LFS mode, generic/013 reports below dmesg:
      
      BUG: unable to handle kernel NULL pointer dereference at 00000104
      *pdpt = 0000000029b7b001 *pde = 0000000000000000
      Oops: 0002 [#1] PREEMPT SMP
      Modules linked in: crc32_generic zram f2fs(O) rfcomm bnep bluetooth ecdh_generic snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq pcbc joydev snd_seq_device aesni_intel snd_timer aes_i586 snd crypto_simd cryptd soundcore i2c_piix4 serio_raw mac_hid video parport_pc ppdev lp parport hid_generic psmouse usbhid hid e1000
      CPU: 0 PID: 11161 Comm: fsstress Tainted: G           O      4.17.0-rc2 #38
      Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      EIP: f2fs_submit_page_write+0x28d/0x550 [f2fs]
      EFLAGS: 00010206 CPU: 0
      EAX: e863dcd8 EBX: 00000000 ECX: 00000100 EDX: 00000200
      ESI: e863dcf4 EDI: f6f82768 EBP: e863dbb0 ESP: e863db74
       DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      CR0: 80050033 CR2: 00000104 CR3: 29a62020 CR4: 000406f0
      Call Trace:
       do_write_page+0x6f/0xc0 [f2fs]
       write_data_page+0x4a/0xd0 [f2fs]
       do_write_data_page+0x327/0x630 [f2fs]
       __write_data_page+0x34b/0x820 [f2fs]
       __f2fs_write_data_pages+0x42d/0x8c0 [f2fs]
       f2fs_write_data_pages+0x27/0x30 [f2fs]
       do_writepages+0x1a/0x70
       __filemap_fdatawrite_range+0x94/0xd0
       filemap_write_and_wait_range+0x3d/0xa0
       __generic_file_write_iter+0x11a/0x1f0
       f2fs_file_write_iter+0xdd/0x3b0 [f2fs]
       __vfs_write+0xd2/0x150
       vfs_write+0x9b/0x190
       ksys_write+0x45/0x90
       sys_write+0x16/0x20
       do_fast_syscall_32+0xaa/0x22c
       entry_SYSENTER_32+0x4c/0x7b
      EIP: 0xb7fc8c51
      EFLAGS: 00000246 CPU: 0
      EAX: ffffffda EBX: 00000003 ECX: 09cde000 EDX: 00001000
      ESI: 00000003 EDI: 00001000 EBP: 00000000 ESP: bfbded38
       DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
      Code: e8 f9 77 34 c9 8b 45 e0 8b 80 b8 00 00 00 39 45 d8 0f 84 bb 02 00 00 8b 45 e0 8b 80 b8 00 00 00 8d 50 d8 8b 08 89 55 f0 8b 50 04 <89> 51 04 89 0a c7 00 00 01 00 00 c7 40 04 00 02 00 00 8b 45 dc
      EIP: f2fs_submit_page_write+0x28d/0x550 [f2fs] SS:ESP: 0068:e863db74
      CR2: 0000000000000104
      ---[ end trace 4cac79c0d1305ee6 ]---
      
      allocate_data_block will submit all sequential pending IOs sorted by a
      FIFO list, If we failed to submit other user's IO due to unaligned write,
      we will retry to allocate new block address for current IO, then it will
      initialize fio.list again, if fio was in the list before, it can break
      FIFO list, result in above panic.
      
      Thread A			Thread B
      - do_write_page
       - allocate_data_block
        - list_add_tail
        : fioA cached in FIFO list.
      				- do_write_page
      				 - allocate_data_block
      				  - list_add_tail
      				  : fioB cached in FIFO list.
      				 - f2fs_submit_page_write
      				 : fail to submit IO
      				 - allocate_data_block
      				  - INIT_LIST_HEAD
       - f2fs_submit_page_write
        - list_del  <-- NULL pointer dereference
      
      This patch adds fio.retry parameter to indicate failure status for each
      IO, and avoid bailing out if there is still pending IO in FIFO list for
      fixing.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fe16efe6
    • C
      f2fs: clean up with clear_radix_tree_dirty_tag · aec2f729
      Chao Yu 提交于
      Introduce clear_radix_tree_dirty_tag to include common codes for cleanup.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      aec2f729
    • Y
      f2fs: let discard thread wait a little longer if dev is busy · f9d1dced
      Yunlei He 提交于
      This patch modify discard thread wait policy as below:
      	issued       io_interrupted     wait time(ms)
      1.        8                 0               50
      2.      (0,8)               1               50
      3.        0                 1              500 (dev is busy)
      4.        0                 0            60000 (no candidates)
      Signed-off-by: NYunlei He <heyunlei@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f9d1dced
    • C
      f2fs: avoid stucking GC due to atomic write · 2ef79ecb
      Chao Yu 提交于
      f2fs doesn't allow abuse on atomic write class interface, so except
      limiting in-mem pages' total memory usage capacity, we need to limit
      atomic-write usage as well when filesystem is seriously fragmented,
      otherwise we may run into infinite loop during foreground GC because
      target blocks in victim segment are belong to atomic opened file for
      long time.
      
      Now, we will detect failure due to atomic write in foreground GC, if
      the count exceeds threshold, we will drop all atomic written data in
      cache, by this, I expect it can keep our system running safely to
      prevent Dos attack.
      
      In addition, his patch adds to show GC skip information in debugfs,
      now it just shows count of skipped caused by atomic write.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2ef79ecb
    • J
      f2fs: introduce sbi->gc_mode to determine the policy · 5b0e9539
      Jaegeuk Kim 提交于
      This is to avoid sbi->gc_thread pointer access.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5b0e9539
    • C
      f2fs: keep migration IO order in LFS mode · 107a805d
      Chao Yu 提交于
      For non-migration IO, we will keep order of data/node blocks' submitting
      as allocation sequence by sorting IOs in per log io_list list, but for
      migration IO, it could be out-of-order.
      
      In LFS mode, we should keep all IOs including migration IO be ordered,
      so that this patch fixes to add an additional lock to keep submitting
      order.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NYunlong Song <yunlong.song@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      107a805d