1. 02 8月, 2018 2 次提交
    • C
      f2fs: issue small discard by LBA order · 20ee4382
      Chao Yu 提交于
      For small granularity discard which size is smaller than 64KB, if we
      issue those kind of discards orderly by size, their IOs will be spread
      into entire logical address, so that in FTL, L2P table will be updated
      randomly, result bad wear rate in the table.
      
      In this patch, we choose to issue small discard by LBA order, by this
      way, we can expect that L2P table updates from adjacent discard IOs can
      be merged in the cache, so it can reduce lifetime wearing of flash.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      20ee4382
    • C
      f2fs: fix to do sanity check with block address in main area · c9b60788
      Chao Yu 提交于
      This patch add to do sanity check with below field:
      - cp_pack_total_block_count
      - blkaddr of data/node
      - extent info
      
      - Overview
      BUG() in verify_block_addr() when writing to a corrupted f2fs image
      
      - Reproduce (4.18 upstream kernel)
      
      - POC (poc.c)
      
      static void activity(char *mpoint) {
      
        char *foo_bar_baz;
        int err;
      
        static int buf[8192];
        memset(buf, 0, sizeof(buf));
      
        err = asprintf(&foo_bar_baz, "%s/foo/bar/baz", mpoint);
      
        int fd = open(foo_bar_baz, O_RDWR | O_TRUNC, 0777);
        if (fd >= 0) {
          write(fd, (char *)buf, sizeof(buf));
          fdatasync(fd);
          close(fd);
        }
      }
      
      int main(int argc, char *argv[]) {
        activity(argv[1]);
        return 0;
      }
      
      - Kernel message
      [  689.349473] F2FS-fs (loop0): Mounted with checkpoint version = 3
      [  699.728662] WARNING: CPU: 0 PID: 1309 at fs/f2fs/segment.c:2860 f2fs_inplace_write_data+0x232/0x240
      [  699.728670] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
      [  699.729056] CPU: 0 PID: 1309 Comm: a.out Not tainted 4.18.0-rc1+ #4
      [  699.729064] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [  699.729074] RIP: 0010:f2fs_inplace_write_data+0x232/0x240
      [  699.729076] Code: ff e9 cf fe ff ff 49 8d 7d 10 e8 39 45 ad ff 4d 8b 7d 10 be 04 00 00 00 49 8d 7f 48 e8 07 49 ad ff 45 8b 7f 48 e9 fb fe ff ff <0f> 0b f0 41 80 4d 48 04 e9 65 fe ff ff 90 66 66 66 66 90 55 48 8d
      [  699.729130] RSP: 0018:ffff8801f43af568 EFLAGS: 00010202
      [  699.729139] RAX: 000000000000003f RBX: ffff8801f43af7b8 RCX: ffffffffb88c9113
      [  699.729142] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffff8802024e5540
      [  699.729144] RBP: ffff8801f43af590 R08: 0000000000000009 R09: ffffffffffffffe8
      [  699.729147] R10: 0000000000000001 R11: ffffed0039b0596a R12: ffff8802024e5540
      [  699.729149] R13: ffff8801f0335500 R14: ffff8801e3e7a700 R15: ffff8801e1ee4450
      [  699.729154] FS:  00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
      [  699.729156] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  699.729159] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0
      [  699.729171] Call Trace:
      [  699.729192]  f2fs_do_write_data_page+0x2e2/0xe00
      [  699.729203]  ? f2fs_should_update_outplace+0xd0/0xd0
      [  699.729238]  ? memcg_drain_all_list_lrus+0x280/0x280
      [  699.729269]  ? __radix_tree_replace+0xa3/0x120
      [  699.729276]  __write_data_page+0x5c7/0xe30
      [  699.729291]  ? kasan_check_read+0x11/0x20
      [  699.729310]  ? page_mapped+0x8a/0x110
      [  699.729321]  ? page_mkclean+0xe9/0x160
      [  699.729327]  ? f2fs_do_write_data_page+0xe00/0xe00
      [  699.729331]  ? invalid_page_referenced_vma+0x130/0x130
      [  699.729345]  ? clear_page_dirty_for_io+0x332/0x450
      [  699.729351]  f2fs_write_cache_pages+0x4ca/0x860
      [  699.729358]  ? __write_data_page+0xe30/0xe30
      [  699.729374]  ? percpu_counter_add_batch+0x22/0xa0
      [  699.729380]  ? kasan_check_write+0x14/0x20
      [  699.729391]  ? _raw_spin_lock+0x17/0x40
      [  699.729403]  ? f2fs_mark_inode_dirty_sync.part.18+0x16/0x30
      [  699.729413]  ? iov_iter_advance+0x113/0x640
      [  699.729418]  ? f2fs_write_end+0x133/0x2e0
      [  699.729423]  ? balance_dirty_pages_ratelimited+0x239/0x640
      [  699.729428]  f2fs_write_data_pages+0x329/0x520
      [  699.729433]  ? generic_perform_write+0x250/0x320
      [  699.729438]  ? f2fs_write_cache_pages+0x860/0x860
      [  699.729454]  ? current_time+0x110/0x110
      [  699.729459]  ? f2fs_preallocate_blocks+0x1ef/0x370
      [  699.729464]  do_writepages+0x37/0xb0
      [  699.729468]  ? f2fs_write_cache_pages+0x860/0x860
      [  699.729472]  ? do_writepages+0x37/0xb0
      [  699.729478]  __filemap_fdatawrite_range+0x19a/0x1f0
      [  699.729483]  ? delete_from_page_cache_batch+0x4e0/0x4e0
      [  699.729496]  ? __vfs_write+0x2b2/0x410
      [  699.729501]  file_write_and_wait_range+0x66/0xb0
      [  699.729506]  f2fs_do_sync_file+0x1f9/0xd90
      [  699.729511]  ? truncate_partial_data_page+0x290/0x290
      [  699.729521]  ? __sb_end_write+0x30/0x50
      [  699.729526]  ? vfs_write+0x20f/0x260
      [  699.729530]  f2fs_sync_file+0x9a/0xb0
      [  699.729534]  ? f2fs_do_sync_file+0xd90/0xd90
      [  699.729548]  vfs_fsync_range+0x68/0x100
      [  699.729554]  ? __fget_light+0xc9/0xe0
      [  699.729558]  do_fsync+0x3d/0x70
      [  699.729562]  __x64_sys_fdatasync+0x24/0x30
      [  699.729585]  do_syscall_64+0x78/0x170
      [  699.729595]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  699.729613] RIP: 0033:0x7f9bf930d800
      [  699.729615] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d 49 bf 2c 00 00 75 10 b8 4b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be 78 01 00 48 89 04 24
      [  699.729668] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
      [  699.729673] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800
      [  699.729675] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003
      [  699.729678] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000
      [  699.729680] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610
      [  699.729683] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000
      [  699.729687] ---[ end trace 4ce02f25ff7d3df5 ]---
      [  699.729782] ------------[ cut here ]------------
      [  699.729785] kernel BUG at fs/f2fs/segment.h:654!
      [  699.731055] invalid opcode: 0000 [#1] SMP KASAN PTI
      [  699.732104] CPU: 0 PID: 1309 Comm: a.out Tainted: G        W         4.18.0-rc1+ #4
      [  699.733684] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [  699.735611] RIP: 0010:f2fs_submit_page_bio+0x29b/0x730
      [  699.736649] Code: 54 49 8d bd 18 04 00 00 e8 b2 59 af ff 41 8b 8d 18 04 00 00 8b 45 b8 41 d3 e6 44 01 f0 4c 8d 73 14 41 39 c7 0f 82 37 fe ff ff <0f> 0b 65 8b 05 2c 04 77 47 89 c0 48 0f a3 05 52 c1 d5 01 0f 92 c0
      [  699.740524] RSP: 0018:ffff8801f43af508 EFLAGS: 00010283
      [  699.741573] RAX: 0000000000000000 RBX: ffff8801f43af7b8 RCX: ffffffffb88a7cef
      [  699.743006] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8801e3e7a64c
      [  699.744426] RBP: ffff8801f43af558 R08: ffffed003e066b55 R09: ffffed003e066b55
      [  699.745833] R10: 0000000000000001 R11: ffffed003e066b54 R12: ffffea0007876940
      [  699.747256] R13: ffff8801f0335500 R14: ffff8801e3e7a600 R15: 0000000000000001
      [  699.748683] FS:  00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
      [  699.750293] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  699.751462] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0
      [  699.752874] Call Trace:
      [  699.753386]  ? f2fs_inplace_write_data+0x93/0x240
      [  699.754341]  f2fs_inplace_write_data+0xd2/0x240
      [  699.755271]  f2fs_do_write_data_page+0x2e2/0xe00
      [  699.756214]  ? f2fs_should_update_outplace+0xd0/0xd0
      [  699.757215]  ? memcg_drain_all_list_lrus+0x280/0x280
      [  699.758209]  ? __radix_tree_replace+0xa3/0x120
      [  699.759164]  __write_data_page+0x5c7/0xe30
      [  699.760002]  ? kasan_check_read+0x11/0x20
      [  699.760823]  ? page_mapped+0x8a/0x110
      [  699.761573]  ? page_mkclean+0xe9/0x160
      [  699.762345]  ? f2fs_do_write_data_page+0xe00/0xe00
      [  699.763332]  ? invalid_page_referenced_vma+0x130/0x130
      [  699.764374]  ? clear_page_dirty_for_io+0x332/0x450
      [  699.765347]  f2fs_write_cache_pages+0x4ca/0x860
      [  699.766276]  ? __write_data_page+0xe30/0xe30
      [  699.767161]  ? percpu_counter_add_batch+0x22/0xa0
      [  699.768112]  ? kasan_check_write+0x14/0x20
      [  699.768951]  ? _raw_spin_lock+0x17/0x40
      [  699.769739]  ? f2fs_mark_inode_dirty_sync.part.18+0x16/0x30
      [  699.770885]  ? iov_iter_advance+0x113/0x640
      [  699.771743]  ? f2fs_write_end+0x133/0x2e0
      [  699.772569]  ? balance_dirty_pages_ratelimited+0x239/0x640
      [  699.773680]  f2fs_write_data_pages+0x329/0x520
      [  699.774603]  ? generic_perform_write+0x250/0x320
      [  699.775544]  ? f2fs_write_cache_pages+0x860/0x860
      [  699.776510]  ? current_time+0x110/0x110
      [  699.777299]  ? f2fs_preallocate_blocks+0x1ef/0x370
      [  699.778279]  do_writepages+0x37/0xb0
      [  699.779026]  ? f2fs_write_cache_pages+0x860/0x860
      [  699.779978]  ? do_writepages+0x37/0xb0
      [  699.780755]  __filemap_fdatawrite_range+0x19a/0x1f0
      [  699.781746]  ? delete_from_page_cache_batch+0x4e0/0x4e0
      [  699.782820]  ? __vfs_write+0x2b2/0x410
      [  699.783597]  file_write_and_wait_range+0x66/0xb0
      [  699.784540]  f2fs_do_sync_file+0x1f9/0xd90
      [  699.785381]  ? truncate_partial_data_page+0x290/0x290
      [  699.786415]  ? __sb_end_write+0x30/0x50
      [  699.787204]  ? vfs_write+0x20f/0x260
      [  699.787941]  f2fs_sync_file+0x9a/0xb0
      [  699.788694]  ? f2fs_do_sync_file+0xd90/0xd90
      [  699.789572]  vfs_fsync_range+0x68/0x100
      [  699.790360]  ? __fget_light+0xc9/0xe0
      [  699.791128]  do_fsync+0x3d/0x70
      [  699.791779]  __x64_sys_fdatasync+0x24/0x30
      [  699.792614]  do_syscall_64+0x78/0x170
      [  699.793371]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  699.794406] RIP: 0033:0x7f9bf930d800
      [  699.795134] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d 49 bf 2c 00 00 75 10 b8 4b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be 78 01 00 48 89 04 24
      [  699.798960] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
      [  699.800483] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800
      [  699.801923] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003
      [  699.803373] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000
      [  699.804798] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610
      [  699.806233] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000
      [  699.807667] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
      [  699.817079] ---[ end trace 4ce02f25ff7d3df6 ]---
      [  699.818068] RIP: 0010:f2fs_submit_page_bio+0x29b/0x730
      [  699.819114] Code: 54 49 8d bd 18 04 00 00 e8 b2 59 af ff 41 8b 8d 18 04 00 00 8b 45 b8 41 d3 e6 44 01 f0 4c 8d 73 14 41 39 c7 0f 82 37 fe ff ff <0f> 0b 65 8b 05 2c 04 77 47 89 c0 48 0f a3 05 52 c1 d5 01 0f 92 c0
      [  699.822919] RSP: 0018:ffff8801f43af508 EFLAGS: 00010283
      [  699.823977] RAX: 0000000000000000 RBX: ffff8801f43af7b8 RCX: ffffffffb88a7cef
      [  699.825436] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8801e3e7a64c
      [  699.826881] RBP: ffff8801f43af558 R08: ffffed003e066b55 R09: ffffed003e066b55
      [  699.828292] R10: 0000000000000001 R11: ffffed003e066b54 R12: ffffea0007876940
      [  699.829750] R13: ffff8801f0335500 R14: ffff8801e3e7a600 R15: 0000000000000001
      [  699.831192] FS:  00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
      [  699.832793] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  699.833981] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0
      [  699.835556] ==================================================================
      [  699.837029] BUG: KASAN: stack-out-of-bounds in update_stack_state+0x38c/0x3e0
      [  699.838462] Read of size 8 at addr ffff8801f43af970 by task a.out/1309
      
      [  699.840086] CPU: 0 PID: 1309 Comm: a.out Tainted: G      D W         4.18.0-rc1+ #4
      [  699.841603] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [  699.843475] Call Trace:
      [  699.843982]  dump_stack+0x7b/0xb5
      [  699.844661]  print_address_description+0x70/0x290
      [  699.845607]  kasan_report+0x291/0x390
      [  699.846351]  ? update_stack_state+0x38c/0x3e0
      [  699.853831]  __asan_load8+0x54/0x90
      [  699.854569]  update_stack_state+0x38c/0x3e0
      [  699.855428]  ? __read_once_size_nocheck.constprop.7+0x20/0x20
      [  699.856601]  ? __save_stack_trace+0x5e/0x100
      [  699.857476]  unwind_next_frame.part.5+0x18e/0x490
      [  699.858448]  ? unwind_dump+0x290/0x290
      [  699.859217]  ? clear_page_dirty_for_io+0x332/0x450
      [  699.860185]  __unwind_start+0x106/0x190
      [  699.860974]  __save_stack_trace+0x5e/0x100
      [  699.861808]  ? __save_stack_trace+0x5e/0x100
      [  699.862691]  ? unlink_anon_vmas+0xba/0x2c0
      [  699.863525]  save_stack_trace+0x1f/0x30
      [  699.864312]  save_stack+0x46/0xd0
      [  699.864993]  ? __alloc_pages_slowpath+0x1420/0x1420
      [  699.865990]  ? flush_tlb_mm_range+0x15e/0x220
      [  699.866889]  ? kasan_check_write+0x14/0x20
      [  699.867724]  ? __dec_node_state+0x92/0xb0
      [  699.868543]  ? lock_page_memcg+0x85/0xf0
      [  699.869350]  ? unlock_page_memcg+0x16/0x80
      [  699.870185]  ? page_remove_rmap+0x198/0x520
      [  699.871048]  ? mark_page_accessed+0x133/0x200
      [  699.871930]  ? _cond_resched+0x1a/0x50
      [  699.872700]  ? unmap_page_range+0xcd4/0xe50
      [  699.873551]  ? rb_next+0x58/0x80
      [  699.874217]  ? rb_next+0x58/0x80
      [  699.874895]  __kasan_slab_free+0x13c/0x1a0
      [  699.875734]  ? unlink_anon_vmas+0xba/0x2c0
      [  699.876563]  kasan_slab_free+0xe/0x10
      [  699.877315]  kmem_cache_free+0x89/0x1e0
      [  699.878095]  unlink_anon_vmas+0xba/0x2c0
      [  699.878913]  free_pgtables+0x101/0x1b0
      [  699.879677]  exit_mmap+0x146/0x2a0
      [  699.880378]  ? __ia32_sys_munmap+0x50/0x50
      [  699.881214]  ? kasan_check_read+0x11/0x20
      [  699.882052]  ? mm_update_next_owner+0x322/0x380
      [  699.882985]  mmput+0x8b/0x1d0
      [  699.883602]  do_exit+0x43a/0x1390
      [  699.884288]  ? mm_update_next_owner+0x380/0x380
      [  699.885212]  ? f2fs_sync_file+0x9a/0xb0
      [  699.885995]  ? f2fs_do_sync_file+0xd90/0xd90
      [  699.886877]  ? vfs_fsync_range+0x68/0x100
      [  699.887694]  ? __fget_light+0xc9/0xe0
      [  699.888442]  ? do_fsync+0x3d/0x70
      [  699.889118]  ? __x64_sys_fdatasync+0x24/0x30
      [  699.889996]  rewind_stack_do_exit+0x17/0x20
      [  699.890860] RIP: 0033:0x7f9bf930d800
      [  699.891585] Code: Bad RIP value.
      [  699.892268] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
      [  699.893781] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800
      [  699.895220] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003
      [  699.896643] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000
      [  699.898069] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610
      [  699.899505] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000
      
      [  699.901241] The buggy address belongs to the page:
      [  699.902215] page:ffffea0007d0ebc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
      [  699.903811] flags: 0x2ffff0000000000()
      [  699.904585] raw: 02ffff0000000000 0000000000000000 ffffffff07d00101 0000000000000000
      [  699.906125] raw: 0000000000000000 0000000000240000 00000000ffffffff 0000000000000000
      [  699.907673] page dumped because: kasan: bad access detected
      
      [  699.909108] Memory state around the buggy address:
      [  699.910077]  ffff8801f43af800: 00 f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00 00 00
      [  699.911528]  ffff8801f43af880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  699.912953] >ffff8801f43af900: 00 00 00 00 00 00 00 00 f1 01 f4 f4 f4 f2 f2 f2
      [  699.914392]                                                              ^
      [  699.915758]  ffff8801f43af980: f2 00 f4 f4 00 00 00 00 f2 00 00 00 00 00 00 00
      [  699.917193]  ffff8801f43afa00: 00 00 00 00 00 00 00 00 00 f3 f3 f3 00 00 00 00
      [  699.918634] ==================================================================
      
      - Location
      https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/f2fs/segment.h#L644
      
      Reported-by Wen Xu <wen.xu@gatech.edu>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c9b60788
  2. 29 7月, 2018 1 次提交
  3. 27 7月, 2018 5 次提交
    • C
      f2fs: disable f2fs_check_rb_tree_consistence · 67fce70b
      Chao Yu 提交于
      If there is millions of discard entries cached in rb tree, each
      sanity check of it can cause very long latency as held cmd_lock
      blocking other lock grabbers.
      
      In other aspect, we have enabled the check very long time, as
      we see, there is no such inconsistent condition caused by bugs.
      
      But still we do not choose to kill it directly, instead, adding
      an flag to disable the check now, if there is related code change,
      we can reuse it to detect bugs.
      Signed-off-by: NYunlei He <heyunlei@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      67fce70b
    • C
      f2fs: introduce and spread verify_blkaddr · e1da7872
      Chao Yu 提交于
      This patch introduces verify_blkaddr to check meta/data block address
      with valid range to detect bug earlier.
      
      In addition, once we encounter an invalid blkaddr, notice user to run
      fsck to fix, and let the kernel panic.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e1da7872
    • A
      f2fs: use timespec64 for inode timestamps · 24b81dfc
      Arnd Bergmann 提交于
      The on-disk representation and the vfs both use 64-bit tv_sec values,
      so let's change the last missing piece in the middle.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      24b81dfc
    • C
      f2fs: fix to propagate return value of scan_nat_page() · e2374015
      Chao Yu 提交于
      As Anatoly Trosinenko reported in bugzilla:
      
      How to reproduce:
      1. Compile the 73fcb1a3 version of the kernel using the config attached
      2. Unpack and mount the attached filesystem image as F2FS
      3. The kernel will BUG() on mount (BUGs are explicitly enabled in config)
      
      [    2.233612] F2FS-fs (sda): Found nat_bits in checkpoint
      [    2.248422] ------------[ cut here ]------------
      [    2.248857] kernel BUG at fs/f2fs/node.c:1967!
      [    2.249760] invalid opcode: 0000 [#1] SMP NOPTI
      [    2.250219] Modules linked in:
      [    2.251848] CPU: 0 PID: 944 Comm: mount Not tainted 4.17.0-rc5+ #1
      [    2.252331] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [    2.253305] RIP: 0010:build_free_nids+0x337/0x3f0
      [    2.253672] RSP: 0018:ffffae7fc0857c50 EFLAGS: 00000246
      [    2.254080] RAX: 00000000ffffffff RBX: 0000000000000123 RCX: 0000000000000001
      [    2.254638] RDX: ffff9aa7063d5c00 RSI: 0000000000000122 RDI: ffff9aa705852e00
      [    2.255190] RBP: ffff9aa705852e00 R08: 0000000000000001 R09: ffff9aa7059090c0
      [    2.255719] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9aa705852e00
      [    2.256242] R13: ffff9aa7063ad000 R14: ffff9aa705919000 R15: 0000000000000123
      [    2.256809] FS:  00000000023078c0(0000) GS:ffff9aa707800000(0000) knlGS:0000000000000000
      [    2.258654] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    2.259153] CR2: 00000000005511ae CR3: 0000000005872000 CR4: 00000000000006f0
      [    2.259801] Call Trace:
      [    2.260583]  build_node_manager+0x5cd/0x600
      [    2.260963]  f2fs_fill_super+0x66a/0x17c0
      [    2.261300]  ? f2fs_commit_super+0xe0/0xe0
      [    2.261622]  mount_bdev+0x16e/0x1a0
      [    2.261899]  mount_fs+0x30/0x150
      [    2.262398]  vfs_kern_mount.part.28+0x4f/0xf0
      [    2.262743]  do_mount+0x5d0/0xc60
      [    2.263010]  ? _copy_from_user+0x37/0x60
      [    2.263313]  ? memdup_user+0x39/0x60
      [    2.263692]  ksys_mount+0x7b/0xd0
      [    2.263960]  __x64_sys_mount+0x1c/0x20
      [    2.264268]  do_syscall_64+0x43/0xf0
      [    2.264560]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [    2.265095] RIP: 0033:0x48d31a
      [    2.265502] RSP: 002b:00007ffc6fe60a08 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
      [    2.266089] RAX: ffffffffffffffda RBX: 0000000000008000 RCX: 000000000048d31a
      [    2.266607] RDX: 00007ffc6fe62fa5 RSI: 00007ffc6fe62f9d RDI: 00007ffc6fe62f94
      [    2.267130] RBP: 00000000023078a0 R08: 0000000000000000 R09: 0000000000000000
      [    2.267670] R10: 0000000000008000 R11: 0000000000000246 R12: 0000000000000000
      [    2.268192] R13: 0000000000000000 R14: 00007ffc6fe60c78 R15: 0000000000000000
      [    2.268767] Code: e8 5f c3 ff ff 83 c3 01 41 83 c7 01 81 fb c7 01 00 00 74 48 44 39 7d 04 76 42 48 63 c3 48 8d 04 c0 41 8b 44 06 05 83 f8 ff 75 c1 <0f> 0b 49 8b 45 50 48 8d b8 b0 00 00 00 e8 37 59 69 00 b9 01 00
      [    2.270434] RIP: build_free_nids+0x337/0x3f0 RSP: ffffae7fc0857c50
      [    2.271426] ---[ end trace ab20c06cd3c8fde4 ]---
      
      During loading NAT entries, we will do sanity check, once the entry info
      is corrupted, it will cause BUG_ON directly to protect user data from
      being overwrited.
      
      In this case, it will be better to just return failure on mount() instead
      of panic, so that user can get hint from kmsg and try fsck for recovery
      immediately rather than after an abnormal reboot.
      
      https://bugzilla.kernel.org/show_bug.cgi?id=199769Reported-by: NAnatoly Trosinenko <anatoly.trosinenko@gmail.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e2374015
    • J
      f2fs: indicate shutdown f2fs to allow unmount successfully · 83a3bfdb
      Jaegeuk Kim 提交于
      Once we shutdown f2fs, we have to flush stale pages in order to unmount
      the system. In order to make stable, we need to stop fault injection as well.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      83a3bfdb
  4. 06 6月, 2018 1 次提交
    • D
      vfs: change inode times to use struct timespec64 · 95582b00
      Deepa Dinamani 提交于
      struct timespec is not y2038 safe. Transition vfs to use
      y2038 safe struct timespec64 instead.
      
      The change was made with the help of the following cocinelle
      script. This catches about 80% of the changes.
      All the header file and logic changes are included in the
      first 5 rules. The rest are trivial substitutions.
      I avoid changing any of the function signatures or any other
      filesystem specific data structures to keep the patch simple
      for review.
      
      The script can be a little shorter by combining different cases.
      But, this version was sufficient for my usecase.
      
      virtual patch
      
      @ depends on patch @
      identifier now;
      @@
      - struct timespec
      + struct timespec64
        current_time ( ... )
        {
      - struct timespec now = current_kernel_time();
      + struct timespec64 now = current_kernel_time64();
        ...
      - return timespec_trunc(
      + return timespec64_trunc(
        ... );
        }
      
      @ depends on patch @
      identifier xtime;
      @@
       struct \( iattr \| inode \| kstat \) {
       ...
      -       struct timespec xtime;
      +       struct timespec64 xtime;
       ...
       }
      
      @ depends on patch @
      identifier t;
      @@
       struct inode_operations {
       ...
      int (*update_time) (...,
      -       struct timespec t,
      +       struct timespec64 t,
      ...);
       ...
       }
      
      @ depends on patch @
      identifier t;
      identifier fn_update_time =~ "update_time$";
      @@
       fn_update_time (...,
      - struct timespec *t,
      + struct timespec64 *t,
       ...) { ... }
      
      @ depends on patch @
      identifier t;
      @@
      lease_get_mtime( ... ,
      - struct timespec *t
      + struct timespec64 *t
        ) { ... }
      
      @te depends on patch forall@
      identifier ts;
      local idexpression struct inode *inode_node;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier fn_update_time =~ "update_time$";
      identifier fn;
      expression e, E3;
      local idexpression struct inode *node1;
      local idexpression struct inode *node2;
      local idexpression struct iattr *attr1;
      local idexpression struct iattr *attr2;
      local idexpression struct iattr attr;
      identifier i_xtime1 =~ "^i_[acm]time$";
      identifier i_xtime2 =~ "^i_[acm]time$";
      identifier ia_xtime1 =~ "^ia_[acm]time$";
      identifier ia_xtime2 =~ "^ia_[acm]time$";
      @@
      (
      (
      - struct timespec ts;
      + struct timespec64 ts;
      |
      - struct timespec ts = current_time(inode_node);
      + struct timespec64 ts = current_time(inode_node);
      )
      
      <+... when != ts
      (
      - timespec_equal(&inode_node->i_xtime, &ts)
      + timespec64_equal(&inode_node->i_xtime, &ts)
      |
      - timespec_equal(&ts, &inode_node->i_xtime)
      + timespec64_equal(&ts, &inode_node->i_xtime)
      |
      - timespec_compare(&inode_node->i_xtime, &ts)
      + timespec64_compare(&inode_node->i_xtime, &ts)
      |
      - timespec_compare(&ts, &inode_node->i_xtime)
      + timespec64_compare(&ts, &inode_node->i_xtime)
      |
      ts = current_time(e)
      |
      fn_update_time(..., &ts,...)
      |
      inode_node->i_xtime = ts
      |
      node1->i_xtime = ts
      |
      ts = inode_node->i_xtime
      |
      <+... attr1->ia_xtime ...+> = ts
      |
      ts = attr1->ia_xtime
      |
      ts.tv_sec
      |
      ts.tv_nsec
      |
      btrfs_set_stack_timespec_sec(..., ts.tv_sec)
      |
      btrfs_set_stack_timespec_nsec(..., ts.tv_nsec)
      |
      - ts = timespec64_to_timespec(
      + ts =
      ...
      -)
      |
      - ts = ktime_to_timespec(
      + ts = ktime_to_timespec64(
      ...)
      |
      - ts = E3
      + ts = timespec_to_timespec64(E3)
      |
      - ktime_get_real_ts(&ts)
      + ktime_get_real_ts64(&ts)
      |
      fn(...,
      - ts
      + timespec64_to_timespec(ts)
      ,...)
      )
      ...+>
      (
      <... when != ts
      - return ts;
      + return timespec64_to_timespec(ts);
      ...>
      )
      |
      - timespec_equal(&node1->i_xtime1, &node2->i_xtime2)
      + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2)
      |
      - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2)
      + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2)
      |
      - timespec_compare(&node1->i_xtime1, &node2->i_xtime2)
      + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2)
      |
      node1->i_xtime1 =
      - timespec_trunc(attr1->ia_xtime1,
      + timespec64_trunc(attr1->ia_xtime1,
      ...)
      |
      - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2,
      + attr1->ia_xtime1 =  timespec64_trunc(attr2->ia_xtime2,
      ...)
      |
      - ktime_get_real_ts(&attr1->ia_xtime1)
      + ktime_get_real_ts64(&attr1->ia_xtime1)
      |
      - ktime_get_real_ts(&attr.ia_xtime1)
      + ktime_get_real_ts64(&attr.ia_xtime1)
      )
      
      @ depends on patch @
      struct inode *node;
      struct iattr *attr;
      identifier fn;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      expression e;
      @@
      (
      - fn(node->i_xtime);
      + fn(timespec64_to_timespec(node->i_xtime));
      |
       fn(...,
      - node->i_xtime);
      + timespec64_to_timespec(node->i_xtime));
      |
      - e = fn(attr->ia_xtime);
      + e = fn(timespec64_to_timespec(attr->ia_xtime));
      )
      
      @ depends on patch forall @
      struct inode *node;
      struct iattr *attr;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier fn;
      @@
      {
      + struct timespec ts;
      <+...
      (
      + ts = timespec64_to_timespec(node->i_xtime);
      fn (...,
      - &node->i_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      fn (...,
      - &attr->ia_xtime,
      + &ts,
      ...);
      )
      ...+>
      }
      
      @ depends on patch forall @
      struct inode *node;
      struct iattr *attr;
      struct kstat *stat;
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier i_xtime =~ "^i_[acm]time$";
      identifier xtime =~ "^[acm]time$";
      identifier fn, ret;
      @@
      {
      + struct timespec ts;
      <+...
      (
      + ts = timespec64_to_timespec(node->i_xtime);
      ret = fn (...,
      - &node->i_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(node->i_xtime);
      ret = fn (...,
      - &node->i_xtime);
      + &ts);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      ret = fn (...,
      - &attr->ia_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      ret = fn (...,
      - &attr->ia_xtime);
      + &ts);
      |
      + ts = timespec64_to_timespec(stat->xtime);
      ret = fn (...,
      - &stat->xtime);
      + &ts);
      )
      ...+>
      }
      
      @ depends on patch @
      struct inode *node;
      struct inode *node2;
      identifier i_xtime1 =~ "^i_[acm]time$";
      identifier i_xtime2 =~ "^i_[acm]time$";
      identifier i_xtime3 =~ "^i_[acm]time$";
      struct iattr *attrp;
      struct iattr *attrp2;
      struct iattr attr ;
      identifier ia_xtime1 =~ "^ia_[acm]time$";
      identifier ia_xtime2 =~ "^ia_[acm]time$";
      struct kstat *stat;
      struct kstat stat1;
      struct timespec64 ts;
      identifier xtime =~ "^[acmb]time$";
      expression e;
      @@
      (
      ( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1  ;
      |
       node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \);
      |
       node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
      |
       node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
      |
       stat->xtime = node2->i_xtime1;
      |
       stat1.xtime = node2->i_xtime1;
      |
      ( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1  ;
      |
      ( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2;
      |
      - e = node->i_xtime1;
      + e = timespec64_to_timespec( node->i_xtime1 );
      |
      - e = attrp->ia_xtime1;
      + e = timespec64_to_timespec( attrp->ia_xtime1 );
      |
      node->i_xtime1 = current_time(...);
      |
       node->i_xtime2 = node->i_xtime1 = node->i_xtime3 =
      - e;
      + timespec_to_timespec64(e);
      |
       node->i_xtime1 = node->i_xtime3 =
      - e;
      + timespec_to_timespec64(e);
      |
      - node->i_xtime1 = e;
      + node->i_xtime1 = timespec_to_timespec64(e);
      )
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Cc: <anton@tuxera.com>
      Cc: <balbi@kernel.org>
      Cc: <bfields@fieldses.org>
      Cc: <darrick.wong@oracle.com>
      Cc: <dhowells@redhat.com>
      Cc: <dsterba@suse.com>
      Cc: <dwmw2@infradead.org>
      Cc: <hch@lst.de>
      Cc: <hirofumi@mail.parknet.co.jp>
      Cc: <hubcap@omnibond.com>
      Cc: <jack@suse.com>
      Cc: <jaegeuk@kernel.org>
      Cc: <jaharkes@cs.cmu.edu>
      Cc: <jslaby@suse.com>
      Cc: <keescook@chromium.org>
      Cc: <mark@fasheh.com>
      Cc: <miklos@szeredi.hu>
      Cc: <nico@linaro.org>
      Cc: <reiserfs-devel@vger.kernel.org>
      Cc: <richard@nod.at>
      Cc: <sage@redhat.com>
      Cc: <sfrench@samba.org>
      Cc: <swhiteho@redhat.com>
      Cc: <tj@kernel.org>
      Cc: <trond.myklebust@primarydata.com>
      Cc: <tytso@mit.edu>
      Cc: <viro@zeniv.linux.org.uk>
      95582b00
  5. 05 6月, 2018 1 次提交
    • C
      f2fs: let sync node IO interrupt async one · c29fd0c0
      Chao Yu 提交于
      Although mixed sync/async IOs can have continuous LBA, as they have
      different IO priority, block IO scheduler will add them into different
      queues and commit them separately, result in splited IOs which causes
      wrose performance.
      
      This patch gives high priority to synchronous IO of nodes, means that
      once synchronous flow starts, it can interrupt asynchronous writeback
      flow of system flusher, so more big IOs can be expected.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c29fd0c0
  6. 01 6月, 2018 14 次提交
    • C
      f2fs: clean up symbol namespace · 4d57b86d
      Chao Yu 提交于
      As Ted reported:
      
      "Hi, I was looking at f2fs's sources recently, and I noticed that there
      is a very large number of non-static symbols which don't have a f2fs
      prefix.  There's well over a hundred (see attached below).
      
      As one example, in fs/f2fs/dir.c there is:
      
      unsigned char get_de_type(struct f2fs_dir_entry *de)
      
      This function is clearly only useful for f2fs, but it has a generic
      name.  This means that if any other file system tries to have the same
      symbol name, there will be a symbol conflict and the kernel would not
      successfully build.  It also means that when someone is looking f2fs
      sources, it's not at all obvious whether a function such as
      read_data_page(), invalidate_blocks(), is a generic kernel function
      found in the fs, mm, or block layers, or a f2fs specific function.
      
      You might want to fix this at some point.  Hopefully Kent's bcachefs
      isn't similarly using genericly named functions, since that might
      cause conflicts with f2fs's functions --- but just as this would be a
      problem that we would rightly insist that Kent fix, this is something
      that we should have rightly insisted that f2fs should have fixed
      before it was integrated into the mainline kernel.
      
      acquire_orphan_inode
      add_ino_entry
      add_orphan_inode
      allocate_data_block
      allocate_new_segments
      alloc_nid
      alloc_nid_done
      alloc_nid_failed
      available_free_memory
      ...."
      
      This patch adds "f2fs_" prefix for all non-static symbols in order to:
      a) avoid conflict with other kernel generic symbols;
      b) to indicate the function is f2fs specific one instead of generic
      one;
      Reported-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      4d57b86d
    • C
      f2fs: make set_de_type() static · 2e79d951
      Chao Yu 提交于
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2e79d951
    • C
      f2fs: make __f2fs_write_data_pages() static · fc99fe27
      Chao Yu 提交于
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fc99fe27
    • C
      f2fs: fix to let caller retry allocating block address · fe16efe6
      Chao Yu 提交于
      Configure io_bits with 2 and enable LFS mode, generic/013 reports below dmesg:
      
      BUG: unable to handle kernel NULL pointer dereference at 00000104
      *pdpt = 0000000029b7b001 *pde = 0000000000000000
      Oops: 0002 [#1] PREEMPT SMP
      Modules linked in: crc32_generic zram f2fs(O) rfcomm bnep bluetooth ecdh_generic snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq pcbc joydev snd_seq_device aesni_intel snd_timer aes_i586 snd crypto_simd cryptd soundcore i2c_piix4 serio_raw mac_hid video parport_pc ppdev lp parport hid_generic psmouse usbhid hid e1000
      CPU: 0 PID: 11161 Comm: fsstress Tainted: G           O      4.17.0-rc2 #38
      Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      EIP: f2fs_submit_page_write+0x28d/0x550 [f2fs]
      EFLAGS: 00010206 CPU: 0
      EAX: e863dcd8 EBX: 00000000 ECX: 00000100 EDX: 00000200
      ESI: e863dcf4 EDI: f6f82768 EBP: e863dbb0 ESP: e863db74
       DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      CR0: 80050033 CR2: 00000104 CR3: 29a62020 CR4: 000406f0
      Call Trace:
       do_write_page+0x6f/0xc0 [f2fs]
       write_data_page+0x4a/0xd0 [f2fs]
       do_write_data_page+0x327/0x630 [f2fs]
       __write_data_page+0x34b/0x820 [f2fs]
       __f2fs_write_data_pages+0x42d/0x8c0 [f2fs]
       f2fs_write_data_pages+0x27/0x30 [f2fs]
       do_writepages+0x1a/0x70
       __filemap_fdatawrite_range+0x94/0xd0
       filemap_write_and_wait_range+0x3d/0xa0
       __generic_file_write_iter+0x11a/0x1f0
       f2fs_file_write_iter+0xdd/0x3b0 [f2fs]
       __vfs_write+0xd2/0x150
       vfs_write+0x9b/0x190
       ksys_write+0x45/0x90
       sys_write+0x16/0x20
       do_fast_syscall_32+0xaa/0x22c
       entry_SYSENTER_32+0x4c/0x7b
      EIP: 0xb7fc8c51
      EFLAGS: 00000246 CPU: 0
      EAX: ffffffda EBX: 00000003 ECX: 09cde000 EDX: 00001000
      ESI: 00000003 EDI: 00001000 EBP: 00000000 ESP: bfbded38
       DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
      Code: e8 f9 77 34 c9 8b 45 e0 8b 80 b8 00 00 00 39 45 d8 0f 84 bb 02 00 00 8b 45 e0 8b 80 b8 00 00 00 8d 50 d8 8b 08 89 55 f0 8b 50 04 <89> 51 04 89 0a c7 00 00 01 00 00 c7 40 04 00 02 00 00 8b 45 dc
      EIP: f2fs_submit_page_write+0x28d/0x550 [f2fs] SS:ESP: 0068:e863db74
      CR2: 0000000000000104
      ---[ end trace 4cac79c0d1305ee6 ]---
      
      allocate_data_block will submit all sequential pending IOs sorted by a
      FIFO list, If we failed to submit other user's IO due to unaligned write,
      we will retry to allocate new block address for current IO, then it will
      initialize fio.list again, if fio was in the list before, it can break
      FIFO list, result in above panic.
      
      Thread A			Thread B
      - do_write_page
       - allocate_data_block
        - list_add_tail
        : fioA cached in FIFO list.
      				- do_write_page
      				 - allocate_data_block
      				  - list_add_tail
      				  : fioB cached in FIFO list.
      				 - f2fs_submit_page_write
      				 : fail to submit IO
      				 - allocate_data_block
      				  - INIT_LIST_HEAD
       - f2fs_submit_page_write
        - list_del  <-- NULL pointer dereference
      
      This patch adds fio.retry parameter to indicate failure status for each
      IO, and avoid bailing out if there is still pending IO in FIFO list for
      fixing.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fe16efe6
    • C
      f2fs: clean up with clear_radix_tree_dirty_tag · aec2f729
      Chao Yu 提交于
      Introduce clear_radix_tree_dirty_tag to include common codes for cleanup.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      aec2f729
    • Y
      f2fs: let discard thread wait a little longer if dev is busy · f9d1dced
      Yunlei He 提交于
      This patch modify discard thread wait policy as below:
      	issued       io_interrupted     wait time(ms)
      1.        8                 0               50
      2.      (0,8)               1               50
      3.        0                 1              500 (dev is busy)
      4.        0                 0            60000 (no candidates)
      Signed-off-by: NYunlei He <heyunlei@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f9d1dced
    • C
      f2fs: avoid stucking GC due to atomic write · 2ef79ecb
      Chao Yu 提交于
      f2fs doesn't allow abuse on atomic write class interface, so except
      limiting in-mem pages' total memory usage capacity, we need to limit
      atomic-write usage as well when filesystem is seriously fragmented,
      otherwise we may run into infinite loop during foreground GC because
      target blocks in victim segment are belong to atomic opened file for
      long time.
      
      Now, we will detect failure due to atomic write in foreground GC, if
      the count exceeds threshold, we will drop all atomic written data in
      cache, by this, I expect it can keep our system running safely to
      prevent Dos attack.
      
      In addition, his patch adds to show GC skip information in debugfs,
      now it just shows count of skipped caused by atomic write.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2ef79ecb
    • J
      f2fs: introduce sbi->gc_mode to determine the policy · 5b0e9539
      Jaegeuk Kim 提交于
      This is to avoid sbi->gc_thread pointer access.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5b0e9539
    • C
      f2fs: keep migration IO order in LFS mode · 107a805d
      Chao Yu 提交于
      For non-migration IO, we will keep order of data/node blocks' submitting
      as allocation sequence by sorting IOs in per log io_list list, but for
      migration IO, it could be out-of-order.
      
      In LFS mode, we should keep all IOs including migration IO be ordered,
      so that this patch fixes to add an additional lock to keep submitting
      order.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NYunlong Song <yunlong.song@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      107a805d
    • C
      f2fs: clean up with is_valid_blkaddr() · 7b525dd0
      Chao Yu 提交于
      - rename is_valid_blkaddr() to is_valid_meta_blkaddr() for readability.
      - introduce is_valid_blkaddr() for cleanup.
      
      No logic change in this patch.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7b525dd0
    • C
      Revert "f2fs: add ovp valid_blocks check for bg gc victim to fg_gc" · 299254d8
      Chao Yu 提交于
      For extreme case:
      10 section, op = 10%, no_fggc_threshold = 90%
      All section usage: 85% 85% 85% 85% 90% 90% 95% 95% 95% 95%
      
      During foreground GC, if we skip select dirty section whose usage
      is larger than no_fggc_threshold, we can only recycle 80% invalid
      space from four 85% usage sections and two 90% usage sections,
      result in encountering out-of-space issue.
      
      This reverts commit e93b9865 to
      fix this issue, besides, we keep the logic that we scan all dirty
      section when searching a victim, so that GC can select victim with
      least valid blocks.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      299254d8
    • C
      f2fs: rename dio_rwsem to i_gc_rwsem · b2532c69
      Chao Yu 提交于
      RW semphore dio_rwsem in struct f2fs_inode_info is introduced to avoid
      race between dio and data gc, but now, it is more wildly used to avoid
      foreground operation vs data gc. So rename it to i_gc_rwsem to improve
      its readability.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b2532c69
    • J
      f2fs: give message and set need_fsck given broken node id · a4f843bd
      Jaegeuk Kim 提交于
      syzbot hit the following crash on upstream commit
      83beed7b (Fri Apr 20 17:56:32 2018 +0000)
      Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal
      syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=d154ec99402c6f628887
      
      C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5414336294027264
      syzkaller reproducer: https://syzkaller.appspot.com/x/repro.syz?id=5471683234234368
      Raw console output: https://syzkaller.appspot.com/x/log.txt?id=5436660795834368
      Kernel config: https://syzkaller.appspot.com/x/.config?id=1808800213120130118
      compiler: gcc (GCC) 8.0.1 20180413 (experimental)
      
      IMPORTANT: if you fix the bug, please add the following tag to the commit:
      Reported-by: syzbot+d154ec99402c6f628887@syzkaller.appspotmail.com
      It will help syzbot understand when the bug is fixed. See footer for details.
      If you forward the report, please keep this part and the footer.
      
      F2FS-fs (loop0): Magic Mismatch, valid(0xf2f52010) - read(0x0)
      F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock
      F2FS-fs (loop0): invalid crc value
      ------------[ cut here ]------------
      kernel BUG at fs/f2fs/node.c:1185!
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 1 PID: 4549 Comm: syzkaller704305 Not tainted 4.17.0-rc1+ #10
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:__get_node_page+0xb68/0x16e0 fs/f2fs/node.c:1185
      RSP: 0018:ffff8801d960e820 EFLAGS: 00010293
      RAX: ffff8801d88205c0 RBX: 0000000000000003 RCX: ffffffff82f6cc06
      RDX: 0000000000000000 RSI: ffffffff82f6d5e8 RDI: 0000000000000004
      RBP: ffff8801d960ec30 R08: ffff8801d88205c0 R09: ffffed003b5e46c2
      R10: 0000000000000003 R11: 0000000000000003 R12: ffff8801a86e00c0
      R13: 0000000000000001 R14: ffff8801a86e0530 R15: ffff8801d9745240
      FS:  000000000072c880(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f3d403209b8 CR3: 00000001d8f3f000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       get_node_page fs/f2fs/node.c:1237 [inline]
       truncate_xattr_node+0x152/0x2e0 fs/f2fs/node.c:1014
       remove_inode_page+0x200/0xaf0 fs/f2fs/node.c:1039
       f2fs_evict_inode+0xe86/0x1710 fs/f2fs/inode.c:547
       evict+0x4a6/0x960 fs/inode.c:557
       iput_final fs/inode.c:1519 [inline]
       iput+0x62d/0xa80 fs/inode.c:1545
       f2fs_fill_super+0x5f4e/0x7bf0 fs/f2fs/super.c:2849
       mount_bdev+0x30c/0x3e0 fs/super.c:1164
       f2fs_mount+0x34/0x40 fs/f2fs/super.c:3020
       mount_fs+0xae/0x328 fs/super.c:1267
       vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037
       vfs_kern_mount fs/namespace.c:1027 [inline]
       do_new_mount fs/namespace.c:2518 [inline]
       do_mount+0x564/0x3070 fs/namespace.c:2848
       ksys_mount+0x12d/0x140 fs/namespace.c:3064
       __do_sys_mount fs/namespace.c:3078 [inline]
       __se_sys_mount fs/namespace.c:3075 [inline]
       __x64_sys_mount+0xbe/0x150 fs/namespace.c:3075
       do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x443dea
      RSP: 002b:00007ffcc7882368 EFLAGS: 00000297 ORIG_RAX: 00000000000000a5
      RAX: ffffffffffffffda RBX: 0000000020000c00 RCX: 0000000000443dea
      RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007ffcc7882370
      RBP: 0000000000000003 R08: 0000000020016a00 R09: 000000000000000a
      R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000004
      R13: 0000000000402ce0 R14: 0000000000000000 R15: 0000000000000000
      RIP: __get_node_page+0xb68/0x16e0 fs/f2fs/node.c:1185 RSP: ffff8801d960e820
      ---[ end trace 4edbeb71f002bb76 ]---
      
      Reported-and-tested-by: syzbot+d154ec99402c6f628887@syzkaller.appspotmail.com
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      a4f843bd
    • C
      f2fs: introduce private inode status mapping · 59c84408
      Chao Yu 提交于
      Previously, we use generic FS_*_FL defined by vfs to indicate inode status
      for each bit of i_flags, so f2fs's flag status definition is tied to vfs'
      one, it will be hard for f2fs to reuse bits f2fs never used to indicate
      new status..
      
      In order to solve this issue, we introduce private inode status mapping,
      Note, for these bits have already been persisted into disk, we should
      never change their definition, for other ones, we can remap them for
      later new coming status.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      59c84408
  7. 30 5月, 2018 4 次提交
  8. 03 5月, 2018 3 次提交
    • J
      f2fs: check cap_resource only for data blocks · a90a0884
      Jaegeuk Kim 提交于
      This patch changes the rule to check cap_resource for data blocks, not inode
      or node blocks in order to avoid selinux denial.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      a90a0884
    • J
      Revert "f2fs: introduce f2fs_set_page_dirty_nobuffer" · b87078ad
      Jaegeuk Kim 提交于
      This patch reverts copied f2fs_set_page_dirty_nobuffer to use generic function
      for stability.
      
      This reverts commit fe76b796.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b87078ad
    • E
      f2fs: refactor read path to allow multiple postprocessing steps · 6dbb1796
      Eric Biggers 提交于
      Currently f2fs's ->readpage() and ->readpages() assume that either the
      data undergoes no postprocessing, or decryption only.  But with
      fs-verity, there will be an additional authenticity verification step,
      and it may be needed either by itself, or combined with decryption.
      
      To support this, store a 'struct bio_post_read_ctx' in ->bi_private
      which contains a work struct, a bitmask of postprocessing steps that are
      enabled, and an indicator of the current step.  The bio completion
      routine, if there was no I/O error, enqueues the first postprocessing
      step.  When that completes, it continues to the next step.  Pages that
      fail any postprocessing step have PageError set.  Once all steps have
      completed, pages without PageError set are set Uptodate, and all pages
      are unlocked.
      
      Also replace f2fs_encrypted_file() with a new function
      f2fs_post_read_required() in places like direct I/O and garbage
      collection that really should be testing whether the file needs special
      I/O processing, not whether it is encrypted specifically.
      
      This may also be useful for other future f2fs features such as
      compression.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      6dbb1796
  9. 04 4月, 2018 1 次提交
  10. 03 4月, 2018 2 次提交
  11. 29 3月, 2018 1 次提交
    • E
      f2fs: reserve bits for fs-verity · 53fedcc0
      Eric Biggers 提交于
      Reserve an F2FS feature flag and inode flag for fs-verity.  This is an
      in-development feature that is planned be discussed at LSF/MM 2018 [1].
      It will provide file-based integrity and authenticity for read-only
      files.  Most code will be in a filesystem-independent module, with
      smaller changes needed to individual filesystems that opt-in to
      supporting the feature.  An early prototype supporting F2FS is available
      [2].  Reserving the F2FS on-disk bits for fs-verity will prevent users
      of the prototype from conflicting with other new F2FS features.
      
      Note that we're reserving the inode flag in f2fs_inode.i_advise, which
      isn't really appropriate since it's not a hint or advice.  But
      ->i_advise is already being used to hold the 'encrypt' flag; and F2FS's
      ->i_flags uses the generic FS_* values, so it seems ->i_flags can't be
      used for an F2FS-specific flag without additional work to remove the
      assumption that ->i_flags uses the generic flags namespace.
      
      [1] https://marc.info/?l=linux-fsdevel&m=151690752225644
      [2] https://git.kernel.org/pub/scm/linux/kernel/git/mhalcrow/linux.git/log/?h=fs-verity-devSigned-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      53fedcc0
  12. 19 3月, 2018 1 次提交
  13. 17 3月, 2018 4 次提交