1. 01 6月, 2018 23 次提交
    • C
      f2fs: fix to let caller retry allocating block address · fe16efe6
      Chao Yu 提交于
      Configure io_bits with 2 and enable LFS mode, generic/013 reports below dmesg:
      
      BUG: unable to handle kernel NULL pointer dereference at 00000104
      *pdpt = 0000000029b7b001 *pde = 0000000000000000
      Oops: 0002 [#1] PREEMPT SMP
      Modules linked in: crc32_generic zram f2fs(O) rfcomm bnep bluetooth ecdh_generic snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq pcbc joydev snd_seq_device aesni_intel snd_timer aes_i586 snd crypto_simd cryptd soundcore i2c_piix4 serio_raw mac_hid video parport_pc ppdev lp parport hid_generic psmouse usbhid hid e1000
      CPU: 0 PID: 11161 Comm: fsstress Tainted: G           O      4.17.0-rc2 #38
      Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      EIP: f2fs_submit_page_write+0x28d/0x550 [f2fs]
      EFLAGS: 00010206 CPU: 0
      EAX: e863dcd8 EBX: 00000000 ECX: 00000100 EDX: 00000200
      ESI: e863dcf4 EDI: f6f82768 EBP: e863dbb0 ESP: e863db74
       DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      CR0: 80050033 CR2: 00000104 CR3: 29a62020 CR4: 000406f0
      Call Trace:
       do_write_page+0x6f/0xc0 [f2fs]
       write_data_page+0x4a/0xd0 [f2fs]
       do_write_data_page+0x327/0x630 [f2fs]
       __write_data_page+0x34b/0x820 [f2fs]
       __f2fs_write_data_pages+0x42d/0x8c0 [f2fs]
       f2fs_write_data_pages+0x27/0x30 [f2fs]
       do_writepages+0x1a/0x70
       __filemap_fdatawrite_range+0x94/0xd0
       filemap_write_and_wait_range+0x3d/0xa0
       __generic_file_write_iter+0x11a/0x1f0
       f2fs_file_write_iter+0xdd/0x3b0 [f2fs]
       __vfs_write+0xd2/0x150
       vfs_write+0x9b/0x190
       ksys_write+0x45/0x90
       sys_write+0x16/0x20
       do_fast_syscall_32+0xaa/0x22c
       entry_SYSENTER_32+0x4c/0x7b
      EIP: 0xb7fc8c51
      EFLAGS: 00000246 CPU: 0
      EAX: ffffffda EBX: 00000003 ECX: 09cde000 EDX: 00001000
      ESI: 00000003 EDI: 00001000 EBP: 00000000 ESP: bfbded38
       DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
      Code: e8 f9 77 34 c9 8b 45 e0 8b 80 b8 00 00 00 39 45 d8 0f 84 bb 02 00 00 8b 45 e0 8b 80 b8 00 00 00 8d 50 d8 8b 08 89 55 f0 8b 50 04 <89> 51 04 89 0a c7 00 00 01 00 00 c7 40 04 00 02 00 00 8b 45 dc
      EIP: f2fs_submit_page_write+0x28d/0x550 [f2fs] SS:ESP: 0068:e863db74
      CR2: 0000000000000104
      ---[ end trace 4cac79c0d1305ee6 ]---
      
      allocate_data_block will submit all sequential pending IOs sorted by a
      FIFO list, If we failed to submit other user's IO due to unaligned write,
      we will retry to allocate new block address for current IO, then it will
      initialize fio.list again, if fio was in the list before, it can break
      FIFO list, result in above panic.
      
      Thread A			Thread B
      - do_write_page
       - allocate_data_block
        - list_add_tail
        : fioA cached in FIFO list.
      				- do_write_page
      				 - allocate_data_block
      				  - list_add_tail
      				  : fioB cached in FIFO list.
      				 - f2fs_submit_page_write
      				 : fail to submit IO
      				 - allocate_data_block
      				  - INIT_LIST_HEAD
       - f2fs_submit_page_write
        - list_del  <-- NULL pointer dereference
      
      This patch adds fio.retry parameter to indicate failure status for each
      IO, and avoid bailing out if there is still pending IO in FIFO list for
      fixing.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fe16efe6
    • C
      f2fs: fix to don't trigger writeback during recovery · 64c74a7a
      Chao Yu 提交于
      - f2fs_fill_super
       - recover_fsync_data
        - recover_data
         - del_fsync_inode
          - iput
           - iput_final
            - write_inode_now
             - f2fs_write_inode
              - f2fs_balance_fs
               - f2fs_balance_fs_bg
                - sync_dirty_inodes
      
      With data_flush mount option, during recovery, in order to avoid entering
      above writeback flow, let's detect recovery status and do skip in
      f2fs_balance_fs_bg.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NYunlei He <heyunlei@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      64c74a7a
    • S
      f2fs: clear discard_wake earlier · 35a9a766
      Sheng Yong 提交于
      If SBI_NEED_FSCK is set, discard_wake will never be cleared. As a
      result, the condition of wait_event_interruptible_timeout() is always
      true, which gets discard thread run too frequently.
      Signed-off-by: NSheng Yong <shengyong1@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      35a9a766
    • Y
      f2fs: let discard thread wait a little longer if dev is busy · f9d1dced
      Yunlei He 提交于
      This patch modify discard thread wait policy as below:
      	issued       io_interrupted     wait time(ms)
      1.        8                 0               50
      2.      (0,8)               1               50
      3.        0                 1              500 (dev is busy)
      4.        0                 0            60000 (no candidates)
      Signed-off-by: NYunlei He <heyunlei@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f9d1dced
    • C
      f2fs: avoid stucking GC due to atomic write · 2ef79ecb
      Chao Yu 提交于
      f2fs doesn't allow abuse on atomic write class interface, so except
      limiting in-mem pages' total memory usage capacity, we need to limit
      atomic-write usage as well when filesystem is seriously fragmented,
      otherwise we may run into infinite loop during foreground GC because
      target blocks in victim segment are belong to atomic opened file for
      long time.
      
      Now, we will detect failure due to atomic write in foreground GC, if
      the count exceeds threshold, we will drop all atomic written data in
      cache, by this, I expect it can keep our system running safely to
      prevent Dos attack.
      
      In addition, his patch adds to show GC skip information in debugfs,
      now it just shows count of skipped caused by atomic write.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2ef79ecb
    • J
      f2fs: introduce sbi->gc_mode to determine the policy · 5b0e9539
      Jaegeuk Kim 提交于
      This is to avoid sbi->gc_thread pointer access.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5b0e9539
    • C
      f2fs: keep migration IO order in LFS mode · 107a805d
      Chao Yu 提交于
      For non-migration IO, we will keep order of data/node blocks' submitting
      as allocation sequence by sorting IOs in per log io_list list, but for
      migration IO, it could be out-of-order.
      
      In LFS mode, we should keep all IOs including migration IO be ordered,
      so that this patch fixes to add an additional lock to keep submitting
      order.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NYunlong Song <yunlong.song@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      107a805d
    • C
      f2fs: fix to wait page writeback during revoking atomic write · e5e5732d
      Chao Yu 提交于
      After revoking atomic write, related LBA can be reused by others, so we
      need to wait page writeback before reusing the LBA, in order to avoid
      interference between old atomic written in-flight IO and new IO.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e5e5732d
    • C
      f2fs: clean up with is_valid_blkaddr() · 7b525dd0
      Chao Yu 提交于
      - rename is_valid_blkaddr() to is_valid_meta_blkaddr() for readability.
      - introduce is_valid_blkaddr() for cleanup.
      
      No logic change in this patch.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7b525dd0
    • C
      f2fs: fix to initialize min_mtime with ULLONG_MAX · 5ad25442
      Chao Yu 提交于
      Since sit_i.min_mtime's type is unsigned long long, so we should
      initialize it with max value of the type ULLONG_MAX instead of
      LLONG_MAX.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5ad25442
    • C
      f2fs: treat volatile file's data as hot one · b4c3ca8b
      Chao Yu 提交于
      Volatile file's data will be updated oftenly, so it'd better to place
      its data into hot data segment.
      
      In addition, for atomic file, we change to check FI_ATOMIC_FILE instead
      of FI_HOT_DATA to make code readability better.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b4c3ca8b
    • C
      f2fs: introduce release_discard_addr() for cleanup · af8ff65b
      Chao Yu 提交于
      Introduce release_discard_addr() to include common codes for cleanup.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Fengguang Wu: declare static function, reported by kbuild test robot]
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      af8ff65b
    • C
      f2fs: fix potential overflow · a9af3fdc
      Chao Yu 提交于
      In build_sit_entries(), if valid_blocks in SIT block is smaller than
      valid_blocks in journal, for below calculation:
      
      sbi->discard_blks += old_valid_blocks - se->valid_blocks;
      
      There will be two times potential overflow:
      - old_valid_blocks - se->valid_blocks will overflow, and be a very
      large number.
      - sbi->discard_blks += result will overflow again, comes out a correct
      result accidently.
      
      Anyway, it should be fixed.
      
      Fixes: d600af23 ("f2fs: avoid unneeded loop in build_sit_entries")
      Fixes: 1f43e2ad ("f2fs: introduce CP_TRIMMED_FLAG to avoid unneeded discard")
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      a9af3fdc
    • J
      f2fs: sanity check for total valid node blocks · 8a29c126
      Jaegeuk Kim 提交于
      This patch enhances sanity check for SIT entries.
      
      syzbot hit the following crash on upstream commit
      83beed7b (Fri Apr 20 17:56:32 2018 +0000)
      Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal
      syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=bf9253040425feb155ad
      
      syzkaller reproducer: https://syzkaller.appspot.com/x/repro.syz?id=5692130282438656
      Raw console output: https://syzkaller.appspot.com/x/log.txt?id=5095924598571008
      Kernel config: https://syzkaller.appspot.com/x/.config?id=1808800213120130118
      compiler: gcc (GCC) 8.0.1 20180413 (experimental)
      
      IMPORTANT: if you fix the bug, please add the following tag to the commit:
      Reported-by: syzbot+bf9253040425feb155ad@syzkaller.appspotmail.com
      It will help syzbot understand when the bug is fixed. See footer for details.
      If you forward the report, please keep this part and the footer.
      
      F2FS-fs (loop0): invalid crc value
      F2FS-fs (loop0): Try to recover 1th superblock, ret: 0
      F2FS-fs (loop0): Mounted with checkpoint version = d
      F2FS-fs (loop0): Bitmap was wrongly cleared, blk:9740
      ------------[ cut here ]------------
      kernel BUG at fs/f2fs/segment.c:1884!
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 1 PID: 4508 Comm: syz-executor0 Not tainted 4.17.0-rc1+ #10
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:update_sit_entry+0x1215/0x1590 fs/f2fs/segment.c:1882
      RSP: 0018:ffff8801af526708 EFLAGS: 00010282
      RAX: ffffed0035ea4cc0 RBX: ffff8801ad454f90 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffffffff82eeb87e RDI: ffffed0035ea4cb6
      RBP: ffff8801af526760 R08: ffff8801ad4a2480 R09: ffffed003b5e4f90
      R10: ffffed003b5e4f90 R11: ffff8801daf27c87 R12: ffff8801adb8d380
      R13: 0000000000000001 R14: 0000000000000008 R15: 00000000ffffffff
      FS:  00000000014af940(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f06bc223000 CR3: 00000001adb02000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       allocate_data_block+0x66f/0x2050 fs/f2fs/segment.c:2663
       do_write_page+0x105/0x1b0 fs/f2fs/segment.c:2727
       write_node_page+0x129/0x350 fs/f2fs/segment.c:2770
       __write_node_page+0x7da/0x1370 fs/f2fs/node.c:1398
       sync_node_pages+0x18cf/0x1eb0 fs/f2fs/node.c:1652
       block_operations+0x429/0xa60 fs/f2fs/checkpoint.c:1088
       write_checkpoint+0x3ba/0x5380 fs/f2fs/checkpoint.c:1405
       f2fs_sync_fs+0x2fb/0x6a0 fs/f2fs/super.c:1077
       __sync_filesystem fs/sync.c:39 [inline]
       sync_filesystem+0x265/0x310 fs/sync.c:67
       generic_shutdown_super+0xd7/0x520 fs/super.c:429
       kill_block_super+0xa4/0x100 fs/super.c:1191
       kill_f2fs_super+0x9f/0xd0 fs/f2fs/super.c:3030
       deactivate_locked_super+0x97/0x100 fs/super.c:316
       deactivate_super+0x188/0x1b0 fs/super.c:347
       cleanup_mnt+0xbf/0x160 fs/namespace.c:1174
       __cleanup_mnt+0x16/0x20 fs/namespace.c:1181
       task_work_run+0x1e4/0x290 kernel/task_work.c:113
       tracehook_notify_resume include/linux/tracehook.h:191 [inline]
       exit_to_usermode_loop+0x2bd/0x310 arch/x86/entry/common.c:166
       prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
       syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
       do_syscall_64+0x6ac/0x800 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x457d97
      RSP: 002b:00007ffd46f9c8e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000457d97
      RDX: 00000000014b09a3 RSI: 0000000000000002 RDI: 00007ffd46f9da50
      RBP: 00007ffd46f9da50 R08: 0000000000000000 R09: 0000000000000009
      R10: 0000000000000005 R11: 0000000000000246 R12: 00000000014b0940
      R13: 0000000000000000 R14: 0000000000000002 R15: 000000000000658e
      RIP: update_sit_entry+0x1215/0x1590 fs/f2fs/segment.c:1882 RSP: ffff8801af526708
      ---[ end trace f498328bb02610a2 ]---
      
      Reported-and-tested-by: syzbot+bf9253040425feb155ad@syzkaller.appspotmail.com
      Reported-and-tested-by: syzbot+7d6d31d3bc702f566ce3@syzkaller.appspotmail.com
      Reported-and-tested-by: syzbot+0a725420475916460f12@syzkaller.appspotmail.com
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      8a29c126
    • J
      f2fs: sanity check on sit entry · b2ca374f
      Jaegeuk Kim 提交于
      syzbot hit the following crash on upstream commit
      87ef1202 (Wed Apr 18 19:48:17 2018 +0000)
      Merge tag 'ceph-for-4.17-rc2' of git://github.com/ceph/ceph-client
      syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=83699adeb2d13579c31e
      
      C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5805208181407744
      syzkaller reproducer: https://syzkaller.appspot.com/x/repro.syz?id=6005073343676416
      Raw console output: https://syzkaller.appspot.com/x/log.txt?id=6555047731134464
      Kernel config: https://syzkaller.appspot.com/x/.config?id=1808800213120130118
      compiler: gcc (GCC) 8.0.1 20180413 (experimental)
      
      IMPORTANT: if you fix the bug, please add the following tag to the commit:
      Reported-by: syzbot+83699adeb2d13579c31e@syzkaller.appspotmail.com
      It will help syzbot understand when the bug is fixed. See footer for details.
      If you forward the report, please keep this part and the footer.
      
      F2FS-fs (loop0): Magic Mismatch, valid(0xf2f52010) - read(0x0)
      F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock
      F2FS-fs (loop0): invalid crc value
      BUG: unable to handle kernel paging request at ffffed006b2a50c0
      PGD 21ffee067 P4D 21ffee067 PUD 21fbeb067 PMD 0
      Oops: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 0 PID: 4514 Comm: syzkaller989480 Not tainted 4.17.0-rc1+ #8
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:build_sit_entries fs/f2fs/segment.c:3653 [inline]
      RIP: 0010:build_segment_manager+0x7ef7/0xbf70 fs/f2fs/segment.c:3852
      RSP: 0018:ffff8801b102e5b0 EFLAGS: 00010a06
      RAX: 1ffff1006b2a50c0 RBX: 0000000000000004 RCX: 0000000000000001
      RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8801ac74243e
      RBP: ffff8801b102f410 R08: ffff8801acbd46c0 R09: fffffbfff14d9af8
      R10: fffffbfff14d9af8 R11: ffff8801acbd46c0 R12: ffff8801ac742a80
      R13: ffff8801d9519100 R14: dffffc0000000000 R15: ffff880359528600
      FS:  0000000001e04880(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffed006b2a50c0 CR3: 00000001ac6ac000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       f2fs_fill_super+0x4095/0x7bf0 fs/f2fs/super.c:2803
       mount_bdev+0x30c/0x3e0 fs/super.c:1165
       f2fs_mount+0x34/0x40 fs/f2fs/super.c:3020
       mount_fs+0xae/0x328 fs/super.c:1268
       vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037
       vfs_kern_mount fs/namespace.c:1027 [inline]
       do_new_mount fs/namespace.c:2517 [inline]
       do_mount+0x564/0x3070 fs/namespace.c:2847
       ksys_mount+0x12d/0x140 fs/namespace.c:3063
       __do_sys_mount fs/namespace.c:3077 [inline]
       __se_sys_mount fs/namespace.c:3074 [inline]
       __x64_sys_mount+0xbe/0x150 fs/namespace.c:3074
       do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x443d6a
      RSP: 002b:00007ffd312813c8 EFLAGS: 00000297 ORIG_RAX: 00000000000000a5
      RAX: ffffffffffffffda RBX: 0000000020000c00 RCX: 0000000000443d6a
      RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007ffd312813d0
      RBP: 0000000000000003 R08: 0000000020016a00 R09: 000000000000000a
      R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000004
      R13: 0000000000402c60 R14: 0000000000000000 R15: 0000000000000000
      RIP: build_sit_entries fs/f2fs/segment.c:3653 [inline] RSP: ffff8801b102e5b0
      RIP: build_segment_manager+0x7ef7/0xbf70 fs/f2fs/segment.c:3852 RSP: ffff8801b102e5b0
      CR2: ffffed006b2a50c0
      ---[ end trace a2034989e196ff17 ]---
      
      Reported-and-tested-by: syzbot+83699adeb2d13579c31e@syzkaller.appspotmail.com
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b2ca374f
    • C
      f2fs: clean up commit_inmem_pages() · cf52b27a
      Chao Yu 提交于
      This patch moves error handling from commit_inmem_pages() into
      __commit_inmem_page() for cleanup, no logic change.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      cf52b27a
    • Y
      f2fs: stop issue discard if something wrong with f2fs · d6184774
      Yunlei He 提交于
      v4->v5: move data corruption check to __submit_discard_cmd, in order to
      control discard io submitted more accurately, besides, increase async
      thread wait time if data corruption detected.
      
      This patch stop async thread and umount process to issue discard
      if something wrong with f2fs, which is similar to fstrim.
      Signed-off-by: NYunlei He <heyunlei@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d6184774
    • S
      f2fs: check if inmem_pages list is empty correctly · d0891e84
      Sheng Yong 提交于
      `cur' will never be NULL, we should check inmem_pages list instead.
      Signed-off-by: NSheng Yong <shengyong1@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d0891e84
    • Z
      f2fs: check cur_valid_map_mir & raw_sit block count when flush sit entries · 56b07e7e
      Zhikang Zhang 提交于
      We should check valid_map_mir and block count to ensure
      the flushed raw_sit is correct.
      Signed-off-by: NZhikang Zhang <zhangzhikang1@huawei.com>
      Signed-off-by: NYunlei He <heyunlei@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      56b07e7e
    • C
      f2fs: correct return value of f2fs_trim_fs · 3d165dc3
      Chao Yu 提交于
      Correct return value in two cases:
      - return EINVAL if end boundary is out-of-range.
      - return EIO if fs needs off-line check.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      3d165dc3
    • C
      f2fs: don't use GFP_ZERO for page caches · 81114baa
      Chao Yu 提交于
      Related to https://lkml.org/lkml/2018/4/8/661
      
      Sometimes, we need to write meta data to new allocated block address,
      then we will allocate a zeroed page in inner inode's address space, and
      fill partial data in it, and leave other place with zero value which means
      some fields are initial status.
      
      There are two inner inodes (meta inode and node inode) setting __GFP_ZERO,
      I have just checked them, for both of them, we can avoid using __GFP_ZERO,
      and do initialization by ourselves to avoid unneeded/redundant zeroing
      from mm.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      81114baa
    • Y
      f2fs: issue all big range discards in umount process · 241b493d
      Yunlei He 提交于
      This patch modify max_requests to UINT_MAX, to issue
      all big range discards in umount.
      Signed-off-by: NYunlei He <heyunlei@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      241b493d
    • J
      f2fs: run fstrim asynchronously if runtime discard is on · e555da9f
      Jaegeuk Kim 提交于
      We don't need to wait for whole bunch of discard candidates in fstrim, since
      runtime discard will issue them in idle time.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e555da9f
  2. 30 5月, 2018 4 次提交
  3. 03 5月, 2018 1 次提交
  4. 28 3月, 2018 1 次提交
  5. 17 3月, 2018 2 次提交
  6. 13 3月, 2018 7 次提交
    • C
      f2fs: support hot file extension · b6a06cbb
      Chao Yu 提交于
      This patch supports to recognize hot file extension in f2fs, so that we
      can allocate proper hot segment location for its data, which can lead to
      better hot/cold seperation in filesystem.
      
      In addition, we changes a bit on query/add/del operation method for
      extension_list sysfs entry as below:
      
      - Query: cat /sys/fs/f2fs/<disk>/extension_list
      - Add: echo 'extension' > /sys/fs/f2fs/<disk>/extension_list
      - Del: echo '!extension' > /sys/fs/f2fs/<disk>/extension_list
      - Add: echo '[h/c]extension' > /sys/fs/f2fs/<disk>/extension_list
      - Del: echo '[h/c]!extension' > /sys/fs/f2fs/<disk>/extension_list
      - [h] means add/del hot file extension
      - [c] means add/del cold file extension
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b6a06cbb
    • J
      f2fs: issue discard aggressively in the gc_urgent mode · dee02f0d
      Jaegeuk Kim 提交于
      This patch avoids to skip discard commands when user sets gc_urgent mode.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      dee02f0d
    • J
      f2fs: add mount option for segment allocation policy · 07939627
      Jaegeuk Kim 提交于
      This patch adds an mount option, "alloc_mode=%s" having two options, "default"
      and "reuse".
      
      In "alloc_mode=reuse" case, f2fs starts to allocate segments from 0'th segment
      all the time to reassign segments. It'd be useful for small-sized eMMC parts.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      07939627
    • S
      f2fs: clean up f2fs_sb_has_xxx functions · ccd31cb2
      Sheng Yong 提交于
      This patch introduces F2FS_FEATURE_FUNCS to clean up the definitions of
      different f2fs_sb_has_xxx functions.
      Signed-off-by: NSheng Yong <shengyong1@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      ccd31cb2
    • H
      f2fs: support passing down write hints to block layer with F2FS policy · f2e703f9
      Hyunchul Lee 提交于
      Add 'whint_mode=fs-based' mount option. In this mode, F2FS passes
      down write hints with its policy.
      
      * whint_mode=fs-based. F2FS passes down hints with its policy.
      
      User                  F2FS                     Block
      ----                  ----                     -----
                            META                     WRITE_LIFE_MEDIUM;
                            HOT_NODE                 WRITE_LIFE_NOT_SET
                            WARM_NODE                "
                            COLD_NODE                WRITE_LIFE_NONE
      ioctl(COLD)           COLD_DATA                WRITE_LIFE_EXTREME
      extension list        "                        "
      
      -- buffered io
      WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
      WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
      WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_LONG
      WRITE_LIFE_NONE       "                        "
      WRITE_LIFE_MEDIUM     "                        "
      WRITE_LIFE_LONG       "                        "
      
      -- direct io
      WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
      WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
      WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
      WRITE_LIFE_NONE       "                        WRITE_LIFE_NONE
      WRITE_LIFE_MEDIUM     "                        WRITE_LIFE_MEDIUM
      WRITE_LIFE_LONG       "                        WRITE_LIFE_LONG
      
      Many thanks to Chao Yu and Jaegeuk Kim for comments to
      implement this patch.
      Signed-off-by: NHyunchul Lee <cheol.lee@lge.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f2e703f9
    • H
      f2fs: support passing down write hints given by users to block layer · 0cdd3195
      Hyunchul Lee 提交于
      Add the 'whint_mode' mount option that controls which write
      hints are passed down to block layer. There are "off" and
      "user-based" mode. The default mode is "off".
      
      1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.
      
      2) whint_mode=user-based. F2FS tries to pass down hints given
      by users.
      
      User                  F2FS                     Block
      ----                  ----                     -----
                            META                     WRITE_LIFE_NOT_SET
                            HOT_NODE                 "
                            WARM_NODE                "
                            COLD_NODE                "
      ioctl(COLD)           COLD_DATA                WRITE_LIFE_EXTREME
      extension list        "                        "
      
      -- buffered io
      WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
      WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
      WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
      WRITE_LIFE_NONE       "                        "
      WRITE_LIFE_MEDIUM     "                        "
      WRITE_LIFE_LONG       "                        "
      
      -- direct io
      WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
      WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
      WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
      WRITE_LIFE_NONE       "                        WRITE_LIFE_NONE
      WRITE_LIFE_MEDIUM     "                        WRITE_LIFE_MEDIUM
      WRITE_LIFE_LONG       "                        WRITE_LIFE_LONG
      
      Many thanks to Chao Yu and Jaegeuk Kim for comments to
      implement this patch.
      Signed-off-by: NHyunchul Lee <cheol.lee@lge.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: avoid build warning]
      [Chao Yu: fix to restore whint_mode in ->remount_fs]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0cdd3195
    • Y
      f2fs: fix heap mode to reset it back · b94929d9
      Yunlong Song 提交于
      Commit 7a20b8a6 ("f2fs: allocate node
      and hot data in the beginning of partition") introduces another mount
      option, heap, to reset it back. But it does not do anything for heap
      mode, so fix it.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NYunlong Song <yunlong.song@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b94929d9
  7. 26 1月, 2018 2 次提交
    • Y
      f2fs: rebuild sit page from sit info in mem · 068c3cd8
      Yunlei He 提交于
      This patch rebuild sit page from sit info in mem instead
      of issue a read io.
      
      I test this method and the result is as below:
      
      Pre:
       mmc_perf_test-12061 [001] ...1   976.819992: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
       mmc_perf_test-12061 [001] ...1   976.856446: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
       mmc_perf_test-12061 [003] ...1   998.976946: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
       mmc_perf_test-12061 [003] ...1   999.023269: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
       mmc_perf_test-12061 [003] ...1  1022.060772: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
       mmc_perf_test-12061 [003] ...1  1022.111034: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
       mmc_perf_test-12061 [002] ...1  1070.127643: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
       mmc_perf_test-12061 [003] ...1  1070.187352: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
       mmc_perf_test-12061 [003] ...1  1095.942124: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
       mmc_perf_test-12061 [003] ...1  1095.995975: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
       mmc_perf_test-12061 [003] ...1  1122.535091: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
       mmc_perf_test-12061 [003] ...1  1122.586521: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
       mmc_perf_test-12061 [001] ...1  1147.897487: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
       mmc_perf_test-12061 [001] ...1  1147.959438: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
       mmc_perf_test-12061 [003] ...1  1177.926951: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
       mmc_perf_test-12061 [002] ...1  1177.976823: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
       mmc_perf_test-12061 [002] ...1  1204.176087: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
       mmc_perf_test-12061 [002] ...1  1204.239046: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
      
      Some sit flush consume more than 50ms.
      
      Now:
      mmc_perf_test-2187  [007] ...1   196.840684: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
      mmc_perf_test-2187  [007] ...1   196.841258: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
      mmc_perf_test-2187  [007] ...1   219.430582: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
      mmc_perf_test-2187  [007] ...1   219.431144: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
      mmc_perf_test-2187  [002] ...1   243.638678: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
      mmc_perf_test-2187  [000] ...1   243.638980: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
      mmc_perf_test-2187  [002] ...1   265.392180: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
      mmc_perf_test-2187  [002] ...1   265.392245: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
      mmc_perf_test-2187  [000] ...1   290.309051: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
      mmc_perf_test-2187  [000] ...1   290.309116: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
      mmc_perf_test-2187  [003] ...1   317.144209: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
      mmc_perf_test-2187  [003] ...1   317.145913: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
      mmc_perf_test-2187  [005] ...1   343.224954: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
      mmc_perf_test-2187  [005] ...1   343.225574: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
      mmc_perf_test-2187  [000] ...1   370.239846: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
      mmc_perf_test-2187  [000] ...1   370.241138: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
      mmc_perf_test-2187  [001] ...1   397.029043: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
      mmc_perf_test-2187  [001] ...1   397.030750: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
      mmc_perf_test-2187  [003] ...1   425.386377: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
      mmc_perf_test-2187  [003] ...1   425.387735: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
      
      Most sit flush consume no more than 1ms.
      Signed-off-by: NYunlei He <heyunlei@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      068c3cd8
    • C
      f2fs: stop issuing discard if fs is readonly · 3b60d802
      Chao Yu 提交于
      If filesystem is readonly, stop to issue discard in daemon.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      3b60d802