1. 01 10月, 2016 4 次提交
    • J
      f2fs: handle errors during recover_orphan_inodes · d41065e2
      Jaegeuk Kim 提交于
      This patch fixes to handle EIO during recover_orphan_inode() given the below
      panic.
      
      F2FS-fs : inject IO error in f2fs_read_end_io+0xe6/0x100 [f2fs]
      ------------[ cut here ]------------
      RIP: 0010:[<ffffffffc0b244e3>]  [<ffffffffc0b244e3>] f2fs_evict_inode+0x433/0x470 [f2fs]
      RSP: 0018:ffff92f8b7fb7c30  EFLAGS: 00010246
      RAX: ffff92fb88a13500 RBX: ffff92f890566ea0 RCX: 00000000fd3c255c
      RDX: 0000000000000001 RSI: ffff92fb88a13d90 RDI: ffff92fb8ee127e8
      RBP: ffff92f8b7fb7c58 R08: 0000000000000001 R09: ffff92fb88a13d58
      R10: 000000005a6a9373 R11: 0000000000000001 R12: 00000000fffffffb
      R13: ffff92fb8ee12000 R14: 00000000000034ca R15: ffff92fb8ee12620
      FS:  00007f1fefd8e880(0000) GS:ffff92fb95600000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fc211d34cdb CR3: 000000012d43a000 CR4: 00000000001406e0
      Stack:
       ffff92f890566ea0 ffff92f890567078 ffffffffc0b5a0c0 ffff92f890566f28
       ffff92fb888b2000 ffff92f8b7fb7c80 ffffffffbc27ff55 ffff92f890566ea0
       ffff92fb8bf10000 ffffffffc0b5a0c0 ffff92f8b7fb7cb0 ffffffffbc28090d
      Call Trace:
       [<ffffffffbc27ff55>] evict+0xc5/0x1a0
       [<ffffffffbc28090d>] iput+0x1ad/0x2c0
       [<ffffffffc0b3304c>] recover_orphan_inodes+0x10c/0x2e0 [f2fs]
       [<ffffffffc0b2e0f4>] f2fs_fill_super+0x884/0x1150 [f2fs]
       [<ffffffffbc2644ac>] mount_bdev+0x18c/0x1c0
       [<ffffffffc0b2d870>] ? f2fs_commit_super+0x100/0x100 [f2fs]
       [<ffffffffc0b2a755>] f2fs_mount+0x15/0x20 [f2fs]
       [<ffffffffbc264e49>] mount_fs+0x39/0x170
       [<ffffffffbc28555b>] vfs_kern_mount+0x6b/0x160
       [<ffffffffbc2881df>] do_mount+0x1cf/0xd00
       [<ffffffffbc287f2c>] ? copy_mount_options+0xac/0x170
       [<ffffffffbc289003>] SyS_mount+0x83/0xd0
       [<ffffffffbc8ee880>] entry_SYSCALL_64_fastpath+0x23/0xc1
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d41065e2
    • W
      f2fs: add customized migrate_page callback · 5b7a487c
      Weichao Guo 提交于
      This patch improves the migration of dirty pages and allows migrating atomic
      written pages that F2FS uses in Page Cache. Instead of the fallback releasing
      page path, it provides better performance for memory compaction, CMA and other
      users of memory page migrating. For dirty pages, there is no need to write back
      first when migrating. For an atomic written page before committing, we can
      migrate the page and update the related 'inmem_pages' list at the same time.
      Signed-off-by: NWeichao Guo <guoweichao@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix some coding style]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5b7a487c
    • C
      f2fs: introduce cp_lock to protect updating of ckpt_flags · aaec2b1d
      Chao Yu 提交于
      This patch introduces spinlock to protect updating process of ckpt_flags
      field in struct f2fs_checkpoint, it avoids incorrectly updating in race
      condition.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: add __is_set_ckpt_flags likewise __set_ckpt_flags]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      aaec2b1d
    • J
      f2fs: use crc and cp version to determine roll-forward recovery · a468f0ef
      Jaegeuk Kim 提交于
      Previously, we used cp_version only to detect recoverable dnodes.
      In order to avoid same garbage cp_version, we needed to truncate the next
      dnode during checkpoint, resulting in additional discard or data write.
      If we can distinguish this by using crc in addition to cp_version, we can
      remove this overhead.
      
      There is backward compatibility concern where it changes node_footer layout.
      So, this patch introduces a new checkpoint flag, CP_CRC_RECOVERY_FLAG, to
      detect new layout. New layout will be activated only when this flag is set.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      a468f0ef
  2. 16 9月, 2016 1 次提交
  3. 08 9月, 2016 3 次提交
    • C
      f2fs: fix to set superblock dirty correctly · c2a080ae
      Chao Yu 提交于
      tests/generic/251 of fstest suit complains us with below message:
      
      ------------[ cut here ]------------
      invalid opcode: 0000 [#1] PREEMPT SMP
      CPU: 2 PID: 7698 Comm: fstrim Tainted: G           O    4.7.0+ #21
      task: e9f4e000 task.stack: e7262000
      EIP: 0060:[<f89fcefe>] EFLAGS: 00010202 CPU: 2
      EIP is at write_checkpoint+0xfde/0x1020 [f2fs]
      EAX: f33eb300 EBX: eecac310 ECX: 00000001 EDX: ffff0001
      ESI: eecac000 EDI: eecac5f0 EBP: e7263dec ESP: e7263d18
       DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      CR0: 80050033 CR2: b76ab01c CR3: 2eb89de0 CR4: 000406f0
      Stack:
       00000001 a220fb7b e9f4e000 00000002 419ff2d3 b3a05151 00000002 e9f4e5d8
       e9f4e000 419ff2d3 b3a05151 eecac310 c10b8154 b3a05151 419ff2d3 c10b78bd
       e9f4e000 e9f4e000 e9f4e5d8 00000001 e9f4e000 ec409000 eecac2cc eecac288
      Call Trace:
       [<c10b8154>] ? __lock_acquire+0x3c4/0x760
       [<c10b78bd>] ? mark_held_locks+0x5d/0x80
       [<f8a10632>] f2fs_trim_fs+0x1c2/0x2e0 [f2fs]
       [<f89e9f56>] f2fs_ioctl+0x6b6/0x10b0 [f2fs]
       [<c13d51df>] ? __this_cpu_preempt_check+0xf/0x20
       [<c10b4281>] ? trace_hardirqs_off_caller+0x91/0x120
       [<f89e98a0>] ? __exchange_data_block+0xd30/0xd30 [f2fs]
       [<c120b2e1>] do_vfs_ioctl+0x81/0x7f0
       [<c11d57c5>] ? kmem_cache_free+0x245/0x2e0
       [<c1217840>] ? get_unused_fd_flags+0x40/0x40
       [<c1206eec>] ? putname+0x4c/0x50
       [<c11f631e>] ? do_sys_open+0x16e/0x1d0
       [<c1001990>] ? do_fast_syscall_32+0x30/0x1c0
       [<c13d51df>] ? __this_cpu_preempt_check+0xf/0x20
       [<c120baa8>] SyS_ioctl+0x58/0x80
       [<c1001a01>] do_fast_syscall_32+0xa1/0x1c0
       [<c178cc54>] sysenter_past_esp+0x45/0x74
      EIP: [<f89fcefe>] write_checkpoint+0xfde/0x1020 [f2fs] SS:ESP 0068:e7263d18
      ---[ end trace 4de95d7e6b3aa7c6 ]---
      
      The reason is: with below call stack, we will encounter BUG_ON during
      doing fstrim.
      
      Thread A				Thread B
      - write_checkpoint
       - do_checkpoint
      					- f2fs_write_inode
      					 - update_inode_page
      					  - update_inode
      					   - set_page_dirty
      					    - f2fs_set_node_page_dirty
      					     - inc_page_count
      					      - percpu_counter_inc
      					      - set_sbi_flag(SBI_IS_DIRTY)
        - clear_sbi_flag(SBI_IS_DIRTY)
      
      Thread C				Thread D
      - f2fs_write_node_page
       - set_node_addr
        - __set_nat_cache_dirty
         - nm_i->dirty_nat_cnt++
      					- do_vfs_ioctl
      					 - f2fs_ioctl
      					  - f2fs_trim_fs
      					   - write_checkpoint
      					    - f2fs_bug_on(nm_i->dirty_nat_cnt)
      
      Fix it by setting superblock dirty correctly in do_checkpoint and
      f2fs_write_node_page.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c2a080ae
    • J
      f2fs: fix lost xattrs of directories · bbf156f7
      Jaegeuk Kim 提交于
      This patch enhances the xattr consistency of dirs from suddern power-cuts.
      
      Possible scenario would be:
      1. dir->setxattr used by per-file encryption
      2. file->setxattr goes into inline_xattr
      3. file->fsync
      
      In that case, we should do checkpoint for #1.
      Otherwise we'd lose dir's key information for the file given #2.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      bbf156f7
    • C
      f2fs: support async discard · 275b66b0
      Chao Yu 提交于
      Like most filesystems, f2fs will issue discard command synchronously, so
      when user trigger fstrim through ioctl, multiple discard commands will be
      issued serially with sync mode, which makes poor performance.
      
      In this patch we try to support async discard, so that all discard
      commands can be issued and be waited for endio in batch to improve
      performance.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      275b66b0
  4. 30 8月, 2016 1 次提交
  5. 21 7月, 2016 1 次提交
  6. 16 7月, 2016 1 次提交
  7. 09 7月, 2016 2 次提交
  8. 07 7月, 2016 2 次提交
  9. 14 6月, 2016 1 次提交
  10. 09 6月, 2016 1 次提交
  11. 08 6月, 2016 1 次提交
  12. 03 6月, 2016 4 次提交
  13. 21 5月, 2016 1 次提交
  14. 19 5月, 2016 2 次提交
  15. 17 5月, 2016 2 次提交
  16. 08 5月, 2016 4 次提交
    • C
      f2fs: fix inode cache leak · f61cce5b
      Chao Yu 提交于
      When testing f2fs with inline_dentry option, generic/342 reports:
      VFS: Busy inodes after unmount of dm-0. Self-destruct in 5 seconds.  Have a nice day...
      
      After rmmod f2fs module, kenrel shows following dmesg:
       =============================================================================
       BUG f2fs_inode_cache (Tainted: G           O   ): Objects remaining in f2fs_inode_cache on __kmem_cache_shutdown()
       -----------------------------------------------------------------------------
      
       Disabling lock debugging due to kernel taint
       INFO: Slab 0xf51ca0e0 objects=22 used=1 fp=0xd1e6fc60 flags=0x40004080
       CPU: 3 PID: 7455 Comm: rmmod Tainted: G    B      O    4.6.0-rc4+ #16
       Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
        00000086 00000086 d062fe18 c13a83a0 f51ca0e0 d062fe38 d062fea4 c11c7276
        c1981040 f51ca0e0 00000016 00000001 d1e6fc60 40004080 656a624f 20737463
        616d6572 6e696e69 6e692067 66326620 6e695f73 5f65646f 68636163 6e6f2065
       Call Trace:
        [<c13a83a0>] dump_stack+0x5f/0x8f
        [<c11c7276>] slab_err+0x76/0x80
        [<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
        [<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
        [<c11cbfe5>] __kmem_cache_shutdown+0x125/0x2f0
        [<c1198a38>] kmem_cache_destroy+0x158/0x1f0
        [<c176b43d>] ? mutex_unlock+0xd/0x10
        [<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
        [<c10f596c>] SyS_delete_module+0x16c/0x1d0
        [<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
        [<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
        [<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
        [<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
        [<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
        [<c176d888>] sysenter_past_esp+0x45/0x74
       INFO: Object 0xd1e6d9e0 @offset=6624
       kmem_cache_destroy f2fs_inode_cache: Slab cache still has objects
       CPU: 3 PID: 7455 Comm: rmmod Tainted: G    B      O    4.6.0-rc4+ #16
       Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
        00000286 00000286 d062fef4 c13a83a0 f174b000 d062ff14 d062ff28 c1198ac7
        c197fe18 f3c5b980 d062ff20 000d04f2 d062ff0c d062ff0c d062ff14 d062ff14
        f8f20dc0 fffffff5 d062e000 d062ff30 f8f15aa3 d062ff7c c10f596c 73663266
       Call Trace:
        [<c13a83a0>] dump_stack+0x5f/0x8f
        [<c1198ac7>] kmem_cache_destroy+0x1e7/0x1f0
        [<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
        [<c10f596c>] SyS_delete_module+0x16c/0x1d0
        [<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
        [<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
        [<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
        [<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
        [<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
        [<c176d888>] sysenter_past_esp+0x45/0x74
      
      The reason is: in recovery flow, we use delayed iput mechanism for directory
      which has recovered dentry block. It means the reference of inode will be
      held until last dirty dentry page being writebacked.
      
      But when we mount f2fs with inline_dentry option, during recovery, dirent
      may only be recovered into dir inode page rather than dentry page, so there
      are no chance for us to release inode reference in ->writepage when
      writebacking last dentry page.
      
      We can call paired iget/iput explicityly for inline_dentry case, but for
      non-inline_dentry case, iput will call writeback_single_inode to write all
      data pages synchronously, but during recovery, ->writepages of f2fs skips
      writing all pages, result in losing dirent.
      
      This patch fixes this issue by obsoleting old mechanism, and introduce a
      new dir_list to hold all directory inodes which has recovered datas until
      finishing recovery.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f61cce5b
    • J
      f2fs: fix leak of orphan inode objects · 74ef9241
      Jaegeuk Kim 提交于
      When unmounting filesystem, we should release all the ino entries.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      74ef9241
    • J
      f2fs: inject ENOSPC failures · cb78942b
      Jaegeuk Kim 提交于
      This patch injects ENOSPC failures.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      cb78942b
    • J
      f2fs: use f2fs_grab_cache_page instead of grab_cache_page · 300e129c
      Jaegeuk Kim 提交于
      This patch converts grab_cache_page to f2fs_grab_cache_page.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      300e129c
  17. 27 4月, 2016 1 次提交
  18. 15 4月, 2016 1 次提交
  19. 18 3月, 2016 1 次提交
  20. 27 2月, 2016 1 次提交
  21. 26 2月, 2016 1 次提交
    • C
      f2fs: fix incorrect upper bound when iterating inode mapping tree · 80dd9c0e
      Chao Yu 提交于
      1. Inode mapping tree can index page in range of [0, ULONG_MAX], however,
      in some places, f2fs only search or iterate page in ragne of [0, LONG_MAX],
      result in miss hitting in page cache.
      
      2. filemap_fdatawait_range accepts range parameters in unit of bytes, so
      the max range it covers should be [0, LLONG_MAX], if we use [0, LONG_MAX]
      as range for waiting on writeback, big number of pages will not be covered.
      
      This patch corrects above two issues.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      80dd9c0e
  22. 24 2月, 2016 1 次提交
  23. 23 2月, 2016 3 次提交
    • C
      f2fs: trace old block address for CoWed page · 7a9d7548
      Chao Yu 提交于
      This patch enables to trace old block address of CoWed page for better
      debugging.
      
      f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f0, oldaddr = 0xfe8ab, newaddr = 0xfee90 rw = WRITE_SYNC, type = NODE
      f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f8, oldaddr = 0xfe8b0, newaddr = 0xfee91 rw = WRITE_SYNC, type = NODE
      f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4fa, oldaddr = 0xfe8ae, newaddr = 0xfee92 rw = WRITE_SYNC, type = NODE
      
      f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x96, oldaddr = 0xf049b, newaddr = 0x2bbe rw = WRITE, type = DATA
      f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x97, oldaddr = 0xf049c, newaddr = 0x2bbf rw = WRITE, type = DATA
      f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x98, oldaddr = 0xf049d, newaddr = 0x2bc0 rw = WRITE, type = DATA
      
      f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x47, oldaddr = 0xffffffff, newaddr = 0xf2631 rw = WRITE, type = DATA
      f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x48, oldaddr = 0xffffffff, newaddr = 0xf2632 rw = WRITE, type = DATA
      f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x49, oldaddr = 0xffffffff, newaddr = 0xf2633 rw = WRITE, type = DATA
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7a9d7548
    • S
      f2fs: move sanity checking of cp into get_valid_checkpoint · 984ec63c
      Shawn Lin 提交于
      >From the function name of get_valid_checkpoint, it seems to return
      the valid cp or NULL for caller to check. If no valid one is found,
      f2fs_fill_super will print the err log. But if get_valid_checkpoint
      get one valid(the return value indicate that it's valid, however actually
      it is invalid after sanity checking), then print another similar err
      log. That seems strange. Let's keep sanity checking inside the procedure
      of geting valid cp. Another improvement we gained from this move is
      that even the large volume is supported, we check the cp in advanced
      to skip the following procedure if failing the sanity checking.
      Signed-off-by: NShawn Lin <shawn.lin@rock-chips.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      984ec63c
    • C
      f2fs: split journal cache from curseg cache · b7ad7512
      Chao Yu 提交于
      In curseg cache, f2fs caches two different parts:
       - datas of current summay block, i.e. summary entries, footer info.
       - journal info, i.e. sparse nat/sit entries or io stat info.
      
      With this approach, 1) it may cause higher lock contention when we access
      or update both of the parts of cache since we use the same mutex lock
      curseg_mutex to protect the cache. 2) current summary block with last
      journal info will be writebacked into device as a normal summary block
      when flushing, however, we treat journal info as valid one only in current
      summary, so most normal summary blocks contain junk journal data, it wastes
      remaining space of summary block.
      
      So, in order to fix above issues, we split curseg cache into two parts:
      a) current summary block, protected by original mutex lock curseg_mutex
      b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
      
      When loading curseg cache during ->mount, we store summary info and
      journal info into different caches; When doing checkpoint, we combine
      datas of two cache into current summary block for persisting.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b7ad7512