1. 06 11月, 2017 1 次提交
  2. 22 8月, 2017 3 次提交
  3. 01 8月, 2017 1 次提交
    • C
      f2fs: enhance on-disk inode structure scalability · 7a2af766
      Chao Yu 提交于
      This patch add new flag F2FS_EXTRA_ATTR storing in inode.i_inline
      to indicate that on-disk structure of current inode is extended.
      
      In order to extend, we changed the inode structure a bit:
      
      Original one:
      
      struct f2fs_inode {
      	...
      	struct f2fs_extent i_ext;
      	__le32 i_addr[DEF_ADDRS_PER_INODE];
      	__le32 i_nid[DEF_NIDS_PER_INODE];
      }
      
      Extended one:
      
      struct f2fs_inode {
              ...
              struct f2fs_extent i_ext;
      	union {
      		struct {
      			__le16 i_extra_isize;
      			__le16 i_padding;
      			__le32 i_extra_end[0];
      		};
      		__le32 i_addr[DEF_ADDRS_PER_INODE];
      	};
              __le32 i_nid[DEF_NIDS_PER_INODE];
      }
      
      Once F2FS_EXTRA_ATTR is set, we will steal four bytes in the head of
      i_addr field for storing i_extra_isize and i_padding. with i_extra_isize,
      we can calculate actual size of reserved space in i_addr, available
      attribute fields included in total extra attribute fields for current
      inode can be described as below:
      
        +--------------------+
        | .i_mode            |
        | ...                |
        | .i_ext             |
        +--------------------+
        | .i_extra_isize     |-----+
        | .i_padding         |     |
        | .i_prjid           |     |
        | .i_atime_extra     |     |
        | .i_ctime_extra     |     |
        | .i_mtime_extra     |<----+
        | .i_inode_cs        |<----- store blkaddr/inline from here
        | .i_xattr_cs        |
        | ...                |
        +--------------------+
        |                    |
        |    block address   |
        |                    |
        +--------------------+
        | .i_nid             |
        +--------------------+
        |   node_footer      |
        | (nid, ino, offset) |
        +--------------------+
      
      Hence, with this patch, we would enhance scalability of f2fs inode for
      storing more newly added attribute.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7a2af766
  4. 19 4月, 2017 1 次提交
    • J
      f2fs: avoid dirty node pages in check_only recovery · d40d30c5
      Jaegeuk Kim 提交于
      In the check_only mode, we should not make any dirty node pages. Otherwise,
      we can get this panic:
      
      F2FS-fs (nvme0n1p1): Need to recover fsync data
      ------------[ cut here ]------------
      kernel BUG at fs/f2fs/node.c:2204!
      CPU: 7 PID: 19923 Comm: mount Tainted: G           OE   4.9.8 #2
      RIP: 0010:[<ffffffffc0979c0b>]  [<ffffffffc0979c0b>] flush_nat_entries+0x43b/0x7d0 [f2fs]
      Call Trace:
       [<ffffffffc096ddaa>] ? __f2fs_submit_merged_bio+0x5a/0xd0 [f2fs]
       [<ffffffffc096ddaa>] ? __f2fs_submit_merged_bio+0x5a/0xd0 [f2fs]
       [<ffffffffc096dddb>] ? __f2fs_submit_merged_bio+0x8b/0xd0 [f2fs]
       [<ffffffff860e450f>] ? up_write+0x1f/0x40
       [<ffffffffc096dddb>] ? __f2fs_submit_merged_bio+0x8b/0xd0 [f2fs]
       [<ffffffffc0969f04>] write_checkpoint+0x2f4/0xf20 [f2fs]
       [<ffffffff860e938d>] ? trace_hardirqs_on+0xd/0x10
       [<ffffffffc0960bc9>] ? f2fs_sync_fs+0x79/0x190 [f2fs]
       [<ffffffffc0960bc9>] ? f2fs_sync_fs+0x79/0x190 [f2fs]
       [<ffffffffc0960bd5>] f2fs_sync_fs+0x85/0x190 [f2fs]
       [<ffffffffc097b6de>] f2fs_balance_fs_bg+0x7e/0x1c0 [f2fs]
       [<ffffffffc0977b64>] f2fs_write_node_pages+0x34/0x350 [f2fs]
       [<ffffffff860e5f42>] ? __lock_is_held+0x52/0x70
       [<ffffffff861d9b31>] do_writepages+0x21/0x30
       [<ffffffff86298ce1>] __writeback_single_inode+0x61/0x760
       [<ffffffff86909127>] ? _raw_spin_unlock+0x27/0x40
       [<ffffffff8629a735>] writeback_single_inode+0xd5/0x190
       [<ffffffff8629a889>] write_inode_now+0x99/0xc0
       [<ffffffff86283876>] iput+0x1f6/0x2c0
       [<ffffffffc0964b52>] f2fs_fill_super+0xc32/0x10c0 [f2fs]
       [<ffffffff86266462>] mount_bdev+0x182/0x1b0
       [<ffffffffc0963f20>] ? f2fs_commit_super+0x100/0x100 [f2fs]
       [<ffffffffc0960da5>] f2fs_mount+0x15/0x20 [f2fs]
       [<ffffffff86266e08>] mount_fs+0x38/0x170
       [<ffffffff86288bab>] vfs_kern_mount+0x6b/0x160
       [<ffffffff8628bcfe>] do_mount+0x1be/0xd60
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d40d30c5
  5. 24 2月, 2017 1 次提交
    • C
      f2fs: change recovery policy of xattr node block · d260081c
      Chao Yu 提交于
      Currently, if we call fsync after updating the xattr date belongs to the
      file, f2fs needs to trigger checkpoint to keep xattr data consistent. But,
      this policy cause low performance as checkpoint will block most foreground
      operations and cause unneeded and unrelated IOs around checkpoint.
      
      This patch will reuse regular file recovery policy for xattr node block,
      so, we change to write xattr node block tagged with fsync flag to warm
      area instead of cold area, and during recovery, we search warm node chain
      for fsynced xattr block, and do the recovery.
      
      So, for below application IO pattern, performance can be improved
      obviously:
      - touch file
      - create/update/delete xattr entry in file
      - fsync file
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d260081c
  6. 23 2月, 2017 1 次提交
  7. 29 1月, 2017 1 次提交
  8. 30 11月, 2016 1 次提交
    • J
      f2fs: do not activate auto_recovery for fallocated i_size · 26787236
      Jaegeuk Kim 提交于
      If a file needs to keep its i_size by fallocate, we need to turn off auto
      recovery during roll-forward recovery.
      
      This will resolve the below scenario.
      
      1. xfs_io -f /mnt/f2fs/file -c "pwrite 0 4096" -c "fsync"
      2. xfs_io -f /mnt/f2fs/file -c "falloc -k 4096 4096" -c "fsync"
      3. md5sum /mnt/f2fs/file;
      4. godown /mnt/f2fs/
      5. umount /mnt/f2fs/
      6. mount -t f2fs /dev/sdx /mnt/f2fs
      7. md5sum /mnt/f2fs/file
      Reported-by: NChao Yu <chao@kernel.org>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      26787236
  9. 26 11月, 2016 1 次提交
  10. 24 11月, 2016 2 次提交
  11. 01 10月, 2016 3 次提交
  12. 14 9月, 2016 1 次提交
  13. 13 9月, 2016 1 次提交
  14. 08 9月, 2016 2 次提交
  15. 21 7月, 2016 2 次提交
  16. 14 6月, 2016 1 次提交
  17. 03 6月, 2016 3 次提交
  18. 21 5月, 2016 1 次提交
  19. 19 5月, 2016 1 次提交
  20. 12 5月, 2016 1 次提交
  21. 08 5月, 2016 2 次提交
    • C
      f2fs: fix inode cache leak · f61cce5b
      Chao Yu 提交于
      When testing f2fs with inline_dentry option, generic/342 reports:
      VFS: Busy inodes after unmount of dm-0. Self-destruct in 5 seconds.  Have a nice day...
      
      After rmmod f2fs module, kenrel shows following dmesg:
       =============================================================================
       BUG f2fs_inode_cache (Tainted: G           O   ): Objects remaining in f2fs_inode_cache on __kmem_cache_shutdown()
       -----------------------------------------------------------------------------
      
       Disabling lock debugging due to kernel taint
       INFO: Slab 0xf51ca0e0 objects=22 used=1 fp=0xd1e6fc60 flags=0x40004080
       CPU: 3 PID: 7455 Comm: rmmod Tainted: G    B      O    4.6.0-rc4+ #16
       Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
        00000086 00000086 d062fe18 c13a83a0 f51ca0e0 d062fe38 d062fea4 c11c7276
        c1981040 f51ca0e0 00000016 00000001 d1e6fc60 40004080 656a624f 20737463
        616d6572 6e696e69 6e692067 66326620 6e695f73 5f65646f 68636163 6e6f2065
       Call Trace:
        [<c13a83a0>] dump_stack+0x5f/0x8f
        [<c11c7276>] slab_err+0x76/0x80
        [<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
        [<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
        [<c11cbfe5>] __kmem_cache_shutdown+0x125/0x2f0
        [<c1198a38>] kmem_cache_destroy+0x158/0x1f0
        [<c176b43d>] ? mutex_unlock+0xd/0x10
        [<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
        [<c10f596c>] SyS_delete_module+0x16c/0x1d0
        [<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
        [<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
        [<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
        [<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
        [<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
        [<c176d888>] sysenter_past_esp+0x45/0x74
       INFO: Object 0xd1e6d9e0 @offset=6624
       kmem_cache_destroy f2fs_inode_cache: Slab cache still has objects
       CPU: 3 PID: 7455 Comm: rmmod Tainted: G    B      O    4.6.0-rc4+ #16
       Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
        00000286 00000286 d062fef4 c13a83a0 f174b000 d062ff14 d062ff28 c1198ac7
        c197fe18 f3c5b980 d062ff20 000d04f2 d062ff0c d062ff0c d062ff14 d062ff14
        f8f20dc0 fffffff5 d062e000 d062ff30 f8f15aa3 d062ff7c c10f596c 73663266
       Call Trace:
        [<c13a83a0>] dump_stack+0x5f/0x8f
        [<c1198ac7>] kmem_cache_destroy+0x1e7/0x1f0
        [<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
        [<c10f596c>] SyS_delete_module+0x16c/0x1d0
        [<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
        [<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
        [<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
        [<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
        [<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
        [<c176d888>] sysenter_past_esp+0x45/0x74
      
      The reason is: in recovery flow, we use delayed iput mechanism for directory
      which has recovered dentry block. It means the reference of inode will be
      held until last dirty dentry page being writebacked.
      
      But when we mount f2fs with inline_dentry option, during recovery, dirent
      may only be recovered into dir inode page rather than dentry page, so there
      are no chance for us to release inode reference in ->writepage when
      writebacking last dentry page.
      
      We can call paired iget/iput explicityly for inline_dentry case, but for
      non-inline_dentry case, iput will call writeback_single_inode to write all
      data pages synchronously, but during recovery, ->writepages of f2fs skips
      writing all pages, result in losing dirent.
      
      This patch fixes this issue by obsoleting old mechanism, and introduce a
      new dir_list to hold all directory inodes which has recovered datas until
      finishing recovery.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f61cce5b
    • C
      f2fs: remove unneeded readahead in find_fsync_dnodes · ae8d1db3
      Chao Yu 提交于
      In find_fsync_dnodes, get_tmp_page will read dnode page synchronously,
      previously, ra_meta_page did the same work, which is redundant, remove
      it.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      ae8d1db3
  22. 04 5月, 2016 1 次提交
  23. 27 4月, 2016 1 次提交
  24. 15 4月, 2016 1 次提交
  25. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  26. 23 2月, 2016 3 次提交
    • C
      f2fs: support revoking atomic written pages · 28bc106b
      Chao Yu 提交于
      f2fs support atomic write with following semantics:
      1. open db file
      2. ioctl start atomic write
      3. (write db file) * n
      4. ioctl commit atomic write
      5. close db file
      
      With this flow we can avoid file becoming corrupted when abnormal power
      cut, because we hold data of transaction in referenced pages linked in
      inmem_pages list of inode, but without setting them dirty, so these data
      won't be persisted unless we commit them in step 4.
      
      But we should still hold journal db file in memory by using volatile
      write, because our semantics of 'atomic write support' is incomplete, in
      step 4, we could fail to submit all dirty data of transaction, once
      partial dirty data was committed in storage, then after a checkpoint &
      abnormal power-cut, db file will be corrupted forever.
      
      So this patch tries to improve atomic write flow by adding a revoking flow,
      once inner error occurs in committing, this gives another chance to try to
      revoke these partial submitted data of current transaction, it makes
      committing operation more like aotmical one.
      
      If we're not lucky, once revoking operation was failed, EAGAIN will be
      reported to user for suggesting doing the recovery with held journal file,
      or retrying current transaction again.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      28bc106b
    • C
      f2fs: remove unneeded pointer conversion · 81ca7350
      Chao Yu 提交于
      There are redundant pointer conversion in following call stack:
       - at position a, inode was been converted to f2fs_file_info.
       - at position b, f2fs_file_info was been converted to inode again.
      
       - truncate_blocks(inode,..)
        - fi = F2FS_I(inode)		---a
        - ADDRS_PER_PAGE(node_page, fi)
         - addrs_per_inode(fi)
          - inode = &fi->vfs_inode	---b
          - f2fs_has_inline_xattr(inode)
           - fi = F2FS_I(inode)
           - is_inode_flag_set(fi,..)
      
      In order to avoid unneeded conversion, alter ADDRS_PER_PAGE and
      addrs_per_inode to acept parameter with type of inode pointer.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      81ca7350
    • J
      f2fs: use wait_for_stable_page to avoid contention · fec1d657
      Jaegeuk Kim 提交于
      In write_begin, if storage supports stable_page, we don't need to wait for
      writeback to update its contents.
      This patch introduces to use wait_for_stable_page instead of
      wait_on_page_writeback.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fec1d657
  27. 31 12月, 2015 1 次提交
  28. 05 12月, 2015 1 次提交