1. 24 2月, 2017 1 次提交
    • C
      f2fs: change recovery policy of xattr node block · d260081c
      Chao Yu 提交于
      Currently, if we call fsync after updating the xattr date belongs to the
      file, f2fs needs to trigger checkpoint to keep xattr data consistent. But,
      this policy cause low performance as checkpoint will block most foreground
      operations and cause unneeded and unrelated IOs around checkpoint.
      
      This patch will reuse regular file recovery policy for xattr node block,
      so, we change to write xattr node block tagged with fsync flag to warm
      area instead of cold area, and during recovery, we search warm node chain
      for fsynced xattr block, and do the recovery.
      
      So, for below application IO pattern, performance can be improved
      obviously:
      - touch file
      - create/update/delete xattr entry in file
      - fsync file
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d260081c
  2. 23 2月, 2017 2 次提交
  3. 24 11月, 2016 2 次提交
    • C
      f2fs: split free nid list · b8559dc2
      Chao Yu 提交于
      During free nid allocation, in order to do preallocation, we will tag free
      nid entry as allocated one and still leave it in free nid list, for other
      allocators who want to grab free nids, it needs to traverse the free nid
      list for lookup. It becomes overhead in scenario of allocating free nid
      intensively by multithreads.
      
      This patch splits free nid list to two list: {free,alloc}_nid_list, to
      keep free nids and preallocated free nids separately, after that, traverse
      latency will be gone, besides split nid_cnt for separate statistic.
      
      Additionally, introduce __insert_nid_to_list and __remove_nid_from_list for
      cleanup.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: modify f2fs_bug_on to avoid needless branches]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b8559dc2
    • E
      f2fs: fix sparse warnings · 0c0b471e
      Eric Biggers 提交于
      f2fs contained a number of endianness conversion bugs.
      
      Also, one function should have been 'static'.
      
      Found with sparse by running 'make C=2 CF=-D__CHECK_ENDIAN__ fs/f2fs/'
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0c0b471e
  4. 01 10月, 2016 2 次提交
    • C
      f2fs: introduce cp_lock to protect updating of ckpt_flags · aaec2b1d
      Chao Yu 提交于
      This patch introduces spinlock to protect updating process of ckpt_flags
      field in struct f2fs_checkpoint, it avoids incorrectly updating in race
      condition.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: add __is_set_ckpt_flags likewise __set_ckpt_flags]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      aaec2b1d
    • J
      f2fs: use crc and cp version to determine roll-forward recovery · a468f0ef
      Jaegeuk Kim 提交于
      Previously, we used cp_version only to detect recoverable dnodes.
      In order to avoid same garbage cp_version, we needed to truncate the next
      dnode during checkpoint, resulting in additional discard or data write.
      If we can distinguish this by using crc in addition to cp_version, we can
      remove this overhead.
      
      There is backward compatibility concern where it changes node_footer layout.
      So, this patch introduces a new checkpoint flag, CP_CRC_RECOVERY_FLAG, to
      detect new layout. New layout will be activated only when this flag is set.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      a468f0ef
  5. 07 7月, 2016 1 次提交
  6. 08 6月, 2016 2 次提交
  7. 23 2月, 2016 4 次提交
    • J
      f2fs: use wait_for_stable_page to avoid contention · fec1d657
      Jaegeuk Kim 提交于
      In write_begin, if storage supports stable_page, we don't need to wait for
      writeback to update its contents.
      This patch introduces to use wait_for_stable_page instead of
      wait_on_page_writeback.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fec1d657
    • J
      f2fs: avoid multiple node page writes due to inline_data · 2049d4fc
      Jaegeuk Kim 提交于
      The sceanrio is:
      1. create fully node blocks
      2. flush node blocks
      3. write inline_data for all the node blocks again
      4. flush node blocks redundantly
      
      So, this patch tries to flush inline_data when flushing node blocks.
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2049d4fc
    • C
      f2fs: export dirty_nats_ratio in sysfs · 2304cb0c
      Chao Yu 提交于
      This patch exports a new sysfs entry 'dirty_nat_ratio' to control threshold
      of dirty nat entries, if current ratio exceeds configured threshold,
      checkpoint will be triggered in f2fs_balance_fs_bg for flushing dirty nats.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2304cb0c
    • C
      f2fs: flush dirty nat entries when exceeding threshold · 7d768d2c
      Chao Yu 提交于
      When testing f2fs with xfstest, generic/251 is stuck for long time,
      the case uses below serials to obtain fresh released space in device,
      in order to prepare for following fstrim test.
      
      1. rm -rf /mnt/dir
      2. mkdir /mnt/dir/
      3. cp -axT `pwd`/ /mnt/dir/
      4. goto 1
      
      During preparing step, all nat entries will be cached in nat cache,
      most of them are dirty entries with invalid blkaddr, which means
      nodes related to these entries have been truncated, and they could
      be reused after the dirty entries been checkpointed.
      
      However, there was no checkpoint been triggered, so nid allocators
      (e.g. mkdir, creat) will run into long journey of iterating all NAT
      pages, looking for free nids in alloc_nid->build_free_nids.
      
      Here, in f2fs_balance_fs_bg we give another chance to do checkpoint
      to flush nat entries for reusing them in free nid cache when dirty
      entry count exceeds 10% of max count.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7d768d2c
  8. 09 1月, 2016 1 次提交
  9. 05 12月, 2015 1 次提交
  10. 13 10月, 2015 2 次提交
  11. 10 10月, 2015 1 次提交
    • J
      f2fs: do not skip dentry block writes · 90b803e6
      Jaegeuk Kim 提交于
      Previously, we skip dentry block writes when wbc is SYNC_NONE with no memory
      pressure and the number of dirty pages is pretty small.
      
      But, we didn't skip for normal data writes, which gives us not much big impact
      on overall performance.
      Moreover, by skipping some data writes, kworker falls into infinite loop to try
      to write blocks, when many dir inodes have only one dentry block.
      
      So, this patch removes skipping data writes.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      90b803e6
  12. 29 5月, 2015 1 次提交
  13. 04 3月, 2015 1 次提交
  14. 10 1月, 2015 4 次提交
    • J
      f2fs: free radix_tree_nodes used by nat_set entries · 7aed0d45
      Jaegeuk Kim 提交于
      In the normal case, the radix_tree_nodes are freed successfully.
      But, when cp_error was detected, we should destroy them forcefully.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7aed0d45
    • J
      f2fs: fix missing cold bit during recovery · 09eb483e
      Jaegeuk Kim 提交于
      In do_recover_data, we find and update previous node pages after updating
      its new block addresses.
      After then, we call fill_node_footer without reset field, we erase its
      cold bit so that this new cold node block is written to wrong log area.
      This patch fixes not to miss its old flag.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      09eb483e
    • C
      f2fs: merge two uchar variable in struct node_info to reduce memory cost · 5c27f4ee
      Chao Yu 提交于
      This patch moves one member of struct nat_entry: _flag_ to struct node_info,
      so _version_ in struct node_info and _flag_ which are unsigned char type will
      merge to one 32-bit space in register/memory. So the size of nat_entry will be
      reduced from 28 bytes to 24 bytes (for 64-bit machine, reduce its size from 40
      bytes to 32 bytes) and then slab memory using by f2fs will be reduced.
      
      changes from v2:
       o update description of memory usage gain for 64-bit machine suggested by
         Changman Lee.
      changes from v1:
       o introduce inline copy_node_info() to copy valid data from node info suggested
         by Jaegeuk Kim, it can avoid bug.
      Reviewed-by: NChangman Lee <cm224.lee@samsung.com>
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5c27f4ee
    • J
      f2fs: change atomic and volatile write policies · 1e84371f
      Jaegeuk Kim 提交于
      This patch adds two new ioctls to release inmemory pages grabbed by atomic
      writes.
       o f2fs_ioc_abort_volatile_write
        - If transaction was failed, all the grabbed pages and data should be written.
       o f2fs_ioc_release_volatile_write
        - This is to enhance the performance of PERSIST mode in sqlite.
      
      In order to avoid huge memory consumption which causes OOM, this patch changes
      volatile writes to use normal dirty pages, instead blocked flushing to the disk
      as long as system does not suffer from memory pressure.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      1e84371f
  15. 07 11月, 2014 1 次提交
  16. 04 11月, 2014 1 次提交
  17. 06 10月, 2014 1 次提交
  18. 01 10月, 2014 1 次提交
    • J
      f2fs: refactor flush_nat_entries to remove costly reorganizing ops · 309cc2b6
      Jaegeuk Kim 提交于
      Previously, f2fs tries to reorganize the dirty nat entries into multiple sets
      according to its nid ranges. This can improve the flushing nat pages, however,
      if there are a lot of cached nat entries, it becomes a bottleneck.
      
      This patch introduces a new set management flow by removing dirty nat list and
      adding a series of set operations when the nat entry becomes dirty.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      309cc2b6
  19. 24 9月, 2014 2 次提交
    • J
      f2fs: fix conditions to remain recovery information in f2fs_sync_file · 88bd02c9
      Jaegeuk Kim 提交于
      This patch revisited whole the recovery information during the f2fs_sync_file.
      
      In this patch, there are three information to make a decision.
      
      a) IS_CHECKPOINTED,	/* is it checkpointed before? */
      b) HAS_FSYNCED_INODE,	/* is the inode fsynced before? */
      c) HAS_LAST_FSYNC,	/* has the latest node fsync mark? */
      
      And, the scenarios for our rule are based on:
      
      [Term] F: fsync_mark, D: dentry_mark
      
      1. inode(x) | CP | inode(x) | dnode(F)
      2. inode(x) | CP | inode(F) | dnode(F)
      3. inode(x) | CP | dnode(F) | inode(x) | inode(F)
      4. inode(x) | CP | dnode(F) | inode(F)
      5. CP | inode(x) | dnode(F) | inode(DF)
      6. CP | inode(DF) | dnode(F)
      7. CP | dnode(F) | inode(DF)
      8. CP | dnode(F) | inode(x) | inode(DF)
      
      For example, #3, the three conditions should be changed as follows.
      
         inode(x) | CP | dnode(F) | inode(x) | inode(F)
      a)    x       o      o          o          o
      b)    x       x      x          x          o
      c)    x       o      o          x          o
      
      If f2fs_sync_file stops   ------^,
       it should write inode(F)    --------------^
      
      So, the need_inode_block_update should return true, since
       c) get_nat_flag(e, HAS_LAST_FSYNC), is false.
      
      For example, #8,
            CP | alloc | dnode(F) | inode(x) | inode(DF)
      a)    o      x        x          x          x
      b)    x               x          x          o
      c)    o               o          x          o
      
      If f2fs_sync_file stops   -------^,
       it should write inode(DF)    --------------^
      
      Note that, the roll-forward policy should follow this rule, which means,
      if there are any missing blocks, we doesn't need to recover that inode.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      88bd02c9
    • J
      f2fs: introduce a flag to represent each nat entry information · 7ef35e3b
      Jaegeuk Kim 提交于
      This patch introduces a flag in the nat entry structure to merge various
      information such as checkpointed and fsync_done marks.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7ef35e3b
  20. 16 9月, 2014 1 次提交
  21. 04 9月, 2014 1 次提交
  22. 10 7月, 2014 1 次提交
    • C
      f2fs: refactor flush_nat_entries codes for reducing NAT writes · aec71382
      Chao Yu 提交于
      Although building NAT journal in cursum reduce the read/write work for NAT
      block, but previous design leave us lower performance when write checkpoint
      frequently for these cases:
      1. if journal in cursum has already full, it's a bit of waste that we flush all
         nat entries to page for persistence, but not to cache any entries.
      2. if journal in cursum is not full, we fill nat entries to journal util
         journal is full, then flush the left dirty entries to disk without merge
         journaled entries, so these journaled entries may be flushed to disk at next
         checkpoint but lost chance to flushed last time.
      
      In this patch we merge dirty entries located in same NAT block to nat entry set,
      and linked all set to list, sorted ascending order by entries' count of set.
      Later we flush entries in sparse set into journal as many as we can, and then
      flush merged entries to disk. In this way we can not only gain in performance,
      but also save lifetime of flash device.
      
      In my testing environment, it shows this patch can help to reduce NAT block
      writes obviously. In hard disk test case: cost time of fsstress is stablely
      reduced by about 5%.
      
      1. virtual machine + hard disk:
      fsstress -p 20 -n 200 -l 5
      		node num	cp count	nodes/cp
      based		4599.6		1803.0		2.551
      patched		2714.6		1829.6		1.483
      
      2. virtual machine + 32g micro SD card:
      fsstress -p 20 -n 200 -l 1 -w -f chown=0 -f creat=4 -f dwrite=0
      -f fdatasync=4 -f fsync=4 -f link=0 -f mkdir=4 -f mknod=4 -f rename=5
      -f rmdir=5 -f symlink=0 -f truncate=4 -f unlink=5 -f write=0 -S
      
      		node num	cp count	nodes/cp
      based		84.5		43.7		1.933
      patched		49.2		40.0		1.23
      
      Our latency of merging op shows not bad when handling extreme case like:
      merging a great number of dirty nats:
      latency(ns)	dirty nat count
      3089219		24922
      5129423		27422
      4000250		24523
      
      change log from v1:
       o fix wrong logic in add_nat_entry when grab a new nat entry set.
       o swith to create slab cache in create_node_manager_caches.
       o use GFP_ATOMIC instead of GFP_NOFS to avoid potential long latency.
      
      change log from v2:
       o make comment position more appropriate suggested by Jaegeuk Kim.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      aec71382
  23. 07 5月, 2014 4 次提交
  24. 20 3月, 2014 2 次提交