1. 07 5月, 2014 1 次提交
    • J
      f2fs: avoid to conduct roll-forward due to the remained garbage blocks · 1e87a78d
      Jaegeuk Kim 提交于
      The f2fs always scans the next chain of direct node blocks.
      But some garbage blocks are able to be remained due to no discard support or
      SSR triggers.
      This occasionally wreaks recovering wrong inodes that were used or BUG_ONs
      due to reallocating node ids as follows.
      
      When mount this f2fs image:
      http://linuxtesting.org/downloads/f2fs_fault_image.zip
      BUG_ON is triggered in f2fs driver (messages below are generated on
      kernel 3.13.2; for other kernels output is similar):
      
      kernel BUG at fs/f2fs/node.c:215!
       Call Trace:
       [<ffffffffa032ebad>] recover_inode_page+0x1fd/0x3e0 [f2fs]
       [<ffffffff811446e7>] ? __lock_page+0x67/0x70
       [<ffffffff81089990>] ? autoremove_wake_function+0x50/0x50
       [<ffffffffa0337788>] recover_fsync_data+0x1398/0x15d0 [f2fs]
       [<ffffffff812b9e5c>] ? selinux_d_instantiate+0x1c/0x20
       [<ffffffff811cb20b>] ? d_instantiate+0x5b/0x80
       [<ffffffffa0321044>] f2fs_fill_super+0xb04/0xbf0 [f2fs]
       [<ffffffff811b861e>] ? mount_bdev+0x7e/0x210
       [<ffffffff811b8769>] mount_bdev+0x1c9/0x210
       [<ffffffffa0320540>] ? validate_superblock+0x210/0x210 [f2fs]
       [<ffffffffa031cf8d>] f2fs_mount+0x1d/0x30 [f2fs]
       [<ffffffff811b9497>] mount_fs+0x47/0x1c0
       [<ffffffff81166e00>] ? __alloc_percpu+0x10/0x20
       [<ffffffff811d4032>] vfs_kern_mount+0x72/0x110
       [<ffffffff811d6763>] do_mount+0x493/0x910
       [<ffffffff811615cb>] ? strndup_user+0x5b/0x80
       [<ffffffff811d6c70>] SyS_mount+0x90/0xe0
       [<ffffffff8166f8d9>] system_call_fastpath+0x16/0x1b
      
      Found by Linux File System Verification project (linuxtesting.org).
      Reported-by: NAndrey Tsyvarev <tsyvarev@ispras.ru>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      1e87a78d
  2. 07 4月, 2014 1 次提交
    • J
      f2fs: introduce f2fs_issue_flush to avoid redundant flush issue · 6b4afdd7
      Jaegeuk Kim 提交于
      Some storage devices show relatively high latencies to complete cache_flush
      commands, even though their normal IO speed is prettry much high. In such
      the case, it needs to merge cache_flush commands as much as possible to avoid
      issuing them redundantly.
      So, this patch introduces a mount option, "-o flush_merge", to mitigate such
      the overhead.
      
      If this option is enabled by user, F2FS merges the cache_flush commands and then
      issues just one cache_flush on behalf of them. Once the single command is
      finished, F2FS sends a completion signal to all the pending threads.
      
      Note that, this option can be used under a workload consisting of very intensive
      concurrent fsync calls, while the storage handles cache_flush commands slowly.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      6b4afdd7
  3. 01 4月, 2014 1 次提交
  4. 20 3月, 2014 3 次提交
    • J
      f2fs: skip unnecessary node writes during fsync · 479f40c4
      Jaegeuk Kim 提交于
      If multiple redundant fsync calls are triggered, we don't need to write its
      node pages with fsync mark continuously.
      
      So, this patch adds FI_NEED_FSYNC to track whether the latest node block is
      written with the fsync mark or not.
      If the mark was set, a new fsync doesn't need to write a node block.
      Otherwise, we should do a new node block with the mark for roll-forward
      recovery.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      479f40c4
    • J
      f2fs: introduce fi->i_sem to protect fi's info · d928bfbf
      Jaegeuk Kim 提交于
      This patch introduces fi->i_sem to protect fi's info that includes xattr_ver,
      pino, i_nlink.
      This enables to remove i_mutex during f2fs_sync_file, resulting in performance
      improvement when a number of fsync calls are triggered from many concurrent
      threads.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      d928bfbf
    • J
      f2fs: throttle the memory footprint with a sysfs entry · cdfc41c1
      Jaegeuk Kim 提交于
      This patch introduces ram_thresh, a sysfs entry, which controls the memory
      footprint used by the free nid list and the nat cache.
      
      Previously, the free nid list was controlled by MAX_FREE_NIDS, while the nat
      cache was managed by NM_WOUT_THRESHOLD.
      However, this approach cannot be applied dynamically according to the system.
      
      So, this patch adds ram_thresh that users can specify the threshold, which is
      in order of 1 / 1024.
      For example, if the total ram size is 4GB and the value is set to 10 by default,
      f2fs tries to control the number of free nids and nat caches not to consume over
      10 * (4GB / 1024) = 10MB.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      cdfc41c1
  5. 18 3月, 2014 2 次提交
  6. 12 3月, 2014 1 次提交
  7. 10 3月, 2014 1 次提交
  8. 27 2月, 2014 4 次提交
    • C
      f2fs: use existing macro to clean up some codes · 695fd1ed
      Chao Yu 提交于
      This patch use existing macro F2FS_INODE/NEXT_FREE_BLKADDR to clean up some
      codes.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      695fd1ed
    • C
      f2fs: readahead contiguous SSA blocks for f2fs_gc · 81c1a0f1
      Chao Yu 提交于
      If there are multi segments in one section, we will read those SSA blocks which
      have contiguous address one by one in f2fs_gc. It may lost performance, let's
      read ahead SSA blocks by merge multi read request.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      81c1a0f1
    • J
      f2fs: add an sysfs entry to control the directory level · ab9fa662
      Jaegeuk Kim 提交于
      This patch adds an sysfs entry to control dir_level used by the large directory.
      
      The description of this entry is:
      
       dir_level                    This parameter controls the directory level to
      			      support large directory. If a directory has a
      			      number of files, it can reduce the file lookup
      			      latency by increasing this dir_level value.
      			      Otherwise, it needs to decrease this value to
      			      reduce the space overhead. The default value is 0.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      ab9fa662
    • J
      f2fs: introduce large directory support · 38431545
      Jaegeuk Kim 提交于
      This patch introduces an i_dir_level field to support large directory.
      
      Previously, f2fs maintains multi-level hash tables to find a dentry quickly
      from a bunch of chiild dentries in a directory, and the hash tables consist of
      the following tree structure as below.
      
      In Documentation/filesystems/f2fs.txt,
      
      ----------------------
      A : bucket
      B : block
      N : MAX_DIR_HASH_DEPTH
      ----------------------
      
      level #0   | A(2B)
                 |
      level #1   | A(2B) - A(2B)
                 |
      level #2   | A(2B) - A(2B) - A(2B) - A(2B)
           .     |   .       .       .       .
      level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
           .     |   .       .       .       .
      level #N   | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)
      
      But, if we can guess that a directory will handle a number of child files,
      we don't need to traverse the tree from level #0 to #N all the time.
      Since the lower level tables contain relatively small number of dentries,
      the miss ratio of the target dentry is likely to be high.
      
      In order to avoid that, we can configure the hash tables sparsely from level #0
      like this.
      
      level #0   | A(2B) - A(2B) - A(2B) - A(2B)
      
      level #1   | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
           .     |   .       .       .       .
      level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
           .     |   .       .       .       .
      level #N   | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)
      
      With this structure, we can skip the ineffective tree searches in lower level
      hash tables.
      
      This patch adds just a facility for this by introducing i_dir_level in
      f2fs_inode.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      38431545
  9. 24 2月, 2014 3 次提交
  10. 17 2月, 2014 7 次提交
  11. 26 1月, 2014 1 次提交
  12. 22 1月, 2014 2 次提交
  13. 20 1月, 2014 1 次提交
  14. 14 1月, 2014 2 次提交
  15. 08 1月, 2014 2 次提交
    • J
      f2fs: add a sysfs entry to control max_victim_search · b1c57c1c
      Jaegeuk Kim 提交于
      Previously during SSR and GC, the maximum number of retrials to find a victim
      segment was hard-coded by MAX_VICTIM_SEARCH, 4096 by default.
      
      This number makes an effect on IO locality, when SSR mode is activated, which
      results in performance fluctuation on some low-end devices.
      
      If max_victim_search = 4, the victim will be searched like below.
      ("D" represents a dirty segment, and "*" indicates a selected victim segment.)
      
       D1 D2 D3 D4 D5 D6 D7 D8 D9
      [   *       ]
            [   *    ]
                  [         * ]
      	                [ ....]
      
      This patch adds a sysfs entry to control the number dynamically through:
        /sys/fs/f2fs/$dev/max_victim_search
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      b1c57c1c
    • J
      f2fs: improve write performance under frequent fsync calls · fb5566da
      Jaegeuk Kim 提交于
      When considering a bunch of data writes with very frequent fsync calls, we
      are able to think the following performance regression.
      
      N: Node IO, D: Data IO, IO scheduler: cfq
      
      Issue    pending IOs
      	 D1 D2 D3 D4
       D1         D2 D3 D4 N1
       D2            D3 D4 N1 N2
       N1            D3 D4 N2 D1
       --> N1 can be selected by cfq becase of the same priority of N and D.
           Then D3 and D4 would be delayed, resuling in performance degradation.
      
      So, when processing the fsync call, it'd better give higher priority to data IOs
      than node IOs by assigning WRITE and WRITE_SYNC respectively.
      This patch improves the random wirte performance with frequent fsync calls by up
      to 10%.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      fb5566da
  16. 06 1月, 2014 3 次提交
    • J
      f2fs: add inline_data recovery routine · 1e1bb4ba
      Jaegeuk Kim 提交于
      This patch adds a inline_data recovery routine with the following policy.
      
      [prev.] [next] of inline_data flag
         o       o  -> recover inline_data
         o       x  -> remove inline_data, and then recover data blocks
         x       o  -> remove inline_data, and then recover inline_data
         x       x  -> recover data blocks
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      1e1bb4ba
    • J
      f2fs: add the number of inline_data files to status info · 0dbdc2ae
      Jaegeuk Kim 提交于
      This patch adds the number of inline_data files into the status information.
      Note that the number is reset whenever the filesystem is newly mounted.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      0dbdc2ae
    • J
      f2fs: refactor f2fs_convert_inline_data · 9e09fc85
      Jaegeuk Kim 提交于
      Change log from v1:
       o handle NULL pointer of grab_cache_page_write_begin() pointed by Chao Yu.
      
      This patch refactors f2fs_convert_inline_data to check a couple of conditions
      internally for deciding whether it needs to convert inline_data or not.
      
      So, the new f2fs_convert_inline_data initially checks:
      1) f2fs_has_inline_data(), and
      2) the data size to be changed.
      
      If the inode has inline_data but the size to fill is less than MAX_INLINE_DATA,
      then we don't need to convert the inline_data with data allocation.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      9e09fc85
  17. 26 12月, 2013 3 次提交
  18. 23 12月, 2013 2 次提交