1. 01 10月, 2014 1 次提交
  2. 24 9月, 2014 1 次提交
    • C
      f2fs: fix to search whole dirty segmap when get_victim · 210f41bc
      Chao Yu 提交于
      In ->get_victim we get max_search value from dirty_i->nr_dirty without
      protection of seglist_lock, after that, nr_dirty can be increased/decreased
      before we hold seglist_lock lock.
      Then in main loop we attempt to traverse all dirty section one time to find
      victim section, but it's not accurate to use max_search as the total loop count,
      because we might lose checking several sections or check sections redundantly
      for the case of nr_dirty are increased or decreased previously.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      210f41bc
  3. 16 9月, 2014 1 次提交
  4. 10 9月, 2014 1 次提交
    • H
      f2fs: avoid node page to be written twice in gc_node_segment · 9a01b56b
      Huang Ying 提交于
      In gc_node_segment, if node page gc is run concurrently with node page
      writeback, and check_valid_map and get_node_page run after page locked
      and before cur_valid_map is updated as below, it is possible for the
      page to be written twice unnecessarily.
      
      			sync_node_pages
      			  try_lock_page
      			  ...
      check_valid_map		  f2fs_write_node_page
      			    ...
      			    write_node_page
      			      do_write_page
      			        allocate_data_block
      				  ...
      				  refresh_sit_entry /* update cur_valid_map */
      				  ...
      			    ...
      			    unlock_page
      get_node_page
      ...
      set_page_dirty
      ...
      f2fs_put_page
        unlock_page
      
      This can be solved via calling check_valid_map after get_node_page again.
      Signed-off-by: NHuang, Ying <ying.huang@intel.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      9a01b56b
  5. 02 9月, 2014 1 次提交
    • C
      f2fs: reposition unlock_new_inode to prevent accessing invalid inode · b73e5282
      Chao Yu 提交于
      As the race condition on the inode cache, following scenario can appear:
      [Thread a]				[Thread b]
      					->f2fs_mkdir
      					  ->f2fs_add_link
      					    ->__f2fs_add_link
      					      ->init_inode_metadata failed here
      ->gc_thread_func
        ->f2fs_gc
          ->do_garbage_collect
            ->gc_data_segment
              ->f2fs_iget
                ->iget_locked
                  ->wait_on_inode
      					  ->unlock_new_inode
              ->move_data_page
      					  ->make_bad_inode
      					  ->iput
      
      When we fail in create/symlink/mkdir/mknod/tmpfile, the new allocated inode
      should be set as bad to avoid being accessed by other thread. But in above
      scenario, it allows f2fs to access the invalid inode before this inode was set
      as bad.
      This patch fix the potential problem, and this issue was found by code review.
      
      change log from v1:
       o Add condition judgment in gc_data_segment() suggested by Changman Lee.
       o use iget_failed to simplify code.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b73e5282
  6. 22 8月, 2014 1 次提交
  7. 20 8月, 2014 1 次提交
  8. 05 8月, 2014 1 次提交
  9. 10 3月, 2014 1 次提交
  10. 27 2月, 2014 1 次提交
  11. 17 2月, 2014 2 次提交
  12. 14 1月, 2014 1 次提交
  13. 08 1月, 2014 1 次提交
    • J
      f2fs: add a sysfs entry to control max_victim_search · b1c57c1c
      Jaegeuk Kim 提交于
      Previously during SSR and GC, the maximum number of retrials to find a victim
      segment was hard-coded by MAX_VICTIM_SEARCH, 4096 by default.
      
      This number makes an effect on IO locality, when SSR mode is activated, which
      results in performance fluctuation on some low-end devices.
      
      If max_victim_search = 4, the victim will be searched like below.
      ("D" represents a dirty segment, and "*" indicates a selected victim segment.)
      
       D1 D2 D3 D4 D5 D6 D7 D8 D9
      [   *       ]
            [   *    ]
                  [         * ]
      	                [ ....]
      
      This patch adds a sysfs entry to control the number dynamically through:
        /sys/fs/f2fs/$dev/max_victim_search
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      b1c57c1c
  14. 23 12月, 2013 6 次提交
    • G
      f2fs: remove the rw_flag domain from f2fs_io_info · 7e8f2308
      Gu Zheng 提交于
      When using the f2fs_io_info in the low level, we still need to merge the
      rw and rw_flag, so use the rw to hold all the io flags directly,
      and remove the rw_flag field.
      
      ps.It is based on the previous patch:
      f2fs: move all the bio initialization into __bio_alloc
      Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      7e8f2308
    • J
      f2fs: refactor bio->rw handling · 458e6197
      Jaegeuk Kim 提交于
      This patch introduces f2fs_io_info to mitigate the complex parameter list.
      
      struct f2fs_io_info {
      	enum page_type type;		/* contains DATA/NODE/META/META_FLUSH */
      	int rw;				/* contains R/RS/W/WS */
      	int rw_flag;			/* contains REQ_META/REQ_PRIO */
      }
      
      1. f2fs_write_data_pages
       - DATA
       - WRITE_SYNC is set when wbc->WB_SYNC_ALL.
      
      2. sync_node_pages
       - NODE
       - WRITE_SYNC all the time
      
      3. sync_meta_pages
       - META
       - WRITE_SYNC all the time
       - REQ_META | REQ_PRIO all the time
      
       ** f2fs_submit_merged_bio() handles META_FLUSH.
      
      4. ra_nat_pages, ra_sit_pages, ra_sum_pages
       - META
       - READ_SYNC
      
      Cc: Fan Li <fanofcode.li@samsung.com>
      Cc: Changman Lee <cm224.lee@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      458e6197
    • F
      f2fs: merge pages with the same sync_mode flag · 63a0b7cb
      Fan Li 提交于
      Previously f2fs submits most of write requests using WRITE_SYNC, but f2fs_write_data_pages
      submits last write requests by sync_mode flags callers pass.
      
      This causes a performance problem since continuous pages with different sync flags
      can't be merged in cfq IO scheduler(thanks yu chao for pointing it out), and synchronous
      requests often take more time.
      
      This patch makes the following modifies to DATA writebacks:
      
      1. every page will be written back using the sync mode caller pass.
      2. only pages with the same sync mode can be merged in one bio request.
      
      These changes are restricted to DATA pages.Other types of writebacks are modified
      To remain synchronous.
      
      In my test with tiotest, f2fs sequence write performance is improved by about 7%-10% ,
      and this patch has no obvious impact on other performance tests.
      Signed-off-by: NFan Li <fanofcode.li@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      63a0b7cb
    • J
      f2fs: add unlikely() macro for compiler more aggressively · 6bacf52f
      Jaegeuk Kim 提交于
      This patch adds unlikely() macro into the most of codes.
      The basic rule is to add that when:
      - checking unusual errors,
      - checking page mappings,
      - and the other unlikely conditions.
      
      Change log from v1:
       - Don't add unlikely for the NULL test and error test: advised by Andi Kleen.
      
      Cc: Chao Yu <chao2.yu@samsung.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      6bacf52f
    • J
      f2fs: refactor bio-related operations · 93dfe2ac
      Jaegeuk Kim 提交于
      This patch integrates redundant bio operations on read and write IOs.
      
      1. Move bio-related codes to the top of data.c.
      2. Replace f2fs_submit_bio with f2fs_submit_merged_bio, which handles read
         bios additionally.
      3. Introduce __submit_merged_bio to submit the merged bio.
      4. Change f2fs_readpage to f2fs_submit_page_bio.
      5. Introduce f2fs_submit_page_mbio to integrate previous submit_read_page and
         submit_write_page.
      Reviewed-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
      Reviewed-by: Chao Yu <chao2.yu@samsung.com >
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      93dfe2ac
    • J
      f2fs: remove unnecessary condition checks · 031fa8cc
      Jaegeuk Kim 提交于
      This patch removes the unnecessary condition checks on:
      
      fs/f2fs/gc.c:667 do_garbage_collect() warn: 'sum_page' isn't an ERR_PTR
      fs/f2fs/f2fs.h:795 f2fs_put_page() warn: 'page' isn't an ERR_PTR
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      031fa8cc
  15. 25 10月, 2013 3 次提交
  16. 22 10月, 2013 1 次提交
  17. 24 9月, 2013 1 次提交
    • J
      f2fs: optimize the victim searching loop slightly · a57e564d
      Jin Xu 提交于
      Since the MAX_VICTIM_SEARCH has been enlarged from 20 to 4096,
      the victim searching overhead will be increased much than before,
      especially for SSR that searches victim for use quiet often.
      This patch intends to reduce the overhead a little bit by:
      - make the get_gc_cost a inline routine to reduce function call
        overhead
      - reduce multiplication and division operations
      - reduce unnecessary comparison operation
      Signed-off-by: NJin Xu <jinuxstyle@gmail.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      a57e564d
  18. 05 9月, 2013 1 次提交
    • J
      f2fs: optimize gc for better performance · a26b7c8a
      Jin Xu 提交于
      This patch improves the gc efficiency by optimizing the victim
      selection policy. With this optimization, the random re-write
      performance could increase up to 20%.
      
      For f2fs, when disk is in shortage of free spaces, gc will selects
      dirty segments and moves valid blocks around for making more space
      available. The gc cost of a segment is determined by the valid blocks
      in the segment. The less the valid blocks, the higher the efficiency.
      The ideal victim segment is the one that has the most garbage blocks.
      
      Currently, it searches up to 20 dirty segments for a victim segment.
      The selected victim is not likely the best victim for gc when there
      are much more dirty segments. Why not searching more dirty segments
      for a better victim? The cost of searching dirty segments is
      negligible in comparison to moving blocks.
      
      In this patch, it enlarges the MAX_VICTIM_SEARCH to 4096 to make
      the search more aggressively for a possible better victim. Since
      it also applies to victim selection for SSR, it will likely improve
      the SSR efficiency as well.
      
      The test case is simple. It creates as many files until the disk full.
      The size for each file is 32KB. Then it writes as many as 100000
      records of 4KB size to random offsets of random files in sync mode.
      The testing was done on a 2GB partition of a SDHC card. Let's see the
      test result of f2fs without and with the patch.
      
      ---------------------------------------
      2GB partition, SDHC
      create 52023 files of size 32768 bytes
      random re-write 100000 records of 4KB
      ---------------------------------------
      | file creation (s) | rewrite time (s) | gc count | gc garbage blocks |
      [no patch]  341         4227             1174          174840
      [patched]   324         2958             645           106682
      
      It's obvious that, with the patch, f2fs finishes the test in 20+% less
      time than without the patch. And internally it does much less gc with
      higher efficiency than before.
      
      Since the performance improvement is related to gc, it might not be so
      obvious for other tests that do not trigger gc as often as this one (
      This is because f2fs selects dirty segments for SSR use most of the
      time when free space is in shortage). The well-known iozone test tool
      was not used for benchmarking the patch becuase it seems do not have
      a test case that performs random re-write on a full disk.
      
      This patch is the revised version based on the suggestion from
      Jaegeuk Kim.
      Signed-off-by: NJin Xu <jinuxstyle@gmail.com>
      [Jaegeuk Kim: suggested simpler solution]
      Reviewed-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      a26b7c8a
  19. 26 8月, 2013 1 次提交
    • J
      f2fs: reserve the xattr space dynamically · de93653f
      Jaegeuk Kim 提交于
      This patch enables the number of direct pointers inside on-disk inode block to
      be changed dynamically according to the size of inline xattr space.
      
      The number of direct pointers, ADDRS_PER_INODE, can be changed only if the file
      has inline xattr flag.
      
      The number of direct pointers that will be used by inline xattrs is defined as
      F2FS_INLINE_XATTR_ADDRS.
      Current patch assigns F2FS_INLINE_XATTR_ADDRS to 0 temporarily.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      de93653f
  20. 06 8月, 2013 3 次提交
    • J
      f2fs: fix a deadlock in fsync · a569469e
      Jin Xu 提交于
      This patch fixes a deadlock bug that occurs quite often when there are
      concurrent write and fsync on a same file.
      
      Following is the simplified call trace when tasks get hung.
      
      fsync thread:
      - f2fs_sync_file
       ...
       - f2fs_write_data_pages
       ...
        - update_extent_cache
        ...
         - update_inode
          - wait_on_page_writeback
      
      bdi writeback thread
      - __writeback_single_inode
       - f2fs_write_data_pages
        - mutex_lock(sbi->writepages)
      
      The deadlock happens when the fsync thread waits on a inode page that has
      been added to the f2fs' cached bio sbi->bio[NODE], and unfortunately,
      no one else could be able to submit the cached bio to block layer for
      writeback. This is because the fsync thread already hold a sbi->fs_lock and
      the sbi->writepages lock, causing the bdi thread being blocked when attempt
      to write data pages for the same inode. At the same time, f2fs_gc thread
      does not notice the situation and could not help. Even the sync syscall
      gets blocked.
      
      To fix it, we could submit the cached bio first before waiting on a inode page
      that is being written back.
      Signed-off-by: NJin Xu <jinuxstyle@gmail.com>
      [Jaegeuk Kim: add more cases to use f2fs_wait_on_page_writeback]
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      a569469e
    • N
      f2fs: add sysfs entries to select the gc policy · d2dc095f
      Namjae Jeon 提交于
      Add sysfs entry gc_idle to control the gc policy. Where
      gc_idle = 1 corresponds to selecting a cost benefit approach,
      while gc_idle = 2 corresponds to selecting a greedy approach
      to garbage collection. The selection is mutually exclusive one
      approach will work at any point. If gc_idle = 0, then this
      option is disabled.
      
      Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NPankaj Kumar <pankaj.km@samsung.com>
      Reviewed-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
      [Jaegeuk Kim: change the select_gc_type() flow slightly]
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      d2dc095f
    • N
      f2fs: add sysfs support for controlling the gc_thread · b59d0bae
      Namjae Jeon 提交于
      Add sysfs entries to control the timing parameters for
      f2fs gc thread.
      
      Various Sysfs options introduced are:
      gc_min_sleep_time: Min Sleep time for GC in ms
      gc_max_sleep_time: Max Sleep time for GC in ms
      gc_no_gc_sleep_time: Default Sleep time for GC in ms
      
      Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NPankaj Kumar <pankaj.km@samsung.com>
      Reviewed-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
      [Jaegeuk Kim: fix an umount bug and some minor changes]
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      b59d0bae
  21. 02 7月, 2013 1 次提交
  22. 06 6月, 2013 1 次提交
    • N
      f2fs: reorganise the function get_victim_by_default · b2b3460a
      Namjae Jeon 提交于
      Fix the function get_victim_by_default, where it checks
      for the condition  that p.min_segno != NULL_SEGNO as
      shown:
      
      if (p.min_segno != NULL_SEGNO)
                 goto got_it;
      
      and if above condition is true then
      
      got_it:
              if (p.min_segno != NULL_SEGNO) {
      
      So this condition is being checked twice. Hence move the goto
      statement after the if condition so that duplication of condition
      check is avoided.
      
      Also this function makes a call to get_max_cost() to compute
      the max cost based on the f2fs_sbi_info and victim policy. Since
      get_max_cost depends on on three parameters of victim_sel_policy
      => alloc_mode, gc_mode & ofs_unit, once this victim policy is
      initialised, these value will not change till the execution
      time of get_victim_by_default() & also f2fs_sbi_info structure
      parameters will not change.
      
      Hence making calls to get_max_cost() in while loop does not seems to
      be a good point. Instead we can call it once in begining and store
      the results in local variable, which later can serve our purpose
      for comparing the cost with max cost inside the while loop.
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NPankaj Kumar <pankaj.km@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      b2b3460a
  23. 28 5月, 2013 2 次提交
  24. 30 4月, 2013 1 次提交
  25. 26 4月, 2013 2 次提交
    • J
      f2fs: give a chance to merge IOs by IO scheduler · c718379b
      Jaegeuk Kim 提交于
      Previously, background GC submits many 4KB read requests to load victim blocks
      and/or its (i)node blocks.
      
      ...
      f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb61, blkaddr = 0x3b964ed
      f2fs_gc : block_rq_complete: 8,16 R () 499854968 + 8 [0]
      f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb6f, blkaddr = 0x3b964ee
      f2fs_gc : block_rq_complete: 8,16 R () 499854976 + 8 [0]
      f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb79, blkaddr = 0x3b964ef
      f2fs_gc : block_rq_complete: 8,16 R () 499854984 + 8 [0]
      ...
      
      However, by the fact that many IOs are sequential, we can give a chance to merge
      the IOs by IO scheduler.
      In order to do that, let's use blk_plug.
      
      ...
      f2fs_gc : f2fs_iget: ino = 143
      f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c6, blkaddr = 0x2e6ee
      f2fs_gc : f2fs_iget: ino = 143
      f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c7, blkaddr = 0x2e6ef
      <idle> : block_rq_complete: 8,16 R () 1519616 + 8 [0]
      <idle> : block_rq_complete: 8,16 R () 1519848 + 8 [0]
      <idle> : block_rq_complete: 8,16 R () 1520432 + 96 [0]
      <idle> : block_rq_complete: 8,16 R () 1520536 + 104 [0]
      <idle> : block_rq_complete: 8,16 R () 1521008 + 112 [0]
      <idle> : block_rq_complete: 8,16 R () 1521440 + 152 [0]
      <idle> : block_rq_complete: 8,16 R () 1521688 + 144 [0]
      <idle> : block_rq_complete: 8,16 R () 1522128 + 192 [0]
      <idle> : block_rq_complete: 8,16 R () 1523256 + 328 [0]
      ...
      
      Note that this issue should be addressed in checkpoint, and some readahead
      flows too.
      Reviewed-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      c718379b
    • J
      f2fs: avoid frequent background GC · 6cb968d9
      Jaegeuk Kim 提交于
      If there is no victim segments selected by background GC, let's wait
      a little bit longer time to collect dirty segments.
      By default, let's give 5 minutes.
      Reviewed-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      6cb968d9
  26. 23 4月, 2013 1 次提交
  27. 09 4月, 2013 2 次提交
    • J
      f2fs: write checkpoint before starting FG_GC · d64f8047
      Jaegeuk Kim 提交于
      In order to be aware of prefree and free sections during FG_GC, let's start with
      write_checkpoint().
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      d64f8047
    • J
      f2fs: introduce a new global lock scheme · 39936837
      Jaegeuk Kim 提交于
      In the previous version, f2fs uses global locks according to the usage types,
      such as directory operations, block allocation, block write, and so on.
      
      Reference the following lock types in f2fs.h.
      enum lock_type {
      	RENAME,		/* for renaming operations */
      	DENTRY_OPS,	/* for directory operations */
      	DATA_WRITE,	/* for data write */
      	DATA_NEW,	/* for data allocation */
      	DATA_TRUNC,	/* for data truncate */
      	NODE_NEW,	/* for node allocation */
      	NODE_TRUNC,	/* for node truncate */
      	NODE_WRITE,	/* for node write */
      	NR_LOCK_TYPE,
      };
      
      In that case, we lose the performance under the multi-threading environment,
      since every types of operations must be conducted one at a time.
      
      In order to address the problem, let's share the locks globally with a mutex
      array regardless of any types.
      So, let users grab a mutex and perform their jobs in parallel as much as
      possbile.
      
      For this, I propose a new global lock scheme as follows.
      
      0. Data structure
       - f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS]
       - f2fs_sb_info -> node_write
      
      1. mutex_lock_op(sbi)
       - try to get an avaiable lock from the array.
       - returns the index of the gottern lock variable.
      
      2. mutex_unlock_op(sbi, index of the lock)
       - unlock the given index of the lock.
      
      3. mutex_lock_all(sbi)
       - grab all the locks in the array before the checkpoint.
      
      4. mutex_unlock_all(sbi)
       - release all the locks in the array after checkpoint.
      
      5. block_operations()
       - call mutex_lock_all()
       - sync_dirty_dir_inodes()
       - grab node_write
       - sync_node_pages()
      
      Note that,
       the pairs of mutex_lock_op()/mutex_unlock_op() and
       mutex_lock_all()/mutex_unlock_all() should be used together.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      39936837