1. 08 1月, 2014 1 次提交
    • J
      f2fs: improve write performance under frequent fsync calls · fb5566da
      Jaegeuk Kim 提交于
      When considering a bunch of data writes with very frequent fsync calls, we
      are able to think the following performance regression.
      
      N: Node IO, D: Data IO, IO scheduler: cfq
      
      Issue    pending IOs
      	 D1 D2 D3 D4
       D1         D2 D3 D4 N1
       D2            D3 D4 N1 N2
       N1            D3 D4 N2 D1
       --> N1 can be selected by cfq becase of the same priority of N and D.
           Then D3 and D4 would be delayed, resuling in performance degradation.
      
      So, when processing the fsync call, it'd better give higher priority to data IOs
      than node IOs by assigning WRITE and WRITE_SYNC respectively.
      This patch improves the random wirte performance with frequent fsync calls by up
      to 10%.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      fb5566da
  2. 06 1月, 2014 3 次提交
    • J
      f2fs: add inline_data recovery routine · 1e1bb4ba
      Jaegeuk Kim 提交于
      This patch adds a inline_data recovery routine with the following policy.
      
      [prev.] [next] of inline_data flag
         o       o  -> recover inline_data
         o       x  -> remove inline_data, and then recover data blocks
         x       o  -> remove inline_data, and then recover inline_data
         x       x  -> recover data blocks
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      1e1bb4ba
    • J
      f2fs: refactor f2fs_convert_inline_data · 9e09fc85
      Jaegeuk Kim 提交于
      Change log from v1:
       o handle NULL pointer of grab_cache_page_write_begin() pointed by Chao Yu.
      
      This patch refactors f2fs_convert_inline_data to check a couple of conditions
      internally for deciding whether it needs to convert inline_data or not.
      
      So, the new f2fs_convert_inline_data initially checks:
      1) f2fs_has_inline_data(), and
      2) the data size to be changed.
      
      If the inode has inline_data but the size to fill is less than MAX_INLINE_DATA,
      then we don't need to convert the inline_data with data allocation.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      9e09fc85
    • J
      f2fs: convert inline_data for punch_hole · 8230a0a4
      Jaegeuk Kim 提交于
      In the punch_hole(), let's convert inline_data all the time for simplicity and
      to avoid potential deadlock conditions.
      It is pretty much not a big deal to do this.
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      8230a0a4
  3. 27 12月, 2013 1 次提交
  4. 26 12月, 2013 1 次提交
    • H
      f2fs: handle inline data operations · 9ffe0fb5
      Huajun Li 提交于
      Hook inline data read/write, truncate, fallocate, setattr, etc.
      
      Files need meet following 2 requirement to inline:
       1) file size is not greater than MAX_INLINE_DATA;
       2) file doesn't pre-allocate data blocks by fallocate().
      
      FI_INLINE_DATA will not be set while creating a new regular inode because
      most of the files are bigger than ~3.4K. Set FI_INLINE_DATA only when
      data is submitted to block layer, ranther than set it while creating a new
      inode, this also avoids converting data from inline to normal data block
      and vice versa.
      
      While writting inline data to inode block, the first data block should be
      released if the file has a block indexed by i_addr[0].
      
      On the other hand, when a file operation is appied to a file with inline
      data, we need to test if this file can remain inline by doing this
      operation, otherwise it should be convert into normal file by reserving
      a new data block, copying inline data to this new block and clear
      FI_INLINE_DATA flag. Because reserve a new data block here will make use
      of i_addr[0], if we save inline data in i_addr[0..872], then the first
      4 bytes would be overwriten. This problem can be avoided simply by
      not using i_addr[0] for inline data.
      Signed-off-by: NHuajun Li <huajun.li@intel.com>
      Signed-off-by: NHaicheng Li <haicheng.li@linux.intel.com>
      Signed-off-by: NWeihong Xu <weihong.xu@intel.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      9ffe0fb5
  5. 23 12月, 2013 3 次提交
  6. 31 10月, 2013 1 次提交
  7. 29 10月, 2013 1 次提交
  8. 25 10月, 2013 1 次提交
  9. 07 10月, 2013 1 次提交
    • G
      f2fs: use rw_sem instead of fs_lock(locks mutex) · e479556b
      Gu Zheng 提交于
      The fs_locks is used to block other ops(ex, recovery) when doing checkpoint.
      And each other operate routine(besides checkpoint) needs to acquire a fs_lock,
      there is a terrible problem here, if these are too many concurrency threads acquiring
      fs_lock, so that they will block each other and may lead to some performance problem,
      but this is not the phenomenon we want to see.
      Though there are some optimization patches introduced to enhance the usage of fs_lock,
      but the thorough solution is using a *rw_sem* to replace the fs_lock.
      Checkpoint routine takes write_sem, and other ops take read_sem, so that we can block
      other ops(ex, recovery) when doing checkpoint, and other ops will not disturb each other,
      this can avoid the problem described above completely.
      Because of the weakness of rw_sem, the above change may introduce a potential problem
      that the checkpoint thread might get starved if other threads are intensively locking
      the read semaphore for I/O.(Pointed out by Xu Jin)
      In order to avoid this, a wait_list is introduced, the appending read semaphore ops
      will be dropped into the wait_list if checkpoint thread is waiting for write semaphore,
      and will be waked up when checkpoint thread gives up write semaphore.
      Thanks to Kim's previous review and test, and will be very glad to see other guys'
      performance tests about this patch.
      
      V2:
        -fix the potential starvation problem.
        -use more suitable func name suggested by Xu Jin.
      Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
      [Jaegeuk Kim: adjust minor coding standard]
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      e479556b
  10. 26 8月, 2013 1 次提交
    • J
      f2fs: reserve the xattr space dynamically · de93653f
      Jaegeuk Kim 提交于
      This patch enables the number of direct pointers inside on-disk inode block to
      be changed dynamically according to the size of inline xattr space.
      
      The number of direct pointers, ADDRS_PER_INODE, can be changed only if the file
      has inline xattr flag.
      
      The number of direct pointers that will be used by inline xattrs is defined as
      F2FS_INLINE_XATTR_ADDRS.
      Current patch assigns F2FS_INLINE_XATTR_ADDRS to 0 temporarily.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      de93653f
  11. 09 8月, 2013 2 次提交
  12. 30 7月, 2013 3 次提交
  13. 14 6月, 2013 4 次提交
  14. 11 6月, 2013 1 次提交
    • J
      f2fs: fix i_blocks translation on various types of files · 2d4d9fb5
      Jaegeuk Kim 提交于
      Basically an inode manages the number of allocated blocks with inode->i_blocks
      which is represented in a unit of sectors, not file system blocks.
      But, f2fs has used i_blocks in a unit of file system blocks, and f2fs_getattr
      translates it to the number of sectors when fstat is called.
      
      However, previously f2fs_file_inode_operations only has this, so this patch adds
      it to all the types of inode_operations.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      2d4d9fb5
  15. 28 5月, 2013 4 次提交
    • J
      f2fs: reuse the locked dnode page and its inode · b292dcab
      Jaegeuk Kim 提交于
      This patch fixes the following deadlock bug during the recovery.
      
      INFO: task mount:1322 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      mount           D ffffffff81125870     0  1322   1266 0x00000000
       ffff8801207e39d8 0000000000000046 ffff88012ab1dee0 0000000000000046
       ffff8801207e3a08 ffff880115903f40 ffff8801207e3fd8 ffff8801207e3fd8
       ffff8801207e3fd8 ffff880115903f40 ffff8801207e39d8 ffff88012fc94520
      Call Trace:
      [<ffffffff81125870>] ? __lock_page+0x70/0x70
      [<ffffffff816a92d9>] schedule+0x29/0x70
      [<ffffffff816a93af>] io_schedule+0x8f/0xd0
      [<ffffffff8112587e>] sleep_on_page+0xe/0x20
      [<ffffffff816a649a>] __wait_on_bit_lock+0x5a/0xc0
      [<ffffffff81125867>] __lock_page+0x67/0x70
      [<ffffffff8106c7b0>] ? autoremove_wake_function+0x40/0x40
      [<ffffffff81126857>] find_lock_page+0x67/0x80
      [<ffffffff8112698f>] find_or_create_page+0x3f/0xb0
      [<ffffffffa03901a8>] ? sync_inode_page+0xa8/0xd0 [f2fs]
      [<ffffffffa038fdf7>] get_node_page+0x67/0x180 [f2fs]
      [<ffffffffa039818b>] recover_fsync_data+0xacb/0xff0 [f2fs]
      [<ffffffff816aaa1e>] ? _raw_spin_unlock+0x3e/0x40
      [<ffffffffa0389634>] f2fs_fill_super+0x7d4/0x850 [f2fs]
      [<ffffffff81184cf9>] mount_bdev+0x1c9/0x210
      [<ffffffffa0388e60>] ? validate_superblock+0x180/0x180 [f2fs]
      [<ffffffffa0387635>] f2fs_mount+0x15/0x20 [f2fs]
      [<ffffffff81185a13>] mount_fs+0x43/0x1b0
      [<ffffffff81145ba0>] ? __alloc_percpu+0x10/0x20
      [<ffffffff811a0796>] vfs_kern_mount+0x76/0x120
      [<ffffffff811a2cb7>] do_mount+0x237/0xa10
      [<ffffffff81140b9b>] ? strndup_user+0x5b/0x80
      [<ffffffff811a3520>] SyS_mount+0x90/0xe0
      [<ffffffff816b3502>] system_call_fastpath+0x16/0x1b
      
      The bug is triggered when check_index_in_prev_nodes tries to get the direct
      node page by calling get_node_page.
      At this point, if the direct node page is already locked by get_dnode_of_data,
      its caller, we got a deadlock condition.
      
      This patch adds additional condition check for the reuse of locked direct node
      pages prior to the get_node_page call.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      b292dcab
    • J
      f2fs: add f2fs_readonly() · 77888c1e
      Jaegeuk Kim 提交于
      Introduce a simple macro function for readability.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      77888c1e
    • N
      f2fs: reorganize f2fs_vm_page_mkwrite · 9851e6e1
      Namjae Jeon 提交于
      Few things can be changed in the default mkwrite function
      1) Make file_update_time at the start before acquiring any lock
      2) the condition page_offset(page) >= i_size_read(inode) should be
       changed to page_offset(page) > i_size_read
      3) Move wait_on_page_writeback.
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NAmit Sahrawat <a.sahrawat@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      9851e6e1
    • J
      f2fs: change get_new_data_page to pass a locked node page · 64aa7ed9
      Jaegeuk Kim 提交于
      This patch is for passing a locked node page to get_dnode_of_data.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      64aa7ed9
  16. 29 4月, 2013 1 次提交
  17. 26 4月, 2013 1 次提交
    • J
      f2fs: give a chance to merge IOs by IO scheduler · c718379b
      Jaegeuk Kim 提交于
      Previously, background GC submits many 4KB read requests to load victim blocks
      and/or its (i)node blocks.
      
      ...
      f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb61, blkaddr = 0x3b964ed
      f2fs_gc : block_rq_complete: 8,16 R () 499854968 + 8 [0]
      f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb6f, blkaddr = 0x3b964ee
      f2fs_gc : block_rq_complete: 8,16 R () 499854976 + 8 [0]
      f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb79, blkaddr = 0x3b964ef
      f2fs_gc : block_rq_complete: 8,16 R () 499854984 + 8 [0]
      ...
      
      However, by the fact that many IOs are sequential, we can give a chance to merge
      the IOs by IO scheduler.
      In order to do that, let's use blk_plug.
      
      ...
      f2fs_gc : f2fs_iget: ino = 143
      f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c6, blkaddr = 0x2e6ee
      f2fs_gc : f2fs_iget: ino = 143
      f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c7, blkaddr = 0x2e6ef
      <idle> : block_rq_complete: 8,16 R () 1519616 + 8 [0]
      <idle> : block_rq_complete: 8,16 R () 1519848 + 8 [0]
      <idle> : block_rq_complete: 8,16 R () 1520432 + 96 [0]
      <idle> : block_rq_complete: 8,16 R () 1520536 + 104 [0]
      <idle> : block_rq_complete: 8,16 R () 1521008 + 112 [0]
      <idle> : block_rq_complete: 8,16 R () 1521440 + 152 [0]
      <idle> : block_rq_complete: 8,16 R () 1521688 + 144 [0]
      <idle> : block_rq_complete: 8,16 R () 1522128 + 192 [0]
      <idle> : block_rq_complete: 8,16 R () 1523256 + 328 [0]
      ...
      
      Note that this issue should be addressed in checkpoint, and some readahead
      flows too.
      Reviewed-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      c718379b
  18. 23 4月, 2013 3 次提交
  19. 10 4月, 2013 1 次提交
  20. 09 4月, 2013 2 次提交
    • J
      f2fs: introduce a new global lock scheme · 39936837
      Jaegeuk Kim 提交于
      In the previous version, f2fs uses global locks according to the usage types,
      such as directory operations, block allocation, block write, and so on.
      
      Reference the following lock types in f2fs.h.
      enum lock_type {
      	RENAME,		/* for renaming operations */
      	DENTRY_OPS,	/* for directory operations */
      	DATA_WRITE,	/* for data write */
      	DATA_NEW,	/* for data allocation */
      	DATA_TRUNC,	/* for data truncate */
      	NODE_NEW,	/* for node allocation */
      	NODE_TRUNC,	/* for node truncate */
      	NODE_WRITE,	/* for node write */
      	NR_LOCK_TYPE,
      };
      
      In that case, we lose the performance under the multi-threading environment,
      since every types of operations must be conducted one at a time.
      
      In order to address the problem, let's share the locks globally with a mutex
      array regardless of any types.
      So, let users grab a mutex and perform their jobs in parallel as much as
      possbile.
      
      For this, I propose a new global lock scheme as follows.
      
      0. Data structure
       - f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS]
       - f2fs_sb_info -> node_write
      
      1. mutex_lock_op(sbi)
       - try to get an avaiable lock from the array.
       - returns the index of the gottern lock variable.
      
      2. mutex_unlock_op(sbi, index of the lock)
       - unlock the given index of the lock.
      
      3. mutex_lock_all(sbi)
       - grab all the locks in the array before the checkpoint.
      
      4. mutex_unlock_all(sbi)
       - release all the locks in the array after checkpoint.
      
      5. block_operations()
       - call mutex_lock_all()
       - sync_dirty_dir_inodes()
       - grab node_write
       - sync_node_pages()
      
      Note that,
       the pairs of mutex_lock_op()/mutex_unlock_op() and
       mutex_lock_all()/mutex_unlock_all() should be used together.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      39936837
    • J
      f2fs: move f2fs_balance_fs from truncate to punch_hole · 1127a3d4
      Jason Hrycay 提交于
      Move the f2fs_balance_fs out of the truncate_hole function and only
      perform that in punch_hole use case.  The commit:
      
        ed60b1644e7f7e5dd67d21caf7e4425dff05dad0
      
      intended to do this but moved it into truncate_hole to cover more
      cases.  However, a deadlock scenario is possible when deleting an inode
      entry under specific conditions:
      
       f2fs_delete_entry()
           mutex_lock_op(sbi, DENTRY_OPS);
           truncate_hole()
               f2fs_balance_fs()
                   mutex_lock(&sbi->gc_mutex);
                   f2fs_gc()
                       write_checkpoint()
                           block_operations()
                               mutex_lock_op(sbi, DENTRY_OPS);
      
      Lets move it into the punch_hole case to cover the original intent of
      avoiding it during fallocate's expand_inode_data case.
      
      Change-Id: I29f8ea1056b0b88b70ba8652d901b6e8431bb27e
      Signed-off-by: NJason Hrycay <jason.hrycay@motorola.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      1127a3d4
  21. 27 3月, 2013 2 次提交
    • J
      f2fs: fix to give correct parent inode number for roll forward · 953a3e27
      Jaegeuk Kim 提交于
      When we recover fsync'ed data after power-off-recovery, we should guarantee
      that any parent inode number should be correct for each direct inode blocks.
      
      So, let's make the following rules.
      
      - The fsync should do checkpoint to all the inodes that were experienced hard
      links.
      
      - So, the only normal files can be recovered by roll-forward.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      953a3e27
    • J
      f2fs: do not skip writing file meta during fsync · 0ff153a2
      Jaegeuk Kim 提交于
      This patch removes data_version check flow during the fsync call.
      The original purpose for the use of data_version was to avoid writng inode
      pages redundantly by the fsync calls repeatedly.
      However, when user can modify file meta and then call fsync, we should not
      skip fsync procedure.
      So, let's remove this condition check and hope that user triggers in right
      manner.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      0ff153a2
  22. 20 3月, 2013 1 次提交
  23. 18 3月, 2013 1 次提交
    • J
      f2fs: introduce readahead mode of node pages · 266e97a8
      Jaegeuk Kim 提交于
      Previously, f2fs reads several node pages ahead when get_dnode_of_data is called
      with RDONLY_NODE flag.
      And, this flag is set by the following functions.
      - get_data_block_ro
      - get_lock_data_page
      - do_write_data_page
      - truncate_blocks
      - truncate_hole
      
      However, this readahead mechanism is initially introduced for the use of
      get_data_block_ro to enhance the sequential read performance.
      
      So, let's clarify all the cases with the additional modes as follows.
      
      enum {
      	ALLOC_NODE,	/* allocate a new node page if needed */
      	LOOKUP_NODE,	/* look up a node without readahead */
      	LOOKUP_NODE_RA,	/*
      			 * look up a node with readahead called
      			 * by get_datablock_ro.
      			 */
      }
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      Reviewed-by: NNamjae Jeon <namjae.jeon@samsung.com>
      266e97a8