1. 17 2月, 2014 2 次提交
    • J
      f2fs: clean up with a macro · 491c0854
      Jaegeuk Kim 提交于
      This patch adds GET_BLKOFF_FROM_SEG0 to clean up some codes.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      491c0854
    • J
      f2fs: fix to recover xattr node block · abb2366c
      Jaegeuk Kim 提交于
      If a new xattr node page was allocated and its inode is fsynced, we should
      recover the xattr node page during the roll-forward process after power-cut.
      But, previously, f2fs didn't handle that case, resulting in kernel panic as
      follows reported by Tom Li.
      
      BUG: unable to handle kernel paging request at ffffc9001c861a98
      IP: [<ffffffffa0295236>] check_index_in_prev_nodes+0x86/0x2d0 [f2fs]
      Call Trace:
       [<ffffffff815ece9b>] ? printk+0x48/0x4a
       [<ffffffffa029626a>] recover_fsync_data+0xdca/0xf50 [f2fs]
       [<ffffffffa02873ae>] f2fs_fill_super+0x92e/0x970 [f2fs]
       [<ffffffff8112c9f8>] mount_bdev+0x1b8/0x200
       [<ffffffffa0286a80>] ? f2fs_remount+0x130/0x130 [f2fs]
       [<ffffffffa0285e40>] f2fs_mount+0x10/0x20 [f2fs]
       [<ffffffff8112d4de>] mount_fs+0x3e/0x1b0
       [<ffffffff810ef4eb>] ? __alloc_percpu+0xb/0x10
       [<ffffffff8114761f>] vfs_kern_mount+0x6f/0x120
       [<ffffffff811497b9>] do_mount+0x259/0xa90
       [<ffffffff810ead1d>] ? memdup_user+0x3d/0x80
       [<ffffffff810eadb3>] ? strndup_user+0x53/0x70
       [<ffffffff8114a2c9>] SyS_mount+0x89/0xd0
       [<ffffffff815feae2>] system_call_fastpath+0x16/0x1b
      
      This patch adds a recovery function of xattr node pages.
      Reported-by: NTom Li <biergaizi@members.fsf.org>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      abb2366c
  2. 20 1月, 2014 1 次提交
  3. 06 1月, 2014 1 次提交
    • J
      f2fs: add inline_data recovery routine · 1e1bb4ba
      Jaegeuk Kim 提交于
      This patch adds a inline_data recovery routine with the following policy.
      
      [prev.] [next] of inline_data flag
         o       o  -> recover inline_data
         o       x  -> remove inline_data, and then recover data blocks
         x       o  -> remove inline_data, and then recover inline_data
         x       x  -> recover data blocks
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      1e1bb4ba
  4. 26 12月, 2013 2 次提交
  5. 23 12月, 2013 4 次提交
  6. 29 10月, 2013 1 次提交
  7. 25 10月, 2013 1 次提交
  8. 07 10月, 2013 1 次提交
    • G
      f2fs: use rw_sem instead of fs_lock(locks mutex) · e479556b
      Gu Zheng 提交于
      The fs_locks is used to block other ops(ex, recovery) when doing checkpoint.
      And each other operate routine(besides checkpoint) needs to acquire a fs_lock,
      there is a terrible problem here, if these are too many concurrency threads acquiring
      fs_lock, so that they will block each other and may lead to some performance problem,
      but this is not the phenomenon we want to see.
      Though there are some optimization patches introduced to enhance the usage of fs_lock,
      but the thorough solution is using a *rw_sem* to replace the fs_lock.
      Checkpoint routine takes write_sem, and other ops take read_sem, so that we can block
      other ops(ex, recovery) when doing checkpoint, and other ops will not disturb each other,
      this can avoid the problem described above completely.
      Because of the weakness of rw_sem, the above change may introduce a potential problem
      that the checkpoint thread might get starved if other threads are intensively locking
      the read semaphore for I/O.(Pointed out by Xu Jin)
      In order to avoid this, a wait_list is introduced, the appending read semaphore ops
      will be dropped into the wait_list if checkpoint thread is waiting for write semaphore,
      and will be waked up when checkpoint thread gives up write semaphore.
      Thanks to Kim's previous review and test, and will be very glad to see other guys'
      performance tests about this patch.
      
      V2:
        -fix the potential starvation problem.
        -use more suitable func name suggested by Xu Jin.
      Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
      [Jaegeuk Kim: adjust minor coding standard]
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      e479556b
  9. 25 9月, 2013 2 次提交
  10. 26 8月, 2013 1 次提交
    • J
      f2fs: reserve the xattr space dynamically · de93653f
      Jaegeuk Kim 提交于
      This patch enables the number of direct pointers inside on-disk inode block to
      be changed dynamically according to the size of inline xattr space.
      
      The number of direct pointers, ADDRS_PER_INODE, can be changed only if the file
      has inline xattr flag.
      
      The number of direct pointers that will be used by inline xattrs is defined as
      F2FS_INLINE_XATTR_ADDRS.
      Current patch assigns F2FS_INLINE_XATTR_ADDRS to 0 temporarily.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      de93653f
  11. 19 8月, 2013 1 次提交
  12. 09 8月, 2013 1 次提交
  13. 30 7月, 2013 1 次提交
  14. 02 7月, 2013 2 次提交
  15. 07 6月, 2013 1 次提交
    • J
      f2fs: fix iget/iput of dir during recovery · 5deb8267
      Jaegeuk Kim 提交于
      It is possible that iput is skipped after iget during the recovery.
      
      In recover_dentry(),
       dir = f2fs_iget();
       ...
       if (de && inode->i_ino == le32_to_cpu(de->ino))
      	goto out;
      
      In this case, this dir is not able to be added in dirty_dir_inode_list.
      The actual linking is done only when set_page_dirty() is called.
      
      So let's add this newly got inode into the list explicitly, and put it at the
      end of the recovery routine.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      5deb8267
  16. 28 5月, 2013 11 次提交
    • J
      f2fs: fix dentry recovery routine · 6b8213d9
      Jaegeuk Kim 提交于
      The error scenario is:
      1. create /a
      (1.a link /a /b)
      2. sync
      3. unlinke /a
      4. create /a
      5. fsync /a
      6. Sudden power-off
      
      When the f2fs recovers the fsynced dentry, /a, we discover an exsiting dentry at
      f2fs_find_entry() in recover_dentry().
      
      In such the case, we should unlink the existing dentry and its inode
      and then recover newly created dentry.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      6b8213d9
    • D
      f2fs: dereferencing an ERR_PTR · f28c06fa
      Dan Carpenter 提交于
      There is an error path where "dir" is an ERR_PTR.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      f28c06fa
    • J
      f2fs: fix to handle do_recover_data errors · 39cf72cf
      Jaegeuk Kim 提交于
      This patch adds error handling codes of check_index_in_prev_nodes and its
      caller, do_recover_data.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      39cf72cf
    • J
      f2fs: reuse the locked dnode page and its inode · b292dcab
      Jaegeuk Kim 提交于
      This patch fixes the following deadlock bug during the recovery.
      
      INFO: task mount:1322 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      mount           D ffffffff81125870     0  1322   1266 0x00000000
       ffff8801207e39d8 0000000000000046 ffff88012ab1dee0 0000000000000046
       ffff8801207e3a08 ffff880115903f40 ffff8801207e3fd8 ffff8801207e3fd8
       ffff8801207e3fd8 ffff880115903f40 ffff8801207e39d8 ffff88012fc94520
      Call Trace:
      [<ffffffff81125870>] ? __lock_page+0x70/0x70
      [<ffffffff816a92d9>] schedule+0x29/0x70
      [<ffffffff816a93af>] io_schedule+0x8f/0xd0
      [<ffffffff8112587e>] sleep_on_page+0xe/0x20
      [<ffffffff816a649a>] __wait_on_bit_lock+0x5a/0xc0
      [<ffffffff81125867>] __lock_page+0x67/0x70
      [<ffffffff8106c7b0>] ? autoremove_wake_function+0x40/0x40
      [<ffffffff81126857>] find_lock_page+0x67/0x80
      [<ffffffff8112698f>] find_or_create_page+0x3f/0xb0
      [<ffffffffa03901a8>] ? sync_inode_page+0xa8/0xd0 [f2fs]
      [<ffffffffa038fdf7>] get_node_page+0x67/0x180 [f2fs]
      [<ffffffffa039818b>] recover_fsync_data+0xacb/0xff0 [f2fs]
      [<ffffffff816aaa1e>] ? _raw_spin_unlock+0x3e/0x40
      [<ffffffffa0389634>] f2fs_fill_super+0x7d4/0x850 [f2fs]
      [<ffffffff81184cf9>] mount_bdev+0x1c9/0x210
      [<ffffffffa0388e60>] ? validate_superblock+0x180/0x180 [f2fs]
      [<ffffffffa0387635>] f2fs_mount+0x15/0x20 [f2fs]
      [<ffffffff81185a13>] mount_fs+0x43/0x1b0
      [<ffffffff81145ba0>] ? __alloc_percpu+0x10/0x20
      [<ffffffff811a0796>] vfs_kern_mount+0x76/0x120
      [<ffffffff811a2cb7>] do_mount+0x237/0xa10
      [<ffffffff81140b9b>] ? strndup_user+0x5b/0x80
      [<ffffffff811a3520>] SyS_mount+0x90/0xe0
      [<ffffffff816b3502>] system_call_fastpath+0x16/0x1b
      
      The bug is triggered when check_index_in_prev_nodes tries to get the direct
      node page by calling get_node_page.
      At this point, if the direct node page is already locked by get_dnode_of_data,
      its caller, we got a deadlock condition.
      
      This patch adds additional condition check for the reuse of locked direct node
      pages prior to the get_node_page call.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      b292dcab
    • J
      f2fs: don't do checkpoint if error is occurred · 2c2c149f
      Jaegeuk Kim 提交于
      If we met an error during the dentry recovery, we should not conduct checkpoint.
      Otherwise, some errorneous dentry blocks overwrites the existing blocks that
      contain the remaining recovery information.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      2c2c149f
    • J
      f2fs: fix to unlock page before exit · 45856aff
      Jaegeuk Kim 提交于
      If we got an error after lock_page, we should unlock it before exit.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      45856aff
    • J
      f2fs: remove unnecessary kmap/kunmap operations · 9a55ed65
      Jaegeuk Kim 提交于
      The allocated page used by the recovery is not on HIGHMEM, so that we don't
      need to use kmap/kunmap.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      9a55ed65
    • J
      f2fs: add debug msgs in the recovery routine · f356fe0c
      Jaegeuk Kim 提交于
      This patch adds some trivial debugging messages in the recovery process.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      f356fe0c
    • J
      f2fs: fix BUG_ON during f2fs_evict_inode(dir) · 74d0b917
      Jaegeuk Kim 提交于
      During the dentry recovery routine, recover_inode() triggers __f2fs_add_link
      with its directory inode.
      
      In the following scenario, a bug is captured.
       1. dir = f2fs_iget(pino)
       2. __f2fs_add_link(dir, name)
       3. iput(dir)
        -> f2fs_evict_inode() faces with BUG_ON(atomic_read(fi->dirty_dents))
      
      Kernel BUG at ffffffffa01c0676 [verbose debug info unavailable]
      [<ffffffffa01c0676>] f2fs_evict_inode+0x276/0x300 [f2fs]
      Call Trace:
       [<ffffffff8118ea00>] evict+0xb0/0x1b0
       [<ffffffff8118f1c5>] iput+0x105/0x190
       [<ffffffffa01d2dac>] recover_fsync_data+0x3bc/0x1070 [f2fs]
       [<ffffffff81692e8a>] ? io_schedule+0xaa/0xd0
       [<ffffffff81690acb>] ? __wait_on_bit_lock+0x7b/0xc0
       [<ffffffff8111a0e7>] ? __lock_page+0x67/0x70
       [<ffffffff81165e21>] ? kmem_cache_alloc+0x31/0x140
       [<ffffffff8118a502>] ? __d_instantiate+0x92/0xf0
       [<ffffffff812a949b>] ? security_d_instantiate+0x1b/0x30
       [<ffffffff8118a5b4>] ? d_instantiate+0x54/0x70
      
      This means that we should flush all the dentry pages between iget and iput().
      But, during the recovery routine, it is unallowed due to consistency, so we
      have to wait the whole recovery process.
      And then, write_checkpoint flushes all the dirty dentry blocks, and nicely we
      can put the stale dir inodes from the dirty_dir_inode_list.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      74d0b917
    • J
      f2fs: fix por_doing variable coverage · 8c26d7d5
      Jaegeuk Kim 提交于
      The reason of using sbi->por_doing is to alleviate data writes during the
      recovery.
      The find_fsync_dnodes() produces some dirty dentry pages, so we should
      cover it too with sbi->por_doing.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      8c26d7d5
    • J
      f2fs: remove redundant assignment · addbe45b
      Jaegeuk Kim 提交于
      We don't need to assign a value redundantly.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      addbe45b
  17. 08 5月, 2013 1 次提交
  18. 09 4月, 2013 1 次提交
    • J
      f2fs: introduce a new global lock scheme · 39936837
      Jaegeuk Kim 提交于
      In the previous version, f2fs uses global locks according to the usage types,
      such as directory operations, block allocation, block write, and so on.
      
      Reference the following lock types in f2fs.h.
      enum lock_type {
      	RENAME,		/* for renaming operations */
      	DENTRY_OPS,	/* for directory operations */
      	DATA_WRITE,	/* for data write */
      	DATA_NEW,	/* for data allocation */
      	DATA_TRUNC,	/* for data truncate */
      	NODE_NEW,	/* for node allocation */
      	NODE_TRUNC,	/* for node truncate */
      	NODE_WRITE,	/* for node write */
      	NR_LOCK_TYPE,
      };
      
      In that case, we lose the performance under the multi-threading environment,
      since every types of operations must be conducted one at a time.
      
      In order to address the problem, let's share the locks globally with a mutex
      array regardless of any types.
      So, let users grab a mutex and perform their jobs in parallel as much as
      possbile.
      
      For this, I propose a new global lock scheme as follows.
      
      0. Data structure
       - f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS]
       - f2fs_sb_info -> node_write
      
      1. mutex_lock_op(sbi)
       - try to get an avaiable lock from the array.
       - returns the index of the gottern lock variable.
      
      2. mutex_unlock_op(sbi, index of the lock)
       - unlock the given index of the lock.
      
      3. mutex_lock_all(sbi)
       - grab all the locks in the array before the checkpoint.
      
      4. mutex_unlock_all(sbi)
       - release all the locks in the array after checkpoint.
      
      5. block_operations()
       - call mutex_lock_all()
       - sync_dirty_dir_inodes()
       - grab node_write
       - sync_node_pages()
      
      Note that,
       the pairs of mutex_lock_op()/mutex_unlock_op() and
       mutex_lock_all()/mutex_unlock_all() should be used together.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      39936837
  19. 27 3月, 2013 1 次提交
  20. 20 3月, 2013 1 次提交
  21. 18 3月, 2013 1 次提交
    • J
      f2fs: introduce readahead mode of node pages · 266e97a8
      Jaegeuk Kim 提交于
      Previously, f2fs reads several node pages ahead when get_dnode_of_data is called
      with RDONLY_NODE flag.
      And, this flag is set by the following functions.
      - get_data_block_ro
      - get_lock_data_page
      - do_write_data_page
      - truncate_blocks
      - truncate_hole
      
      However, this readahead mechanism is initially introduced for the use of
      get_data_block_ro to enhance the sequential read performance.
      
      So, let's clarify all the cases with the additional modes as follows.
      
      enum {
      	ALLOC_NODE,	/* allocate a new node page if needed */
      	LOOKUP_NODE,	/* look up a node without readahead */
      	LOOKUP_NODE_RA,	/*
      			 * look up a node with readahead called
      			 * by get_datablock_ro.
      			 */
      }
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      Reviewed-by: NNamjae Jeon <namjae.jeon@samsung.com>
      266e97a8
  22. 12 2月, 2013 2 次提交
    • J
      f2fs: clarify and enhance the f2fs_gc flow · 43727527
      Jaegeuk Kim 提交于
      This patch makes clearer the ambiguous f2fs_gc flow as follows.
      
      1. Remove intermediate checkpoint condition during f2fs_gc
       (i.e., should_do_checkpoint() and GC_BLOCKED)
      
      2. Remove unnecessary return values of f2fs_gc because of #1.
       (i.e., GC_NODE, GC_OK, etc)
      
      3. Simplify write_checkpoint() because of #2.
      
      4. Clarify the main f2fs_gc flow.
       o monitor how many freed sections during one iteration of do_garbage_collect().
       o do GC more without checkpoints if we can't get enough free sections.
       o do checkpoint once we've got enough free sections through forground GCs.
      
      5. Adopt thread-logging (Slack-Space-Recycle) scheme more aggressively on data
        log types. See. get_ssr_segement()
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      43727527
    • J
      f2fs: avoid balanc_fs during evict_inode · d4686d56
      Jaegeuk Kim 提交于
      1. Background
      
      Previously, if f2fs tries to move data blocks of an *evicting* inode during the
      cleaning process, it stops the process incompletely and then restarts the whole
      process, since it needs a locked inode to grab victim data pages in its address
      space. In order to get a locked inode, iget_locked() by f2fs_iget() is normally
      used, but, it waits if the inode is on freeing.
      
      So, here is a deadlock scenario.
      1. f2fs_evict_inode()       <- inode "A"
        2. f2fs_balance_fs()
          3. f2fs_gc()
            4. gc_data_segment()
              5. f2fs_iget()      <- inode "A" too!
      
      If step #1 and #5 treat a same inode "A", step #5 would fall into deadlock since
      the inode "A" is on freeing. In order to resolve this, f2fs_iget_nowait() which
      skips __wait_on_freeing_inode() was introduced in step #5, and stops f2fs_gc()
      to complete f2fs_evict_inode().
      
      1. f2fs_evict_inode()           <- inode "A"
        2. f2fs_balance_fs()
          3. f2fs_gc()
            4. gc_data_segment()
              5. f2fs_iget_nowait()   <- inode "A", then stop f2fs_gc() w/ -ENOENT
      
      2. Problem and Solution
      
      In the above scenario, however, f2fs cannot finish f2fs_evict_inode() only if:
       o there are not enough free sections, and
       o f2fs_gc() tries to move data blocks of the *evicting* inode repeatedly.
      
      So, the final solution is to use f2fs_iget() and remove f2fs_balance_fs() in
      f2fs_evict_inode().
      The f2fs_evict_inode() actually truncates all the data and node blocks, which
      means that it doesn't produce any dirty node pages accordingly.
      So, we don't need to do f2fs_balance_fs() in practical.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      d4686d56