1. 05 3月, 2014 1 次提交
  2. 27 2月, 2014 2 次提交
    • J
      f2fs: introduce large directory support · 38431545
      Jaegeuk Kim 提交于
      This patch introduces an i_dir_level field to support large directory.
      
      Previously, f2fs maintains multi-level hash tables to find a dentry quickly
      from a bunch of chiild dentries in a directory, and the hash tables consist of
      the following tree structure as below.
      
      In Documentation/filesystems/f2fs.txt,
      
      ----------------------
      A : bucket
      B : block
      N : MAX_DIR_HASH_DEPTH
      ----------------------
      
      level #0   | A(2B)
                 |
      level #1   | A(2B) - A(2B)
                 |
      level #2   | A(2B) - A(2B) - A(2B) - A(2B)
           .     |   .       .       .       .
      level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
           .     |   .       .       .       .
      level #N   | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)
      
      But, if we can guess that a directory will handle a number of child files,
      we don't need to traverse the tree from level #0 to #N all the time.
      Since the lower level tables contain relatively small number of dentries,
      the miss ratio of the target dentry is likely to be high.
      
      In order to avoid that, we can configure the hash tables sparsely from level #0
      like this.
      
      level #0   | A(2B) - A(2B) - A(2B) - A(2B)
      
      level #1   | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
           .     |   .       .       .       .
      level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
           .     |   .       .       .       .
      level #N   | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)
      
      With this structure, we can skip the ineffective tree searches in lower level
      hash tables.
      
      This patch adds just a facility for this by introducing i_dir_level in
      f2fs_inode.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      38431545
    • J
      f2fs: remove costly bit operations for f2fs_find_entry · 5d0c6671
      Jaegeuk Kim 提交于
      It turns out that a bit operation like find_next_bit is not always fast enough
      for f2fs_find_entry.
      Instead, it is pretty much simple and fast to traverse each dentries.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      5d0c6671
  3. 17 2月, 2014 3 次提交
    • J
      f2fs: clean up redundant function call · 1fe54f9d
      Jaegeuk Kim 提交于
      This patch integrates inode_[inc|dec]_dirty_dents with inc_page_count to remove
      redundant calls.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      1fe54f9d
    • J
      f2fs: fix to truncate dentry pages in the error case · bd859c65
      Jaegeuk Kim 提交于
      When a new directory is allocated, if an error is occurred, we should truncate
      preallocated dentry pages too.
      
      This bug was reported by Andrey Tsyvarev after a while as follows.
      
      mkdir()->
       f2fs_add_link()->
        init_inode_metadata()->
          f2fs_init_acl()->
            f2fs_get_acl()->
              f2fs_getxattr()->
                read_all_xattrs() fails.
      
      Also there was a BUG_ON triggered after the fault in
      mkdir()->
       f2fs_add_link()->
         init_inode_metadata()->
          remove_inode_page() ->
            f2fs_bug_on(inode->i_blocks != 0 && inode->i_blocks != 1);
      
      But, previous patch wasn't perfect to resolve that bug, so the following bug
      report was also submitted.
      
      kernel BUG at fs/f2fs/inode.c:274!
      Call Trace:
       [<ffffffff811fde03>] evict+0xa3/0x1a0
       [<ffffffff811fe615>] iput+0xf5/0x180
       [<ffffffffa01c7f63>] f2fs_mkdir+0xf3/0x150 [f2fs]
       [<ffffffff811f2a77>] vfs_mkdir+0xb7/0x160
       [<ffffffff811f36bf>] SyS_mkdir+0x5f/0xc0
       [<ffffffff81680769>] system_call_fastpath+0x16/0x1b
      
      Finally, this patch resolves all the issues like below.
      
      If an error is occurred after make_empty_dir(),
       1. truncate_inode_pages()
         The make_bad_inode() prior to iput() will change i_mode to S_IFREG, which
         means that f2fs will not decrement fi->dirty_dents during f2fs_evict_inode.
         But, by calling it here, we can do that.
      
       2. truncate_blocks()
         Preallocated dentry pages are trucated here to sync i_blocks.
      
       3. remove_dirty_dir_inode()
         Remove this directory inode from the list.
      Reported-and-Tested-by: NAndrey Tsyvarev <tsyvarev@ispras.ru>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      bd859c65
    • J
      f2fs: fix the potential mismatch between dir's i_size and i_blocks · 924a2ddb
      Jaegeuk Kim 提交于
      This is the erroneous scenario.
      
                                   i_size    on-disk i_size    i_blocks
      __f2fs_add_link()             4096           4096           2
       get_new_data_page            8192           4096           3
       -ENOSPC = init_inode_metadata
       checkpoint                     -            4096           3
       POR and reboot
      
      __f2fs_add_link()             4096           4096           3
       page = get_new_data_page (page->index = 1 by NEW_ADDR)
       add a dentry to the page successfully
      
      f2fs_rmdir()
       f2fs_empty_dir()             4096           4096           3
       f2fs_unlink() goes, since there is no valid dentry due to i_size = 4096.
       But, still there is one dentry in page->index = 1.
      
      So this patch moves the code to write dir->i_size into on-disk i_size in order
      to sync dir's i_size, on-disk i_size, and its i_blocks.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      924a2ddb
  4. 22 1月, 2014 2 次提交
  5. 20 1月, 2014 1 次提交
  6. 06 1月, 2014 1 次提交
    • J
      f2fs: handle errors correctly during f2fs_reserve_block · a8865372
      Jaegeuk Kim 提交于
      The get_dnode_of_data nullifies inode and node page when error is occurred.
      
      There are two cases that passes inode page into get_dnode_of_data().
      
      1. make_empty_dir()
          -> get_new_data_page()
            -> f2fs_reserve_block(ipage)
      	-> get_dnode_of_data()
      
      2. f2fs_convert_inline_data()
          -> __f2fs_convert_inline_data()
            -> f2fs_reserve_block(ipage)
      	-> get_dnode_of_data()
      
      This patch adds correct error handling codes when get_dnode_of_data() returns
      an error.
      
      At first, f2fs_reserve_block() calls f2fs_put_dnode() whenever reserve_new_block
      returns an error.
      So, the rule of f2fs_reserve_block() is to nullify inode page when there is any
      error internally.
      
      Finally, two callers of f2fs_reserve_block() should call f2fs_put_dnode()
      appropriately if they got an error since successful f2fs_reserve_block().
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      a8865372
  7. 26 12月, 2013 2 次提交
  8. 23 12月, 2013 3 次提交
  9. 29 10月, 2013 1 次提交
  10. 28 10月, 2013 1 次提交
    • J
      f2fs: fix a deadlock during init_acl procedure · 2ed2d5b3
      Jaegeuk Kim 提交于
      The deadlock is found through the following scenario.
      
      sys_mkdir()
       -> f2fs_add_link()
        -> __f2fs_add_link()
         -> init_inode_metadata()
           : lock_page(inode);
          -> f2fs_init_acl()
           -> f2fs_set_acl()
            -> f2fs_setxattr(..., NULL)
             : This NULL page incurs a deadlock at update_inode_page().
      
      So, likewise f2fs_init_security(), this patch adds a parameter to transfer the
      locked inode page to f2fs_setxattr().
      
      Found by Linux File System Verification project (linuxtesting.org).
      Reported-by: NAlexey Khoroshilov <khoroshilov@ispras.ru>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      2ed2d5b3
  11. 30 7月, 2013 3 次提交
  12. 08 7月, 2013 1 次提交
    • J
      f2fs: fix readdir incorrectness · 99b072bb
      Jaegeuk Kim 提交于
      In the previous Al Viro's readdir patch set, there occurs a bug when
      running
      xfstest: 006 as follows.
      
      [Error output]
      alpha size = 4, name length = 6, total files = 4096, nproc=1
      1023 files created
      rm: cannot remove `/mnt/f2fs/permname.15150/a': Directory not empty
      
      [Correct output]
      alpha size = 4, name length = 6, total files = 4096, nproc=1
      4097 files created
      
      This bug is due to the misupdate of directory position in ctx.
      So, this patch fixes this.
      
      [AV: fixed a braino]
      
      CC: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      99b072bb
  13. 29 6月, 2013 1 次提交
  14. 14 6月, 2013 1 次提交
  15. 12 6月, 2013 1 次提交
    • J
      f2fs: sync dir->i_size with its block allocation · 699489bb
      Jaegeuk Kim 提交于
      If new dentry block is allocated and its i_size is updated, we should update
      its inode block together in order to sync i_size and its block allocation.
      Otherwise, we can loose additional dentry block due to the unconsistent i_size.
      
      Errorneous Scenario
      -------------------
      
      In the recovery routine,
       - recovery_dentry
       | - __f2fs_add_link
       | | - get_new_data_page
       | | | - i_size_write(new_i_size)
       | | | - mark_inode_dirty_sync(dir)
       | | - update_parent_metadata
       | | | - mark_inode_dirty(dir)
       |
       - write_checkpoint
         - sync_dirty_dir_inodes
           - filemap_flush(dentry_blocks)
             - f2fs_write_data_page
               - skip to write the last dentry block due to index < i_size
      
      In the above flow, new_i_size is not updated to its inode block so that the
      last dentry block will be lost accordingly.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      699489bb
  16. 11 6月, 2013 1 次提交
    • J
      f2fs: support xattr security labels · 8ae8f162
      Jaegeuk Kim 提交于
      This patch adds the support of security labels for f2fs, which will be used
      by Linus Security Models (LSMs).
      
      Quote from http://en.wikipedia.org/wiki/Linux_Security_Modules:
      "Linux Security Modules (LSM) is a framework that allows the Linux kernel to
      support a variety of computer security models while avoiding favoritism toward
      any single security implementation. The framework is licensed under the terms of
      the GNU General Public License and is standard part of the Linux kernel since
      Linux 2.6. AppArmor, SELinux, Smack and TOMOYO Linux are the currently accepted
      modules in the official kernel.".
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      8ae8f162
  17. 28 5月, 2013 4 次提交
  18. 26 4月, 2013 1 次提交
    • J
      f2fs: give a chance to merge IOs by IO scheduler · c718379b
      Jaegeuk Kim 提交于
      Previously, background GC submits many 4KB read requests to load victim blocks
      and/or its (i)node blocks.
      
      ...
      f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb61, blkaddr = 0x3b964ed
      f2fs_gc : block_rq_complete: 8,16 R () 499854968 + 8 [0]
      f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb6f, blkaddr = 0x3b964ee
      f2fs_gc : block_rq_complete: 8,16 R () 499854976 + 8 [0]
      f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb79, blkaddr = 0x3b964ef
      f2fs_gc : block_rq_complete: 8,16 R () 499854984 + 8 [0]
      ...
      
      However, by the fact that many IOs are sequential, we can give a chance to merge
      the IOs by IO scheduler.
      In order to do that, let's use blk_plug.
      
      ...
      f2fs_gc : f2fs_iget: ino = 143
      f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c6, blkaddr = 0x2e6ee
      f2fs_gc : f2fs_iget: ino = 143
      f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c7, blkaddr = 0x2e6ef
      <idle> : block_rq_complete: 8,16 R () 1519616 + 8 [0]
      <idle> : block_rq_complete: 8,16 R () 1519848 + 8 [0]
      <idle> : block_rq_complete: 8,16 R () 1520432 + 96 [0]
      <idle> : block_rq_complete: 8,16 R () 1520536 + 104 [0]
      <idle> : block_rq_complete: 8,16 R () 1521008 + 112 [0]
      <idle> : block_rq_complete: 8,16 R () 1521440 + 152 [0]
      <idle> : block_rq_complete: 8,16 R () 1521688 + 144 [0]
      <idle> : block_rq_complete: 8,16 R () 1522128 + 192 [0]
      <idle> : block_rq_complete: 8,16 R () 1523256 + 328 [0]
      ...
      
      Note that this issue should be addressed in checkpoint, and some readahead
      flows too.
      Reviewed-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      c718379b
  19. 10 4月, 2013 1 次提交
  20. 09 4月, 2013 1 次提交
    • J
      f2fs: introduce a new global lock scheme · 39936837
      Jaegeuk Kim 提交于
      In the previous version, f2fs uses global locks according to the usage types,
      such as directory operations, block allocation, block write, and so on.
      
      Reference the following lock types in f2fs.h.
      enum lock_type {
      	RENAME,		/* for renaming operations */
      	DENTRY_OPS,	/* for directory operations */
      	DATA_WRITE,	/* for data write */
      	DATA_NEW,	/* for data allocation */
      	DATA_TRUNC,	/* for data truncate */
      	NODE_NEW,	/* for node allocation */
      	NODE_TRUNC,	/* for node truncate */
      	NODE_WRITE,	/* for node write */
      	NR_LOCK_TYPE,
      };
      
      In that case, we lose the performance under the multi-threading environment,
      since every types of operations must be conducted one at a time.
      
      In order to address the problem, let's share the locks globally with a mutex
      array regardless of any types.
      So, let users grab a mutex and perform their jobs in parallel as much as
      possbile.
      
      For this, I propose a new global lock scheme as follows.
      
      0. Data structure
       - f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS]
       - f2fs_sb_info -> node_write
      
      1. mutex_lock_op(sbi)
       - try to get an avaiable lock from the array.
       - returns the index of the gottern lock variable.
      
      2. mutex_unlock_op(sbi, index of the lock)
       - unlock the given index of the lock.
      
      3. mutex_lock_all(sbi)
       - grab all the locks in the array before the checkpoint.
      
      4. mutex_unlock_all(sbi)
       - release all the locks in the array after checkpoint.
      
      5. block_operations()
       - call mutex_lock_all()
       - sync_dirty_dir_inodes()
       - grab node_write
       - sync_node_pages()
      
      Note that,
       the pairs of mutex_lock_op()/mutex_unlock_op() and
       mutex_lock_all()/mutex_unlock_all() should be used together.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      39936837
  21. 18 3月, 2013 1 次提交
  22. 23 2月, 2013 1 次提交
  23. 08 2月, 2013 4 次提交
  24. 14 1月, 2013 1 次提交
  25. 28 12月, 2012 1 次提交