1. 11 4月, 2015 7 次提交
    • J
      f2fs: avoid punch_hole overhead when releasing volatile data · 3c6c2beb
      Jaegeuk Kim 提交于
      This patch is to avoid some punch_hole overhead when releasing volatile data.
      If volatile data was not written yet, we just can make the first page as zero.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      3c6c2beb
    • J
      f2fs: enhance multi-threads performance · 78373b73
      Jaegeuk Kim 提交于
      Previously, f2fs_write_data_pages has a mutex, sbi->writepages, to serialize
      data writes to maximize write bandwidth, while sacrificing multi-threads
      performance.
      Practically, however, multi-threads environment is much more important for
      users. So this patch tries to remove the mutex.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      78373b73
    • J
      f2fs: set buffer_new when new blocks are allocated · 3402e87c
      Jaegeuk Kim 提交于
      This patch modifies to call set_buffer_new, if new blocks are allocated.
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      3402e87c
    • C
      f2fs: fix to check current blkaddr in __allocate_data_blocks · d6d4f1cb
      Chao Yu 提交于
      In __allocate_data_blocks, we should check current blkaddr which is located at
      ofs_in_node of dnode page instead of checking first blkaddr all the time.
      Otherwise we can only allocate one blkaddr in each dnode page. Fix it.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d6d4f1cb
    • J
      f2fs: avoid to trigger writepage during POR · d5669f7b
      Jaegeuk Kim 提交于
      This patch doesn't make any effect on previous behavior, since
      f2fs_write_data_page bypasses writing the page during POR.
      
      But, the difference is that this patch avoids holding writepages mutex.
      This is to avoid the following false warning, since this can happen only
      when mount and shutdown are triggered at the same time.
      
       ======================================================
       [ INFO: possible circular locking dependency detected ]
       4.0.0-rc1+ #3 Tainted: G           O
       -------------------------------------------------------
       kworker/u8:0/2270 is trying to acquire lock:
        (&sbi->gc_mutex){+.+.+.}, at: [<ffffffffa02bdd33>] f2fs_balance_fs+0x73/0x90 [f2fs]
      
       but task is already holding lock:
        (&sbi->writepages){+.+...}, at: [<ffffffffa02b261b>] f2fs_write_data_pages+0xcb/0x3a0 [f2fs]
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #2 (&sbi->writepages){+.+...}:
              [<ffffffff810e2b11>] lock_acquire+0xe1/0x2f0
              [<ffffffff8185e1b3>] mutex_lock_nested+0x63/0x530
              [<ffffffffa02b261b>] f2fs_write_data_pages+0xcb/0x3a0 [f2fs]
              [<ffffffff811c38c1>] do_writepages+0x21/0x50
              [<ffffffff8126c5a6>] __writeback_single_inode+0x76/0xbf0
              [<ffffffff8126e23a>] writeback_single_inode+0xea/0x1c0
              [<ffffffff8126e425>] write_inode_now+0x95/0xa0
              [<ffffffff81259dab>] iput+0x20b/0x3f0
              [<ffffffffa02c1c8b>] recover_data.constprop.14+0x26b/0xa80 [f2fs]
              [<ffffffffa02c2776>] recover_fsync_data+0x2b6/0x5e0 [f2fs]
              [<ffffffffa02a9744>] f2fs_fill_super+0xb24/0xb90 [f2fs]
              [<ffffffff8123d7f4>] mount_bdev+0x1a4/0x1e0
              [<ffffffffa02a3c85>] f2fs_mount+0x15/0x20 [f2fs]
              [<ffffffff8123e159>] mount_fs+0x39/0x180
              [<ffffffff8125e51b>] vfs_kern_mount+0x6b/0x160
              [<ffffffff81261554>] do_mount+0x204/0xbe0
              [<ffffffff8126223b>] SyS_mount+0x8b/0xe0
              [<ffffffff81863e6d>] system_call_fastpath+0x16/0x1b
      
       -> #1 (&sbi->cp_mutex){+.+...}:
              [<ffffffff810e2b11>] lock_acquire+0xe1/0x2f0
              [<ffffffff8185e1b3>] mutex_lock_nested+0x63/0x530
              [<ffffffffa02acbf2>] write_checkpoint+0x42/0x1230 [f2fs]
              [<ffffffffa02a847d>] f2fs_sync_fs+0x9d/0x2a0 [f2fs]
              [<ffffffff81272f82>] sync_filesystem+0x82/0xb0
              [<ffffffff8123c214>] generic_shutdown_super+0x34/0x100
              [<ffffffff8123c5f7>] kill_block_super+0x27/0x70
              [<ffffffffa02a3c60>] kill_f2fs_super+0x20/0x30 [f2fs]
              [<ffffffff8123ca49>] deactivate_locked_super+0x49/0x80
              [<ffffffff8123d05e>] deactivate_super+0x4e/0x70
              [<ffffffff8125df63>] cleanup_mnt+0x43/0x90
              [<ffffffff8125e002>] __cleanup_mnt+0x12/0x20
              [<ffffffff810a82e4>] task_work_run+0xc4/0xf0
              [<ffffffff8101f0bd>] do_notify_resume+0x8d/0xa0
              [<ffffffff81864141>] int_signal+0x12/0x17
      
       -> #0 (&sbi->gc_mutex){+.+.+.}:
              [<ffffffff810e2866>] __lock_acquire+0x1ac6/0x1c90
              [<ffffffff810e2b11>] lock_acquire+0xe1/0x2f0
              [<ffffffff8185e1b3>] mutex_lock_nested+0x63/0x530
              [<ffffffffa02bdd33>] f2fs_balance_fs+0x73/0x90 [f2fs]
              [<ffffffffa02b5938>] f2fs_write_data_page+0x348/0x5b0 [f2fs]
              [<ffffffffa02af9da>] __f2fs_writepage+0x1a/0x50 [f2fs]
              [<ffffffff811c1b54>] write_cache_pages+0x274/0x6f0
              [<ffffffffa02b2630>] f2fs_write_data_pages+0xe0/0x3a0 [f2fs]
              [<ffffffff811c38c1>] do_writepages+0x21/0x50
              [<ffffffff8126c5a6>] __writeback_single_inode+0x76/0xbf0
              [<ffffffff8126d44a>] writeback_sb_inodes+0x32a/0x710
              [<ffffffff8126d8cf>] __writeback_inodes_wb+0x9f/0xd0
              [<ffffffff8126dcdb>] wb_writeback+0x3db/0x850
              [<ffffffff8126e848>] bdi_writeback_workfn+0x148/0x980
              [<ffffffff810a3782>] process_one_work+0x1e2/0x840
              [<ffffffff810a3f01>] worker_thread+0x121/0x460
              [<ffffffff810a9dc8>] kthread+0xf8/0x110
              [<ffffffff81863dbc>] ret_from_fork+0x7c/0xb0
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d5669f7b
    • J
      f2fs: check its block allocation to avoid producing wrong dirty pages · b7f204cc
      Jaegeuk Kim 提交于
      If a page is cached but its block was deallocated, we don't need to make
      the page dirty again by gc and truncate_partial_data_page.
      
      In that case, it needs to check its block allocation all the time instead
      of giving up-to-date page.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b7f204cc
    • J
      f2fs: clear page's up-to-date if block was deallocated · 2bca1e23
      Jaegeuk Kim 提交于
      If page's on-disk block was deallocated, let's remove up-to-date flag to avoid
      further access with wrong contents.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2bca1e23
  2. 04 3月, 2015 10 次提交
    • C
      f2fs: use extent cache for dir · cb3bc9ee
      Chao Yu 提交于
      We update extent cache for all user inode of f2fs including dir inode, so this
      patch gives another chance to try to get physical address of page from extent
      cache for dir inode.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      cb3bc9ee
    • C
      f2fs: switch to check FI_NO_EXTENT in f2fs_{lookup,update}_extent_cache · 91c5d9bc
      Chao Yu 提交于
      This patch switch to check FI_NO_EXTENT in f2fs_{lookup,update}_extent_cache
      instead of f2fs_{lookup,update}_extent_tree or {lookup,update}_extent_info.
      
      No functionality modification in this patch.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      91c5d9bc
    • C
      f2fs: support fast lookup in extent cache · 62c8af65
      Chao Yu 提交于
      This patch adds a fast lookup path for rb-tree extent cache.
      
      In this patch we add a recently accessed extent node pointer 'cached_en' in
      extent tree. In lookup path of extent cache, we will firstly lookup the last
      accessed extent node which cached_en points, if we do not hit in this node,
      we will try to lookup extent node in rb-tree.
      
      By this way we can avoid unnecessary slow lookup in rb-tree sometimes.
      
      Note that, side-effect of this patch is that we will increase memory cost,
      because we will store a pointer variable in each struct extent tree
      additionally.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      62c8af65
    • C
      f2fs: add trace for rb-tree extent cache ops · 1ec4610c
      Chao Yu 提交于
      This patch adds trace for lookup/update/shrink/destroy ops in rb-tree extent cache.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      1ec4610c
    • C
      f2fs: enable rb-tree extent cache · 1dcc336b
      Chao Yu 提交于
      This patch enables rb-tree based extent cache in f2fs.
      
      When we mount with "-o extent_cache", f2fs will try to add recently accessed
      page-block mappings into rb-tree based extent cache as much as possible, instead
      of original one extent info cache.
      
      By this way, f2fs can support more effective cache between dnode page cache and
      disk. It will supply high hit ratio in the cache with fewer memory when dnode
      page cache are reclaimed in environment of low memory.
      
      Storage: Sandisk sd card 64g
      1.append write file (offset: 0, size: 128M);
      2.override write file (offset: 2M, size: 1M);
      3.override write file (offset: 4M, size: 1M);
      ...
      4.override write file (offset: 48M, size: 1M);
      ...
      5.override write file (offset: 112M, size: 1M);
      6.sync
      7.echo 3 > /proc/sys/vm/drop_caches
      8.read file (size:128M, unit: 4k, count: 32768)
      (time dd if=/mnt/f2fs/128m bs=4k count=32768)
      
      Extent Hit Ratio:
      		before		patched
      Hit Ratio	121 / 1071	1071 / 1071
      
      Performance:
      		before		patched
      real    	0m37.051s	0m35.556s
      user    	0m0.040s	0m0.026s
      sys     	0m2.990s	0m2.251s
      
      Memory Cost:
      		before		patched
      Tree Count:	0		1 (size: 24 bytes)
      Node Count:	0		45 (size: 1440 bytes)
      
      v3:
       o retest and given more details of test result.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      1dcc336b
    • C
      f2fs: add core functions for rb-tree extent cache · 429511cd
      Chao Yu 提交于
      This patch adds core functions including slab cache init function and
      init/lookup/update/shrink/destroy function for rb-tree based extent cache.
      
      Thank Jaegeuk Kim and Changman Lee as they gave much suggestion about detail
      design and implementation of extent cache.
      
      Todo:
       * register rb-based extent cache shrink with mm shrink interface.
      
      v2:
       o move set_extent_info and __is_{extent,back,front}_mergeable into f2fs.h.
       o introduce __{attach,detach}_extent_node for code readability.
       o add cond_resched() when fail to invoke kmem_cache_alloc/radix_tree_insert.
       o fix some coding style and typo issues.
      
      v3:
       o fix oops due to using an unassigned pointer.
       o use list_del to remove extent node in shrink list.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NChangman Lee <cm224.lee@samsung.com>
      [Jaegeuk Kim: add static for some funcitons and declare in f2fs.h]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      429511cd
    • C
      f2fs: introduce universal lookup/update interface for extent cache · 7e4dde79
      Chao Yu 提交于
      In this patch, we do these jobs:
      1. rename {check,update}_extent_cache to {lookup,update}_extent_info;
      2. introduce universal lookup/update interface of extent cache:
      f2fs_{lookup,update}_extent_cache including above two real functions, then
      export them to function callers.
      
      So after above cleanup, we can add new rb-tree based extent cache into exported
      interfaces.
      
      v2:
       o remove "f2fs_" for inner function {lookup,update}_extent_info suggested by
         Jaegeuk Kim.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7e4dde79
    • C
      f2fs: introduce f2fs_map_bh to clean codes of check_extent_cache · a2e7d1bf
      Chao Yu 提交于
      This patch introduces f2fs_map_bh to clean codes of check_extent_cache.
      
      v2:
       o cleanup f2fs_map_bh pointed out by Jaegeuk Kim.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      a2e7d1bf
    • C
      f2fs: simplfy a field name in struct f2fs_extent,extent_info · 4d0b0bd4
      Chao Yu 提交于
      Rename a filed name from 'blk_addr' to 'blk' in struct {f2fs_extent,extent_info}
      as annotation of this field descripts its meaning well to us.
      
      By this way, we can avoid long statement in code of following patches.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      4d0b0bd4
    • C
      f2fs: move ext_lock out of struct extent_info · 0c872e2d
      Chao Yu 提交于
      Move ext_lock out of struct extent_info, then in the following patches we can
      use variables with struct extent_info type as a parameter to pass pure data.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0c872e2d
  3. 12 2月, 2015 5 次提交
    • J
      f2fs: allocate data blocks in advance for f2fs_direct_IO · 59b802e5
      Jaegeuk Kim 提交于
      This patch adds preallocation for data blocks to prepare f2fs_direct_IO.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      59b802e5
    • J
      f2fs: call set_buffer_new for get_block · da17eece
      Jaegeuk Kim 提交于
      This patch fixes wrong handling of buffer_new flag in get_block.
      If f2fs allocates new blocks and mapped buffer_head, it needs to set buffer_new
      for the bh_result.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      da17eece
    • C
      f2fs: merge {invalidate,release}page for meta/node/data pages · 487261f3
      Chao Yu 提交于
      This patch merges ->{invalidate,release}page function for meta/node/data pages.
      
      After this, duplication of codes could be removed.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      487261f3
    • J
      f2fs: keep PagePrivate during releasepage · f68daeeb
      Jaegeuk Kim 提交于
      If PagePrivate is removed by releasepage, f2fs loses counting dirty pages.
      
      e.g., try_to_release_page will not release page when the page is dirty,
      but our releasepage removes PagePrivate.
      
          [<ffffffff81188d75>] try_to_release_page+0x35/0x50
          [<ffffffff811996f9>] invalidate_inode_pages2_range+0x2f9/0x3b0
          [<ffffffffa02a7f54>] ? truncate_blocks+0x384/0x4d0 [f2fs]
          [<ffffffffa02b7583>] ? f2fs_direct_IO+0x283/0x290 [f2fs]
          [<ffffffffa02b7fb0>] ? get_data_block_fiemap+0x20/0x20 [f2fs]
          [<ffffffff8118aa53>] generic_file_direct_write+0x163/0x170
          [<ffffffff8118ad06>] __generic_file_write_iter+0x2a6/0x350
          [<ffffffff8118adef>] generic_file_write_iter+0x3f/0xb0
          [<ffffffff81203081>] new_sync_write+0x81/0xb0
          [<ffffffff81203837>] vfs_write+0xb7/0x1f0
          [<ffffffff81204459>] SyS_write+0x49/0xb0
          [<ffffffff817c286d>] system_call_fastpath+0x16/0x1b
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f68daeeb
    • C
      f2fs: merge flags in struct f2fs_sb_info · caf0047e
      Chao Yu 提交于
      Currently, there are several variables with Boolean type as below:
      
      struct f2fs_sb_info {
      ...
      	int s_dirty;
      	bool need_fsck;
      	bool s_closing;
      ...
      	bool por_doing;
      ...
      }
      
      For this there are some issues:
      1. there are some space of f2fs_sb_info is wasted due to aligning after Boolean
         type variables by compiler.
      2. if we continuously add new flag into f2fs_sb_info, structure will be messed
         up.
      
      So in this patch, we try to:
      1. switch s_dirty to Boolean type variable since it has two status 0/1.
      2. merge s_dirty/need_fsck/s_closing/por_doing variables into s_flag.
      3. introduce an enum type which can indicate different states of sbi.
      4. use new introduced universal interfaces is_sbi_flag_set/{set,clear}_sbi_flag
         to operate flags for sbi.
      
      After that, above issues will be fixed.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      caf0047e
  4. 10 1月, 2015 11 次提交
  5. 02 12月, 2014 1 次提交
  6. 26 11月, 2014 1 次提交
    • J
      f2fs: fix deadlock during inline_data conversion · 5f727395
      Jaegeuk Kim 提交于
      A deadlock can be occurred:
      Thread 1]                             Thread 2]
       - f2fs_write_data_pages              - f2fs_write_begin
         - lock_page(page #0)
                                              - grab_cache_page(page #X)
                                              - get_node_page(inode_page)
                                              - grab_cache_page(page #0)
                                                : to convert inline_data
         - f2fs_write_data_page
           - f2fs_write_inline_data
             - get_node_page(inode_page)
      
      In this case, trying to lock inode_page and page #0 causes deadlock.
      In order to avoid this, this patch adds a rule for this locking policy,
      which is that page #0 should be locked followed by inode_page lock.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5f727395
  7. 19 11月, 2014 1 次提交
  8. 05 11月, 2014 2 次提交
    • J
      f2fs: avoid race condition in handling wait_io · 6a8f8ca5
      Jaegeuk Kim 提交于
      __submit_merged_bio    f2fs_write_end_io        f2fs_write_end_io
                             wait_io = X              wait_io = x
                             complete(X)              complete(X)
                             wait_io = NULL
      wait_for_completion()
      free(X)
                                                       spin_lock(X)
                                                       kernel panic
      
      In order to avoid this, this patch removes the wait_io facility.
      Instead, we can use wait_on_all_pages_writeback(sbi) to wait for end_ios.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      6a8f8ca5
    • J
      f2fs: revisit inline_data to avoid data races and potential bugs · b3d208f9
      Jaegeuk Kim 提交于
      This patch simplifies the inline_data usage with the following rule.
      1. inline_data is set during the file creation.
      2. If new data is requested to be written ranges out of inline_data,
       f2fs converts that inode permanently.
      3. There is no cases which converts non-inline_data inode to inline_data.
      4. The inline_data flag should be changed under inode page lock.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b3d208f9
  9. 04 11月, 2014 2 次提交
    • J
      f2fs: fix possible data corruption in f2fs_write_begin() · 9234f319
      Jan Kara 提交于
      f2fs_write_begin() doesn't initialize the 'dn' variable if the inode has
      inline data. However it uses its contents to decide whether it should
      just zero out the page or load data to it. Thus if we are unlucky we can
      zero out page contents instead of loading inline data into a page.
      
      CC: stable@vger.kernel.org
      CC: Changman Lee <cm224.lee@samsung.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      9234f319
    • J
      f2fs: avoid to allocate when inline_data was written · 9ba69cf9
      Jaegeuk Kim 提交于
      The sceanrio is like this.
      inline_data   i_size     page                 write_begin/vm_page_mkwrite
        X             30       dirty_page
        X             30                            write to #4096 position
        X             30       get_dnode_of_data    wait for get_dnode_of_data
        O             30       write inline_data
        O             30                            get_dnode_of_data
        O             30                            reserve data block
      ..
      
      In this case, we have #0 = NEW_ADDR and inline_data as well.
      We should not allow this condition for further access.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      9ba69cf9