1. 23 2月, 2016 5 次提交
    • C
      f2fs: introduce f2fs_submit_merged_bio_cond · 0c3a5797
      Chao Yu 提交于
      f2fs use single bio buffer per type data (META/NODE/DATA) for caching
      writes locating in continuous block address as many as possible, after
      submitting, these writes may be still cached in bio buffer, so we have
      to flush cached writes in bio buffer by calling f2fs_submit_merged_bio.
      
      Unfortunately, in the scenario of high concurrency, bio buffer could be
      flushed by someone else before we submit it as below reasons:
      a) there is no space in bio buffer.
      b) add a request of different type (SYNC, ASYNC).
      c) add a discontinuous block address.
      
      For this condition, f2fs_submit_merged_bio will be devastating, because
      it could break the following merging of writes in bio buffer, split one
      big bio into two smaller one.
      
      This patch introduces f2fs_submit_merged_bio_cond which can do a
      conditional submitting with bio buffer, before submitting it will judge
      whether:
       - page in DATA type bio buffer is matching with specified page;
       - page in DATA type bio buffer is belong to specified inode;
       - page in NODE type bio buffer is belong to specified inode;
      If there is no eligible page in bio buffer, we will skip submitting step,
      result in gaining more chance to merge consecutive block IOs in bio cache.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0c3a5797
    • J
      f2fs: use wait_for_stable_page to avoid contention · fec1d657
      Jaegeuk Kim 提交于
      In write_begin, if storage supports stable_page, we don't need to wait for
      writeback to update its contents.
      This patch introduces to use wait_for_stable_page instead of
      wait_on_page_writeback.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fec1d657
    • C
      f2fs: correct search area in get_new_segment · 0ab14356
      Chao Yu 提交于
      get_new_segment starts from current segment position, tries to search a
      free segment among its right neighbors locate in same section.
      
      But previously our search area was set as [current segment, max segment],
      which means we have to search to more bits in free_segmap bitmap for some
      worse cases. So here we correct the search area to [current segment, last
      segment in section] to avoid unnecessary searching.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0ab14356
    • C
      f2fs: flush dirty nat entries when exceeding threshold · 7d768d2c
      Chao Yu 提交于
      When testing f2fs with xfstest, generic/251 is stuck for long time,
      the case uses below serials to obtain fresh released space in device,
      in order to prepare for following fstrim test.
      
      1. rm -rf /mnt/dir
      2. mkdir /mnt/dir/
      3. cp -axT `pwd`/ /mnt/dir/
      4. goto 1
      
      During preparing step, all nat entries will be cached in nat cache,
      most of them are dirty entries with invalid blkaddr, which means
      nodes related to these entries have been truncated, and they could
      be reused after the dirty entries been checkpointed.
      
      However, there was no checkpoint been triggered, so nid allocators
      (e.g. mkdir, creat) will run into long journey of iterating all NAT
      pages, looking for free nids in alloc_nid->build_free_nids.
      
      Here, in f2fs_balance_fs_bg we give another chance to do checkpoint
      to flush nat entries for reusing them in free nid cache when dirty
      entry count exceeds 10% of max count.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7d768d2c
    • C
      f2fs: relocate is_merged_page · 0fd785eb
      Chao Yu 提交于
      Operations in is_merged_page is related to inner bio cache, move it to
      data.c.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0fd785eb
  2. 12 1月, 2016 3 次提交
  3. 09 1月, 2016 1 次提交
  4. 31 12月, 2015 1 次提交
  5. 18 12月, 2015 1 次提交
    • C
      f2fs: support data flush in background · 36b35a0d
      Chao Yu 提交于
      Previously, when finishing a checkpoint, we have persisted all fs meta
      info including meta inode, node inode, dentry page of directory inode, so,
      after a sudden power cut, f2fs can recover from last checkpoint with full
      directory structure.
      
      But during checkpoint, we didn't flush dirty pages of regular and symlink
      inode, so such dirty datas still in memory will be lost in that moment of
      power off.
      
      In order to reduce the chance of lost data, this patch enables
      f2fs_balance_fs_bg with the ability of data flushing. It will try to flush
      user data before starting a checkpoint. So user's data written after last
      checkpoint which may not be fsynced could be saved.
      
      When we mount with data_flush option, after every period of cp_interval
      (could be configured in sysfs: /sys/fs/f2fs/device/cp_interval) seconds
      user data could be flushed into device once f2fs_balance_fs_bg was called
      in kworker thread or gc thread.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      36b35a0d
  6. 10 12月, 2015 1 次提交
  7. 05 12月, 2015 3 次提交
  8. 23 10月, 2015 1 次提交
  9. 22 10月, 2015 2 次提交
  10. 14 10月, 2015 1 次提交
    • C
      f2fs crypto: fix racing of accessing encrypted page among · 08b39fbd
      Chao Yu 提交于
       different competitors
      
      Since we use different page cache (normally inode's page cache for R/W
      and meta inode's page cache for GC) to cache the same physical block
      which is belong to an encrypted inode. Writeback of these two page
      cache should be exclusive, but now we didn't handle writeback state
      well, so there may be potential racing problem:
      
      a)
      kworker:				f2fs_gc:
       - f2fs_write_data_pages
        - f2fs_write_data_page
         - do_write_data_page
          - write_data_page
           - f2fs_submit_page_mbio
      (page#1 in inode's page cache was queued
      in f2fs bio cache, and be ready to write
      to new blkaddr)
      					 - gc_data_segment
      					  - move_encrypted_block
      					   - pagecache_get_page
      					(page#2 in meta inode's page cache
      					was cached with the invalid datas
      					of physical block located in new
      					blkaddr)
      					   - f2fs_submit_page_mbio
      					(page#1 was submitted, later, page#2
      					with invalid data will be submitted)
      
      b)
      f2fs_gc:
       - gc_data_segment
        - move_encrypted_block
         - f2fs_submit_page_mbio
      (page#1 in meta inode's page cache was
      queued in f2fs bio cache, and be ready
      to write to new blkaddr)
      					user thread:
      					 - f2fs_write_begin
      					  - f2fs_submit_page_bio
      					(we submit the request to block layer
      					to update page#2 in inode's page cache
      					with physical block located in new
      					blkaddr, so here we may read gabbage
      					data from new blkaddr since GC hasn't
      					writebacked the page#1 yet)
      
      This patch fixes above potential racing problem for encrypted inode.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      08b39fbd
  11. 13 10月, 2015 3 次提交
    • C
      f2fs: support lower priority asynchronous readahead in ra_meta_pages · 26879fb1
      Chao Yu 提交于
      Now, we use ra_meta_pages to reads continuous physical blocks as much as
      possible to improve performance of following reads. However, ra_meta_pages
      uses a synchronous readahead approach by submitting bio with READ, as READ
      is with high priority, it can not be used in the case of preloading blocks,
      and it's not sure when these RAed pages will be used.
      
      This patch supports asynchronous readahead in ra_meta_pages by tagging bio
      with READA flag in order to allow preloading.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      26879fb1
    • C
      f2fs: don't tag REQ_META for temporary non-meta pages · 2b947003
      Chao Yu 提交于
      In recovery or checkpoint flow, we grab pages temperarily in meta inode's
      mapping for caching temperary data, actually, datas in these pages were
      not meta data of f2fs, but still we tag them with REQ_META flag. However,
      lower device like eMMC may do some optimization for data of such type.
      So in order to avoid wrong optimization, we'd better remove such flag
      for temperary non-meta pages.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2b947003
    • J
      f2fs: fix SSA updates resulting in corruption · 6e2c64ad
      Jaegeuk Kim 提交于
      The f2fs_collapse_range and f2fs_insert_range changes the block addresses
      directly. But that can cause uncovered SSA updates.
      In that case, we need to give up to change the block addresses and do buffered
      writes to keep filesystem consistency.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      6e2c64ad
  12. 10 10月, 2015 3 次提交
  13. 25 8月, 2015 1 次提交
  14. 21 8月, 2015 2 次提交
    • J
      f2fs: handle failed bio allocation · 740432f8
      Jaegeuk Kim 提交于
      As the below comment of bio_alloc_bioset, f2fs can allocate multiple bios at the
      same time. So, we can't guarantee that bio is allocated all the time.
      
      "
       *   When @bs is not NULL, if %__GFP_WAIT is set then bio_alloc will always be
       *   able to allocate a bio. This is due to the mempool guarantees. To make this
       *   work, callers must never allocate more than 1 bio at a time from this pool.
       *   Callers that need to allocate more than 1 bio must always submit the
       *   previously allocated bio for IO before attempting to allocate a new one.
       *   Failure to do so can cause deadlocks under memory pressure.
      "
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      740432f8
    • C
      f2fs: shrink free_nids entries · 31696580
      Chao Yu 提交于
      This patch introduces __count_free_nids/try_to_free_nids and registers
      them in slab shrinker for shrinking under memory pressure.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      31696580
  15. 15 8月, 2015 1 次提交
  16. 12 8月, 2015 1 次提交
    • C
      f2fs: remove inmem radix tree · decd36b6
      Chao Yu 提交于
      Previously, we use radix tree to index all registered page entries for
      atomic file, but now we only use radix tree to see whether current page
      is indexed or not, since the other user of radix tree is gone in commit
      042b7816 ("f2fs: remove unnecessary call to invalidate inmemory pages").
      
      So in this patch, we try to use one more efficient way:
      Introducing a macro ATOMIC_WRITTEN_PAGE, and setting it as page private
      value to indicate page indexing status. By using this way, we can save
      memory and lookup time.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      decd36b6
  17. 05 8月, 2015 4 次提交
  18. 25 7月, 2015 1 次提交
    • J
      f2fs: call set_page_dirty to attach i_wb for cgroup · 6282adbf
      Jaegeuk Kim 提交于
      The cgroup attaches inode->i_wb via mark_inode_dirty and when set_page_writeback
      is called, __inc_wb_stat() updates i_wb's stat.
      
      So, we need to explicitly call set_page_dirty->__mark_inode_dirty in prior to
      any writebacking pages.
      
      This patch should resolve the following kernel panic reported by Andreas Reis.
      
      https://bugzilla.kernel.org/show_bug.cgi?id=101801
      
      --- Comment #2 from Andreas Reis <andreas.reis@gmail.com> ---
      BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
      IP: [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90
      PGD 2951ff067 PUD 2df43f067 PMD 0
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in:
      CPU: 7 PID: 10356 Comm: gcc Tainted: G        W       4.2.0-1-cu #1
      Hardware name: Gigabyte Technology Co., Ltd. G1.Sniper M5/G1.Sniper M5, BIOS
      T01 02/03/2015
      task: ffff880295044f80 ti: ffff880295140000 task.ti: ffff880295140000
      RIP: 0010:[<ffffffff8149deea>]  [<ffffffff8149deea>]
      __percpu_counter_add+0x1a/0x90
      RSP: 0018:ffff880295143ac8  EFLAGS: 00010082
      RAX: 0000000000000003 RBX: ffffea000a526d40 RCX: 0000000000000001
      RDX: 0000000000000020 RSI: 0000000000000001 RDI: 0000000000000088
      RBP: ffff880295143ae8 R08: 0000000000000000 R09: ffff88008f69bb30
      R10: 00000000fffffffa R11: 0000000000000000 R12: 0000000000000088
      R13: 0000000000000001 R14: ffff88041d099000 R15: ffff880084a205d0
      FS:  00007f8549374700(0000) GS:ffff88042f3c0000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000000000a8 CR3: 000000033e1d5000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Stack:
       0000000000000000 ffffea000a526d40 ffff880084a20738 ffff880084a20750
       ffff880295143b48 ffffffff811cc91e ffff880000000000 0000000000000296
       0000000000000000 ffff880417090198 0000000000000000 ffffea000a526d40
      Call Trace:
       [<ffffffff811cc91e>] __test_set_page_writeback+0xde/0x1d0
       [<ffffffff813fee87>] do_write_data_page+0xe7/0x3a0
       [<ffffffff813faeea>] gc_data_segment+0x5aa/0x640
       [<ffffffff813fb0b8>] do_garbage_collect+0x138/0x150
       [<ffffffff813fb3fe>] f2fs_gc+0x1be/0x3e0
       [<ffffffff81405541>] f2fs_balance_fs+0x81/0x90
       [<ffffffff813ee357>] f2fs_unlink+0x47/0x1d0
       [<ffffffff81239329>] vfs_unlink+0x109/0x1b0
       [<ffffffff8123e3d7>] do_unlinkat+0x287/0x2c0
       [<ffffffff8123ebc6>] SyS_unlink+0x16/0x20
       [<ffffffff81942e2e>] entry_SYSCALL_64_fastpath+0x12/0x71
      Code: 41 5e 5d c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 55 49
      89 f5 41 54 49 89 fc 53 48 83 ec 08 65 ff 05 e6 d9 b6 7e <48> 8b 47 20 48 63 ca
      65 8b 18 48 63 db 48 01 f3 48 39 cb 7d 0a
      RIP  [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90
       RSP <ffff880295143ac8>
      CR2: 00000000000000a8
      ---[ end trace 5132449a58ed93a3 ]---
      note: gcc[10356] exited with preempt_count 2
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      6282adbf
  19. 03 6月, 2015 2 次提交
  20. 02 6月, 2015 3 次提交