1. 23 2月, 2016 11 次提交
    • C
      f2fs: split journal cache from curseg cache · b7ad7512
      Chao Yu 提交于
      In curseg cache, f2fs caches two different parts:
       - datas of current summay block, i.e. summary entries, footer info.
       - journal info, i.e. sparse nat/sit entries or io stat info.
      
      With this approach, 1) it may cause higher lock contention when we access
      or update both of the parts of cache since we use the same mutex lock
      curseg_mutex to protect the cache. 2) current summary block with last
      journal info will be writebacked into device as a normal summary block
      when flushing, however, we treat journal info as valid one only in current
      summary, so most normal summary blocks contain junk journal data, it wastes
      remaining space of summary block.
      
      So, in order to fix above issues, we split curseg cache into two parts:
      a) current summary block, protected by original mutex lock curseg_mutex
      b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
      
      When loading curseg cache during ->mount, we store summary info and
      journal info into different caches; When doing checkpoint, we combine
      datas of two cache into current summary block for persisting.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b7ad7512
    • C
      f2fs: enhance IO path with block plug · e9f5b8b8
      Chao Yu 提交于
      Try to use block plug in more place as below to let process cache bios
      as much as possbile, in order to reduce lock overhead of queue in IO
      scheduler.
      1) sync_meta_pages
      2) ra_meta_pages
      3) f2fs_balance_fs_bg
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e9f5b8b8
    • C
      f2fs: introduce f2fs_journal struct to wrap journal info · dfc08a12
      Chao Yu 提交于
      Introduce a new structure f2fs_journal to wrap journal info in struct
      f2fs_summary_block for readability.
      
      struct f2fs_journal {
      	union {
      		__le16 n_nats;
      		__le16 n_sits;
      	};
      	union {
      		struct nat_journal nat_j;
      		struct sit_journal sit_j;
      		struct f2fs_extra_info info;
      	};
      } __packed;
      
      struct f2fs_summary_block {
      	struct f2fs_summary entries[ENTRIES_IN_SUM];
      	struct f2fs_journal journal;
      	struct summary_footer footer;
      } __packed;
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      dfc08a12
    • C
      f2fs: support revoking atomic written pages · 28bc106b
      Chao Yu 提交于
      f2fs support atomic write with following semantics:
      1. open db file
      2. ioctl start atomic write
      3. (write db file) * n
      4. ioctl commit atomic write
      5. close db file
      
      With this flow we can avoid file becoming corrupted when abnormal power
      cut, because we hold data of transaction in referenced pages linked in
      inmem_pages list of inode, but without setting them dirty, so these data
      won't be persisted unless we commit them in step 4.
      
      But we should still hold journal db file in memory by using volatile
      write, because our semantics of 'atomic write support' is incomplete, in
      step 4, we could fail to submit all dirty data of transaction, once
      partial dirty data was committed in storage, then after a checkpoint &
      abnormal power-cut, db file will be corrupted forever.
      
      So this patch tries to improve atomic write flow by adding a revoking flow,
      once inner error occurs in committing, this gives another chance to try to
      revoke these partial submitted data of current transaction, it makes
      committing operation more like aotmical one.
      
      If we're not lucky, once revoking operation was failed, EAGAIN will be
      reported to user for suggesting doing the recovery with held journal file,
      or retrying current transaction again.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      28bc106b
    • C
      f2fs: split drop_inmem_pages from commit_inmem_pages · 29b96b54
      Chao Yu 提交于
      Split drop_inmem_pages from commit_inmem_pages for code readability,
      and prepare for the following modification.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      29b96b54
    • J
      f2fs: use correct errno · 60b286c4
      Jaegeuk Kim 提交于
      This patch is to fix misused error number.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      60b286c4
    • C
      f2fs: introduce f2fs_submit_merged_bio_cond · 0c3a5797
      Chao Yu 提交于
      f2fs use single bio buffer per type data (META/NODE/DATA) for caching
      writes locating in continuous block address as many as possible, after
      submitting, these writes may be still cached in bio buffer, so we have
      to flush cached writes in bio buffer by calling f2fs_submit_merged_bio.
      
      Unfortunately, in the scenario of high concurrency, bio buffer could be
      flushed by someone else before we submit it as below reasons:
      a) there is no space in bio buffer.
      b) add a request of different type (SYNC, ASYNC).
      c) add a discontinuous block address.
      
      For this condition, f2fs_submit_merged_bio will be devastating, because
      it could break the following merging of writes in bio buffer, split one
      big bio into two smaller one.
      
      This patch introduces f2fs_submit_merged_bio_cond which can do a
      conditional submitting with bio buffer, before submitting it will judge
      whether:
       - page in DATA type bio buffer is matching with specified page;
       - page in DATA type bio buffer is belong to specified inode;
       - page in NODE type bio buffer is belong to specified inode;
      If there is no eligible page in bio buffer, we will skip submitting step,
      result in gaining more chance to merge consecutive block IOs in bio cache.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0c3a5797
    • J
      f2fs: use wait_for_stable_page to avoid contention · fec1d657
      Jaegeuk Kim 提交于
      In write_begin, if storage supports stable_page, we don't need to wait for
      writeback to update its contents.
      This patch introduces to use wait_for_stable_page instead of
      wait_on_page_writeback.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fec1d657
    • C
      f2fs: correct search area in get_new_segment · 0ab14356
      Chao Yu 提交于
      get_new_segment starts from current segment position, tries to search a
      free segment among its right neighbors locate in same section.
      
      But previously our search area was set as [current segment, max segment],
      which means we have to search to more bits in free_segmap bitmap for some
      worse cases. So here we correct the search area to [current segment, last
      segment in section] to avoid unnecessary searching.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0ab14356
    • C
      f2fs: flush dirty nat entries when exceeding threshold · 7d768d2c
      Chao Yu 提交于
      When testing f2fs with xfstest, generic/251 is stuck for long time,
      the case uses below serials to obtain fresh released space in device,
      in order to prepare for following fstrim test.
      
      1. rm -rf /mnt/dir
      2. mkdir /mnt/dir/
      3. cp -axT `pwd`/ /mnt/dir/
      4. goto 1
      
      During preparing step, all nat entries will be cached in nat cache,
      most of them are dirty entries with invalid blkaddr, which means
      nodes related to these entries have been truncated, and they could
      be reused after the dirty entries been checkpointed.
      
      However, there was no checkpoint been triggered, so nid allocators
      (e.g. mkdir, creat) will run into long journey of iterating all NAT
      pages, looking for free nids in alloc_nid->build_free_nids.
      
      Here, in f2fs_balance_fs_bg we give another chance to do checkpoint
      to flush nat entries for reusing them in free nid cache when dirty
      entry count exceeds 10% of max count.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7d768d2c
    • C
      f2fs: relocate is_merged_page · 0fd785eb
      Chao Yu 提交于
      Operations in is_merged_page is related to inner bio cache, move it to
      data.c.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0fd785eb
  2. 12 1月, 2016 3 次提交
  3. 09 1月, 2016 1 次提交
  4. 31 12月, 2015 1 次提交
  5. 18 12月, 2015 1 次提交
    • C
      f2fs: support data flush in background · 36b35a0d
      Chao Yu 提交于
      Previously, when finishing a checkpoint, we have persisted all fs meta
      info including meta inode, node inode, dentry page of directory inode, so,
      after a sudden power cut, f2fs can recover from last checkpoint with full
      directory structure.
      
      But during checkpoint, we didn't flush dirty pages of regular and symlink
      inode, so such dirty datas still in memory will be lost in that moment of
      power off.
      
      In order to reduce the chance of lost data, this patch enables
      f2fs_balance_fs_bg with the ability of data flushing. It will try to flush
      user data before starting a checkpoint. So user's data written after last
      checkpoint which may not be fsynced could be saved.
      
      When we mount with data_flush option, after every period of cp_interval
      (could be configured in sysfs: /sys/fs/f2fs/device/cp_interval) seconds
      user data could be flushed into device once f2fs_balance_fs_bg was called
      in kworker thread or gc thread.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      36b35a0d
  6. 10 12月, 2015 1 次提交
  7. 05 12月, 2015 3 次提交
  8. 23 10月, 2015 1 次提交
  9. 22 10月, 2015 2 次提交
  10. 14 10月, 2015 1 次提交
    • C
      f2fs crypto: fix racing of accessing encrypted page among · 08b39fbd
      Chao Yu 提交于
       different competitors
      
      Since we use different page cache (normally inode's page cache for R/W
      and meta inode's page cache for GC) to cache the same physical block
      which is belong to an encrypted inode. Writeback of these two page
      cache should be exclusive, but now we didn't handle writeback state
      well, so there may be potential racing problem:
      
      a)
      kworker:				f2fs_gc:
       - f2fs_write_data_pages
        - f2fs_write_data_page
         - do_write_data_page
          - write_data_page
           - f2fs_submit_page_mbio
      (page#1 in inode's page cache was queued
      in f2fs bio cache, and be ready to write
      to new blkaddr)
      					 - gc_data_segment
      					  - move_encrypted_block
      					   - pagecache_get_page
      					(page#2 in meta inode's page cache
      					was cached with the invalid datas
      					of physical block located in new
      					blkaddr)
      					   - f2fs_submit_page_mbio
      					(page#1 was submitted, later, page#2
      					with invalid data will be submitted)
      
      b)
      f2fs_gc:
       - gc_data_segment
        - move_encrypted_block
         - f2fs_submit_page_mbio
      (page#1 in meta inode's page cache was
      queued in f2fs bio cache, and be ready
      to write to new blkaddr)
      					user thread:
      					 - f2fs_write_begin
      					  - f2fs_submit_page_bio
      					(we submit the request to block layer
      					to update page#2 in inode's page cache
      					with physical block located in new
      					blkaddr, so here we may read gabbage
      					data from new blkaddr since GC hasn't
      					writebacked the page#1 yet)
      
      This patch fixes above potential racing problem for encrypted inode.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      08b39fbd
  11. 13 10月, 2015 3 次提交
    • C
      f2fs: support lower priority asynchronous readahead in ra_meta_pages · 26879fb1
      Chao Yu 提交于
      Now, we use ra_meta_pages to reads continuous physical blocks as much as
      possible to improve performance of following reads. However, ra_meta_pages
      uses a synchronous readahead approach by submitting bio with READ, as READ
      is with high priority, it can not be used in the case of preloading blocks,
      and it's not sure when these RAed pages will be used.
      
      This patch supports asynchronous readahead in ra_meta_pages by tagging bio
      with READA flag in order to allow preloading.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      26879fb1
    • C
      f2fs: don't tag REQ_META for temporary non-meta pages · 2b947003
      Chao Yu 提交于
      In recovery or checkpoint flow, we grab pages temperarily in meta inode's
      mapping for caching temperary data, actually, datas in these pages were
      not meta data of f2fs, but still we tag them with REQ_META flag. However,
      lower device like eMMC may do some optimization for data of such type.
      So in order to avoid wrong optimization, we'd better remove such flag
      for temperary non-meta pages.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2b947003
    • J
      f2fs: fix SSA updates resulting in corruption · 6e2c64ad
      Jaegeuk Kim 提交于
      The f2fs_collapse_range and f2fs_insert_range changes the block addresses
      directly. But that can cause uncovered SSA updates.
      In that case, we need to give up to change the block addresses and do buffered
      writes to keep filesystem consistency.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      6e2c64ad
  12. 10 10月, 2015 3 次提交
  13. 25 8月, 2015 1 次提交
  14. 21 8月, 2015 2 次提交
    • J
      f2fs: handle failed bio allocation · 740432f8
      Jaegeuk Kim 提交于
      As the below comment of bio_alloc_bioset, f2fs can allocate multiple bios at the
      same time. So, we can't guarantee that bio is allocated all the time.
      
      "
       *   When @bs is not NULL, if %__GFP_WAIT is set then bio_alloc will always be
       *   able to allocate a bio. This is due to the mempool guarantees. To make this
       *   work, callers must never allocate more than 1 bio at a time from this pool.
       *   Callers that need to allocate more than 1 bio must always submit the
       *   previously allocated bio for IO before attempting to allocate a new one.
       *   Failure to do so can cause deadlocks under memory pressure.
      "
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      740432f8
    • C
      f2fs: shrink free_nids entries · 31696580
      Chao Yu 提交于
      This patch introduces __count_free_nids/try_to_free_nids and registers
      them in slab shrinker for shrinking under memory pressure.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      31696580
  15. 15 8月, 2015 1 次提交
  16. 12 8月, 2015 1 次提交
    • C
      f2fs: remove inmem radix tree · decd36b6
      Chao Yu 提交于
      Previously, we use radix tree to index all registered page entries for
      atomic file, but now we only use radix tree to see whether current page
      is indexed or not, since the other user of radix tree is gone in commit
      042b7816 ("f2fs: remove unnecessary call to invalidate inmemory pages").
      
      So in this patch, we try to use one more efficient way:
      Introducing a macro ATOMIC_WRITTEN_PAGE, and setting it as page private
      value to indicate page indexing status. By using this way, we can save
      memory and lookup time.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      decd36b6
  17. 05 8月, 2015 4 次提交