1. 19 8月, 2013 1 次提交
  2. 12 8月, 2013 1 次提交
  3. 06 8月, 2013 1 次提交
    • J
      f2fs: fix a deadlock in fsync · a569469e
      Jin Xu 提交于
      This patch fixes a deadlock bug that occurs quite often when there are
      concurrent write and fsync on a same file.
      
      Following is the simplified call trace when tasks get hung.
      
      fsync thread:
      - f2fs_sync_file
       ...
       - f2fs_write_data_pages
       ...
        - update_extent_cache
        ...
         - update_inode
          - wait_on_page_writeback
      
      bdi writeback thread
      - __writeback_single_inode
       - f2fs_write_data_pages
        - mutex_lock(sbi->writepages)
      
      The deadlock happens when the fsync thread waits on a inode page that has
      been added to the f2fs' cached bio sbi->bio[NODE], and unfortunately,
      no one else could be able to submit the cached bio to block layer for
      writeback. This is because the fsync thread already hold a sbi->fs_lock and
      the sbi->writepages lock, causing the bdi thread being blocked when attempt
      to write data pages for the same inode. At the same time, f2fs_gc thread
      does not notice the situation and could not help. Even the sync syscall
      gets blocked.
      
      To fix it, we could submit the cached bio first before waiting on a inode page
      that is being written back.
      Signed-off-by: NJin Xu <jinuxstyle@gmail.com>
      [Jaegeuk Kim: add more cases to use f2fs_wait_on_page_writeback]
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      a569469e
  4. 30 7月, 2013 1 次提交
  5. 02 7月, 2013 2 次提交
    • J
      f2fs: remove reusing any prefree segments · 763bfe1b
      Jaegeuk Kim 提交于
      This patch removes check_prefree_segments initially designed to enhance the
      performance by narrowing the range of LBA usage across the whole block device.
      
      When allocating a new segment, previous f2fs tries to find proper prefree
      segments, and then, if finds a segment, it reuses the segment for further
      data or node block allocation.
      
      However, I found that this was totally wrong approach since the prefree segments
      have several data or node blocks that will be used by the roll-forward mechanism
      operated after sudden-power-off.
      
      Let's assume the following scenario.
      
      /* write 8MB with fsync */
      for (i = 0; i < 2048; i++) {
      	offset = i * 4096;
      	write(fd, offset, 4KB);
      	fsync(fd);
      }
      
      In this case, naive segment allocation sequence will be like:
       data segment: x, x+1, x+2, x+3
       node segment: y, y+1, y+2, y+3.
      
      But, if we can reuse prefree segments, the sequence can be like:
       data segment: x, x+1, y, y+1
       node segment: y, y+1, y+2, y+3.
      Because, y, y+1, and y+2 became prefree segments one by one, and those are
      reused by data allocation.
      
      After conducting this workload, we should consider how to recover the latest
      inode with its data.
      If we reuse the prefree segments such as y or y+1, we lost the old node blocks
      so that f2fs even cannot start roll-forward recovery.
      
      Therefore, I suggest that we should remove reusing prefree segments.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      763bfe1b
    • N
      f2fs: optimize the init_dirty_segmap function · 8736fbf0
      Namjae Jeon 提交于
      Optimize the while loop condition
      
      Since this condition will always be true and while loop will
      be terminated by the following condition in code:
      
      if (segno >= TOTAL_SEGS(sbi))
          break;
      Hence we can replace the while loop condition with while(1)
      instead of always checking for segno to be less than Total segs.
      
      Also we do not need to use TOTAL_SEGS() everytime. We can store
      this value in a local variable since this value is constant.
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NPankaj Kumar <pankaj.km@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      8736fbf0
  6. 14 6月, 2013 3 次提交
  7. 28 5月, 2013 2 次提交
  8. 30 4月, 2013 1 次提交
    • J
      f2fs: modify the number of issued pages to merge IOs · ac5d156c
      Jaegeuk Kim 提交于
      When testing f2fs on an SSD, I found some 128 page IOs followed by 1 page IO
      were issued by f2fs_write_node_pages.
      This means that there were some mishandling flows which degrades performance.
      
      Previous f2fs_write_node_pages determines the number of pages to be written,
      nr_to_write, as follows.
      
      1. The bio_get_nr_vecs returns 129 pages.
      2. The bio_alloc makes a room for 128 pages.
      3. The initial 128 pages go into one bio.
      4. The existing bio is submitted, and a new bio is prepared for the last 1 page.
      5. Finally, sync_node_pages submits the last 1 page bio.
      
      The problem is from the use of bio_get_nr_vecs, so this patch replace it
      with max_hw_blocks using queue_max_sectors.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      ac5d156c
  9. 26 4月, 2013 1 次提交
  10. 23 4月, 2013 1 次提交
  11. 03 4月, 2013 5 次提交
    • J
      f2fs: fix the bitmap consistency of dirty segments · b2f2c390
      Jaegeuk Kim 提交于
      Like below, there are 8 segment bitmaps for SSR victim candidates.
      
      enum dirty_type {
      	DIRTY_HOT_DATA,		/* dirty segments assigned as hot data logs */
      	DIRTY_WARM_DATA,	/* dirty segments assigned as warm data logs */
      	DIRTY_COLD_DATA,	/* dirty segments assigned as cold data logs */
      	DIRTY_HOT_NODE,		/* dirty segments assigned as hot node logs */
      	DIRTY_WARM_NODE,	/* dirty segments assigned as warm node logs */
      	DIRTY_COLD_NODE,	/* dirty segments assigned as cold node logs */
      	DIRTY,			/* to count # of dirty segments */
      	PRE,			/* to count # of entirely obsolete segments */
      	NR_DIRTY_TYPE
      };
      
      The upper 6 bitmaps indicates segments dirtied by active log areas respectively.
      And, the DIRTY bitmap integrates all the 6 bitmaps.
      
      For example,
       o DIRTY_HOT_DATA : 1010000
       o DIRTY_WARM_DATA: 0100000
       o DIRTY_COLD_DATA: 0001000
       o DIRTY_HOT_NODE : 0000010
       o DIRTY_WARM_NODE: 0000001
       o DIRTY_COLD_NODE: 0000000
      In this case,
       o DIRTY          : 1111011,
      
       which means that we should guarantee the consistency between DIRTY and other
       bitmaps concreately.
      
      However, the SSR mode selects victims freely from any log types, which can set
      multiple bits across the various bitmap types.
      
      So, this patch eliminates this inconsistency.
      Reviewed-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      b2f2c390
    • J
      f2fs: allocate remained free segments in the LFS mode · 60374688
      Jaegeuk Kim 提交于
      This patch adds a new condition that allocates free segments in the current
      active section even if SSR is needed.
      Otherwise, f2fs cannot allocate remained free segments in the section since
      SSR finds dirty segments only.
      Reviewed-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      60374688
    • J
      f2fs: change GC bitmaps to apply the section granularity · 5ec4e49f
      Jaegeuk Kim 提交于
      This patch removes a bitmap for victim segments selected by foreground GC, and
      modifies the other bitmap for victim segments selected by background GC.
      
      1) foreground GC bitmap
       : We don't need to manage this, since we just only one previous victim section
         number instead of the whole victim history.
         The f2fs uses the victim section number in order not to allocate currently
         GC'ed section to current active logs.
      
      2) background GC bitmap
       : This bitmap is used to avoid selecting victims repeatedly by background GCs.
         In addition, the victims are able to be selected by foreground GCs, since
         there is no need to read victim blocks during foreground GCs.
      
         By the fact that the foreground GC reclaims segments in a section unit, it'd
         be better to manage this bitmap based on the section granularity.
      Reviewed-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      5ec4e49f
    • J
      f2fs: allocate new segment aligned with sections · 33afa7fd
      Jaegeuk Kim 提交于
      When allocating a new segment under the LFS mode, we should keep the section
      boundary.
      Reviewed-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      33afa7fd
    • J
      f2fs: introduce TOTAL_SECS macro · 53cf9522
      Jaegeuk Kim 提交于
      Let's use a macro to get the total number of sections.
      Reviewed-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      53cf9522
  12. 31 3月, 2013 1 次提交
  13. 12 2月, 2013 2 次提交
    • J
      f2fs: clarify and enhance the f2fs_gc flow · 43727527
      Jaegeuk Kim 提交于
      This patch makes clearer the ambiguous f2fs_gc flow as follows.
      
      1. Remove intermediate checkpoint condition during f2fs_gc
       (i.e., should_do_checkpoint() and GC_BLOCKED)
      
      2. Remove unnecessary return values of f2fs_gc because of #1.
       (i.e., GC_NODE, GC_OK, etc)
      
      3. Simplify write_checkpoint() because of #2.
      
      4. Clarify the main f2fs_gc flow.
       o monitor how many freed sections during one iteration of do_garbage_collect().
       o do GC more without checkpoints if we can't get enough free sections.
       o do checkpoint once we've got enough free sections through forground GCs.
      
      5. Adopt thread-logging (Slack-Space-Recycle) scheme more aggressively on data
        log types. See. get_ssr_segement()
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      43727527
    • J
      f2fs: prevent checkpoint once any IO failure is detected · 577e3495
      Jaegeuk Kim 提交于
      This patch enhances the checkpoint routine to cope with IO errors.
      
      Basically f2fs detects IO errors from end_io_write, and the errors are able to
      be occurred during one of data, node, and meta page writes.
      
      In the previous code, when an IO error is occurred during writes, f2fs sets a
      flag, CP_ERROR_FLAG, in the raw ckeckpoint buffer which will be written to disk.
      Afterwards, write_checkpoint() will check the flag and remount f2fs as a
      read-only (ro) mode.
      
      However, even once f2fs is remounted as a ro mode, dirty checkpoint pages are
      freely able to be written to disk by flusher or kswapd in background.
      In such a case, after cold reboot, f2fs would restore the checkpoint data having
      CP_ERROR_FLAG, resulting in disabling write_checkpoint and remounting f2fs as
      a ro mode again.
      
      Therefore, let's prevent any checkpoint page (meta) writes once an IO error is
      occurred, and remount f2fs as a ro mode right away at that moment.
      Reported-by: NOliver Winker <oliver@oli1170.net>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      Reviewed-by: NNamjae Jeon <namjae.jeon@samsung.com>
      577e3495
  14. 10 1月, 2013 1 次提交
    • J
      f2fs: revisit the f2fs_gc flow · 408e9375
      Jaegeuk Kim 提交于
      I'd like to revisit the f2fs_gc flow and rewrite as follows.
      
      1. In practical, the nGC parameter of f2fs_gc is meaningless. So, let's
        remove it.
      2. Background GC marks victim blocks as dirty one at a time.
      3. Foreground GC should do cleaning job until acquiring enough free
        sections. Afterwards, it needs to do checkpoint.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      408e9375
  15. 28 12月, 2012 4 次提交
  16. 26 12月, 2012 1 次提交
    • J
      f2fs: remove set_page_dirty for atomic f2fs_end_io_write · dfb7c0ce
      Jaegeuk Kim 提交于
      We should guarantee not to do *scheduling while atomic*.
      I found, in atomic f2fs_end_io_write(), there is a set_page_dirty() call
      to deal with IO errors.
      
      But, set_page_dirty() calls:
       -> f2fs_set_data_page_dirty()
         -> set_dirty_dir_page()
            -> cond_resched() which results in scheduling.
      
      In order to avoid this, I'd like to remove simply set_page_dirty(),
      since the page is already marked as ERROR and f2fs will be operated
      as the read-only mode as well.
      So, there is no recovery issue with this.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      dfb7c0ce
  17. 11 12月, 2012 6 次提交
    • J
      f2fs: cleanup the f2fs_bio_alloc routine · 3cd8a239
      Jaegeuk Kim 提交于
      Do cleanup more for better code readability.
      
      - Change the parameter set of f2fs_bio_alloc()
        This function should allocate a bio only since it is not something like
        f2fs_bio_init(). Instead, the caller should initialize the allocated bio.
      
      - Introduce SECTOR_FROM_BLOCK
        This macro translates a block address to its sector address.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      Reviewed-by: NNamjae Jeon <namjae.jeon@samsung.com>
      3cd8a239
    • N
      f2fs: rewrite f2fs_bio_alloc to make it simpler · c212991a
      Namjae Jeon 提交于
      Since, GFP_NOFS(__GFP_WAIT) is used for allocation requests of bio in f2fs.
      So, there is no chance of returning NULL from the BIO allocation.
      
      Making the bio allocation routine for f2fs simpler.
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NAmit Sahrawat <a.sahrawat@samsung.com>
      c212991a
    • N
      f2fs: remove unneeded initialization · 1042d60f
      Namjae Jeon 提交于
      No need to initialize  "struct f2fs_gc_kthread *gc_th = NULL",
      as gc_th = NULL, will be taken care by the return values of kmalloc().
      And fix codes in other places.
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NAmit Sahrawat <a.sahrawat@samsung.com>
      1042d60f
    • J
      f2fs: adjust kernel coding style · 0a8165d7
      Jaegeuk Kim 提交于
      As pointed out by Randy Dunlap, this patch removes all usage of "/**" for comment
      blocks. Instead, just use "/*".
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      0a8165d7
    • J
      f2fs: fix endian conversion bugs reported by sparse · 25ca923b
      Jaegeuk Kim 提交于
      This patch should resolve the bugs reported by the sparse tool.
      Initial reports were written by "kbuild test robot" managed by fengguang.wu.
      
      In my local machines, I've tested also by running:
      > make C=2 CF="-D__CHECK_ENDIAN__"
      
      Accordingly, I've found lots of warnings and bugs related to the endian
      conversion. And I've fixed all at this moment.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      25ca923b
    • J
      f2fs: add segment operations · 351df4b2
      Jaegeuk Kim 提交于
      This adds specific functions not only to manage dirty/free segments, SIT pages,
      a cache for SIT entries, and summary entries, but also to allocate free blocks
      and write three types of pages: data, node, and meta.
      
      - F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
        and dirty segments respectively.
      
      - The key information of an SIT entry consists of a segment number, the number
        of valid blocks in the segment, a bitmap to identify there-in valid or invalid
        blocks.
      
      - An SIT page is composed of a certain range of SIT entries, which is maintained
        by the address space of meta_inode.
      
      - To cache SIT entries, a simple array is used. The index for the array is the
        segment number.
      
      - A summary entry for data contains the parent node information. A summary entry
        for node contains its node offset from the inode.
      
      - F2FS manages information about six active logs and those summary entries in
        memory. Whenever one of them is changed, its summary entries are flushed to
        its SIT page maintained by the address space of meta_inode.
      
      - This patch adds a default block allocation function which supports heap-based
        allocation policy.
      
      - This patch adds core functions to write data, node, and meta pages. Since LFS
        basically produces a series of sequential writes, F2FS merges sequential bios
        with a single one as much as possible to reduce the IO scheduling overhead.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      351df4b2