1. 24 11月, 2013 2 次提交
    • K
      block: Abstract out bvec iterator · 4f024f37
      Kent Overstreet 提交于
      Immutable biovecs are going to require an explicit iterator. To
      implement immutable bvecs, a later patch is going to add a bi_bvec_done
      member to this struct; for now, this patch effectively just renames
      things.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Cc: Benny Halevy <bhalevy@tonian.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: xfs@oss.sgi.com
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: "Roger Pau Monné" <roger.pau@citrix.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
      Cc: Ian Campbell <Ian.Campbell@citrix.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchand@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Peng Tao <tao.peng@emc.com>
      Cc: Andy Adamson <andros@netapp.com>
      Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
      Cc: Jie Liu <jeff.liu@oracle.com>
      Cc: Sunil Mushran <sunil.mushran@gmail.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Pankaj Kumar <pankaj.km@samsung.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Mel Gorman <mgorman@suse.de>6
      4f024f37
    • K
      block: Convert various code to bio_for_each_segment() · 2c30c71b
      Kent Overstreet 提交于
      With immutable biovecs we don't want code accessing bi_io_vec directly -
      the uses this patch changes weren't incorrect since they all own the
      bio, but it makes the code harder to audit for no good reason - also,
      this will help with multipage bvecs later.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      2c30c71b
  2. 11 11月, 2013 1 次提交
    • C
      f2fs: issue more large discard command · 29e59c14
      Changman Lee 提交于
      o Changes from v1
        Use find_next(_zero)_bit suggested by jg.kim
      
      When f2fs issues discard command, if segment is contiguous,
      let's issue more large segment to gather adjacent segments.
      
      ** blktrace **
      179,1    0     5859    42.619023770   971  C   D 131072 + 2097152 [0]
      179,1    0    33665   108.840475468   971  C   D 2228224 + 2494464 [0]
      179,1    0    33671   109.131616427   971  C   D 14909440 + 344064 [0]
      179,1    0    33677   109.137100677   971  C   D 15261696 + 4096 [0]
      Signed-off-by: NChangman Lee <cm224.lee@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      29e59c14
  3. 08 11月, 2013 1 次提交
  4. 06 11月, 2013 1 次提交
  5. 30 10月, 2013 1 次提交
    • F
      f2fs: change the method of calculating the number summary blocks · 9a47938b
      Fan Li 提交于
      npages_for_summary_flush uses (SUMMARY_SIZE + 1) as the size of a f2fs_summary
      while its actual size is  SUMMARY_SIZE. So the result sometimes is bigger than
      actual number by one, which causes checkpoint can't be written into disk
      contiguously, and sometimes summary blocks can't be compacted like they should.
      Besides, when writing summary blocks into pages, if remain space in a page
      isn't big enough for one f2fs_summary, it will be left unused, current code
      seems not to take it into account.
      Signed-off-by: NFan Li <fanofcode.li@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      9a47938b
  6. 29 10月, 2013 1 次提交
  7. 28 10月, 2013 1 次提交
  8. 25 10月, 2013 4 次提交
  9. 22 10月, 2013 2 次提交
  10. 18 10月, 2013 1 次提交
    • G
      f2fs: avoid wait if IO end up when do_checkpoint for better performance · e2340887
      Gu Zheng 提交于
      Previously, do_checkpoint() will call congestion_wait() for waiting the pages
      (previous submitted node/meta/data pages) to be written back.
      Because congestion_wait() will set a regular period (e.g. HZ / 50 ) for waiting, and
      no additional wake up mechanism was introduced if IO ends up before regular period costed.
      Yuan Zhong found there is a situation that after the pages have been written back,
      but the checkpoint thread still wait for congestion_wait to exit.
      
      So here we store checkpoint task into f2fs_sb when doing checkpoint, it'll wait for IO completes
      if there's IO going on, and in the end IO path, wake up checkpoint task when IO ends up.
      
      Thanks to Yuan Zhong's pre work about this problem.
      Reported-by: NYuan Zhong <yuan.mark.zhong@samsung.com>
      Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      e2340887
  11. 24 9月, 2013 1 次提交
  12. 19 8月, 2013 1 次提交
  13. 12 8月, 2013 1 次提交
  14. 06 8月, 2013 1 次提交
    • J
      f2fs: fix a deadlock in fsync · a569469e
      Jin Xu 提交于
      This patch fixes a deadlock bug that occurs quite often when there are
      concurrent write and fsync on a same file.
      
      Following is the simplified call trace when tasks get hung.
      
      fsync thread:
      - f2fs_sync_file
       ...
       - f2fs_write_data_pages
       ...
        - update_extent_cache
        ...
         - update_inode
          - wait_on_page_writeback
      
      bdi writeback thread
      - __writeback_single_inode
       - f2fs_write_data_pages
        - mutex_lock(sbi->writepages)
      
      The deadlock happens when the fsync thread waits on a inode page that has
      been added to the f2fs' cached bio sbi->bio[NODE], and unfortunately,
      no one else could be able to submit the cached bio to block layer for
      writeback. This is because the fsync thread already hold a sbi->fs_lock and
      the sbi->writepages lock, causing the bdi thread being blocked when attempt
      to write data pages for the same inode. At the same time, f2fs_gc thread
      does not notice the situation and could not help. Even the sync syscall
      gets blocked.
      
      To fix it, we could submit the cached bio first before waiting on a inode page
      that is being written back.
      Signed-off-by: NJin Xu <jinuxstyle@gmail.com>
      [Jaegeuk Kim: add more cases to use f2fs_wait_on_page_writeback]
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      a569469e
  15. 30 7月, 2013 1 次提交
  16. 02 7月, 2013 2 次提交
    • J
      f2fs: remove reusing any prefree segments · 763bfe1b
      Jaegeuk Kim 提交于
      This patch removes check_prefree_segments initially designed to enhance the
      performance by narrowing the range of LBA usage across the whole block device.
      
      When allocating a new segment, previous f2fs tries to find proper prefree
      segments, and then, if finds a segment, it reuses the segment for further
      data or node block allocation.
      
      However, I found that this was totally wrong approach since the prefree segments
      have several data or node blocks that will be used by the roll-forward mechanism
      operated after sudden-power-off.
      
      Let's assume the following scenario.
      
      /* write 8MB with fsync */
      for (i = 0; i < 2048; i++) {
      	offset = i * 4096;
      	write(fd, offset, 4KB);
      	fsync(fd);
      }
      
      In this case, naive segment allocation sequence will be like:
       data segment: x, x+1, x+2, x+3
       node segment: y, y+1, y+2, y+3.
      
      But, if we can reuse prefree segments, the sequence can be like:
       data segment: x, x+1, y, y+1
       node segment: y, y+1, y+2, y+3.
      Because, y, y+1, and y+2 became prefree segments one by one, and those are
      reused by data allocation.
      
      After conducting this workload, we should consider how to recover the latest
      inode with its data.
      If we reuse the prefree segments such as y or y+1, we lost the old node blocks
      so that f2fs even cannot start roll-forward recovery.
      
      Therefore, I suggest that we should remove reusing prefree segments.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      763bfe1b
    • N
      f2fs: optimize the init_dirty_segmap function · 8736fbf0
      Namjae Jeon 提交于
      Optimize the while loop condition
      
      Since this condition will always be true and while loop will
      be terminated by the following condition in code:
      
      if (segno >= TOTAL_SEGS(sbi))
          break;
      Hence we can replace the while loop condition with while(1)
      instead of always checking for segno to be less than Total segs.
      
      Also we do not need to use TOTAL_SEGS() everytime. We can store
      this value in a local variable since this value is constant.
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NPankaj Kumar <pankaj.km@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      8736fbf0
  17. 14 6月, 2013 3 次提交
  18. 28 5月, 2013 2 次提交
  19. 30 4月, 2013 1 次提交
    • J
      f2fs: modify the number of issued pages to merge IOs · ac5d156c
      Jaegeuk Kim 提交于
      When testing f2fs on an SSD, I found some 128 page IOs followed by 1 page IO
      were issued by f2fs_write_node_pages.
      This means that there were some mishandling flows which degrades performance.
      
      Previous f2fs_write_node_pages determines the number of pages to be written,
      nr_to_write, as follows.
      
      1. The bio_get_nr_vecs returns 129 pages.
      2. The bio_alloc makes a room for 128 pages.
      3. The initial 128 pages go into one bio.
      4. The existing bio is submitted, and a new bio is prepared for the last 1 page.
      5. Finally, sync_node_pages submits the last 1 page bio.
      
      The problem is from the use of bio_get_nr_vecs, so this patch replace it
      with max_hw_blocks using queue_max_sectors.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      ac5d156c
  20. 26 4月, 2013 1 次提交
  21. 23 4月, 2013 1 次提交
  22. 03 4月, 2013 5 次提交
    • J
      f2fs: fix the bitmap consistency of dirty segments · b2f2c390
      Jaegeuk Kim 提交于
      Like below, there are 8 segment bitmaps for SSR victim candidates.
      
      enum dirty_type {
      	DIRTY_HOT_DATA,		/* dirty segments assigned as hot data logs */
      	DIRTY_WARM_DATA,	/* dirty segments assigned as warm data logs */
      	DIRTY_COLD_DATA,	/* dirty segments assigned as cold data logs */
      	DIRTY_HOT_NODE,		/* dirty segments assigned as hot node logs */
      	DIRTY_WARM_NODE,	/* dirty segments assigned as warm node logs */
      	DIRTY_COLD_NODE,	/* dirty segments assigned as cold node logs */
      	DIRTY,			/* to count # of dirty segments */
      	PRE,			/* to count # of entirely obsolete segments */
      	NR_DIRTY_TYPE
      };
      
      The upper 6 bitmaps indicates segments dirtied by active log areas respectively.
      And, the DIRTY bitmap integrates all the 6 bitmaps.
      
      For example,
       o DIRTY_HOT_DATA : 1010000
       o DIRTY_WARM_DATA: 0100000
       o DIRTY_COLD_DATA: 0001000
       o DIRTY_HOT_NODE : 0000010
       o DIRTY_WARM_NODE: 0000001
       o DIRTY_COLD_NODE: 0000000
      In this case,
       o DIRTY          : 1111011,
      
       which means that we should guarantee the consistency between DIRTY and other
       bitmaps concreately.
      
      However, the SSR mode selects victims freely from any log types, which can set
      multiple bits across the various bitmap types.
      
      So, this patch eliminates this inconsistency.
      Reviewed-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      b2f2c390
    • J
      f2fs: allocate remained free segments in the LFS mode · 60374688
      Jaegeuk Kim 提交于
      This patch adds a new condition that allocates free segments in the current
      active section even if SSR is needed.
      Otherwise, f2fs cannot allocate remained free segments in the section since
      SSR finds dirty segments only.
      Reviewed-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      60374688
    • J
      f2fs: change GC bitmaps to apply the section granularity · 5ec4e49f
      Jaegeuk Kim 提交于
      This patch removes a bitmap for victim segments selected by foreground GC, and
      modifies the other bitmap for victim segments selected by background GC.
      
      1) foreground GC bitmap
       : We don't need to manage this, since we just only one previous victim section
         number instead of the whole victim history.
         The f2fs uses the victim section number in order not to allocate currently
         GC'ed section to current active logs.
      
      2) background GC bitmap
       : This bitmap is used to avoid selecting victims repeatedly by background GCs.
         In addition, the victims are able to be selected by foreground GCs, since
         there is no need to read victim blocks during foreground GCs.
      
         By the fact that the foreground GC reclaims segments in a section unit, it'd
         be better to manage this bitmap based on the section granularity.
      Reviewed-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      5ec4e49f
    • J
      f2fs: allocate new segment aligned with sections · 33afa7fd
      Jaegeuk Kim 提交于
      When allocating a new segment under the LFS mode, we should keep the section
      boundary.
      Reviewed-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      33afa7fd
    • J
      f2fs: introduce TOTAL_SECS macro · 53cf9522
      Jaegeuk Kim 提交于
      Let's use a macro to get the total number of sections.
      Reviewed-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      53cf9522
  23. 31 3月, 2013 1 次提交
  24. 12 2月, 2013 2 次提交
    • J
      f2fs: clarify and enhance the f2fs_gc flow · 43727527
      Jaegeuk Kim 提交于
      This patch makes clearer the ambiguous f2fs_gc flow as follows.
      
      1. Remove intermediate checkpoint condition during f2fs_gc
       (i.e., should_do_checkpoint() and GC_BLOCKED)
      
      2. Remove unnecessary return values of f2fs_gc because of #1.
       (i.e., GC_NODE, GC_OK, etc)
      
      3. Simplify write_checkpoint() because of #2.
      
      4. Clarify the main f2fs_gc flow.
       o monitor how many freed sections during one iteration of do_garbage_collect().
       o do GC more without checkpoints if we can't get enough free sections.
       o do checkpoint once we've got enough free sections through forground GCs.
      
      5. Adopt thread-logging (Slack-Space-Recycle) scheme more aggressively on data
        log types. See. get_ssr_segement()
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      43727527
    • J
      f2fs: prevent checkpoint once any IO failure is detected · 577e3495
      Jaegeuk Kim 提交于
      This patch enhances the checkpoint routine to cope with IO errors.
      
      Basically f2fs detects IO errors from end_io_write, and the errors are able to
      be occurred during one of data, node, and meta page writes.
      
      In the previous code, when an IO error is occurred during writes, f2fs sets a
      flag, CP_ERROR_FLAG, in the raw ckeckpoint buffer which will be written to disk.
      Afterwards, write_checkpoint() will check the flag and remount f2fs as a
      read-only (ro) mode.
      
      However, even once f2fs is remounted as a ro mode, dirty checkpoint pages are
      freely able to be written to disk by flusher or kswapd in background.
      In such a case, after cold reboot, f2fs would restore the checkpoint data having
      CP_ERROR_FLAG, resulting in disabling write_checkpoint and remounting f2fs as
      a ro mode again.
      
      Therefore, let's prevent any checkpoint page (meta) writes once an IO error is
      occurred, and remount f2fs as a ro mode right away at that moment.
      Reported-by: NOliver Winker <oliver@oli1170.net>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      Reviewed-by: NNamjae Jeon <namjae.jeon@samsung.com>
      577e3495
  25. 10 1月, 2013 1 次提交
    • J
      f2fs: revisit the f2fs_gc flow · 408e9375
      Jaegeuk Kim 提交于
      I'd like to revisit the f2fs_gc flow and rewrite as follows.
      
      1. In practical, the nGC parameter of f2fs_gc is meaningless. So, let's
        remove it.
      2. Background GC marks victim blocks as dirty one at a time.
      3. Foreground GC should do cleaning job until acquiring enough free
        sections. Afterwards, it needs to do checkpoint.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      408e9375
  26. 28 12月, 2012 1 次提交