1. 01 10月, 2013 1 次提交
    • V
      nilfs2: fix issue with race condition of competition between segments for dirty blocks · 7f42ec39
      Vyacheslav Dubeyko 提交于
      Many NILFS2 users were reported about strange file system corruption
      (for example):
      
         NILFS: bad btree node (blocknr=185027): level = 0, flags = 0x0, nchildren = 768
         NILFS error (device sda4): nilfs_bmap_last_key: broken bmap (inode number=11540)
      
      But such error messages are consequence of file system's issue that takes
      place more earlier.  Fortunately, Jerome Poulin <jeromepoulin@gmail.com>
      and Anton Eliasson <devel@antoneliasson.se> were reported about another
      issue not so recently.  These reports describe the issue with segctor
      thread's crash:
      
        BUG: unable to handle kernel paging request at 0000000000004c83
        IP: nilfs_end_page_io+0x12/0xd0 [nilfs2]
      
        Call Trace:
         nilfs_segctor_do_construct+0xf25/0x1b20 [nilfs2]
         nilfs_segctor_construct+0x17b/0x290 [nilfs2]
         nilfs_segctor_thread+0x122/0x3b0 [nilfs2]
         kthread+0xc0/0xd0
         ret_from_fork+0x7c/0xb0
      
      These two issues have one reason.  This reason can raise third issue
      too.  Third issue results in hanging of segctor thread with eating of
      100% CPU.
      
      REPRODUCING PATH:
      
      One of the possible way or the issue reproducing was described by
      Jermoe me Poulin <jeromepoulin@gmail.com>:
      
      1. init S to get to single user mode.
      2. sysrq+E to make sure only my shell is running
      3. start network-manager to get my wifi connection up
      4. login as root and launch "screen"
      5. cd /boot/log/nilfs which is a ext3 mount point and can log when NILFS dies.
      6. lscp | xz -9e > lscp.txt.xz
      7. mount my snapshot using mount -o cp=3360839,ro /dev/vgUbuntu/root /mnt/nilfs
      8. start a screen to dump /proc/kmsg to text file since rsyslog is killed
      9. start a screen and launch strace -f -o find-cat.log -t find
      /mnt/nilfs -type f -exec cat {} > /dev/null \;
      10. start a screen and launch strace -f -o apt-get.log -t apt-get update
      11. launch the last command again as it did not crash the first time
      12. apt-get crashes
      13. ps aux > ps-aux-crashed.log
      13. sysrq+W
      14. sysrq+E  wait for everything to terminate
      15. sysrq+SUSB
      
      Simplified way of the issue reproducing is starting kernel compilation
      task and "apt-get update" in parallel.
      
      REPRODUCIBILITY:
      
      The issue is reproduced not stable [60% - 80%].  It is very important to
      have proper environment for the issue reproducing.  The critical
      conditions for successful reproducing:
      
      (1) It should have big modified file by mmap() way.
      
      (2) This file should have the count of dirty blocks are greater that
          several segments in size (for example, two or three) from time to time
          during processing.
      
      (3) It should be intensive background activity of files modification
          in another thread.
      
      INVESTIGATION:
      
      First of all, it is possible to see that the reason of crash is not valid
      page address:
      
        NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82
        NILFS [nilfs_segctor_complete_write]:2101 segbuf->sb_segnum 6783
      
      Moreover, value of b_page (0x1a82) is 6786.  This value looks like segment
      number.  And b_blocknr with b_size values look like block numbers.  So,
      buffer_head's pointer points on not proper address value.
      
      Detailed investigation of the issue is discovered such picture:
      
        [-----------------------------SEGMENT 6783-------------------------------]
        NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
        NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
        NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
        NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
        NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
        NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
        NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111149024, segbuf->sb_segnum 6783
      
        [-----------------------------SEGMENT 6784-------------------------------]
        NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
        NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
        NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
        NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff8802174a6798, bh->b_assoc_buffers.prev ffff880221cffee8
        NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
        NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
        NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
        NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
        NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
        NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
        NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6784
        NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111150080, segbuf->sb_segnum 6784, segbuf->sb_nbio 0
        [----------] ditto
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111164416, segbuf->sb_segnum 6784, segbuf->sb_nbio 15
      
        [-----------------------------SEGMENT 6785-------------------------------]
        NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
        NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
        NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
        NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff880219277e80, bh->b_assoc_buffers.prev ffff880221cffc88
        NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
        NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
        NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
        NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
        NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
        NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6785
        NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111165440, segbuf->sb_segnum 6785, segbuf->sb_nbio 0
        [----------] ditto
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111177728, segbuf->sb_segnum 6785, segbuf->sb_nbio 12
      
        NILFS [nilfs_segctor_do_construct]:2399 nilfs_segctor_wait
        NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6783
        NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6784
        NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6785
      
        NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82
      
        BUG: unable to handle kernel paging request at 0000000000001a82
        IP: [<ffffffffa024d0f2>] nilfs_end_page_io+0x12/0xd0 [nilfs2]
      
      Usually, for every segment we collect dirty files in list.  Then, dirty
      blocks are gathered for every dirty file, prepared for write and
      submitted by means of nilfs_segbuf_submit_bh() call.  Finally, it takes
      place complete write phase after calling nilfs_end_bio_write() on the
      block layer.  Buffers/pages are marked as not dirty on final phase and
      processed files removed from the list of dirty files.
      
      It is possible to see that we had three prepare_write and submit_bio
      phases before segbuf_wait and complete_write phase.  Moreover, segments
      compete between each other for dirty blocks because on every iteration
      of segments processing dirty buffer_heads are added in several lists of
      payload_buffers:
      
        [SEGMENT 6784]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
        [SEGMENT 6785]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8
      
      The next pointer is the same but prev pointer has changed.  It means
      that buffer_head has next pointer from one list but prev pointer from
      another.  Such modification can be made several times.  And, finally, it
      can be resulted in various issues: (1) segctor hanging, (2) segctor
      crashing, (3) file system metadata corruption.
      
      FIX:
      This patch adds:
      
      (1) setting of BH_Async_Write flag in nilfs_segctor_prepare_write()
          for every proccessed dirty block;
      
      (2) checking of BH_Async_Write flag in
          nilfs_lookup_dirty_data_buffers() and
          nilfs_lookup_dirty_node_buffers();
      
      (3) clearing of BH_Async_Write flag in nilfs_segctor_complete_write(),
          nilfs_abort_logs(), nilfs_forget_buffer(), nilfs_clear_dirty_page().
      Reported-by: NJerome Poulin <jeromepoulin@gmail.com>
      Reported-by: NAnton Eliasson <devel@antoneliasson.se>
      Cc: Paul Fertser <fercerpav@gmail.com>
      Cc: ARAI Shun-ichi <hermes@ceres.dti.ne.jp>
      Cc: Piotr Szymaniak <szarpaj@grubelek.pl>
      Cc: Juan Barry Manuel Canham <Linux@riotingpacifist.net>
      Cc: Zahid Chowdhury <zahid.chowdhury@starsolutions.com>
      Cc: Elmer Zhang <freeboy6716@gmail.com>
      Cc: Kenneth Langga <klangga@gmail.com>
      Signed-off-by: NVyacheslav Dubeyko <slava@dubeyko.com>
      Acked-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7f42ec39
  2. 04 7月, 2013 1 次提交
  3. 31 7月, 2012 1 次提交
    • J
      nilfs2: Convert to new freezing mechanism · 2c22b337
      Jan Kara 提交于
      We change nilfs_page_mkwrite() to provide proper freeze protection for
      writeable page faults (we must wait for frozen filesystem even if the
      page is fully mapped).
      
      We remove all vfs_check_frozen() checks since they are now handled by
      the generic code.
      
      CC: linux-nilfs@vger.kernel.org
      CC: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2c22b337
  4. 21 6月, 2012 1 次提交
    • R
      nilfs2: ensure proper cache clearing for gc-inodes · fbb24a3a
      Ryusuke Konishi 提交于
      A gc-inode is a pseudo inode used to buffer the blocks to be moved by
      garbage collection.
      
      Block caches of gc-inodes must be cleared every time a garbage collection
      function (nilfs_clean_segments) completes.  Otherwise, stale blocks
      buffered in the caches may be wrongly reused in successive calls of the GC
      function.
      
      For user files, this is not a problem because their gc-inodes are
      distinguished by a checkpoint number as well as an inode number.  They
      never buffer different blocks if either an inode number, a checkpoint
      number, or a block offset differs.
      
      However, gc-inodes of sufile, cpfile and DAT file can store different data
      for the same block offset.  Thus, the nilfs_clean_segments function can
      move incorrect block for these meta-data files if an old block is cached.
      I found this is really causing meta-data corruption in nilfs.
      
      This fixes the issue by ensuring cache clear of gc-inodes and resolves
      reported GC problems including checkpoint file corruption, b-tree
      corruption, and the following warning during GC.
      
        nilfs_palloc_freev: entry number 307234 already freed.
        ...
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Tested-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: <stable@vger.kernel.org>	[2.6.37+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fbb24a3a
  5. 22 11月, 2011 1 次提交
    • T
      freezer: unexport refrigerator() and update try_to_freeze() slightly · a0acae0e
      Tejun Heo 提交于
      There is no reason to export two functions for entering the
      refrigerator.  Calling refrigerator() instead of try_to_freeze()
      doesn't save anything noticeable or removes any race condition.
      
      * Rename refrigerator() to __refrigerator() and make it return bool
        indicating whether it scheduled out for freezing.
      
      * Update try_to_freeze() to return bool and relay the return value of
        __refrigerator() if freezing().
      
      * Convert all refrigerator() users to try_to_freeze().
      
      * Update documentation accordingly.
      
      * While at it, add might_sleep() to try_to_freeze().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Samuel Ortiz <samuel@sortiz.org>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Christoph Hellwig <hch@infradead.org>
      a0acae0e
  6. 11 6月, 2011 1 次提交
  7. 10 5月, 2011 7 次提交
  8. 09 3月, 2011 7 次提交
  9. 02 3月, 2011 1 次提交
  10. 10 1月, 2011 3 次提交
  11. 27 10月, 2010 1 次提交
    • M
      mm: add account_page_writeback() · f629d1c9
      Michael Rubin 提交于
      To help developers and applications gain visibility into writeback
      behaviour this patch adds two counters to /proc/vmstat.
      
        # grep nr_dirtied /proc/vmstat
        nr_dirtied 3747
        # grep nr_written /proc/vmstat
        nr_written 3618
      
      These entries allow user apps to understand writeback behaviour over time
      and learn how it is impacting their performance.  Currently there is no
      way to inspect dirty and writeback speed over time.  It's not possible for
      nr_dirty/nr_writeback.
      
      These entries are necessary to give visibility into writeback behaviour.
      We have /proc/diskstats which lets us understand the io in the block
      layer.  We have blktrace for more in depth understanding.  We have
      e2fsprogs and debugsfs to give insight into the file systems behaviour,
      but we don't offer our users the ability understand what writeback is
      doing.  There is no way to know how active it is over the whole system, if
      it's falling behind or to quantify it's efforts.  With these values
      exported users can easily see how much data applications are sending
      through writeback and also at what rates writeback is processing this
      data.  Comparing the rates of change between the two allow developers to
      see when writeback is not able to keep up with incoming traffic and the
      rate of dirty memory being sent to the IO back end.  This allows folks to
      understand their io workloads and track kernel issues.  Non kernel
      engineers at Google often use these counters to solve puzzling performance
      problems.
      
      Patch #4 adds a pernode vmstat file with nr_dirtied and nr_written
      
      Patch #5 add writeback thresholds to /proc/vmstat
      
      Currently these values are in debugfs. But they should be promoted to
      /proc since they are useful for developers who are writing databases
      and file servers and are not debugging the kernel.
      
      The output is as below:
      
       # grep threshold /proc/vmstat
       nr_pages_dirty_threshold 409111
       nr_pages_dirty_background_threshold 818223
      
      This patch:
      
      This allows code outside of the mm core to safely manipulate page
      writeback state and not worry about the other accounting.  Not using these
      routines means that some code will lose track of the accounting and we get
      bugs.
      
      Modify nilfs2 to use interface.
      Signed-off-by: NMichael Rubin <mrubin@google.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Jiro SEKIBA <jir@unicus.jp>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f629d1c9
  12. 23 10月, 2010 9 次提交
  13. 23 7月, 2010 4 次提交
    • R
      nilfs2: do not update log cursor for small change · 32502047
      Ryusuke Konishi 提交于
      Super blocks of nilfs are periodically overwritten in order to record
      the recent log position.  This shortens recovery time after unclean
      unmount, but the current implementation performs the update even for a
      few blocks of change.  If the filesystem gets small changes slowly and
      continually, super blocks may be updated excessively.
      
      This moderates the issue by skipping update of log cursor if it does
      not cross a segment boundary.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      32502047
    • J
      nilfs2: sync super blocks in turns · b2ac86e1
      Jiro SEKIBA 提交于
      This will sync super blocks in turns instead of syncing duplicate
      super blocks at the time.  This will help searching valid super root
      when super block is written into disk before log is written, which is
      happen when barrier-less block devices are unmounted uncleanly.  In
      the situation, old super block likely points to valid log.
      
      This patch introduces ns_sbwcount member to the nilfs object and adds
      nilfs_sb_will_flip() function; ns_sbwcount counts how many times super
      blocks write back to the disk.  And, nilfs_sb_will_flip() decides
      whether flipping required or not based on the count of ns_sbwcount to
      sync super blocks asymmetrically.
      
      The following functions are also changed:
      
       - nilfs_prepare_super(): flips super blocks according to the
         argument.  The argument is calculated by nilfs_sb_will_flip()
         function.
      
       - nilfs_cleanup_super(): sets "clean" flag to both super blocks if
         they point to the same checkpoint.
      
      To update both of super block information, caller of
      nilfs_commit_super must set the information on both super blocks.
      Signed-off-by: NJiro SEKIBA <jir@unicus.jp>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      b2ac86e1
    • J
      nilfs2: introduce nilfs_prepare_super · d26493b6
      Jiro SEKIBA 提交于
      This function checks validity of super block pointers.
      If first super block is invalid, it will swap the super blocks.
      The function should be called before any super block information updates.
      Caller must obtain nilfs->ns_sem.
      Signed-off-by: NJiro SEKIBA <jir@unicus.jp>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      d26493b6
    • R
      nilfs2: get rid of macros for segment summary information · 4762077c
      Ryusuke Konishi 提交于
      This removes macros to test segment summary flags and redefines a few
      relevant macros with inline functions.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      4762077c
  14. 10 5月, 2010 2 次提交