1. 24 5月, 2016 3 次提交
  2. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  3. 07 11月, 2015 3 次提交
    • R
      nilfs2: fix gcc unused-but-set-variable warnings · 09ef29e0
      Ryusuke Konishi 提交于
      Fix the following build warnings:
      
       $ make W=1
       [...]
         CC [M]  fs/nilfs2/btree.o
       fs/nilfs2/btree.c: In function 'nilfs_btree_split':
       fs/nilfs2/btree.c:923:8: warning: variable 'newptr' set but not used [-Wunused-but-set-variable]
         __u64 newptr;
               ^
       fs/nilfs2/btree.c:922:8: warning: variable 'newkey' set but not used [-Wunused-but-set-variable]
         __u64 newkey;
               ^
         CC [M]  fs/nilfs2/dat.o
       fs/nilfs2/dat.c: In function 'nilfs_dat_prepare_end':
       fs/nilfs2/dat.c:158:8: warning: variable 'start' set but not used [-Wunused-but-set-variable]
         __u64 start;
               ^
         CC [M]  fs/nilfs2/segment.o
       fs/nilfs2/segment.c: In function 'nilfs_segctor_do_immediate_flush':
       fs/nilfs2/segment.c:2433:6: warning: variable 'err' set but not used [-Wunused-but-set-variable]
         int err;
             ^
         CC [M]  fs/nilfs2/sufile.o
       fs/nilfs2/sufile.c: In function 'nilfs_sufile_alloc':
       fs/nilfs2/sufile.c:320:27: warning: variable 'ncleansegs' set but not used [-Wunused-but-set-variable]
         unsigned long nsegments, ncleansegs, nsus, cnt;
                                  ^
         CC [M]  fs/nilfs2/alloc.o
       fs/nilfs2/alloc.c: In function 'nilfs_palloc_prepare_alloc_entry':
       fs/nilfs2/alloc.c:478:38: warning: variable 'groups_per_desc_block' set but not used [-Wunused-but-set-variable]
         unsigned long n, entries_per_group, groups_per_desc_block;
                                             ^
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09ef29e0
    • H
      nilfs2: add a tracepoint for transaction events · 44fda114
      Hitoshi Mitake 提交于
      This patch adds a tracepoint for transaction events of nilfs.  With the
      tracepoint, these events can be tracked: begin, abort, commit, trylock,
      lock, and unlock.  Basically, these events have corresponding functions
      e.g.  begin event corresponds nilfs_transaction_begin().  The unlock event
      is an exception.  It corresponds to the iteration in
      nilfs_transaction_lock().
      
      Only one tracepoint is introcued: nilfs2_transaction_transition.  The
      above events are distinguished with newly introduced enum.  With this
      tracepoint, we can analyse a critical section of segment constructoin.
      
      Sample output by tpoint of perf-tools:
                    cp-4457  [000] ...1    63.266220: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 1 flags = 9 state = BEGIN
                    cp-4457  [000] ...1    63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT
                    cp-4457  [000] ...1    63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT
              segctord-4371  [001] ...1    68.261196: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK
              segctord-4371  [001] ...1    68.261280: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = LOCK
              segctord-4371  [001] ...1    68.261877: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 1 flags = 10 state = BEGIN
              segctord-4371  [001] ...1    68.262116: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = COMMIT
              segctord-4371  [001] ...1    68.265032: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = UNLOCK
              segctord-4371  [001] ...1   132.376847: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK
      
      This patch also does trivial cleaning of comma usage in collection stage
      transition event for consistent coding style.
      Signed-off-by: NHitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      44fda114
    • H
      nilfs2: add a tracepoint for tracking stage transition of segment construction · 58497703
      Hitoshi Mitake 提交于
      This patch adds a tracepoint for tracking stage transition of block
      collection in segment construction.  With the tracepoint, we can analysis
      the behavior of segment construction in depth.  It would be useful for
      bottleneck detection and debugging, etc.
      
      The tracepoint is created with the standard trace API of linux (like ext3,
      ext4, f2fs and btrfs).  So we can analysis with existing tools easily.  Of
      course, more detailed analysis will be possible if we can create nilfs
      specific analysis tools.
      
      Below is an example of event dump with Brendan Gregg's perf-tools
      (https://github.com/brendangregg/perf-tools).  Time consumption between
      each stage can be obtained.
      
      $ sudo bin/tpoint nilfs2:nilfs2_collection_stage_transition
      Tracing nilfs2:nilfs2_collection_stage_transition. Ctrl-C to end.
              segctord-14875 [003] ...1 28311.067794: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_INIT
              segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_GC
              segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_FILE
              segctord-14875 [003] ...1 28311.068486: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_IFILE
              segctord-14875 [003] ...1 28311.068540: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_CPFILE
              segctord-14875 [003] ...1 28311.068561: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SUFILE
              segctord-14875 [003] ...1 28311.068565: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DAT
              segctord-14875 [003] ...1 28311.068573: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SR
              segctord-14875 [003] ...1 28311.068574: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DONE
      
      For capturing transition correctly, this patch adds wrappers for the
      member scnt of nilfs_cstage.  With this change, every transition of the
      stage can produce trace event in a correct manner.
      Signed-off-by: NHitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      58497703
  4. 17 4月, 2015 2 次提交
  5. 13 3月, 2015 1 次提交
    • R
      nilfs2: fix deadlock of segment constructor during recovery · 283ee148
      Ryusuke Konishi 提交于
      According to a report from Yuxuan Shui, nilfs2 in kernel 3.19 got stuck
      during recovery at mount time.  The code path that caused the deadlock was
      as follows:
      
        nilfs_fill_super()
          load_nilfs()
            nilfs_salvage_orphan_logs()
              * Do roll-forwarding, attach segment constructor for recovery,
                and kick it.
      
              nilfs_segctor_thread()
                nilfs_segctor_thread_construct()
                 * A lock is held with nilfs_transaction_lock()
                   nilfs_segctor_do_construct()
                     nilfs_segctor_drop_written_files()
                       iput()
                         iput_final()
                           write_inode_now()
                             writeback_single_inode()
                               __writeback_single_inode()
                                 do_writepages()
                                   nilfs_writepage()
                                     nilfs_construct_dsync_segment()
                                       nilfs_transaction_lock() --> deadlock
      
      This can happen if commit 7ef3ff2f ("nilfs2: fix deadlock of segment
      constructor over I_SYNC flag") is applied and roll-forward recovery was
      performed at mount time.  The roll-forward recovery can happen if datasync
      write is done and the file system crashes immediately after that.  For
      instance, we can reproduce the issue with the following steps:
      
       < nilfs2 is mounted on /nilfs (device: /dev/sdb1) >
       # dd if=/dev/zero of=/nilfs/test bs=4k count=1 && sync
       # dd if=/dev/zero of=/nilfs/test conv=notrunc oflag=dsync bs=4k
       count=1 && reboot -nfh
       < the system will immediately reboot >
       # mount -t nilfs2 /dev/sdb1 /nilfs
      
      The deadlock occurs because iput() can run segment constructor through
      writeback_single_inode() if MS_ACTIVE flag is not set on sb->s_flags.  The
      above commit changed segment constructor so that it calls iput()
      asynchronously for inodes with i_nlink == 0, but that change was
      imperfect.
      
      This fixes the another deadlock by deferring iput() in segment constructor
      even for the case that mount is not finished, that is, for the case that
      MS_ACTIVE flag is not set.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Reported-by: NYuxuan Shui <yshuiv7@gmail.com>
      Tested-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      283ee148
  6. 06 2月, 2015 1 次提交
    • R
      nilfs2: fix deadlock of segment constructor over I_SYNC flag · 7ef3ff2f
      Ryusuke Konishi 提交于
      Nilfs2 eventually hangs in a stress test with fsstress program.  This
      issue was caused by the following deadlock over I_SYNC flag between
      nilfs_segctor_thread() and writeback_sb_inodes():
      
        nilfs_segctor_thread()
          nilfs_segctor_thread_construct()
            nilfs_segctor_unlock()
              nilfs_dispose_list()
                iput()
                  iput_final()
                    evict()
                      inode_wait_for_writeback()  * wait for I_SYNC flag
      
        writeback_sb_inodes()
           * set I_SYNC flag on inode->i_state
          __writeback_single_inode()
            do_writepages()
              nilfs_writepages()
                nilfs_construct_dsync_segment()
                  nilfs_segctor_sync()
                     * wait for completion of segment constructor
          inode_sync_complete()
             * clear I_SYNC flag after __writeback_single_inode() completed
      
      writeback_sb_inodes() calls do_writepages() for dirty inodes after
      setting I_SYNC flag on inode->i_state.  do_writepages() in turn calls
      nilfs_writepages(), which can run segment constructor and wait for its
      completion.  On the other hand, segment constructor calls iput(), which
      can call evict() and wait for the I_SYNC flag on
      inode_wait_for_writeback().
      
      Since segment constructor doesn't know when I_SYNC will be set, it
      cannot know whether iput() will block or not unless inode->i_nlink has a
      non-zero count.  We can prevent evict() from being called in iput() by
      implementing sop->drop_inode(), but it's not preferable to leave inodes
      with i_nlink == 0 for long periods because it even defers file
      truncation and inode deallocation.  So, this instead resolves the
      deadlock by calling iput() asynchronously with a workqueue for inodes
      with i_nlink == 0.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Tested-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ef3ff2f
  7. 14 10月, 2014 2 次提交
    • A
      nilfs2: improve the performance of fdatasync() · b9f66140
      Andreas Rohner 提交于
      Support for fdatasync() has been implemented in NILFS2 for a long time,
      but whenever the corresponding inode is dirty the implementation falls
      back to a full-flegded sync().  Since every write operation has to
      update the modification time of the file, the inode will almost always
      be dirty and fdatasync() will fall back to sync() most of the time.  But
      this fallback is only necessary for a change of the file size and not
      for a change of the various timestamps.
      
      This patch adds a new flag NILFS_I_INODE_SYNC to differentiate between
      those two situations.
      
       * If it is set the file size was changed and a full sync is necessary.
       * If it is not set then only the timestamps were updated and
         fdatasync() can go ahead.
      
      There is already a similar flag I_DIRTY_DATASYNC on the VFS layer with
      the exact same semantics.  Unfortunately it cannot be used directly,
      because NILFS2 doesn't implement write_inode() and doesn't clear the VFS
      flags when inodes are written out.  So the VFS writeback thread can
      clear I_DIRTY_DATASYNC at any time without notifying NILFS2.  So
      I_DIRTY_DATASYNC has to be mapped onto NILFS_I_INODE_SYNC in
      nilfs_update_inode().
      Signed-off-by: NAndreas Rohner <andreas.rohner@gmx.net>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9f66140
    • A
      nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs() · e2c7617a
      Andreas Rohner 提交于
      Under normal circumstances nilfs_sync_fs() writes out the super block,
      which causes a flush of the underlying block device.  But this depends
      on the THE_NILFS_SB_DIRTY flag, which is only set if the pointer to the
      last segment crosses a segment boundary.  So if only a small amount of
      data is written before the call to nilfs_sync_fs(), no flush of the
      block device occurs.
      
      In the above case an additional call to blkdev_issue_flush() is needed.
      To prevent unnecessary overhead, the new flag nilfs->ns_flushed_device
      is introduced, which is cleared whenever new logs are written and set
      whenever the block device is flushed.  For convenience the function
      nilfs_flush_device() is added, which contains the above logic.
      Signed-off-by: NAndreas Rohner <andreas.rohner@gmx.net>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e2c7617a
  8. 15 1月, 2014 1 次提交
    • A
      nilfs2: fix segctor bug that causes file system corruption · 70f2fe3a
      Andreas Rohner 提交于
      There is a bug in the function nilfs_segctor_collect, which results in
      active data being written to a segment, that is marked as clean.  It is
      possible, that this segment is selected for a later segment
      construction, whereby the old data is overwritten.
      
      The problem shows itself with the following kernel log message:
      
        nilfs_sufile_do_cancel_free: segment 6533 must be clean
      
      Usually a few hours later the file system gets corrupted:
      
        NILFS: bad btree node (blocknr=8748107): level = 0, flags = 0x0, nchildren = 0
        NILFS error (device sdc1): nilfs_bmap_last_key: broken bmap (inode number=114660)
      
      The issue can be reproduced with a file system that is nearly full and
      with the cleaner running, while some IO intensive task is running.
      Although it is quite hard to reproduce.
      
      This is what happens:
      
       1. The cleaner starts the segment construction
       2. nilfs_segctor_collect is called
       3. sc_stage is on NILFS_ST_SUFILE and segments are freed
       4. sc_stage is on NILFS_ST_DAT current segment is full
       5. nilfs_segctor_extend_segments is called, which
          allocates a new segment
       6. The new segment is one of the segments freed in step 3
       7. nilfs_sufile_cancel_freev is called and produces an error message
       8. Loop around and the collection starts again
       9. sc_stage is on NILFS_ST_SUFILE and segments are freed
          including the newly allocated segment, which will contain active
          data and can be allocated at a later time
      10. A few hours later another segment construction allocates the
          segment and causes file system corruption
      
      This can be prevented by simply reordering the statements.  If
      nilfs_sufile_cancel_freev is called before nilfs_segctor_extend_segments
      the freed segments are marked as dirty and cannot be allocated any more.
      Signed-off-by: NAndreas Rohner <andreas.rohner@gmx.net>
      Reviewed-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Tested-by: NAndreas Rohner <andreas.rohner@gmx.net>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      70f2fe3a
  9. 01 10月, 2013 1 次提交
    • V
      nilfs2: fix issue with race condition of competition between segments for dirty blocks · 7f42ec39
      Vyacheslav Dubeyko 提交于
      Many NILFS2 users were reported about strange file system corruption
      (for example):
      
         NILFS: bad btree node (blocknr=185027): level = 0, flags = 0x0, nchildren = 768
         NILFS error (device sda4): nilfs_bmap_last_key: broken bmap (inode number=11540)
      
      But such error messages are consequence of file system's issue that takes
      place more earlier.  Fortunately, Jerome Poulin <jeromepoulin@gmail.com>
      and Anton Eliasson <devel@antoneliasson.se> were reported about another
      issue not so recently.  These reports describe the issue with segctor
      thread's crash:
      
        BUG: unable to handle kernel paging request at 0000000000004c83
        IP: nilfs_end_page_io+0x12/0xd0 [nilfs2]
      
        Call Trace:
         nilfs_segctor_do_construct+0xf25/0x1b20 [nilfs2]
         nilfs_segctor_construct+0x17b/0x290 [nilfs2]
         nilfs_segctor_thread+0x122/0x3b0 [nilfs2]
         kthread+0xc0/0xd0
         ret_from_fork+0x7c/0xb0
      
      These two issues have one reason.  This reason can raise third issue
      too.  Third issue results in hanging of segctor thread with eating of
      100% CPU.
      
      REPRODUCING PATH:
      
      One of the possible way or the issue reproducing was described by
      Jermoe me Poulin <jeromepoulin@gmail.com>:
      
      1. init S to get to single user mode.
      2. sysrq+E to make sure only my shell is running
      3. start network-manager to get my wifi connection up
      4. login as root and launch "screen"
      5. cd /boot/log/nilfs which is a ext3 mount point and can log when NILFS dies.
      6. lscp | xz -9e > lscp.txt.xz
      7. mount my snapshot using mount -o cp=3360839,ro /dev/vgUbuntu/root /mnt/nilfs
      8. start a screen to dump /proc/kmsg to text file since rsyslog is killed
      9. start a screen and launch strace -f -o find-cat.log -t find
      /mnt/nilfs -type f -exec cat {} > /dev/null \;
      10. start a screen and launch strace -f -o apt-get.log -t apt-get update
      11. launch the last command again as it did not crash the first time
      12. apt-get crashes
      13. ps aux > ps-aux-crashed.log
      13. sysrq+W
      14. sysrq+E  wait for everything to terminate
      15. sysrq+SUSB
      
      Simplified way of the issue reproducing is starting kernel compilation
      task and "apt-get update" in parallel.
      
      REPRODUCIBILITY:
      
      The issue is reproduced not stable [60% - 80%].  It is very important to
      have proper environment for the issue reproducing.  The critical
      conditions for successful reproducing:
      
      (1) It should have big modified file by mmap() way.
      
      (2) This file should have the count of dirty blocks are greater that
          several segments in size (for example, two or three) from time to time
          during processing.
      
      (3) It should be intensive background activity of files modification
          in another thread.
      
      INVESTIGATION:
      
      First of all, it is possible to see that the reason of crash is not valid
      page address:
      
        NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82
        NILFS [nilfs_segctor_complete_write]:2101 segbuf->sb_segnum 6783
      
      Moreover, value of b_page (0x1a82) is 6786.  This value looks like segment
      number.  And b_blocknr with b_size values look like block numbers.  So,
      buffer_head's pointer points on not proper address value.
      
      Detailed investigation of the issue is discovered such picture:
      
        [-----------------------------SEGMENT 6783-------------------------------]
        NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
        NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
        NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
        NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
        NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
        NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
        NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111149024, segbuf->sb_segnum 6783
      
        [-----------------------------SEGMENT 6784-------------------------------]
        NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
        NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
        NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
        NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff8802174a6798, bh->b_assoc_buffers.prev ffff880221cffee8
        NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
        NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
        NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
        NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
        NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
        NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
        NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6784
        NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111150080, segbuf->sb_segnum 6784, segbuf->sb_nbio 0
        [----------] ditto
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111164416, segbuf->sb_segnum 6784, segbuf->sb_nbio 15
      
        [-----------------------------SEGMENT 6785-------------------------------]
        NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
        NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
        NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
        NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff880219277e80, bh->b_assoc_buffers.prev ffff880221cffc88
        NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
        NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
        NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
        NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
        NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
        NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6785
        NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111165440, segbuf->sb_segnum 6785, segbuf->sb_nbio 0
        [----------] ditto
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111177728, segbuf->sb_segnum 6785, segbuf->sb_nbio 12
      
        NILFS [nilfs_segctor_do_construct]:2399 nilfs_segctor_wait
        NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6783
        NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6784
        NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6785
      
        NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82
      
        BUG: unable to handle kernel paging request at 0000000000001a82
        IP: [<ffffffffa024d0f2>] nilfs_end_page_io+0x12/0xd0 [nilfs2]
      
      Usually, for every segment we collect dirty files in list.  Then, dirty
      blocks are gathered for every dirty file, prepared for write and
      submitted by means of nilfs_segbuf_submit_bh() call.  Finally, it takes
      place complete write phase after calling nilfs_end_bio_write() on the
      block layer.  Buffers/pages are marked as not dirty on final phase and
      processed files removed from the list of dirty files.
      
      It is possible to see that we had three prepare_write and submit_bio
      phases before segbuf_wait and complete_write phase.  Moreover, segments
      compete between each other for dirty blocks because on every iteration
      of segments processing dirty buffer_heads are added in several lists of
      payload_buffers:
      
        [SEGMENT 6784]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
        [SEGMENT 6785]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8
      
      The next pointer is the same but prev pointer has changed.  It means
      that buffer_head has next pointer from one list but prev pointer from
      another.  Such modification can be made several times.  And, finally, it
      can be resulted in various issues: (1) segctor hanging, (2) segctor
      crashing, (3) file system metadata corruption.
      
      FIX:
      This patch adds:
      
      (1) setting of BH_Async_Write flag in nilfs_segctor_prepare_write()
          for every proccessed dirty block;
      
      (2) checking of BH_Async_Write flag in
          nilfs_lookup_dirty_data_buffers() and
          nilfs_lookup_dirty_node_buffers();
      
      (3) clearing of BH_Async_Write flag in nilfs_segctor_complete_write(),
          nilfs_abort_logs(), nilfs_forget_buffer(), nilfs_clear_dirty_page().
      Reported-by: NJerome Poulin <jeromepoulin@gmail.com>
      Reported-by: NAnton Eliasson <devel@antoneliasson.se>
      Cc: Paul Fertser <fercerpav@gmail.com>
      Cc: ARAI Shun-ichi <hermes@ceres.dti.ne.jp>
      Cc: Piotr Szymaniak <szarpaj@grubelek.pl>
      Cc: Juan Barry Manuel Canham <Linux@riotingpacifist.net>
      Cc: Zahid Chowdhury <zahid.chowdhury@starsolutions.com>
      Cc: Elmer Zhang <freeboy6716@gmail.com>
      Cc: Kenneth Langga <klangga@gmail.com>
      Signed-off-by: NVyacheslav Dubeyko <slava@dubeyko.com>
      Acked-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7f42ec39
  10. 04 7月, 2013 1 次提交
  11. 31 7月, 2012 1 次提交
    • J
      nilfs2: Convert to new freezing mechanism · 2c22b337
      Jan Kara 提交于
      We change nilfs_page_mkwrite() to provide proper freeze protection for
      writeable page faults (we must wait for frozen filesystem even if the
      page is fully mapped).
      
      We remove all vfs_check_frozen() checks since they are now handled by
      the generic code.
      
      CC: linux-nilfs@vger.kernel.org
      CC: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2c22b337
  12. 21 6月, 2012 1 次提交
    • R
      nilfs2: ensure proper cache clearing for gc-inodes · fbb24a3a
      Ryusuke Konishi 提交于
      A gc-inode is a pseudo inode used to buffer the blocks to be moved by
      garbage collection.
      
      Block caches of gc-inodes must be cleared every time a garbage collection
      function (nilfs_clean_segments) completes.  Otherwise, stale blocks
      buffered in the caches may be wrongly reused in successive calls of the GC
      function.
      
      For user files, this is not a problem because their gc-inodes are
      distinguished by a checkpoint number as well as an inode number.  They
      never buffer different blocks if either an inode number, a checkpoint
      number, or a block offset differs.
      
      However, gc-inodes of sufile, cpfile and DAT file can store different data
      for the same block offset.  Thus, the nilfs_clean_segments function can
      move incorrect block for these meta-data files if an old block is cached.
      I found this is really causing meta-data corruption in nilfs.
      
      This fixes the issue by ensuring cache clear of gc-inodes and resolves
      reported GC problems including checkpoint file corruption, b-tree
      corruption, and the following warning during GC.
      
        nilfs_palloc_freev: entry number 307234 already freed.
        ...
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Tested-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: <stable@vger.kernel.org>	[2.6.37+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fbb24a3a
  13. 22 11月, 2011 1 次提交
    • T
      freezer: unexport refrigerator() and update try_to_freeze() slightly · a0acae0e
      Tejun Heo 提交于
      There is no reason to export two functions for entering the
      refrigerator.  Calling refrigerator() instead of try_to_freeze()
      doesn't save anything noticeable or removes any race condition.
      
      * Rename refrigerator() to __refrigerator() and make it return bool
        indicating whether it scheduled out for freezing.
      
      * Update try_to_freeze() to return bool and relay the return value of
        __refrigerator() if freezing().
      
      * Convert all refrigerator() users to try_to_freeze().
      
      * Update documentation accordingly.
      
      * While at it, add might_sleep() to try_to_freeze().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Samuel Ortiz <samuel@sortiz.org>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Christoph Hellwig <hch@infradead.org>
      a0acae0e
  14. 11 6月, 2011 1 次提交
  15. 10 5月, 2011 7 次提交
  16. 09 3月, 2011 7 次提交
  17. 02 3月, 2011 1 次提交
  18. 10 1月, 2011 3 次提交
  19. 27 10月, 2010 1 次提交
    • M
      mm: add account_page_writeback() · f629d1c9
      Michael Rubin 提交于
      To help developers and applications gain visibility into writeback
      behaviour this patch adds two counters to /proc/vmstat.
      
        # grep nr_dirtied /proc/vmstat
        nr_dirtied 3747
        # grep nr_written /proc/vmstat
        nr_written 3618
      
      These entries allow user apps to understand writeback behaviour over time
      and learn how it is impacting their performance.  Currently there is no
      way to inspect dirty and writeback speed over time.  It's not possible for
      nr_dirty/nr_writeback.
      
      These entries are necessary to give visibility into writeback behaviour.
      We have /proc/diskstats which lets us understand the io in the block
      layer.  We have blktrace for more in depth understanding.  We have
      e2fsprogs and debugsfs to give insight into the file systems behaviour,
      but we don't offer our users the ability understand what writeback is
      doing.  There is no way to know how active it is over the whole system, if
      it's falling behind or to quantify it's efforts.  With these values
      exported users can easily see how much data applications are sending
      through writeback and also at what rates writeback is processing this
      data.  Comparing the rates of change between the two allow developers to
      see when writeback is not able to keep up with incoming traffic and the
      rate of dirty memory being sent to the IO back end.  This allows folks to
      understand their io workloads and track kernel issues.  Non kernel
      engineers at Google often use these counters to solve puzzling performance
      problems.
      
      Patch #4 adds a pernode vmstat file with nr_dirtied and nr_written
      
      Patch #5 add writeback thresholds to /proc/vmstat
      
      Currently these values are in debugfs. But they should be promoted to
      /proc since they are useful for developers who are writing databases
      and file servers and are not debugging the kernel.
      
      The output is as below:
      
       # grep threshold /proc/vmstat
       nr_pages_dirty_threshold 409111
       nr_pages_dirty_background_threshold 818223
      
      This patch:
      
      This allows code outside of the mm core to safely manipulate page
      writeback state and not worry about the other accounting.  Not using these
      routines means that some code will lose track of the accounting and we get
      bugs.
      
      Modify nilfs2 to use interface.
      Signed-off-by: NMichael Rubin <mrubin@google.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Jiro SEKIBA <jir@unicus.jp>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f629d1c9
  20. 23 10月, 2010 1 次提交