1. 21 11月, 2014 4 次提交
    • F
      Btrfs: avoid premature -ENOMEM in clear_extent_bit() · c7bc6319
      Filipe Manana 提交于
      We try to allocate an extent state structure before acquiring the extent
      state tree's spinlock as we might need a new one later and therefore avoid
      doing later an atomic allocation while holding the tree's spinlock. However
      we returned -ENOMEM if that initial non-atomic allocation failed, which is
      a bit excessive since we might end up not needing the pre-allocated extent
      state at all - for the case where the tree doesn't have any extent states
      that cover the input range and cover too any other range. Therefore don't
      return -ENOMEM if that pre-allocation fails.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      c7bc6319
    • F
      Btrfs: avoid returning -ENOMEM in convert_extent_bit() too early · c8fd3de7
      Filipe Manana 提交于
      We try to allocate an extent state before acquiring the tree's spinlock
      just in case we end up needing to split an existing extent state into two.
      If that allocation failed, we would return -ENOMEM.
      However, our only single caller (transaction/log commit code), passes in
      an extent state that was cached from a call to find_first_extent_bit() and
      that has a very high chance to match exactly the input range (always true
      for a transaction commit and very often, but not always, true for a log
      commit) - in this case we end up not needing at all that initial extent
      state used for an eventual split. Therefore just don't return -ENOMEM if
      we can't allocate the temporary extent state, since we might not need it
      at all, and if we end up needing one, we'll do it later anyway.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      c8fd3de7
    • F
      Btrfs: make find_first_extent_bit be able to cache any state · e38e2ed7
      Filipe Manana 提交于
      Right now the only caller of find_first_extent_bit() that is interested
      in caching extent states (transaction or log commit), never gets an extent
      state cached. This is because find_first_extent_bit() only caches states
      that have at least one of the flags EXTENT_IOBITS or EXTENT_BOUNDARY, and
      the transaction/log commit caller always passes a tree that doesn't have
      ever extent states with any of those flags (they can only have one of the
      following flags: EXTENT_DIRTY, EXTENT_NEW or EXTENT_NEED_WAIT).
      
      This change together with the following one in the patch series (titled
      "Btrfs: avoid returning -ENOMEM in convert_extent_bit() too early") will
      help reduce significantly the chances of calls to convert_extent_bit()
      fail with -ENOMEM when called from the transaction/log commit code.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      e38e2ed7
    • F
      Btrfs: set page and mapping error on compressed write failure · 704de49d
      Filipe Manana 提交于
      If we fail in submit_compressed_extents() before calling btrfs_submit_compressed_write(),
      we start and end the writeback for the pages (clear their dirty flag, unlock them, etc)
      but we don't tag the pages, nor the inode's mapping, with an error. This makes it
      impossible for a caller of filemap_fdatawait_range() (fsync, or transaction commit
      for e.g.) know that there was an error.
      
      Note that the return value of submit_compressed_extents() is useless, as that function
      is executed by a workqueue task and not directly by the fill_delalloc callback. This
      means the writepage/s callbacks of the inode's address space operations don't get that
      return value.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      704de49d
  2. 04 10月, 2014 3 次提交
    • F
      Btrfs: be aware of btree inode write errors to avoid fs corruption · 656f30db
      Filipe Manana 提交于
      While we have a transaction ongoing, the VM might decide at any time
      to call btree_inode->i_mapping->a_ops->writepages(), which will start
      writeback of dirty pages belonging to btree nodes/leafs. This call
      might return an error or the writeback might finish with an error
      before we attempt to commit the running transaction. If this happens,
      we might have no way of knowing that such error happened when we are
      committing the transaction - because the pages might no longer be
      marked dirty nor tagged for writeback (if a subsequent modification
      to the extent buffer didn't happen before the transaction commit) which
      makes filemap_fdata[write|wait]_range unable to find such pages (even
      if they're marked with SetPageError).
      So if this happens we must abort the transaction, otherwise we commit
      a super block with btree roots that point to btree nodes/leafs whose
      content on disk is invalid - either garbage or the content of some
      node/leaf from a past generation that got cowed or deleted and is no
      longer valid (for this later case we end up getting error messages like
      "parent transid verify failed on 10826481664 wanted 25748 found 29562"
      when reading btree nodes/leafs from disk).
      
      Note that setting and checking AS_EIO/AS_ENOSPC in the btree inode's
      i_mapping would not be enough because we need to distinguish between
      log tree extents (not fatal) vs non-log tree extents (fatal) and
      because the next call to filemap_fdatawait_range() will catch and clear
      such errors in the mapping - and that call might be from a log sync and
      not from a transaction commit, which means we would not know about the
      error at transaction commit time. Also, checking for the eb flag
      EXTENT_BUFFER_IOERR at transaction commit time isn't done and would
      not be completely reliable, as the eb might be removed from memory and
      read back when trying to get it, which clears that flag right before
      reading the eb's pages from disk, making us not know about the previous
      write error.
      
      Using the new 3 flags for the btree inode also makes us achieve the
      goal of AS_EIO/AS_ENOSPC when writepages() returns success, started
      writeback for all dirty pages and before filemap_fdatawait_range() is
      called, the writeback for all dirty pages had already finished with
      errors - because we were not using AS_EIO/AS_ENOSPC,
      filemap_fdatawait_range() would return success, as it could not know
      that writeback errors happened (the pages were no longer tagged for
      writeback).
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      656f30db
    • L
      Btrfs: fix crash of btrfs_release_extent_buffer_page · 81465028
      Liu Bo 提交于
      This is actually inspired by Filipe's patch.  When write_one_eb() fails on
      submit_extent_page(), it'll give up writing this eb and mark it with
      EXTENT_BUFFER_IOERR.  So if it's not the last page that encounter the failure,
      there are some left pages which remain DIRTY, and if a later COW on this eb
      happens, ie. eb is COWed and freed, it'd run into BUG_ON in
      btrfs_release_extent_buffer_page() for the DIRTY page, ie. BUG_ON(PageDirty(page));
      
      This adds the missing clear_page_dirty_for_io() for the rest pages of eb.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      81465028
    • F
      Btrfs: add missing end_page_writeback on submit_extent_page failure · 55e3bd2e
      Filipe Manana 提交于
      If submit_extent_page() fails in write_one_eb(), we end up with the current
      page not marked dirty anymore, unlocked and marked for writeback. But we never
      end up calling end_page_writeback() against the page, which will make calls to
      filemap_fdatawait_range (e.g. at transaction commit time) hang forever waiting
      for the writeback bit to be cleared from the page.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      55e3bd2e
  3. 02 10月, 2014 2 次提交
  4. 18 9月, 2014 13 次提交
  5. 21 8月, 2014 1 次提交
    • L
      Btrfs: fix crash on endio of reading corrupted block · 38c1c2e4
      Liu Bo 提交于
      The crash is
      
      ------------[ cut here ]------------
      kernel BUG at fs/btrfs/extent_io.c:2124!
      [...]
      Workqueue: btrfs-endio normal_work_helper [btrfs]
      RIP: 0010:[<ffffffffa02d6055>]  [<ffffffffa02d6055>] end_bio_extent_readpage+0xb45/0xcd0 [btrfs]
      
      This is in fact a regression.
      
      It is because we forgot to increase @offset properly in reading corrupted block,
      so that the @offset remains, and this leads to checksum errors while reading
      left blocks queued up in the same bio, and then ends up with hiting the above
      BUG_ON.
      Reported-by: NChris Murphy <lists@colorremedies.com>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      38c1c2e4
  6. 19 8月, 2014 1 次提交
  7. 16 7月, 2014 1 次提交
    • N
      sched: Remove proliferation of wait_on_bit() action functions · 74316201
      NeilBrown 提交于
      The current "wait_on_bit" interface requires an 'action'
      function to be provided which does the actual waiting.
      There are over 20 such functions, many of them identical.
      Most cases can be satisfied by one of just two functions, one
      which uses io_schedule() and one which just uses schedule().
      
      So:
       Rename wait_on_bit and        wait_on_bit_lock to
              wait_on_bit_action and wait_on_bit_lock_action
       to make it explicit that they need an action function.
      
       Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
       which are *not* given an action function but implicitly use
       a standard one.
       The decision to error-out if a signal is pending is now made
       based on the 'mode' argument rather than being encoded in the action
       function.
      
       All instances of the old wait_on_bit and wait_on_bit_lock which
       can use the new version have been changed accordingly and their
       action functions have been discarded.
       wait_on_bit{_lock} does not return any specific error code in the
       event of a signal so the caller must check for non-zero and
       interpolate their own error code as appropriate.
      
      The wait_on_bit() call in __fscache_wait_on_invalidate() was
      ambiguous as it specified TASK_UNINTERRUPTIBLE but used
      fscache_wait_bit_interruptible as an action function.
      David Howells confirms this should be uniformly
      "uninterruptible"
      
      The main remaining user of wait_on_bit{,_lock}_action is NFS
      which needs to use a freezer-aware schedule() call.
      
      A comment in fs/gfs2/glock.c notes that having multiple 'action'
      functions is useful as they display differently in the 'wchan'
      field of 'ps'. (and /proc/$PID/wchan).
      As the new bit_wait{,_io} functions are tagged "__sched", they
      will not show up at all, but something higher in the stack.  So
      the distinction will still be visible, only with different
      function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
      gfs2/glock.c case).
      
      Since first version of this patch (against 3.15) two new action
      functions appeared, on in NFS and one in CIFS.  CIFS also now
      uses an action function that makes the same freezer aware
      schedule call as NFS.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
      Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brownSigned-off-by: NIngo Molnar <mingo@kernel.org>
      74316201
  8. 14 6月, 2014 1 次提交
    • E
      btrfs: fix use of uninit "ret" in end_extent_writepage() · 3e2426bd
      Eric Sandeen 提交于
      If this condition in end_extent_writepage() is false:
      
      	if (tree->ops && tree->ops->writepage_end_io_hook)
      
      we will then test an uninitialized "ret" at:
      
      	ret = ret < 0 ? ret : -EIO;
      
      The test for ret is for the case where ->writepage_end_io_hook
      failed, and we'd choose that ret as the error; but if
      there is no ->writepage_end_io_hook, nothing sets ret.
      
      Initializing ret to 0 should be sufficient; if
      writepage_end_io_hook wasn't set, (!uptodate) means
      non-zero err was passed in, so we choose -EIO in that case.
      Signed-of-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      3e2426bd
  9. 13 6月, 2014 1 次提交
  10. 10 6月, 2014 6 次提交
    • C
      Btrfs: split up __extent_writepage to lower stack usage · 40f76580
      Chris Mason 提交于
      __extent_writepage has two unrelated parts.  First it does the delayed
      allocation dance and second it does the mapping and IO for the page
      we're actually writing.
      
      This splits it up into those two parts so the stack from one doesn't
      impact the stack from the other.
      Signed-off-by: NChris Mason <clm@fb.com>
      40f76580
    • C
      Btrfs: cut down stack usage in btree_write_cache_pages · 0e378df1
      Chris Mason 提交于
      This adds noinline_for_stack to two helpers used by
      btree_write_cache_pages.  It shaves us down from 424 bytes on the
      stack to 280.
      Signed-off-by: NChris Mason <clm@fb.com>
      0e378df1
    • C
      Btrfs: fix double free in find_lock_delalloc_range · 7d788742
      Chris Mason 提交于
      We need to NULL the cached_state after freeing it, otherwise
      we might free it again if find_delalloc_range doesn't find anything.
      Signed-off-by: NChris Mason <clm@fb.com>
      cc: stable@vger.kernel.org
      7d788742
    • J
      Btrfs: add sanity tests for new qgroup accounting code · faa2dbf0
      Josef Bacik 提交于
      This exercises the various parts of the new qgroup accounting code.  We do some
      basic stuff and do some things with the shared refs to make sure all that code
      works.  I had to add a bunch of infrastructure because I needed to be able to
      insert items into a fake tree without having to do all the hard work myself,
      hopefully this will be usefull in the future.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      faa2dbf0
    • L
      Btrfs: mark mapping with error flag to report errors to userspace · 5dca6eea
      Liu Bo 提交于
      According to commit 865ffef3
      (fs: fix fsync() error reporting),
      it's not stable to just check error pages because pages can be
      truncated or invalidated, we should also mark mapping with error
      flag so that a later fsync can catch the error.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      5dca6eea
    • F
      Btrfs: fix hang on error (such as ENOSPC) when writing extent pages · 61391d56
      Filipe Manana 提交于
      When running low on available disk space and having several processes
      doing buffered file IO, I got the following trace in dmesg:
      
      [ 4202.720152] INFO: task kworker/u8:1:5450 blocked for more than 120 seconds.
      [ 4202.720401]       Not tainted 3.13.0-fdm-btrfs-next-26+ #1
      [ 4202.720596] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 4202.720874] kworker/u8:1    D 0000000000000001     0  5450      2 0x00000000
      [ 4202.720904] Workqueue: btrfs-flush_delalloc normal_work_helper [btrfs]
      [ 4202.720908]  ffff8801f62ddc38 0000000000000082 ffff880203ac2490 00000000001d3f40
      [ 4202.720913]  ffff8801f62ddfd8 00000000001d3f40 ffff8800c4f0c920 ffff880203ac2490
      [ 4202.720918]  00000000001d4a40 ffff88020fe85a40 ffff88020fe85ab8 0000000000000001
      [ 4202.720922] Call Trace:
      [ 4202.720931]  [<ffffffff816a3cb9>] schedule+0x29/0x70
      [ 4202.720950]  [<ffffffffa01ec48d>] btrfs_start_ordered_extent+0x6d/0x110 [btrfs]
      [ 4202.720956]  [<ffffffff8108e620>] ? bit_waitqueue+0xc0/0xc0
      [ 4202.720972]  [<ffffffffa01ec559>] btrfs_run_ordered_extent_work+0x29/0x40 [btrfs]
      [ 4202.720988]  [<ffffffffa0201987>] normal_work_helper+0x137/0x2c0 [btrfs]
      [ 4202.720994]  [<ffffffff810680e5>] process_one_work+0x1f5/0x530
      (...)
      [ 4202.721027] 2 locks held by kworker/u8:1/5450:
      [ 4202.721028]  #0:  (%s-%s){++++..}, at: [<ffffffff81068083>] process_one_work+0x193/0x530
      [ 4202.721037]  #1:  ((&work->normal_work)){+.+...}, at: [<ffffffff81068083>] process_one_work+0x193/0x530
      [ 4202.721054] INFO: task btrfs:7891 blocked for more than 120 seconds.
      [ 4202.721258]       Not tainted 3.13.0-fdm-btrfs-next-26+ #1
      [ 4202.721444] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 4202.721699] btrfs           D 0000000000000001     0  7891   7890 0x00000001
      [ 4202.721704]  ffff88018c2119e8 0000000000000086 ffff8800a33d2490 00000000001d3f40
      [ 4202.721710]  ffff88018c211fd8 00000000001d3f40 ffff8802144b0000 ffff8800a33d2490
      [ 4202.721714]  ffff8800d8576640 ffff88020fe85bc0 ffff88020fe85bc8 7fffffffffffffff
      [ 4202.721718] Call Trace:
      [ 4202.721723]  [<ffffffff816a3cb9>] schedule+0x29/0x70
      [ 4202.721727]  [<ffffffff816a2ebc>] schedule_timeout+0x1dc/0x270
      [ 4202.721732]  [<ffffffff8109bd79>] ? mark_held_locks+0xb9/0x140
      [ 4202.721736]  [<ffffffff816a90c0>] ? _raw_spin_unlock_irq+0x30/0x40
      [ 4202.721740]  [<ffffffff8109bf0d>] ? trace_hardirqs_on_caller+0x10d/0x1d0
      [ 4202.721744]  [<ffffffff816a488f>] wait_for_completion+0xdf/0x120
      [ 4202.721749]  [<ffffffff8107fa90>] ? try_to_wake_up+0x310/0x310
      [ 4202.721765]  [<ffffffffa01ebee4>] btrfs_wait_ordered_extents+0x1f4/0x280 [btrfs]
      [ 4202.721781]  [<ffffffffa020526e>] btrfs_mksubvol.isra.62+0x30e/0x5a0 [btrfs]
      [ 4202.721786]  [<ffffffff8108e620>] ? bit_waitqueue+0xc0/0xc0
      [ 4202.721799]  [<ffffffffa02056a9>] btrfs_ioctl_snap_create_transid+0x1a9/0x1b0 [btrfs]
      [ 4202.721813]  [<ffffffffa020583a>] btrfs_ioctl_snap_create_v2+0x10a/0x170 [btrfs]
      (...)
      
      It turns out that extent_io.c:__extent_writepage(), which ends up being called
      through filemap_fdatawrite_range() in btrfs_start_ordered_extent(), was getting
      -ENOSPC when calling the fill_delalloc callback. In this situation, it returned
      without the writepage_end_io_hook callback (inode.c:btrfs_writepage_end_io_hook)
      ever being called for the respective page, which prevents the ordered extent's
      bytes_left count from ever reaching 0, and therefore a finish_ordered_fn work
      is never queued into the endio_write_workers queue. This makes the task that
      called btrfs_start_ordered_extent() hang forever on the wait queue of the ordered
      extent.
      
      This is fairly easy to reproduce using a small filesystem and fsstress on
      a quad core vm:
      
          mkfs.btrfs -f -b `expr 2100 \* 1024 \* 1024` /dev/sdd
          mount /dev/sdd /mnt
      
          fsstress -p 6 -d /mnt -n 100000 -x \
              "btrfs subvolume snapshot -r /mnt /mnt/mysnap" \
      	    -f allocsp=0 \
      	    -f bulkstat=0 \
      	    -f bulkstat1=0 \
      	    -f chown=0 \
      	    -f creat=1 \
      	    -f dread=0 \
      	    -f dwrite=0 \
      	    -f fallocate=1 \
      	    -f fdatasync=0 \
      	    -f fiemap=0 \
      	    -f freesp=0 \
      	    -f fsync=0 \
      	    -f getattr=0 \
      	    -f getdents=0 \
      	    -f link=0 \
      	    -f mkdir=0 \
      	    -f mknod=0 \
      	    -f punch=1 \
      	    -f read=0 \
      	    -f readlink=0 \
      	    -f rename=0 \
      	    -f resvsp=0 \
      	    -f rmdir=0 \
      	    -f setxattr=0 \
      	    -f stat=0 \
      	    -f symlink=0 \
      	    -f sync=0 \
      	    -f truncate=1 \
      	    -f unlink=0 \
      	    -f unresvsp=0 \
      	    -f write=4
      
      So just ensure that if an error happens while writing the extent page
      we call the writepage_end_io_hook callback. Also make it return the
      error code and ensure the caller (extent_write_cache_pages) processes
      all pages in the page vector even if an error happens only for some
      of them, so that ordered extents end up released.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      61391d56
  11. 05 6月, 2014 1 次提交
    • M
      mm: non-atomically mark page accessed during page cache allocation where possible · 2457aec6
      Mel Gorman 提交于
      aops->write_begin may allocate a new page and make it visible only to have
      mark_page_accessed called almost immediately after.  Once the page is
      visible the atomic operations are necessary which is noticable overhead
      when writing to an in-memory filesystem like tmpfs but should also be
      noticable with fast storage.  The objective of the patch is to initialse
      the accessed information with non-atomic operations before the page is
      visible.
      
      The bulk of filesystems directly or indirectly use
      grab_cache_page_write_begin or find_or_create_page for the initial
      allocation of a page cache page.  This patch adds an init_page_accessed()
      helper which behaves like the first call to mark_page_accessed() but may
      called before the page is visible and can be done non-atomically.
      
      The primary APIs of concern in this care are the following and are used
      by most filesystems.
      
      	find_get_page
      	find_lock_page
      	find_or_create_page
      	grab_cache_page_nowait
      	grab_cache_page_write_begin
      
      All of them are very similar in detail to the patch creates a core helper
      pagecache_get_page() which takes a flags parameter that affects its
      behavior such as whether the page should be marked accessed or not.  Then
      old API is preserved but is basically a thin wrapper around this core
      function.
      
      Each of the filesystems are then updated to avoid calling
      mark_page_accessed when it is known that the VM interfaces have already
      done the job.  There is a slight snag in that the timing of the
      mark_page_accessed() has now changed so in rare cases it's possible a page
      gets to the end of the LRU as PageReferenced where as previously it might
      have been repromoted.  This is expected to be rare but it's worth the
      filesystem people thinking about it in case they see a problem with the
      timing change.  It is also the case that some filesystems may be marking
      pages accessed that previously did not but it makes sense that filesystems
      have consistent behaviour in this regard.
      
      The test case used to evaulate this is a simple dd of a large file done
      multiple times with the file deleted on each iterations.  The size of the
      file is 1/10th physical memory to avoid dirty page balancing.  In the
      async case it will be possible that the workload completes without even
      hitting the disk and will have variable results but highlight the impact
      of mark_page_accessed for async IO.  The sync results are expected to be
      more stable.  The exception is tmpfs where the normal case is for the "IO"
      to not hit the disk.
      
      The test machine was single socket and UMA to avoid any scheduling or NUMA
      artifacts.  Throughput and wall times are presented for sync IO, only wall
      times are shown for async as the granularity reported by dd and the
      variability is unsuitable for comparison.  As async results were variable
      do to writback timings, I'm only reporting the maximum figures.  The sync
      results were stable enough to make the mean and stddev uninteresting.
      
      The performance results are reported based on a run with no profiling.
      Profile data is based on a separate run with oprofile running.
      
      async dd
                                          3.15.0-rc3            3.15.0-rc3
                                             vanilla           accessed-v2
      ext3    Max      elapsed     13.9900 (  0.00%)     11.5900 ( 17.16%)
      tmpfs	Max      elapsed      0.5100 (  0.00%)      0.4900 (  3.92%)
      btrfs   Max      elapsed     12.8100 (  0.00%)     12.7800 (  0.23%)
      ext4	Max      elapsed     18.6000 (  0.00%)     13.3400 ( 28.28%)
      xfs	Max      elapsed     12.5600 (  0.00%)      2.0900 ( 83.36%)
      
      The XFS figure is a bit strange as it managed to avoid a worst case by
      sheer luck but the average figures looked reasonable.
      
              samples percentage
      ext3       86107    0.9783  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
      ext3       23833    0.2710  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
      ext3        5036    0.0573  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
      ext4       64566    0.8961  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
      ext4        5322    0.0713  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
      ext4        2869    0.0384  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
      xfs        62126    1.7675  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
      xfs         1904    0.0554  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
      xfs          103    0.0030  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
      btrfs      10655    0.1338  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
      btrfs       2020    0.0273  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
      btrfs        587    0.0079  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
      tmpfs      59562    3.2628  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
      tmpfs       1210    0.0696  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
      tmpfs         94    0.0054  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
      
      [akpm@linux-foundation.org: don't run init_page_accessed() against an uninitialised pointer]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Tested-by: NPrabhakar Lad <prabhakar.csengg@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2457aec6
  12. 18 4月, 2014 1 次提交
  13. 08 4月, 2014 1 次提交
  14. 07 4月, 2014 1 次提交
    • J
      Btrfs: don't clear uptodate if the eb is under IO · a26e8c9f
      Josef Bacik 提交于
      So I have an awful exercise script that will run snapshot, balance and
      send/receive in parallel.  This sometimes would crash spectacularly and when it
      came back up the fs would be completely hosed.  Turns out this is because of a
      bad interaction of balance and send/receive.  Send will hold onto its entire
      path for the whole send, but its blocks could get relocated out from underneath
      it, and because it doesn't old tree locks theres nothing to keep this from
      happening.  So it will go to read in a slot with an old transid, and we could
      have re-allocated this block for something else and it could have a completely
      different transid.  But because we think it is invalid we clear uptodate and
      re-read in the block.  If we do this before we actually write out the new block
      we could write back stale data to the fs, and boom we're screwed.
      
      Now we definitely need to fix this disconnect between send and balance, but we
      really really need to not allow ourselves to accidently read in stale data over
      new data.  So make sure we check if the extent buffer is not under io before
      clearing uptodate, this will kick back EIO to the caller instead of reading in
      stale data and keep us from corrupting the fs.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      a26e8c9f
  15. 11 3月, 2014 2 次提交
  16. 29 1月, 2014 1 次提交