1. 30 5月, 2012 3 次提交
    • L
      Btrfs: use fastpath in extent state ops as much as possible · d1ac6e41
      Liu Bo 提交于
      Fully utilize our extent state's new helper functions to use
      fastpath as much as possible.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Reviewed-by: NJosef Bacik <josef@redhat.com>
      d1ac6e41
    • J
      Btrfs: finish ordered extents in their own thread · 5fd02043
      Josef Bacik 提交于
      We noticed that the ordered extent completion doesn't really rely on having
      a page and that it could be done independantly of ending the writeback on a
      page.  This patch makes us not do the threaded endio stuff for normal
      buffered writes and direct writes so we can end page writeback as soon as
      possible (in irq context) and only start threads to do the ordered work when
      it is actually done.  Compression needs to be reworked some to take
      advantage of this as well, but atm it has to do a find_get_page in its endio
      handler so it must be done in its own thread.  This makes direct writes
      quite a bit faster.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      5fd02043
    • J
      Btrfs: fix compile warnings in extent_io.c · d7dbe9e7
      Josef Bacik 提交于
      These warnings are bogus since we will always have at least one page in an
      eb, but to make the compiler happy just set ret = 0 in these two cases.
      Thanks,
      Btrfs: fix compile warnings in extent_io.c
      
      These warnings are bogus since we will always have at least one page in an
      eb, but to make the compiler happy just set ret = 0 in these two cases.
      Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      d7dbe9e7
  2. 11 5月, 2012 4 次提交
  3. 05 5月, 2012 1 次提交
    • J
      Btrfs: fix page leak when allocing extent buffers · 17de39ac
      Josef Bacik 提交于
      If we happen to alloc a extent buffer and then alloc a page and notice that
      page is already attached to an extent buffer, we will only unlock it and
      free our existing eb.  Any pages currently attached to that eb will be
      properly freed, but we don't do the page_cache_release() on the page where
      we noticed the other extent buffer which can cause us to leak pages and I
      hope cause the weird issues we've been seeing in this area.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      17de39ac
  4. 19 4月, 2012 3 次提交
    • J
      Btrfs: always store the mirror we read the eb from · 5cf1ab56
      Josef Bacik 提交于
      A user reported a panic where we were trying to fix a bad mirror but the
      mirror number we were giving was 0, which is invalid.  This is because we
      don't do the transid verification until after the read, so as far as the
      read code is concerned the read was a success.  So instead store the mirror
      we read from so that if there is some failure post read we know which mirror
      to try next and which mirror needs to be fixed if we find a good copy of the
      block.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      5cf1ab56
    • L
      Btrfs: avoid possible use-after-free in clear_extent_bit() · cdc6a395
      Li Zefan 提交于
      clear_extent_bit()
      {
          next_node = rb_next(&state->rb_node);
          ...
          clear_state_bit(state);  <-- this may free next_node
          if (next_node) {
              state = rb_entry(next_node);
              ...
          }
      }
      
      clear_state_bit() calls merge_state() which may free the next node
      of the passing extent_state, so clear_extent_bit() may end up
      referencing freed memory.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      cdc6a395
    • L
      Btrfs: retrurn void from clear_state_bit · 8e52acf7
      Li Zefan 提交于
      Currently it returns a set of bits that were cleared, but this return
      value is not used at all.
      
      Moreover it doesn't seem to be useful, because we may clear the bits
      of a few extent_states, but only the cleared bits of last one is
      returned.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      8e52acf7
  5. 13 4月, 2012 2 次提交
  6. 27 3月, 2012 8 次提交
    • J
      Btrfs: deal with read errors on extent buffers differently · ea466794
      Josef Bacik 提交于
      Since we need to read and write extent buffers in their entirety we can't use
      the normal bio_readpage_error stuff since it only works on a per page basis.  So
      instead make it so that if we see an io error in endio we just mark the eb as
      having an IO error and then in btree_read_extent_buffer_pages we will manually
      try other mirrors and then overwrite the bad mirror if we find a good copy.
      This works with larger than page size blocks.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      ea466794
    • C
      Btrfs: loop waiting on writeback · a098d8e8
      Chris Mason 提交于
      lock_extent_buffer_for_io needs to loop around and make sure the
      writeback bits are not set.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      a098d8e8
    • J
      Btrfs: ensure an entire eb is written at once · 0b32f4bb
      Josef Bacik 提交于
      This patch simplifies how we track our extent buffers.  Previously we could exit
      writepages with only having written half of an extent buffer, which meant we had
      to track the state of the pages and the state of the extent buffers differently.
      Now we only read in entire extent buffers and write out entire extent buffers,
      this allows us to simply set bits in our bflags to indicate the state of the eb
      and we no longer have to do things like track uptodate with our iotree.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      0b32f4bb
    • J
      Btrfs: introduce mark_extent_buffer_accessed · 5df4235e
      Josef Bacik 提交于
      Because an eb can have multiple pages we need to make sure that all pages within
      the eb are markes as accessed, since releasepage can be called against any page
      in the eb.  This will keep us from possibly evicting hot eb's when we're doing
      larger than pagesize eb's.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      5df4235e
    • J
      Btrfs: introduce free_extent_buffer_stale · 3083ee2e
      Josef Bacik 提交于
      Because btrfs cow's we can end up with extent buffers that are no longer
      necessary just sitting around in memory.  So instead of evicting these pages, we
      could end up evicting things we actually care about.  Thus we have
      free_extent_buffer_stale for use when we are freeing tree blocks.  This will
      make it so that the ref for the eb being in the radix tree is dropped as soon as
      possible and then is freed when the refcount hits 0 instead of waiting to be
      released by releasepage.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      3083ee2e
    • J
      Btrfs: only use the existing eb if it's count isn't 0 · 115391d2
      Josef Bacik 提交于
      We can run into a problem where we find an eb for our existing page already on
      the radix tree but it has a ref count of 0.  It hasn't yet been removed by RCU
      yet so this can cause issues where we will use the EB after free.  So do
      atomic_inc_not_zero on the exists->refs and if it is zero just do
      synchronize_rcu() and try again.  We won't have to worry about new allocators
      coming in since they will block on the page lock at this point.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      115391d2
    • J
      Btrfs: set page->private to the eb · 4f2de97a
      Josef Bacik 提交于
      We spend a lot of time looking up extent buffers from pages when we could just
      store the pointer to the eb the page is associated with in page->private.  This
      patch does just that, and it makes things a little simpler and reduces a bit of
      CPU overhead involved with doing metadata IO.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      4f2de97a
    • C
      Btrfs: allow metadata blocks larger than the page size · 727011e0
      Chris Mason 提交于
      A few years ago the btrfs code to support blocks lager than
      the page size was disabled to fix a few corner cases in the
      page cache handling.  This fixes the code to properly support
      large metadata blocks again.
      
      Since current kernels will crash early and often with larger
      metadata blocks, this adds an incompat bit so that older kernels
      can't mount it.
      
      This also does away with different blocksizes for nodes and leaves.
      You get a single block size for all tree blocks.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      727011e0
  7. 22 3月, 2012 8 次提交
  8. 20 3月, 2012 1 次提交
  9. 23 2月, 2012 1 次提交
    • C
      Btrfs: clear the extent uptodate bits during parent transid failures · 50653190
      Chris Mason 提交于
      If btrfs reads a block and finds a parent transid mismatch, it clears
      the uptodate flags on the extent buffer, and the pages inside it.  But
      we only clear the uptodate bits in the state tree if the block straddles
      more than one page.
      
      This is from an old optimization from to reduce contention on the extent
      state tree.  But it is buggy because the code that retries a read from
      a different copy of the block is going to find the uptodate state bits
      set and skip the IO.
      
      The end result of the bug is that we'll never actually read the good
      copy (if there is one).
      
      The fix here is to always clear the uptodate state bits, which is safe
      because this code is only called when the parent transid fails.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      50653190
  10. 21 2月, 2012 1 次提交
  11. 17 2月, 2012 4 次提交
  12. 15 2月, 2012 1 次提交
    • J
      btrfs: delalloc for page dirtied out-of-band in fixup worker · 87826df0
      Jeff Mahoney 提交于
       We encountered an issue that was easily observable on s/390 systems but
       could really happen anywhere. The timing just seemed to hit reliably
       on s/390 with limited memory.
      
       The gist is that when an unexpected set_page_dirty() happened, we'd
       run into the BUG() in btrfs_writepage_fixup_worker since it wasn't
       properly set up for delalloc.
      
       This patch does the following:
       - Performs the missing delalloc in the fixup worker
       - Allow the start hook to return -EBUSY which informs __extent_writepage
         that it should mark the page skipped and not to redirty it. This is
         required since the fixup worker can fail with -ENOSPC and the page
         will have already been redirtied. That causes an Oops in
         drop_outstanding_extents later. Retrying the fixup worker could
         lead to an infinite loop. Deferring the page redirty also saves us
         some cycles since the page would be stuck in a resubmit-redirty loop
         until the fixup worker completes. It's not harmful, just wasteful.
       - If the fixup worker fails, we mark the page and mapping as errored,
         and end the writeback, similar to what we would do had the page
         actually been submitted to writeback.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      87826df0
  13. 27 1月, 2012 1 次提交
    • M
      Btrfs: Check for NULL page in extent_range_uptodate · 8bedd51b
      Mitch Harder 提交于
      A user has encountered a NULL pointer kernel oops in btrfs when
      encountering media errors.  The problem has been identified
      as an unhandled NULL pointer returned from find_get_page().
      This modification simply checks for a NULL page, and returns
      with an error if found (the extent_range_uptodate() function
      returns 1 on errors).
      
      After testing this patch, the user reported that the error with
      the NULL pointer oops was solved.  However, there is still a
      remaining problem with a thread becoming stuck in
      wait_on_page_locked(page) in the read_extent_buffer_pages(...)
      function in extent_io.c
      
             for (i = start_i; i < num_pages; i++) {
                     page = extent_buffer_page(eb, i);
                     wait_on_page_locked(page);
                     if (!PageUptodate(page))
                             ret = -EIO;
             }
      
      This patch leaves the issue with the locked page yet to be resolved.
      Signed-off-by: NMitch Harder <mitch.harder@sabayonlinux.org>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      8bedd51b
  14. 04 1月, 2012 1 次提交
  15. 22 12月, 2011 1 次提交
    • S
      Btrfs: integrate integrity check module into btrfs · 21adbd5c
      Stefan Behrens 提交于
      This is the last part of the patch series. It modifies the btrfs
      code to use the integrity check module if configured to do so
      with the define BTRFS_FS_CHECK_INTEGRITY. If this define is not set,
      the only effective change is that code is added that handles the
      mount option to activate the integrity check. If the mount option is
      set and the define BTRFS_FS_CHECK_INTEGRITY is not set, that code
      complains in the log and the mount fails with EINVAL.
      
      Add the mount option to activate the usage of the integrity check
      code.
      Add invocation of btrfs integrity check code init and cleanup
      function on mount and umount, respectively.
      Add hook to call btrfs integrity check code version of
      submit_bh/submit_bio.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      21adbd5c