1. 24 7月, 2012 2 次提交
  2. 03 7月, 2012 1 次提交
    • J
      Btrfs: hold a ref on the inode during writepages · 7fd1a3f7
      Josef Bacik 提交于
      We can race with unlink and not actually be able to do our igrab in
      btrfs_add_ordered_extent.  This will result in all sorts of problems.
      Instead of doing the complicated work to try and handle returning an error
      properly from btrfs_add_ordered_extent, just hold a ref to the inode during
      writepages.  If we cannot grab a ref we know we're freeing this inode anyway
      and can just drop the dirty pages on the floor, because screw them we're
      going to invalidate them anyway.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      7fd1a3f7
  3. 15 6月, 2012 1 次提交
    • J
      Btrfs: use rcu to protect device->name · 606686ee
      Josef Bacik 提交于
      Al pointed out that we can just toss out the old name on a device and add a
      new one arbitrarily, so anybody who uses device->name in printk could
      possibly use free'd memory.  Instead of adding locking around all of this he
      suggested doing it with RCU, so I've introduced a struct rcu_string that
      does just that and have gone through and protected all accesses to
      device->name that aren't under the uuid_mutex with rcu_read_lock().  This
      protects us and I will use it for dealing with removing the device that we
      used to mount the file system in a later patch.  Thanks,
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      606686ee
  4. 30 5月, 2012 4 次提交
    • S
      Btrfs: add device counters for detected IO and checksum errors · 442a4f63
      Stefan Behrens 提交于
      The goal is to detect when drives start to get an increased error rate,
      when drives should be replaced soon. Therefore statistic counters are
      added that count IO errors (read, write and flush). Additionally, the
      software detected errors like checksum errors and corrupted blocks are
      counted.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      442a4f63
    • L
      Btrfs: use fastpath in extent state ops as much as possible · d1ac6e41
      Liu Bo 提交于
      Fully utilize our extent state's new helper functions to use
      fastpath as much as possible.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Reviewed-by: NJosef Bacik <josef@redhat.com>
      d1ac6e41
    • J
      Btrfs: finish ordered extents in their own thread · 5fd02043
      Josef Bacik 提交于
      We noticed that the ordered extent completion doesn't really rely on having
      a page and that it could be done independantly of ending the writeback on a
      page.  This patch makes us not do the threaded endio stuff for normal
      buffered writes and direct writes so we can end page writeback as soon as
      possible (in irq context) and only start threads to do the ordered work when
      it is actually done.  Compression needs to be reworked some to take
      advantage of this as well, but atm it has to do a find_get_page in its endio
      handler so it must be done in its own thread.  This makes direct writes
      quite a bit faster.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      5fd02043
    • J
      Btrfs: fix compile warnings in extent_io.c · d7dbe9e7
      Josef Bacik 提交于
      These warnings are bogus since we will always have at least one page in an
      eb, but to make the compiler happy just set ret = 0 in these two cases.
      Thanks,
      Btrfs: fix compile warnings in extent_io.c
      
      These warnings are bogus since we will always have at least one page in an
      eb, but to make the compiler happy just set ret = 0 in these two cases.
      Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      d7dbe9e7
  5. 26 5月, 2012 1 次提交
  6. 11 5月, 2012 4 次提交
  7. 05 5月, 2012 1 次提交
    • J
      Btrfs: fix page leak when allocing extent buffers · 17de39ac
      Josef Bacik 提交于
      If we happen to alloc a extent buffer and then alloc a page and notice that
      page is already attached to an extent buffer, we will only unlock it and
      free our existing eb.  Any pages currently attached to that eb will be
      properly freed, but we don't do the page_cache_release() on the page where
      we noticed the other extent buffer which can cause us to leak pages and I
      hope cause the weird issues we've been seeing in this area.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      17de39ac
  8. 19 4月, 2012 3 次提交
    • J
      Btrfs: always store the mirror we read the eb from · 5cf1ab56
      Josef Bacik 提交于
      A user reported a panic where we were trying to fix a bad mirror but the
      mirror number we were giving was 0, which is invalid.  This is because we
      don't do the transid verification until after the read, so as far as the
      read code is concerned the read was a success.  So instead store the mirror
      we read from so that if there is some failure post read we know which mirror
      to try next and which mirror needs to be fixed if we find a good copy of the
      block.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      5cf1ab56
    • L
      Btrfs: avoid possible use-after-free in clear_extent_bit() · cdc6a395
      Li Zefan 提交于
      clear_extent_bit()
      {
          next_node = rb_next(&state->rb_node);
          ...
          clear_state_bit(state);  <-- this may free next_node
          if (next_node) {
              state = rb_entry(next_node);
              ...
          }
      }
      
      clear_state_bit() calls merge_state() which may free the next node
      of the passing extent_state, so clear_extent_bit() may end up
      referencing freed memory.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      cdc6a395
    • L
      Btrfs: retrurn void from clear_state_bit · 8e52acf7
      Li Zefan 提交于
      Currently it returns a set of bits that were cleared, but this return
      value is not used at all.
      
      Moreover it doesn't seem to be useful, because we may clear the bits
      of a few extent_states, but only the cleared bits of last one is
      returned.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      8e52acf7
  9. 13 4月, 2012 2 次提交
  10. 27 3月, 2012 8 次提交
    • J
      Btrfs: deal with read errors on extent buffers differently · ea466794
      Josef Bacik 提交于
      Since we need to read and write extent buffers in their entirety we can't use
      the normal bio_readpage_error stuff since it only works on a per page basis.  So
      instead make it so that if we see an io error in endio we just mark the eb as
      having an IO error and then in btree_read_extent_buffer_pages we will manually
      try other mirrors and then overwrite the bad mirror if we find a good copy.
      This works with larger than page size blocks.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      ea466794
    • C
      Btrfs: loop waiting on writeback · a098d8e8
      Chris Mason 提交于
      lock_extent_buffer_for_io needs to loop around and make sure the
      writeback bits are not set.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      a098d8e8
    • J
      Btrfs: ensure an entire eb is written at once · 0b32f4bb
      Josef Bacik 提交于
      This patch simplifies how we track our extent buffers.  Previously we could exit
      writepages with only having written half of an extent buffer, which meant we had
      to track the state of the pages and the state of the extent buffers differently.
      Now we only read in entire extent buffers and write out entire extent buffers,
      this allows us to simply set bits in our bflags to indicate the state of the eb
      and we no longer have to do things like track uptodate with our iotree.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      0b32f4bb
    • J
      Btrfs: introduce mark_extent_buffer_accessed · 5df4235e
      Josef Bacik 提交于
      Because an eb can have multiple pages we need to make sure that all pages within
      the eb are markes as accessed, since releasepage can be called against any page
      in the eb.  This will keep us from possibly evicting hot eb's when we're doing
      larger than pagesize eb's.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      5df4235e
    • J
      Btrfs: introduce free_extent_buffer_stale · 3083ee2e
      Josef Bacik 提交于
      Because btrfs cow's we can end up with extent buffers that are no longer
      necessary just sitting around in memory.  So instead of evicting these pages, we
      could end up evicting things we actually care about.  Thus we have
      free_extent_buffer_stale for use when we are freeing tree blocks.  This will
      make it so that the ref for the eb being in the radix tree is dropped as soon as
      possible and then is freed when the refcount hits 0 instead of waiting to be
      released by releasepage.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      3083ee2e
    • J
      Btrfs: only use the existing eb if it's count isn't 0 · 115391d2
      Josef Bacik 提交于
      We can run into a problem where we find an eb for our existing page already on
      the radix tree but it has a ref count of 0.  It hasn't yet been removed by RCU
      yet so this can cause issues where we will use the EB after free.  So do
      atomic_inc_not_zero on the exists->refs and if it is zero just do
      synchronize_rcu() and try again.  We won't have to worry about new allocators
      coming in since they will block on the page lock at this point.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      115391d2
    • J
      Btrfs: set page->private to the eb · 4f2de97a
      Josef Bacik 提交于
      We spend a lot of time looking up extent buffers from pages when we could just
      store the pointer to the eb the page is associated with in page->private.  This
      patch does just that, and it makes things a little simpler and reduces a bit of
      CPU overhead involved with doing metadata IO.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      4f2de97a
    • C
      Btrfs: allow metadata blocks larger than the page size · 727011e0
      Chris Mason 提交于
      A few years ago the btrfs code to support blocks lager than
      the page size was disabled to fix a few corner cases in the
      page cache handling.  This fixes the code to properly support
      large metadata blocks again.
      
      Since current kernels will crash early and often with larger
      metadata blocks, this adds an incompat bit so that older kernels
      can't mount it.
      
      This also does away with different blocksizes for nodes and leaves.
      You get a single block size for all tree blocks.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      727011e0
  11. 22 3月, 2012 8 次提交
  12. 20 3月, 2012 1 次提交
  13. 23 2月, 2012 1 次提交
    • C
      Btrfs: clear the extent uptodate bits during parent transid failures · 50653190
      Chris Mason 提交于
      If btrfs reads a block and finds a parent transid mismatch, it clears
      the uptodate flags on the extent buffer, and the pages inside it.  But
      we only clear the uptodate bits in the state tree if the block straddles
      more than one page.
      
      This is from an old optimization from to reduce contention on the extent
      state tree.  But it is buggy because the code that retries a read from
      a different copy of the block is going to find the uptodate state bits
      set and skip the IO.
      
      The end result of the bug is that we'll never actually read the good
      copy (if there is one).
      
      The fix here is to always clear the uptodate state bits, which is safe
      because this code is only called when the parent transid fails.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      50653190
  14. 21 2月, 2012 1 次提交
  15. 17 2月, 2012 2 次提交