1. 18 5月, 2013 3 次提交
    • C
      Btrfs: use a btrfs bioset instead of abusing bio internals · 9be3395b
      Chris Mason 提交于
      Btrfs has been pointer tagging bi_private and using bi_bdev
      to store the stripe index and mirror number of failed IOs.
      
      As bios bubble back up through the call chain, we use these
      to decide if and how to retry our IOs.  They are also used
      to count IO failures on a per device basis.
      
      Recently a bio tracepoint was added lead to crashes because
      we were abusing bi_bdev.
      
      This commit adds a btrfs bioset, and creates explicit fields
      for the mirror number and stripe index.  The plan is to
      extend this structure for all of the fields currently in
      struct btrfs_bio, which will mean one less kmalloc in
      our IO path.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      Reported-by: NTejun Heo <tj@kernel.org>
      9be3395b
    • A
      btrfs: do away with non-whole_page extent I/O · 17a5adcc
      Alexandre Oliva 提交于
      end_bio_extent_readpage computes whole_page based on bv_offset and
      bv_len, without taking into account that blk_update_request may modify
      them when some of the blocks to be read into a page produce a read
      error.  This would cause the read to unlock only part of the file
      range associated with the page, which would in turn leave the entire
      page locked, which would not only keep the process blocked instead of
      returning -EIO to it, but also prevent any further access to the file.
      
      It turns out that btrfs always issues whole-page reads and writes.
      The special handling of non-whole_page appears to be a mistake or a
      left-over from a time when this wasn't the case.  Indeed,
      end_bio_extent_writepage distinguished between whole_page and
      non-whole_page writes but behaved identically in both cases!
      
      I've replaced the whole_page computations with warnings, just to be
      sure that we're not issuing partial page reads or writes.  The
      warnings should probably just go away some time.
      Signed-off-by: NAlexandre Oliva <oliva@gnu.org>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      17a5adcc
    • L
      Btrfs: fix off-by-one in fiemap · a52f4cd2
      Liu Bo 提交于
      lock_extent/unlock_extent expect an exclusive end.
      Tested-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      a52f4cd2
  2. 07 5月, 2013 8 次提交
  3. 27 3月, 2013 1 次提交
    • C
      Btrfs: fix race between mmap writes and compression · 4adaa611
      Chris Mason 提交于
      Btrfs uses page_mkwrite to ensure stable pages during
      crc calculations and mmap workloads.  We call clear_page_dirty_for_io
      before we do any crcs, and this forces any application with the file
      mapped to wait for the crc to finish before it is allowed to change
      the file.
      
      With compression on, the clear_page_dirty_for_io step is happening after
      we've compressed the pages.  This means the applications might be
      changing the pages while we are compressing them, and some of those
      modifications might not hit the disk.
      
      This commit adds the clear_page_dirty_for_io before compression starts
      and makes sure to redirty the page if we have to fallback to
      uncompressed IO as well.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      Reported-by: NAlexandre Oliva <oliva@gnu.org>
      cc: stable@vger.kernel.org
      4adaa611
  4. 02 3月, 2013 1 次提交
  5. 01 3月, 2013 1 次提交
  6. 27 2月, 2013 1 次提交
    • Q
      btrfs: cleanup for open-coded alignment · fda2832f
      Qu Wenruo 提交于
      Though most of the btrfs codes are using ALIGN macro for page alignment,
      there are still some codes using open-coded alignment like the
      following:
      ------
              u64 mask = ((u64)root->stripesize - 1);
              u64 ret = (val + mask) & ~mask;
      ------
      Or even hidden one:
      ------
              num_bytes = (end - start + blocksize) & ~(blocksize - 1);
      ------
      
      Sometimes these open-coded alignment is not so easy to understand for
      newbie like me.
      
      This commit changes the open-coded alignment to the ALIGN macro for a
      better readability.
      
      Also there is a previous patch from David Sterba with similar changes,
      but the patch is for 3.2 kernel and seems not merged.
      http://www.spinics.net/lists/linux-btrfs/msg12747.html
      
      Cc: David Sterba <dave@jikos.cz>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      fda2832f
  7. 21 2月, 2013 2 次提交
  8. 20 2月, 2013 1 次提交
  9. 02 2月, 2013 3 次提交
    • C
      Btrfs: reduce lock contention on extent buffer locks · 242e18c7
      Chris Mason 提交于
      The extent buffers have a refs_lock which we use to make coordinate freeing
      the extent buffer with operations on the radix tree.  On tree roots and
      other extent buffers that very cache hot, this can be highly contended.
      
      These are also the extent buffers that are basically pinned in memory.
      This commit adds code to cmpxchg our way through the ref modifications,
      and as long as the result of the reference change is still pinned in
      ram, we skip the expensive spinlock.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      242e18c7
    • D
      Btrfs: RAID5 and RAID6 · 53b381b3
      David Woodhouse 提交于
      This builds on David Woodhouse's original Btrfs raid5/6 implementation.
      The code has changed quite a bit, blame Chris Mason for any bugs.
      
      Read/modify/write is done after the higher levels of the filesystem have
      prepared a given bio.  This means the higher layers are not responsible
      for building full stripes, and they don't need to query for the topology
      of the extents that may get allocated during delayed allocation runs.
      It also means different files can easily share the same stripe.
      
      But, it does expose us to incorrect parity if we crash or lose power
      while doing a read/modify/write cycle.  This will be addressed in a
      later commit.
      
      Scrub is unable to repair crc errors on raid5/6 chunks.
      
      Discard does not work on raid5/6 (yet)
      
      The stripe size is fixed at 64KiB per disk.  This will be tunable
      in a later commit.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      53b381b3
    • D
      Btrfs: add rw argument to merge_bio_hook() · 64a16701
      David Woodhouse 提交于
      We'll want to merge writes so they can fill a full RAID[56] stripe, but
      not necessarily reads.
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      64a16701
  10. 13 12月, 2012 4 次提交
  11. 26 10月, 2012 1 次提交
  12. 09 10月, 2012 7 次提交
  13. 04 10月, 2012 1 次提交
    • J
      Btrfs: fix race when getting the eb out of page->private · b5bae261
      Josef Bacik 提交于
      We can race when checking wether PagePrivate is set on a page and we
      actually have an eb saved in the pages private pointer.  We could have
      easily written out this page and released it in the time that we did the
      pagevec lookup and actually got around to looking at this page.  So use
      mapping->private_lock to ensure we get a consistent view of the
      page->private pointer.  This is inline with the alloc and releasepage paths
      which use private_lock when manipulating page->private.  Thanks,
      Reported-by: NDavid Sterba <dave@jikos.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      b5bae261
  14. 03 10月, 2012 1 次提交
  15. 02 10月, 2012 4 次提交
    • K
      btrfs: Kill some bi_idx references · be3940c0
      Kent Overstreet 提交于
      For immutable bio vecs, I've been auditing and removing bi_idx
      references. These were harmless, but removing them will make auditing
      easier.
      
      scrub_bio_end_io_worker() was open coding a bio_reset() - but this
      doesn't appear to have been needed for anything as right after it does a
      bio_put(), and perusing the code it doesn't appear anything else was
      holding a reference to the bio.
      
      The other use end_bio_extent_readpage() was just for a pr_debug() -
      changed it to something that might be a bit more useful.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Chris Mason <chris.mason@oracle.com>
      CC: Stefan Behrens <sbehrens@giantdisaster.de>
      be3940c0
    • D
      btrfs: polish names of kmem caches · 837e1972
      David Sterba 提交于
      Usecase:
      
        watch 'grep btrfs < /proc/slabinfo'
      
      easy to watch all caches in one go.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      837e1972
    • L
      Btrfs: use flag EXTENT_DEFRAG for snapshot-aware defrag · 9e8a4a8b
      Liu Bo 提交于
      We're going to use this flag EXTENT_DEFRAG to indicate which range
      belongs to defragment so that we can implement snapshow-aware defrag:
      
      We set the EXTENT_DEFRAG flag when dirtying the extents that need
      defragmented, so later on writeback thread can differentiate between
      normal writeback and writeback started by defragmentation.
      Original-Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      9e8a4a8b
    • C
      Btrfs: fix btrfs send for inline items and compression · 74dd17fb
      Chris Mason 提交于
      The btrfs send code was assuming the offset of the file item into the
      extent translated to bytes on disk.  If we're compressed, this isn't
      true, and so it was off into extents owned by other files.
      
      It was also improperly handling inline extents.  This solves a crash
      where we may have gone past the end of the file extent item by not
      testing early enough for an inline extent.  It also solves problems
      where we have a whole between the end of the inline item and the start
      of the full extent.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      74dd17fb
  16. 29 8月, 2012 1 次提交