1. 18 5月, 2013 2 次提交
    • A
      btrfs: do away with non-whole_page extent I/O · 17a5adcc
      Alexandre Oliva 提交于
      end_bio_extent_readpage computes whole_page based on bv_offset and
      bv_len, without taking into account that blk_update_request may modify
      them when some of the blocks to be read into a page produce a read
      error.  This would cause the read to unlock only part of the file
      range associated with the page, which would in turn leave the entire
      page locked, which would not only keep the process blocked instead of
      returning -EIO to it, but also prevent any further access to the file.
      
      It turns out that btrfs always issues whole-page reads and writes.
      The special handling of non-whole_page appears to be a mistake or a
      left-over from a time when this wasn't the case.  Indeed,
      end_bio_extent_writepage distinguished between whole_page and
      non-whole_page writes but behaved identically in both cases!
      
      I've replaced the whole_page computations with warnings, just to be
      sure that we're not issuing partial page reads or writes.  The
      warnings should probably just go away some time.
      Signed-off-by: NAlexandre Oliva <oliva@gnu.org>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      17a5adcc
    • L
      Btrfs: fix off-by-one in fiemap · a52f4cd2
      Liu Bo 提交于
      lock_extent/unlock_extent expect an exclusive end.
      Tested-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      a52f4cd2
  2. 07 5月, 2013 8 次提交
  3. 27 3月, 2013 1 次提交
    • C
      Btrfs: fix race between mmap writes and compression · 4adaa611
      Chris Mason 提交于
      Btrfs uses page_mkwrite to ensure stable pages during
      crc calculations and mmap workloads.  We call clear_page_dirty_for_io
      before we do any crcs, and this forces any application with the file
      mapped to wait for the crc to finish before it is allowed to change
      the file.
      
      With compression on, the clear_page_dirty_for_io step is happening after
      we've compressed the pages.  This means the applications might be
      changing the pages while we are compressing them, and some of those
      modifications might not hit the disk.
      
      This commit adds the clear_page_dirty_for_io before compression starts
      and makes sure to redirty the page if we have to fallback to
      uncompressed IO as well.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      Reported-by: NAlexandre Oliva <oliva@gnu.org>
      cc: stable@vger.kernel.org
      4adaa611
  4. 02 3月, 2013 1 次提交
  5. 01 3月, 2013 1 次提交
  6. 27 2月, 2013 1 次提交
    • Q
      btrfs: cleanup for open-coded alignment · fda2832f
      Qu Wenruo 提交于
      Though most of the btrfs codes are using ALIGN macro for page alignment,
      there are still some codes using open-coded alignment like the
      following:
      ------
              u64 mask = ((u64)root->stripesize - 1);
              u64 ret = (val + mask) & ~mask;
      ------
      Or even hidden one:
      ------
              num_bytes = (end - start + blocksize) & ~(blocksize - 1);
      ------
      
      Sometimes these open-coded alignment is not so easy to understand for
      newbie like me.
      
      This commit changes the open-coded alignment to the ALIGN macro for a
      better readability.
      
      Also there is a previous patch from David Sterba with similar changes,
      but the patch is for 3.2 kernel and seems not merged.
      http://www.spinics.net/lists/linux-btrfs/msg12747.html
      
      Cc: David Sterba <dave@jikos.cz>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      fda2832f
  7. 21 2月, 2013 2 次提交
  8. 20 2月, 2013 1 次提交
  9. 02 2月, 2013 3 次提交
    • C
      Btrfs: reduce lock contention on extent buffer locks · 242e18c7
      Chris Mason 提交于
      The extent buffers have a refs_lock which we use to make coordinate freeing
      the extent buffer with operations on the radix tree.  On tree roots and
      other extent buffers that very cache hot, this can be highly contended.
      
      These are also the extent buffers that are basically pinned in memory.
      This commit adds code to cmpxchg our way through the ref modifications,
      and as long as the result of the reference change is still pinned in
      ram, we skip the expensive spinlock.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      242e18c7
    • D
      Btrfs: RAID5 and RAID6 · 53b381b3
      David Woodhouse 提交于
      This builds on David Woodhouse's original Btrfs raid5/6 implementation.
      The code has changed quite a bit, blame Chris Mason for any bugs.
      
      Read/modify/write is done after the higher levels of the filesystem have
      prepared a given bio.  This means the higher layers are not responsible
      for building full stripes, and they don't need to query for the topology
      of the extents that may get allocated during delayed allocation runs.
      It also means different files can easily share the same stripe.
      
      But, it does expose us to incorrect parity if we crash or lose power
      while doing a read/modify/write cycle.  This will be addressed in a
      later commit.
      
      Scrub is unable to repair crc errors on raid5/6 chunks.
      
      Discard does not work on raid5/6 (yet)
      
      The stripe size is fixed at 64KiB per disk.  This will be tunable
      in a later commit.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      53b381b3
    • D
      Btrfs: add rw argument to merge_bio_hook() · 64a16701
      David Woodhouse 提交于
      We'll want to merge writes so they can fill a full RAID[56] stripe, but
      not necessarily reads.
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      64a16701
  10. 13 12月, 2012 4 次提交
  11. 26 10月, 2012 1 次提交
  12. 09 10月, 2012 7 次提交
  13. 04 10月, 2012 1 次提交
    • J
      Btrfs: fix race when getting the eb out of page->private · b5bae261
      Josef Bacik 提交于
      We can race when checking wether PagePrivate is set on a page and we
      actually have an eb saved in the pages private pointer.  We could have
      easily written out this page and released it in the time that we did the
      pagevec lookup and actually got around to looking at this page.  So use
      mapping->private_lock to ensure we get a consistent view of the
      page->private pointer.  This is inline with the alloc and releasepage paths
      which use private_lock when manipulating page->private.  Thanks,
      Reported-by: NDavid Sterba <dave@jikos.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      b5bae261
  14. 03 10月, 2012 1 次提交
  15. 02 10月, 2012 4 次提交
    • K
      btrfs: Kill some bi_idx references · be3940c0
      Kent Overstreet 提交于
      For immutable bio vecs, I've been auditing and removing bi_idx
      references. These were harmless, but removing them will make auditing
      easier.
      
      scrub_bio_end_io_worker() was open coding a bio_reset() - but this
      doesn't appear to have been needed for anything as right after it does a
      bio_put(), and perusing the code it doesn't appear anything else was
      holding a reference to the bio.
      
      The other use end_bio_extent_readpage() was just for a pr_debug() -
      changed it to something that might be a bit more useful.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Chris Mason <chris.mason@oracle.com>
      CC: Stefan Behrens <sbehrens@giantdisaster.de>
      be3940c0
    • D
      btrfs: polish names of kmem caches · 837e1972
      David Sterba 提交于
      Usecase:
      
        watch 'grep btrfs < /proc/slabinfo'
      
      easy to watch all caches in one go.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      837e1972
    • L
      Btrfs: use flag EXTENT_DEFRAG for snapshot-aware defrag · 9e8a4a8b
      Liu Bo 提交于
      We're going to use this flag EXTENT_DEFRAG to indicate which range
      belongs to defragment so that we can implement snapshow-aware defrag:
      
      We set the EXTENT_DEFRAG flag when dirtying the extents that need
      defragmented, so later on writeback thread can differentiate between
      normal writeback and writeback started by defragmentation.
      Original-Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      9e8a4a8b
    • C
      Btrfs: fix btrfs send for inline items and compression · 74dd17fb
      Chris Mason 提交于
      The btrfs send code was assuming the offset of the file item into the
      extent translated to bytes on disk.  If we're compressed, this isn't
      true, and so it was off into extents owned by other files.
      
      It was also improperly handling inline extents.  This solves a crash
      where we may have gone past the end of the file extent item by not
      testing early enough for an inline extent.  It also solves problems
      where we have a whole between the end of the inline item and the start
      of the full extent.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      74dd17fb
  16. 29 8月, 2012 1 次提交
  17. 24 7月, 2012 1 次提交
    • L
      Btrfs: improve multi-thread buffer read · 67c9684f
      Liu Bo 提交于
      While testing with my buffer read fio jobs[1], I find that btrfs does not
      perform well enough.
      
      Here is a scenario in fio jobs:
      
      We have 4 threads, "t1 t2 t3 t4", starting to buffer read a same file,
      and all of them will race on add_to_page_cache_lru(), and if one thread
      successfully puts its page into the page cache, it takes the responsibility
      to read the page's data.
      
      And what's more, reading a page needs a period of time to finish, in which
      other threads can slide in and process rest pages:
      
           t1          t2          t3          t4
         add Page1
         read Page1  add Page2
           |         read Page2  add Page3
           |            |        read Page3  add Page4
           |            |           |        read Page4
      -----|------------|-----------|-----------|--------
           v            v           v           v
          bio          bio         bio         bio
      
      Now we have four bios, each of which holds only one page since we need to
      maintain consecutive pages in bio.  Thus, we can end up with far more bios
      than we need.
      
      Here we're going to
      a) delay the real read-page section and
      b) try to put more pages into page cache.
      
      With that said, we can make each bio hold more pages and reduce the number
      of bios we need.
      
      Here is some numbers taken from fio results:
               w/o patch                 w patch
             -------------  --------  ---------------
      READ:    745MB/s        +25%       934MB/s
      
      [1]:
      [global]
      group_reporting
      thread
      numjobs=4
      bs=32k
      rw=read
      ioengine=sync
      directory=/mnt/btrfs/
      
      [READ]
      filename=foobar
      size=2000M
      invalidate=1
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      67c9684f