1. 07 5月, 2013 2 次提交
  2. 27 3月, 2013 1 次提交
    • C
      Btrfs: fix race between mmap writes and compression · 4adaa611
      Chris Mason 提交于
      Btrfs uses page_mkwrite to ensure stable pages during
      crc calculations and mmap workloads.  We call clear_page_dirty_for_io
      before we do any crcs, and this forces any application with the file
      mapped to wait for the crc to finish before it is allowed to change
      the file.
      
      With compression on, the clear_page_dirty_for_io step is happening after
      we've compressed the pages.  This means the applications might be
      changing the pages while we are compressing them, and some of those
      modifications might not hit the disk.
      
      This commit adds the clear_page_dirty_for_io before compression starts
      and makes sure to redirty the page if we have to fallback to
      uncompressed IO as well.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      Reported-by: NAlexandre Oliva <oliva@gnu.org>
      cc: stable@vger.kernel.org
      4adaa611
  3. 02 3月, 2013 1 次提交
  4. 01 3月, 2013 1 次提交
  5. 27 2月, 2013 1 次提交
    • Q
      btrfs: cleanup for open-coded alignment · fda2832f
      Qu Wenruo 提交于
      Though most of the btrfs codes are using ALIGN macro for page alignment,
      there are still some codes using open-coded alignment like the
      following:
      ------
              u64 mask = ((u64)root->stripesize - 1);
              u64 ret = (val + mask) & ~mask;
      ------
      Or even hidden one:
      ------
              num_bytes = (end - start + blocksize) & ~(blocksize - 1);
      ------
      
      Sometimes these open-coded alignment is not so easy to understand for
      newbie like me.
      
      This commit changes the open-coded alignment to the ALIGN macro for a
      better readability.
      
      Also there is a previous patch from David Sterba with similar changes,
      but the patch is for 3.2 kernel and seems not merged.
      http://www.spinics.net/lists/linux-btrfs/msg12747.html
      
      Cc: David Sterba <dave@jikos.cz>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      fda2832f
  6. 21 2月, 2013 2 次提交
  7. 20 2月, 2013 1 次提交
  8. 02 2月, 2013 3 次提交
    • C
      Btrfs: reduce lock contention on extent buffer locks · 242e18c7
      Chris Mason 提交于
      The extent buffers have a refs_lock which we use to make coordinate freeing
      the extent buffer with operations on the radix tree.  On tree roots and
      other extent buffers that very cache hot, this can be highly contended.
      
      These are also the extent buffers that are basically pinned in memory.
      This commit adds code to cmpxchg our way through the ref modifications,
      and as long as the result of the reference change is still pinned in
      ram, we skip the expensive spinlock.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      242e18c7
    • D
      Btrfs: RAID5 and RAID6 · 53b381b3
      David Woodhouse 提交于
      This builds on David Woodhouse's original Btrfs raid5/6 implementation.
      The code has changed quite a bit, blame Chris Mason for any bugs.
      
      Read/modify/write is done after the higher levels of the filesystem have
      prepared a given bio.  This means the higher layers are not responsible
      for building full stripes, and they don't need to query for the topology
      of the extents that may get allocated during delayed allocation runs.
      It also means different files can easily share the same stripe.
      
      But, it does expose us to incorrect parity if we crash or lose power
      while doing a read/modify/write cycle.  This will be addressed in a
      later commit.
      
      Scrub is unable to repair crc errors on raid5/6 chunks.
      
      Discard does not work on raid5/6 (yet)
      
      The stripe size is fixed at 64KiB per disk.  This will be tunable
      in a later commit.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      53b381b3
    • D
      Btrfs: add rw argument to merge_bio_hook() · 64a16701
      David Woodhouse 提交于
      We'll want to merge writes so they can fill a full RAID[56] stripe, but
      not necessarily reads.
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      64a16701
  9. 13 12月, 2012 4 次提交
  10. 26 10月, 2012 1 次提交
  11. 09 10月, 2012 7 次提交
  12. 04 10月, 2012 1 次提交
    • J
      Btrfs: fix race when getting the eb out of page->private · b5bae261
      Josef Bacik 提交于
      We can race when checking wether PagePrivate is set on a page and we
      actually have an eb saved in the pages private pointer.  We could have
      easily written out this page and released it in the time that we did the
      pagevec lookup and actually got around to looking at this page.  So use
      mapping->private_lock to ensure we get a consistent view of the
      page->private pointer.  This is inline with the alloc and releasepage paths
      which use private_lock when manipulating page->private.  Thanks,
      Reported-by: NDavid Sterba <dave@jikos.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      b5bae261
  13. 03 10月, 2012 1 次提交
  14. 02 10月, 2012 4 次提交
    • K
      btrfs: Kill some bi_idx references · be3940c0
      Kent Overstreet 提交于
      For immutable bio vecs, I've been auditing and removing bi_idx
      references. These were harmless, but removing them will make auditing
      easier.
      
      scrub_bio_end_io_worker() was open coding a bio_reset() - but this
      doesn't appear to have been needed for anything as right after it does a
      bio_put(), and perusing the code it doesn't appear anything else was
      holding a reference to the bio.
      
      The other use end_bio_extent_readpage() was just for a pr_debug() -
      changed it to something that might be a bit more useful.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Chris Mason <chris.mason@oracle.com>
      CC: Stefan Behrens <sbehrens@giantdisaster.de>
      be3940c0
    • D
      btrfs: polish names of kmem caches · 837e1972
      David Sterba 提交于
      Usecase:
      
        watch 'grep btrfs < /proc/slabinfo'
      
      easy to watch all caches in one go.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      837e1972
    • L
      Btrfs: use flag EXTENT_DEFRAG for snapshot-aware defrag · 9e8a4a8b
      Liu Bo 提交于
      We're going to use this flag EXTENT_DEFRAG to indicate which range
      belongs to defragment so that we can implement snapshow-aware defrag:
      
      We set the EXTENT_DEFRAG flag when dirtying the extents that need
      defragmented, so later on writeback thread can differentiate between
      normal writeback and writeback started by defragmentation.
      Original-Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      9e8a4a8b
    • C
      Btrfs: fix btrfs send for inline items and compression · 74dd17fb
      Chris Mason 提交于
      The btrfs send code was assuming the offset of the file item into the
      extent translated to bytes on disk.  If we're compressed, this isn't
      true, and so it was off into extents owned by other files.
      
      It was also improperly handling inline extents.  This solves a crash
      where we may have gone past the end of the file extent item by not
      testing early enough for an inline extent.  It also solves problems
      where we have a whole between the end of the inline item and the start
      of the full extent.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      74dd17fb
  15. 29 8月, 2012 1 次提交
  16. 24 7月, 2012 5 次提交
    • L
      Btrfs: improve multi-thread buffer read · 67c9684f
      Liu Bo 提交于
      While testing with my buffer read fio jobs[1], I find that btrfs does not
      perform well enough.
      
      Here is a scenario in fio jobs:
      
      We have 4 threads, "t1 t2 t3 t4", starting to buffer read a same file,
      and all of them will race on add_to_page_cache_lru(), and if one thread
      successfully puts its page into the page cache, it takes the responsibility
      to read the page's data.
      
      And what's more, reading a page needs a period of time to finish, in which
      other threads can slide in and process rest pages:
      
           t1          t2          t3          t4
         add Page1
         read Page1  add Page2
           |         read Page2  add Page3
           |            |        read Page3  add Page4
           |            |           |        read Page4
      -----|------------|-----------|-----------|--------
           v            v           v           v
          bio          bio         bio         bio
      
      Now we have four bios, each of which holds only one page since we need to
      maintain consecutive pages in bio.  Thus, we can end up with far more bios
      than we need.
      
      Here we're going to
      a) delay the real read-page section and
      b) try to put more pages into page cache.
      
      With that said, we can make each bio hold more pages and reduce the number
      of bios we need.
      
      Here is some numbers taken from fio results:
               w/o patch                 w patch
             -------------  --------  ---------------
      READ:    745MB/s        +25%       934MB/s
      
      [1]:
      [global]
      group_reporting
      thread
      numjobs=4
      bs=32k
      rw=read
      ioengine=sync
      directory=/mnt/btrfs/
      
      [READ]
      filename=foobar
      size=2000M
      invalidate=1
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      67c9684f
    • J
      Btrfs: lock the transition from dirty to writeback for an eb · 51561ffe
      Josef Bacik 提交于
      There is a small window where an eb can have no IO bits set on it, which
      could potentially result in extent_buffer_under_io() returning false when we
      want it to return true, which could result in not fun things happening.  So
      in order to protect this case we need to hold the refs_lock when we make
      this transition to make sure we get reliable results out of
      extent_buffer_udner_io().  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      51561ffe
    • J
      Btrfs: fix potential race in extent buffer freeing · 594831c4
      Josef Bacik 提交于
      This sounds sort of impossible but it is the only thing I can think of and
      at the very least it is theoretically possible so here it goes.
      
      If we are in try_release_extent_buffer we will check that the ref count on
      the extent buffer is 1 and not under IO, and then go down and clear the tree
      ref.  If between this check and clearing the tree ref somebody else comes in
      and grabs a ref on the eb and the marks it dirty before
      try_release_extent_buffer() does it's tree ref clear we can end up with a
      dirty eb that will be freed while it is still dirty which will result in a
      panic.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      594831c4
    • J
      Btrfs: don't return true in releasepage unless we actually freed the eb · e64860aa
      Josef Bacik 提交于
      I noticed while looking at an extent_buffer race that we will
      unconditionally return 1 if we get down to release_extent_buffer after
      clearing the tree ref.  However we can easily race in here and get a ref on
      the eb and not actually free the eb.  So make release_extent_buffer return 1
      if it free'd the eb and 0 if not so we can be a little kinder to the vm.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      e64860aa
    • A
      btrfs read error corrected message floods the console during recovery · d5b025d5
      Anand Jain 提交于
      Changing printk_in_rcu to printk_ratelimited_in_rcu will suffice
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      d5b025d5
  17. 12 7月, 2012 1 次提交
  18. 03 7月, 2012 1 次提交
    • J
      Btrfs: hold a ref on the inode during writepages · 7fd1a3f7
      Josef Bacik 提交于
      We can race with unlink and not actually be able to do our igrab in
      btrfs_add_ordered_extent.  This will result in all sorts of problems.
      Instead of doing the complicated work to try and handle returning an error
      properly from btrfs_add_ordered_extent, just hold a ref to the inode during
      writepages.  If we cannot grab a ref we know we're freeing this inode anyway
      and can just drop the dirty pages on the floor, because screw them we're
      going to invalidate them anyway.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      7fd1a3f7
  19. 15 6月, 2012 1 次提交
    • J
      Btrfs: use rcu to protect device->name · 606686ee
      Josef Bacik 提交于
      Al pointed out that we can just toss out the old name on a device and add a
      new one arbitrarily, so anybody who uses device->name in printk could
      possibly use free'd memory.  Instead of adding locking around all of this he
      suggested doing it with RCU, so I've introduced a struct rcu_string that
      does just that and have gone through and protected all accesses to
      device->name that aren't under the uuid_mutex with rcu_read_lock().  This
      protects us and I will use it for dealing with removing the device that we
      used to mount the file system in a later patch.  Thanks,
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      606686ee
  20. 30 5月, 2012 1 次提交