1. 24 7月, 2012 7 次提交
    • L
      Btrfs: add ro notification to dump_space_info · 799ffc3c
      Liu Bo 提交于
      Block group has ro attributes, make dump_space_info show it.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      799ffc3c
    • L
      Btrfs: fix a bug of writting free space cache during balance · cf7c1ef6
      Liu Bo 提交于
      Here is the whole story:
      1)
      A free space cache consists of two parts:
      o  free space cache inode, which is special becase it's stored in root tree.
      o  free space info, which is stored as the above inode's file data.
      
      But we only build up another new inode and does not flush its free space info
      onto disk when we _clear and setup_ free space cache, and this ends up with
      that the block group cache's cache_state remains DC_SETUP instead of DC_WRITTEN.
      
      And holding DC_SETUP means that we will not truncate this free space cache inode,
      which means the disk offset of its file extent will remain _unchanged_ at least
      until next transaction finishes committing itself.
      
      2)
      We can set a block group readonly when we relocate the block group.
      
      However,
      if the readonly block group covers the disk offset where our free space cache
      inode is going to write, it will force the free space cache inode into
      cow_file_range() and it'll end up hitting a BUG_ON.
      
      3)
      Due to the above analysis, we fix this bug by adding the missing dirty flag.
      
      4)
      However, it's not over, there is still another case, nospace_cache.
      
      With nospace_cache, we do not want to set dirty flag, instead we just truncate
      free space cache inode and bail out with setting cache state DC_WRITTEN.
      
      We can benifit from it since it saves us another 'pre-allocation' part which
      usually costs a lot.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      cf7c1ef6
    • L
      Btrfs: do not abort transaction in prealloc case · 06789384
      Liu Bo 提交于
      During disk balance, we prealloc new file extent for file data relocation,
      but we may fail in 'no available space' case, and it leads to flipping btrfs
      into readonly.
      
      It is not necessary to bail out and abort transaction since we do have several
      ways to rescue ourselves from ENOSPC case.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      06789384
    • L
      Btrfs: kill root from btrfs_is_free_space_inode · 83eea1f1
      Liu Bo 提交于
      Since root can be fetched via BTRFS_I macro directly, we can save an args
      for btrfs_is_free_space_inode().
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      83eea1f1
    • J
      Btrfs: rework shrink_delalloc · f4c738c2
      Josef Bacik 提交于
      So shrink_delalloc has grown all sorts of cruft over the years thanks to
      many reworkings of how we track enospc.  What happens now as we fill up the
      disk is we will loop for freaking ever hoping to reclaim a arbitrary amount
      of space of metadata, this was from when everybody flushed at the same time.
      Now we only have people flushing one at a time.  So instead of trying to
      reclaim a huge amount of space, just try to flush a decent chunk of space,
      and stop looping as soon as we have enough free space to satisfy our
      reservation.  This makes xfstests 224 go much faster.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      f4c738c2
    • J
      Btrfs: change how we indicate we're adding csums · 0e721106
      Josef Bacik 提交于
      There is weird logic I had to put in place to make sure that when we were
      adding csums that we'd used the delalloc block rsv instead of the global
      block rsv.  Part of this meant that we had to free up our transaction
      reservation before we ran the delayed refs since csum deletion happens
      during the delayed ref work.  The problem with this is that when we release
      a reservation we will add it to the global reserve if it is not full in
      order to keep us going along longer before we have to force a transaction
      commit.  By releasing our reservation before we run delayed refs we don't
      get the opportunity to drain down the global reserve for the work we did, so
      we won't refill it as often.  This isn't a problem per-se, it just results
      in us possibly committing transactions more and more often, and in rare
      cases could cause those WARN_ON()'s to pop in use_block_rsv because we ran
      out of space in our block rsv.
      
      This also helps us by holding onto space while the delayed refs run so we
      don't end up with as many people trying to do things at the same time, which
      again will help us not force commits or hit the use_block_rsv warnings.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      0e721106
    • J
      Btrfs: flush delayed inodes if we're short on space · 96c3f433
      Josef Bacik 提交于
      Those crazy gentoo guys have been complaining about ENOSPC errors on their
      portage volumes.  This is because doing things like untar tends to create
      lots of new files which will soak up all the reservation space in the
      delayed inodes.  Usually this gets papered over by the fact that we will try
      and commit the transaction, however if this happens in the wrong spot or we
      choose not to commit the transaction you will be screwed.  So add the
      ability to expclitly flush delayed inodes to free up space.  Please test
      this out guys to make sure it works since as usual I cannot reproduce.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      96c3f433
  2. 27 6月, 2012 1 次提交
    • J
      Btrfs: avoid waiting for delayed refs when we must not · 8ca78f3e
      Jan Schmidt 提交于
      We track two conditions to decide if we should sleep while waiting for more
      delayed refs, the number of delayed refs (num_refs) and the first entry in
      the list of blockers (first_seq).
      
      When we suspect staleness, we save num_refs and do one more cycle. If
      nothing changes, we then save first_seq for later comparison and do
      wait_event. We ought to save first_seq the very same moment we're saving
      num_refs. Otherwise we cannot be sure that nothing has changed and we might
      start waiting when we shouldn't, which could lead to starvation.
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      8ca78f3e
  3. 30 5月, 2012 1 次提交
    • J
      Btrfs: convert the inode bit field to use the actual bit operations · 72ac3c0d
      Josef Bacik 提交于
      Miao pointed this out while I was working on an orphan problem that messing
      with a bitfield where different ranges are protected by different locks
      doesn't work out right.  Turns out we've been doing this forever where we
      have different parts of the bit field protected by either no lock at all or
      different locks which could cause all sorts of weird problems including the
      issue I was hitting.  So instead make a runtime_flags thing that we use the
      normal bit operations on that are all atomic so we can keep having our
      no/different locking for the different flags and then make force_compress
      it's own thing so it can be treated normally.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      72ac3c0d
  4. 26 5月, 2012 1 次提交
  5. 11 5月, 2012 1 次提交
  6. 06 5月, 2012 1 次提交
    • C
      Btrfs: avoid sleeping in verify_parent_transid while atomic · b9fab919
      Chris Mason 提交于
      verify_parent_transid needs to lock the extent range to make
      sure no IO is underway, and so it can safely clear the
      uptodate bits if our checks fail.
      
      But, a few callers are using it with spinlocks held.  Most
      of the time, the generation numbers are going to match, and
      we don't want to switch to a blocking lock just for the error
      case.  This adds an atomic flag to verify_parent_transid,
      and changes it to return EAGAIN if it needs to block to
      properly verifiy things.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      b9fab919
  7. 28 4月, 2012 1 次提交
  8. 19 4月, 2012 2 次提交
    • A
      btrfs: don't return EINTR · b9688bb8
      Arne Jansen 提交于
      It is basically a good thing if we are interruptible when waiting for
      free space, but the generality in which it is implemented currently
      leads to system calls being interruptible that are not documented this
      way. For example git can't handle interrupted unlink(), leading to
      corrupt repos under space pressure.
      Instead we raise the bar to only be interruptible by SIGKILL.
      Thanks to David Sterba for suggesting this.
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      b9688bb8
    • D
      Btrfs: double unlock bug in error handling · 253beebd
      Dan Carpenter 提交于
      The caller expects this function to return with the lock held and
      releases it immediately on error.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      253beebd
  9. 13 4月, 2012 3 次提交
  10. 29 3月, 2012 2 次提交
  11. 27 3月, 2012 10 次提交
  12. 22 3月, 2012 4 次提交
  13. 24 2月, 2012 1 次提交
  14. 23 2月, 2012 1 次提交
  15. 17 2月, 2012 1 次提交
  16. 15 2月, 2012 1 次提交
    • L
      Btrfs: fix trim 0 bytes after a device delete · 2cac13e4
      Liu Bo 提交于
      A user reported a bug of btrfs's trim, that is we will trim 0 bytes
      after a device delete.
      
      The reproducer:
      
      $ mkfs.btrfs disk1
      $ mkfs.btrfs disk2
      $ mount disk1 /mnt
      $ fstrim -v /mnt
      $ btrfs device add disk2 /mnt
      $ btrfs device del disk1 /mnt
      $ fstrim -v /mnt
      
      This is because after we delete the device, the block group may start from
      a non-zero place, which will confuse trim to discard nothing.
      Reported-by: NLutz Euler <lutz.euler@freenet.de>
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      2cac13e4
  17. 27 1月, 2012 1 次提交
    • M
      Btrfs: fix enospc error caused by wrong checks of the chunk · 9e622d6b
      Miao Xie 提交于
      When we did sysbench test for inline files, enospc error happened easily though
      there was lots of free disk space which could be allocated for new chunks.
      
      Reproduce steps:
       # mkfs.btrfs -b $((2 * 1024 * 1024 * 1024)) <test partition>
       # mount <test partition> /mnt
       # ulimit -n 102400
       # cd /mnt
       # sysbench --num-threads=1 --test=fileio --file-num=81920 \
       > --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
       > --file-test-mode=seqwr prepare
       # sysbench --num-threads=1 --test=fileio --file-num=81920 \
       > --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
       > --file-test-mode=seqwr run
       <soon later, BUG_ON() was triggered by enospc error>
      
      The reason of this bug is:
      Now, we can reserve space which is larger than the free space in the chunks if
      we have enough free disk space which can be used for new chunks. By this way,
      the space allocator should allocate a new chunk by force if there is no free
      space in the free space cache. But there are two wrong checks which break this
      operation.
      
      One is
      	if (ret == -ENOSPC && num_bytes > min_alloc_size)
      in btrfs_reserve_extent(), it is wrong, we should try to allocate a new chunk
      even we fail to allocate free space by minimum allocable size.
      
      The other is
      	if (space_info->force_alloc)
      		force = space_info->force_alloc;
      in do_chunk_alloc(). It makes the allocator ignore CHUNK_ALLOC_FORCE If someone
      sets ->force_alloc to CHUNK_ALLOC_LIMITED, and makes the enospc error happen.
      
      Fix these two wrong checks. Especially the second one, we fix it by changing
      the value of CHUNK_ALLOC_LIMITED and CHUNK_ALLOC_FORCE, and make
      CHUNK_ALLOC_FORCE greater than CHUNK_ALLOC_LIMITED since CHUNK_ALLOC_FORCE has
      higher priority. And if the value which is passed in by the caller is greater
      than ->force_alloc, use the passed value.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      9e622d6b
  18. 17 1月, 2012 1 次提交