1. 01 9月, 2013 6 次提交
  2. 20 7月, 2013 3 次提交
  3. 02 7月, 2013 4 次提交
    • J
      Btrfs: make the chunk allocator completely tree lockless · 6df9a95e
      Josef Bacik 提交于
      When adjusting the enospc rules for relocation I ran into a deadlock because we
      were relocating the only system chunk and that forced us to try and allocate a
      new system chunk while holding locks in the chunk tree, which caused us to
      deadlock.  To fix this I've moved all of the dev extent addition and chunk
      addition out to the delayed chunk completion stuff.  We still keep the in-memory
      stuff which makes sure everything is consistent.
      
      One change I had to make was to search the commit root of the device tree to
      find a free dev extent, and hold onto any chunk em's that we allocated in that
      transaction so we do not allocate the same dev extent twice.  This has the side
      effect of fixing a bug with balance that has been there ever since balance
      existed.  Basically you can free a block group and it's dev extent and then
      immediately allocate that dev extent for a new block group and write stuff to
      that dev extent, all within the same transaction.  So if you happen to crash
      during a balance you could come back to a completely broken file system.  This
      patch should keep these sort of things from happening in the future since we
      won't be able to allocate free'd dev extents until after the transaction
      commits.  This has passed all of the xfstests and my super annoying stress test
      followed by a balance.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      6df9a95e
    • J
      Btrfs: check if we can nocow if we don't have data space · 7ee9e440
      Josef Bacik 提交于
      We always just try and reserve data space when we write, but if we are out of
      space but have prealloc'ed extents we should still successfully write.  This
      patch will try and see if we can write to prealloc'ed space and if we can go
      ahead and allow the write to continue.  With this patch we now pass xfstests
      generic/274.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      7ee9e440
    • J
      Btrfs: stop using try_to_writeback_inodes_sb_nr to flush delalloc · 925a6efb
      Josef Bacik 提交于
      try_to_writeback_inodes_sb_nr returns 1 if writeback is already underway, which
      is completely fraking useless for us as we need to make sure pages are actually
      written before we go and check if there are ordered extents.  So replace this
      with an open coding of try_to_writeback_inodes_sb_nr minus the writeback
      underway check so that we are sure to actually have flushed some dirty pages out
      and will have ordered extents to use.  With this patch xfstests generic/273 now
      passes.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      925a6efb
    • J
      Btrfs: use a percpu to keep track of possibly pinned bytes · b150a4f1
      Josef Bacik 提交于
      There are all of these checks in the ENOSPC code to see if committing the
      transaction would free up enough space to make the allocation.  This is because
      early on we just committed the transaction and hoped and prayed, which resulted
      in cases where it took _forever_ to get an ENOSPC when we really were out of
      space.  So we check space_info->bytes_pinned, except this isn't completely true
      because it doesn't account for space we may free but are stuck in delayed refs.
      So tests like xfstests 226 would fail because we wouldn't commit the transaction
      to free up the data space.  So instead add a percpu counter that will be a
      little fuzzier, it will add bytes as soon as we try to free up the space, and
      remove any space it doesn't actually free up when we get around to doing the
      actual free.  We then 0 out this counter every transaction period so we have a
      better idea of how much space we will actually free up by committing this
      transaction.  With this patch we now pass xfstests 226.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      b150a4f1
  4. 01 7月, 2013 2 次提交
    • J
      Btrfs: fix transaction throttling for delayed refs · 1be41b78
      Josef Bacik 提交于
      Dave has this fs_mark script that can make btrfs abort with sufficient amount of
      ram.  This is because with more ram we can keep more dirty metadata in cache
      which in a round about way makes for many more pending delayed refs.  What
      happens is we end up not throttling the transaction enough so when we go to
      commit the transaction when we've completely filled the file system we'll
      abort() because we use all of the space in the global reserve and we still have
      delayed refs to run.  To fix this we need to make the delayed ref flushing and
      the transaction throttling dependant upon the number of delayed refs that we
      have instead of how much reserved space is left in the global reserve.  With
      this patch we not only stop aborting transactions but we also get a smoother run
      speed with fs_mark and it makes us about 10% faster.  Thanks,
      Reported-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      1be41b78
    • J
      Btrfs: wake up delayed ref flushing waiters on abort · f971fe29
      Josef Bacik 提交于
      I hit a deadlock because we aborted when flushing delayed refs but didn't wake
      any of the other flushers up and so everybody was just sleeping forever.  This
      should fix the problem.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      f971fe29
  5. 14 6月, 2013 7 次提交
  6. 18 5月, 2013 6 次提交
  7. 07 5月, 2013 12 次提交
    • D
      btrfs: fix misleading variable name for flags · b6919a58
      David Sterba 提交于
      The variable was named 'data' in btrfs_reserve_extent and that's the
      only function that actually uses it to let btrfs_get_alloc_profile know
      what profile we want. Then it's passed down as u64 flags.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      b6919a58
    • E
      btrfs: make static code static & remove dead code · 48a3b636
      Eric Sandeen 提交于
      Big patch, but all it does is add statics to functions which
      are in fact static, then remove the associated dead-code fallout.
      
      removed functions:
      
      btrfs_iref_to_path()
      __btrfs_lookup_delayed_deletion_item()
      __btrfs_search_delayed_insertion_item()
      __btrfs_search_delayed_deletion_item()
      find_eb_for_page()
      btrfs_find_block_group()
      range_straddles_pages()
      extent_range_uptodate()
      btrfs_file_extent_length()
      btrfs_scrub_cancel_devid()
      btrfs_start_transaction_lflush()
      
      btrfs_print_tree() is left because it is used for debugging.
      btrfs_start_transaction_lflush() and btrfs_reada_detach() are
      left for symmetry.
      
      ulist.c functions are left, another patch will take care of those.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      48a3b636
    • J
      Btrfs: deal with free space cache errors while replaying log · b50c6e25
      Josef Bacik 提交于
      So everybody who got hit by my fsync bug will still continue to hit this
      BUG_ON() in the free space cache, which is pretty heavy handed.  So I took a
      file system that had this bug and fixed up all the BUG_ON()'s and leaks that
      popped up when I tried to mount a broken file system like this.  With this patch
      we just fail to mount instead of panicing.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      b50c6e25
    • M
      Btrfs: allocate new chunks if the space is not enough for global rsv · 3c76cd84
      Miao Xie 提交于
      When running the 208th of xfstests, the fs returned the enospc
      error when there was lots of free space in the disk.
      
      By bisect debug, we found it was introduced by commit 96f1bb57.
      This commit makes the space check for the global reservation in
      can_overcommit() be inconsistent with should_alloc_chunk().
      can_overcommit() requires that the free space is 2 times the size
      of the global reservation, or we can't do overcommit. And instead,
      we need reclaim some reserved space, and if we still don't have
      enough free space, we need allocate a new chunk. But unfortunately,
      should_alloc_chunk() just requires that the free space is 1 time
      the size of the global reservation, that is we would not try to
      allocate a new chunk if the free space size is in the middle of
      these two requires, and just return the enospc error. Fix it.
      
      Cc: Jim Schutt <jaschut@sandia.gov>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      3c76cd84
    • J
      Btrfs: separate sequence numbers for delayed ref tracking and tree mod log · fc36ed7e
      Jan Schmidt 提交于
      Sequence numbers for delayed refs have been introduced in the first version
      of the qgroup patch set. To solve the problem of find_all_roots on a busy
      file system, the tree mod log was introduced. The sequence numbers for that
      were simply shared between those two users.
      
      However, at one point in qgroup's quota accounting, there's a statement
      accessing the previous sequence number, that's still just doing (seq - 1)
      just as it would have to in the very first version.
      
      To satisfy that requirement, this patch makes the sequence number counter 64
      bit and splits it into a major part (used for qgroup sequence number
      counting) and a minor part (incremented for each tree modification in the
      log). This enables us to go exactly one major step backwards, as required
      for qgroups, while still incrementing the sequence counter for tree mod log
      insertions to keep track of their order. Keeping them in a single variable
      means there's no need to change all the code dealing with comparisons of two
      sequence numbers.
      
      The sequence number is reset to 0 on commit (not new in this patch), which
      ensures we won't overflow the two 32 bit counters.
      
      Without this fix, the qgroup tracking can occasionally go wrong and WARN_ONs
      from the tree mod log code may happen.
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      fc36ed7e
    • J
      Btrfs: don't panic if we're trying to drop too many refs · 32b02538
      Josef Bacik 提交于
      This is just obnoxious.  Just print a message, abort the transaction, and return
      an error.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      32b02538
    • J
      Btrfs: fix all callers of read_tree_block · 416bc658
      Josef Bacik 提交于
      We kept leaking extent buffers when mounting a broken file system and it turns
      out it's because not everybody uses read_tree_block properly.  You need to check
      and make sure the extent_buffer is uptodate before you use it.  This patch fixes
      everybody who calls read_tree_block directly to make sure they check that it is
      uptodate and free it and return an error if it is not.  With this we no longer
      leak EB's when things go horribly wrong.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      416bc658
    • J
      Btrfs: only exclude supers in the range of our block group · 51bf5f0b
      Josef Bacik 提交于
      If we fail to load block groups halfway through we can leave extent_state's on
      the excluded tree.  This is because we just lookup the supers and add them to
      the excluded tree regardless of which block group we are looking at currently.
      This is a problem because we remove the excluded extents for the range of the
      block group only, so if we don't ever load a block group for one of the excluded
      extents we won't ever free it.  This fixes the problem by only adding excluded
      extents if it falls in the block group range we care about.  With this patch
      we're no longer leaking space when we fail to read all of the block groups.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      51bf5f0b
    • J
      Btrfs: fix possible infinite loop in slow caching · 0a3896d0
      Josef Bacik 提交于
      So I noticed there is an infinite loop in the slow caching code.  If we return 1
      when we hit the end of the tree, so we could end up caching the last block group
      the slow way and suddenly we're looping forever because we just keep
      re-searching and trying again.  Fix this by only doing btrfs_next_leaf() if we
      don't need_resched().  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      0a3896d0
    • T
      Btrfs: cleanup of function where btrfs_extend_item() is called · fd279fae
      Tsutomu Itoh 提交于
      Argument 'trans' became unnecessary from setup_inline_extent_backref()
      that called btrfs_extend_item().
      Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      fd279fae
    • T
      Btrfs: remove unused argument of btrfs_extend_item() · 4b90c680
      Tsutomu Itoh 提交于
      Argument 'trans' is not used in btrfs_extend_item().
      Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      4b90c680
    • T
      Btrfs: cleanup of function where fixup_low_keys() is called · afe5fea7
      Tsutomu Itoh 提交于
      If argument 'trans' is unnecessary in the function where
      fixup_low_keys() is called, 'trans' is deleted.
      Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      afe5fea7