1. 25 7月, 2012 1 次提交
  2. 27 6月, 2012 4 次提交
    • J
      Btrfs: resolve tree mod log locking issue in btrfs_next_leaf · d42244a0
      Jan Schmidt 提交于
      With the tree mod log, we may end up with two roots (the current root and a
      rewinded version of it) both pointing to two leaves, l1 and l2, of which l2
      had already been cow-ed in the current transaction. If we don't rewind any
      tree blocks, we cannot have two roots both pointing to an already cowed tree
      block.
      
      Now there is btrfs_next_leaf, which has a leaf locked and wants a lock on
      the next (right) leaf. And there is push_leaf_left, which has a (cowed!)
      leaf locked and wants a lock on the previous (left) leaf.
      
      In order to solve this dead lock situation, we use try_lock in
      btrfs_next_leaf (only in case it's called with a tree mod log time_seq
      paramter) and if we fail to get a lock on the next leaf, we give up our lock
      on the current leaf and retry from the very beginning.
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      d42244a0
    • J
      Btrfs: fix tree mod log rewind of ADD operations · 19956c7e
      Jan Schmidt 提交于
      When a MOD_LOG_KEY_ADD operation is rewinded, we remove the key from the
      tree block. If its not the last key, removal involves a move operation.
      This move operation was explicitly done before this commit.
      
      However, at insertion time, there's a move operation before the actual
      addition to make room for the new key, which is recorded in the tree mod
      log as well. This means, we must drop the move operation when rewinding the
      add operation, because the next operation we'll be rewinding will be the
      corresponding MOD_LOG_MOVE_KEYS operation.
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      19956c7e
    • J
      Btrfs: always put insert_ptr modifications into the tree mod log · c3e06965
      Jan Schmidt 提交于
      Several callers of insert_ptr set the tree_mod_log parameter to 0 to avoid
      addition to the tree mod log. In fact, we need all of those operations. This
      commit simply removes the additional parameter and makes addition to the
      tree mod log unconditional.
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      c3e06965
    • J
      Btrfs: fix tree mod log for root replacements at leaf level · 28da9fb4
      Jan Schmidt 提交于
      For the tree mod log, we don't log any operations at leaf level. If the root
      is at the leaf level (i.e. the tree consists only of the root), then
      __tree_mod_log_oldest_root will find a ROOT_REPLACE operation in the log
      (because we always log that one no matter which level), but no other
      operations.
      
      With this patch __tree_mod_log_oldest_root exits cleanly instead of
      BUGging in this situation. get_old_root checks if its really a root at leaf
      level in case we don't have any operations and WARNs if this assumption
      breaks.
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      28da9fb4
  3. 16 6月, 2012 1 次提交
  4. 15 6月, 2012 4 次提交
    • J
      Btrfs: fix race in tree mod log addition · 3310c36e
      Jan Schmidt 提交于
      When adding to the tree modification log, we grab two locks at different
      stages. We must not drop the outer lock until we're done with section
      protected by the inner lock. This moves the unlock call for the outer lock
      to the appropriate position.
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      3310c36e
    • J
      Btrfs: add btrfs_next_old_leaf · 3d7806ec
      Jan Schmidt 提交于
      To make sense of the tree mod log, the backref walker not only needs
      btrfs_search_old_slot, but it also called btrfs_next_leaf, which in turn was
      calling btrfs_search_slot. This obviously didn't give the correct result.
      
      This commit adds btrfs_next_old_leaf, a drop-in replacement for
      btrfs_next_leaf with a time_seq parameter. If it is zero, it behaves exactly
      like btrfs_next_leaf. If it is non-zero, it will use btrfs_search_old_slot
      with this time_seq parameter.
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      3d7806ec
    • J
      Btrfs: fix return value for __tree_mod_log_oldest_root · a95236d9
      Jan Schmidt 提交于
      In __tree_mod_log_oldest_root() we must return the found operation even if
      it's not a ROOT_REPLACE operation. Otherwise, the caller assumes that there
      are no operations to be rewinded and returns immediately.
      
      The code in the caller is modified to improve readability.
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      a95236d9
    • J
      Btrfs: use btrfs_read_lock_root_node in get_old_root · 8ba97a15
      Jan Schmidt 提交于
      get_old_root could race with root node updates because we weren't locking
      the node early enough. Use btrfs_read_lock_root_node to grab the root locked
      in the very beginning and release the lock as soon as possible (just like
      btrfs_search_slot does).
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      8ba97a15
  5. 04 6月, 2012 1 次提交
  6. 01 6月, 2012 4 次提交
  7. 30 5月, 2012 5 次提交
  8. 26 5月, 2012 1 次提交
  9. 11 5月, 2012 1 次提交
  10. 06 5月, 2012 1 次提交
    • C
      Btrfs: avoid sleeping in verify_parent_transid while atomic · b9fab919
      Chris Mason 提交于
      verify_parent_transid needs to lock the extent range to make
      sure no IO is underway, and so it can safely clear the
      uptodate bits if our checks fail.
      
      But, a few callers are using it with spinlocks held.  Most
      of the time, the generation numbers are going to match, and
      we don't want to switch to a blocking lock just for the error
      case.  This adds an atomic flag to verify_parent_transid,
      and changes it to return EAGAIN if it needs to block to
      properly verifiy things.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      b9fab919
  11. 05 5月, 2012 1 次提交
  12. 27 3月, 2012 3 次提交
    • C
      Btrfs: adjust the write_lock_level as we unlock · f7c79f30
      Chris Mason 提交于
      btrfs_search_slot sometimes needs write locks on high levels of
      the tree.  It remembers the highest level that needs a write lock
      and will use that for all future searches through the tree in a given
      call.
      
      But, very often we'll just cow the top level or the level below and we
      won't really need write locks on the root again after that.  This patch
      changes things to adjust the write lock requirement as it unlocks
      levels.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      f7c79f30
    • C
      Btrfs: add the ability to cache a pointer into the eb · cfed81a0
      Chris Mason 提交于
      This cuts down on the CPU time used by map_private_extent_buffer
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      cfed81a0
    • J
      Btrfs: introduce free_extent_buffer_stale · 3083ee2e
      Josef Bacik 提交于
      Because btrfs cow's we can end up with extent buffers that are no longer
      necessary just sitting around in memory.  So instead of evicting these pages, we
      could end up evicting things we actually care about.  Thus we have
      free_extent_buffer_stale for use when we are freeing tree blocks.  This will
      make it so that the ref for the eb being in the radix tree is dropped as soon as
      possible and then is freed when the refcount hits 0 instead of waiting to be
      released by releasepage.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      3083ee2e
  13. 22 3月, 2012 6 次提交
  14. 22 12月, 2011 1 次提交
    • A
      Btrfs: mark delayed refs as for cow · 66d7e7f0
      Arne Jansen 提交于
      Add a for_cow parameter to add_delayed_*_ref and pass the appropriate value
      from every call site. The for_cow parameter will later on be used to
      determine if a ref will change anything with respect to qgroups.
      
      Delayed refs coming from relocation are always counted as for_cow, as they
      don't change subvol quota.
      
      Also pass in the fs_info for later use.
      
      btrfs_find_all_roots() will use this as an optimization, as changes that are
      for_cow will not change anything with respect to which root points to a
      certain leaf. Thus, we don't need to add the current sequence number to
      those delayed refs.
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      66d7e7f0
  15. 15 11月, 2011 1 次提交
    • L
      Btrfs: fix tree corruption after multi-thread snapshots and inode_cache flush · f1ebcc74
      Liu Bo 提交于
      The btrfs snapshotting code requires that once a root has been
      snapshotted, we don't change it during a commit.
      
      But there are two cases to lead to tree corruptions:
      
      1) multi-thread snapshots can commit serveral snapshots in a transaction,
         and this may change the src root when processing the following pending
         snapshots, which lead to the former snapshots corruptions;
      
      2) the free inode cache was changing the roots when it root the cache,
         which lead to corruptions.
      
      This fixes things by making sure we force COW the block after we create a
      snapshot during commiting a transaction, then any changes to the roots
      will result in COW, and we get all the fs roots and snapshot roots to be
      consistent.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      f1ebcc74
  16. 21 10月, 2011 1 次提交
  17. 28 7月, 2011 3 次提交
    • C
      Btrfs: remove lockdep magic from btrfs_next_leaf · 31533fb2
      Chris Mason 提交于
      Before the reader/writer locks, btrfs_next_leaf needed to keep
      the path blocking to avoid making lockdep upset.
      
      Now that btrfs_next_leaf only takes read locks, this isn't required.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      31533fb2
    • C
      Btrfs: switch the btrfs tree locks to reader/writer · bd681513
      Chris Mason 提交于
      The btrfs metadata btree is the source of significant
      lock contention, especially in the root node.   This
      commit changes our locking to use a reader/writer
      lock.
      
      The lock is built on top of rw spinlocks, and it
      extends the lock tracking to remember if we have a
      read lock or a write lock when we go to blocking.  Atomics
      count the number of blocking readers or writers at any
      given time.
      
      It removes all of the adaptive spinning from the old code
      and uses only the spinning/blocking hints inside of btrfs
      to decide when it should continue spinning.
      
      In read heavy workloads this is dramatically faster.  In write
      heavy workloads we're still faster because of less contention
      on the root node lock.
      
      We suffer slightly in dbench because we schedule more often
      during write locks, but all other benchmarks so far are improved.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      bd681513
    • C
      Btrfs: stop using highmem for extent_buffers · a6591715
      Chris Mason 提交于
      The extent_buffers have a very complex interface where
      we use HIGHMEM for metadata and try to cache a kmap mapping
      to access the memory.
      
      The next commit adds reader/writer locks, and concurrent use
      of this kmap cache would make it even more complex.
      
      This commit drops the ability to use HIGHMEM with extent buffers,
      and rips out all of the related code.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      a6591715
  18. 10 6月, 2011 1 次提交
    • J
      Btrfs: don't map extent buffer if path->skip_locking is set · ad3e34bb
      Josef Bacik 提交于
      Arne's scrub stuff exposed a problem with mapping the extent buffer in
      reada_for_search.  He searches the commit root with multiple threads and with
      skip_locking set, so we can race and overwrite node->map_token since node isn't
      locked.  So fix this so that we only map the extent buffer if we don't already
      have a map_token and skip_locking isn't set.  Without this patch scrub would
      panic almost immediately, with the patch it doesn't panic anymore.  Thanks,
      Reported-by: NArne Jansen <sensille@gmx.net>
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      ad3e34bb