1. 25 3月, 2009 1 次提交
    • C
      Btrfs: do extent allocation and reference count updates in the background · 56bec294
      Chris Mason 提交于
      The extent allocation tree maintains a reference count and full
      back reference information for every extent allocated in the
      filesystem.  For subvolume and snapshot trees, every time
      a block goes through COW, the new copy of the block adds a reference
      on every block it points to.
      
      If a btree node points to 150 leaves, then the COW code needs to go
      and add backrefs on 150 different extents, which might be spread all
      over the extent allocation tree.
      
      These updates currently happen during btrfs_cow_block, and most COWs
      happen during btrfs_search_slot.  btrfs_search_slot has locks held
      on both the parent and the node we are COWing, and so we really want
      to avoid IO during the COW if we can.
      
      This commit adds an rbtree of pending reference count updates and extent
      allocations.  The tree is ordered by byte number of the extent and byte number
      of the parent for the back reference.  The tree allows us to:
      
      1) Modify back references in something close to disk order, reducing seeks
      2) Significantly reduce the number of modifications made as block pointers
      are balanced around
      3) Do all of the extent insertion and back reference modifications outside
      of the performance critical btrfs_search_slot code.
      
      #3 has the added benefit of greatly reducing the btrfs stack footprint.
      The extent allocation tree modifications are done without the deep
      (and somewhat recursive) call chains used in the past.
      
      These delayed back reference updates must be done before the transaction
      commits, and so the rbtree is tied to the transaction.  Throttling is
      implemented to help keep the queue of backrefs at a reasonable size.
      
      Since there was a similar mechanism in place for the extent tree
      extents, that is removed and replaced by the delayed reference tree.
      
      Yan Zheng <yan.zheng@oracle.com> helped review and fixup this code.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      56bec294
  2. 06 1月, 2009 1 次提交
  3. 12 12月, 2008 1 次提交
    • Y
      Btrfs: fix leaking block group on balance · d2fb3437
      Yan Zheng 提交于
      The block group structs are referenced in many different
      places, and it's not safe to free while balancing.  So, those block
      group structs were simply leaked instead.
      
      This patch replaces the block group pointer in the inode with the starting byte
      offset of the block group and adds reference counting to the block group
      struct.
      Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
      d2fb3437
  4. 18 11月, 2008 1 次提交
    • C
      Btrfs: Allow subvolumes and snapshots anywhere in the directory tree · 3de4586c
      Chris Mason 提交于
      Before, all snapshots and subvolumes lived in a single flat directory.  This
      was awkward and confusing because the single flat directory was only writable
      with the ioctls.
      
      This commit changes the ioctls to create subvols and snapshots at any
      point in the directory tree.  This requires making separate ioctls for
      snapshot and subvol creation instead of a combining them into one.
      
      The subvol ioctl does:
      
      btrfsctl -S subvol_name parent_dir
      
      After the ioctl is done subvol_name lives inside parent_dir.
      
      The snapshot ioctl does:
      
      btrfsctl -s path_for_snapshot root_to_snapshot
      
      path_for_snapshot can be an absolute or relative path.  btrfsctl breaks it up
      into directory and basename components.
      
      root_to_snapshot can be any file or directory in the FS.  The snapshot
      is taken of the entire root where that file lives.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      3de4586c
  5. 25 9月, 2008 16 次提交
    • C
      Btrfs: Record dirty pages tree-log pages in an extent_io tree · d0c803c4
      Chris Mason 提交于
      This is the same way the transaction code makes sure that all the
      other tree blocks are safely on disk.  There's an extent_io tree
      for each root, and any blocks allocated to the tree logs are
      recorded in that tree.
      
      At tree-log sync, the extent_io tree is walked to flush down the
      dirty pages and wait for them.
      
      The main benefit is less time spent walking the tree log and skipping
      clean pages, and getting sequential IO down to the drive.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      d0c803c4
    • C
      Btrfs: Add a write ahead tree log to optimize synchronous operations · e02119d5
      Chris Mason 提交于
      File syncs and directory syncs are optimized by copying their
      items into a special (copy-on-write) log tree.  There is one log tree per
      subvolume and the btrfs super block points to a tree of log tree roots.
      
      After a crash, items are copied out of the log tree and back into the
      subvolume.  See tree-log.c for all the details.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      e02119d5
    • Y
      Btrfs: Various small fixes. · b48652c1
      Yan Zheng 提交于
      This trivial patch contains two locking fixes and a off by one fix.
      
      ---
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      b48652c1
    • S
      Btrfs: fix ioctl-initiated transactions vs wait_current_trans() · 9ca9ee09
      Sage Weil 提交于
      Commit 597:466b27332893 (btrfs_start_transaction: wait for commits in
      progress) breaks the transaction start/stop ioctls by making
      btrfs_start_transaction conditionally wait for the next transaction to
      start.  If an application artificially is holding a transaction open,
      things deadlock.
      
      This workaround maintains a count of open ioctl-initiated transactions in
      fs_info, and avoids wait_current_trans() if any are currently open (in
      start_transaction() and btrfs_throttle()).  The start transaction ioctl
      uses a new btrfs_start_ioctl_transaction() that _does_ call
      wait_current_trans(), effectively pushing the join/wait decision to the
      outer ioctl-initiated transaction.
      
      This more or less neuters btrfs_throttle() when ioctl-initiated
      transactions are in use, but that seems like a pretty fundamental
      consequence of wrapping lots of write()'s in a transaction.  Btrfs has no
      way to tell if the application considers a given operation as part of it's
      transaction.
      
      Obviously, if the transaction start/stop ioctls aren't being used, there
      is no effect on current behavior.
      Signed-off-by: NSage Weil <sage@newdream.net>
      ---
       ctree.h       |    1 +
       ioctl.c       |   12 +++++++++++-
       transaction.c |   18 +++++++++++++-----
       transaction.h |    2 ++
       4 files changed, 27 insertions(+), 6 deletions(-)
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      9ca9ee09
    • Y
      Btrfs: Update and fix mount -o nodatacow · f321e491
      Yan Zheng 提交于
      To check whether a given file extent is referenced by multiple snapshots, the
      checker walks down the fs tree through dead root and checks all tree blocks in
      the path.
      
      We can easily detect whether a given tree block is directly referenced by other
      snapshot. We can also detect any indirect reference from other snapshot by
      checking reference's generation. The checker can always detect multiple
      references, but can't reliably detect cases of single reference. So btrfs may
      do file data cow even there is only one reference.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      f321e491
    • C
      Btrfs: Throttle operations if the reference cache gets too large · ab78c84d
      Chris Mason 提交于
      A large reference cache is directly related to a lot of work pending
      for the cleaner thread.  This throttles back new operations based on
      the size of the reference cache so the cleaner thread will be able to keep
      up.
      
      Overall, this actually makes the FS faster because the cleaner thread will
      be more likely to find things in cache.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      ab78c84d
    • C
      btrfs_start_transaction: wait for commits in progress to finish · f9295749
      Chris Mason 提交于
      btrfs_commit_transaction has to loop waiting for any writers in the
      transaction to finish before it can proceed.  btrfs_start_transaction
      should be polite and not join a transaction that is in the process
      of being finished off.
      
      There are a few places that can't wait, basically the ones doing IO that
      might be needed to finish the transaction.  For them, btrfs_join_transaction
      is added.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      f9295749
    • C
      Btrfs: New data=ordered implementation · e6dcd2dc
      Chris Mason 提交于
      The old data=ordered code would force commit to wait until
      all the data extents from the transaction were fully on disk.  This
      introduced large latencies into the commit and stalled new writers
      in the transaction for a long time.
      
      The new code changes the way data allocations and extents work:
      
      * When delayed allocation is filled, data extents are reserved, and
        the extent bit EXTENT_ORDERED is set on the entire range of the extent.
        A struct btrfs_ordered_extent is allocated an inserted into a per-inode
        rbtree to track the pending extents.
      
      * As each page is written EXTENT_ORDERED is cleared on the bytes corresponding
        to that page.
      
      * When all of the bytes corresponding to a single struct btrfs_ordered_extent
        are written, The previously reserved extent is inserted into the FS
        btree and into the extent allocation trees.  The checksums for the file
        data are also updated.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      e6dcd2dc
    • C
      Btrfs: Online btree defragmentation fixes · 3f157a2f
      Chris Mason 提交于
      The btree defragger wasn't making forward progress because the new key wasn't
      being saved by the btrfs_search_forward function.
      
      This also disables the automatic btree defrag, it wasn't scaling well to
      huge filesystems.  The auto-defrag needs to be done differently.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      3f157a2f
    • C
      Btrfs: Replace the transaction work queue with kthreads · a74a4b97
      Chris Mason 提交于
      This creates one kthread for commits and one kthread for
      deleting old snapshots.  All the work queues are removed.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      a74a4b97
    • C
      Add btrfs_end_transaction_throttle to force writers to wait for pending commits · 89ce8a63
      Chris Mason 提交于
      The existing throttle mechanism was often not sufficient to prevent
      new writers from coming in and making a given transaction run forever.
      This adds an explicit wait at the end of most operations so they will
      allow the current transaction to close.
      
      There is no wait inside file_write, inode updates, or cow filling, all which
      have different deadlock possibilities.
      
      This is a temporary measure until better asynchronous commit support is
      added.  This code leads to stalls as it waits for data=ordered
      writeback, and it really needs to be fixed.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      89ce8a63
    • C
      Btrfs: Split the extent_map code into two parts · d1310b2e
      Chris Mason 提交于
      There is now extent_map for mapping offsets in the file to disk and
      extent_io for state tracking, IO submission and extent_bufers.
      
      The new extent_map code shifts from [start,end] pairs to [start,len], and
      pushes the locking out into the caller.  This allows a few performance
      optimizations and is easier to use.
      
      A number of extent_map usage bugs were fixed, mostly with failing
      to remove extent_map entries when changing the file.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      d1310b2e
    • C
      Btrfs: Move snapshot creation to commit time · 3063d29f
      Chris Mason 提交于
      It is very difficult to create a consistent snapshot of the btree when
      other writers may update the btree before the commit is done.
      
      This changes the snapshot creation to happen during the commit, while
      no other updates are possible.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      3063d29f
    • C
      Btrfs: Add data=ordered support · dc17ff8f
      Chris Mason 提交于
      This forces file data extents down the disk along with the metadata that
      references them.  The current implementation is fairly simple, and just
      writes out all of the dirty pages in an inode before the commit.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      dc17ff8f
    • C
      Btrfs: Back port to 2.6.18-el kernels · 6da6abae
      Chris Mason 提交于
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      6da6abae
    • C
      5f39d397
  6. 11 9月, 2007 1 次提交
    • C
      Btrfs: Find and remove dead roots the first time a root is loaded. · 5ce14bbc
      Chris Mason 提交于
      Dead roots are trees left over after a crash, and they were either in the
      process of being removed or were waiting to be removed when the box crashed.
      Before, a search of the entire tree of root pointers was done on mount
      looking for dead roots.  Now, the search is done the first time we load
      a root.
      
      This makes mount faster when there are a large number of snapshots, and it
      enables the block accounting code to properly update the block counts on
      the latest root as old versions of the root are reaped after a crash.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      5ce14bbc
  7. 11 8月, 2007 2 次提交
  8. 09 8月, 2007 1 次提交
  9. 08 8月, 2007 1 次提交
  10. 23 6月, 2007 1 次提交
  11. 12 6月, 2007 1 次提交
  12. 09 6月, 2007 2 次提交
  13. 01 5月, 2007 1 次提交
  14. 28 4月, 2007 1 次提交
  15. 20 4月, 2007 1 次提交
  16. 02 4月, 2007 1 次提交
  17. 23 3月, 2007 2 次提交
  18. 17 3月, 2007 1 次提交