- 26 9月, 2008 3 次提交
-
-
由 Zheng Yan 提交于
Btrfs has a cache of reference counts in leaves, allowing it to avoid reading tree leaves while deleting snapshots. To reduce contention with multiple subvolumes, this cache is private to each subvolume. This patch adds shared reference cache support. The new space balancing code plays with multiple subvols at the same time, So the old per-subvol reference cache is not well suited. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Zheng Yan 提交于
* Reserved extent accounting: reserved extents have been allocated in the rbtrees that track free space but have not been allocated on disk. They were never properly accounted for in the past, making it hard to know how much space was really free. * btrfs_find_block_group used to return NULL for block groups that had been removed by the space balancing code. This made it hard to account for space during the final stages of a balance run. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Btrfs had compatibility code for kernels back to 2.6.18. These have been removed, and will be maintained in a separate backport git tree from now on. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 25 9月, 2008 37 次提交
-
-
由 Zheng Yan 提交于
This patch makes the back reference system to explicit record the location of parent node for all types of extents. The location of parent node is placed into the offset field of backref key. Every time a tree block is balanced, the back references for the affected lower level extents are updated. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
1) replace the per fs_info extent_io_tree that tracked free space with two rb-trees per block group to track free space areas via offset and size. The reason to do this is because most allocations come with a hint byte where to start, so we can usually find a chunk of free space at that hint byte to satisfy the allocation and get good space packing. If we cannot find free space at or after the given offset we fall back on looking for a chunk of the given size as close to that given offset as possible. When we fall back on the size search we also try to find a slot as close to the size we want as possible, to avoid breaking small chunks off of huge areas if possible. 2) remove the extent_io_tree that tracked the block group cache from fs_info and replaced it with an rb-tree thats tracks block group cache via offset. also added a per space_info list that tracks the block group cache for the particular space so we can lookup related block groups easily. 3) cleaned up the allocation code to make it a little easier to read and a little less complicated. Basically there are 3 steps, first look from our provided hint. If we couldn't find from that given hint, start back at our original search start and look for space from there. If that fails try to allocate space if we can and start looking again. If not we're screwed and need to start over again. 4) small fixes. there were some issues in volumes.c where we wouldn't allocate the rest of the disk. fixed cow_file_range to actually pass the alloc_hint, which has helped a good bit in making the fs_mark test I run have semi-normal results as we run out of space. Generally with data allocations we don't track where we last allocated from, so everytime we did a data allocation we'd search through every block group that we have looking for free space. Now searching a block group with no free space isn't terribly time consuming, it was causing a slight degradation as we got more data block groups. The alloc_hint has fixed this slight degredation and made things semi-normal. There is still one nagging problem I'm working on where we will get ENOSPC when there is definitely plenty of space. This only happens with metadata allocations, and only when we are almost full. So you generally hit the 85% mark first, but sometimes you'll hit the BUG before you hit the 85% wall. I'm still tracking it down, but until then this seems to be pretty stable and make a significant performance gain. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
This is the same way the transaction code makes sure that all the other tree blocks are safely on disk. There's an extent_io tree for each root, and any blocks allocated to the tree logs are recorded in that tree. At tree-log sync, the extent_io tree is walked to flush down the dirty pages and wait for them. The main benefit is less time spent walking the tree log and skipping clean pages, and getting sequential IO down to the drive. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Zheng Yan 提交于
Orphan items use BTRFS_ORPHAN_OBJECTID (-5UUL) as key objectid. This affects the find free objectid functions, inode objectid can easily overflow after orphan file cleanup. --- Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Christoph Hellwig 提交于
btrfs_ilookup is unused, which is good because a normal filesystem should never have to use ilookup anyway. Remove it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
File syncs and directory syncs are optimized by copying their items into a special (copy-on-write) log tree. There is one log tree per subvolume and the btrfs super block points to a tree of log tree roots. After a crash, items are copied out of the log tree and back into the subvolume. See tree-log.c for all the details. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Balaji Rao 提交于
Date: Mon, 21 Jul 2008 02:01:04 +0530 This patch introduces a btrfs_iget helper to be used in NFS support. Signed-off-by: NBalaji Rao <balajirrao@gmail.com> Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Before this change, btrfs would use a bdi congestion function to make sure there weren't too many pending async checksum work items. This change makes the process creating async work items wait instead, leading to fewer congestion returns from the bdi. This improves pdflush background_writeout scanning. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Eric Sandeen 提交于
Newer RHEL5 kernels define both ClearPageFSMisc and ClearPageChecked, so test for both before redefining. Signed-off-by: NEric Sandeen <sandeen@redhat.com> --- Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Yan Zheng 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Sage Weil 提交于
Commit 597:466b27332893 (btrfs_start_transaction: wait for commits in progress) breaks the transaction start/stop ioctls by making btrfs_start_transaction conditionally wait for the next transaction to start. If an application artificially is holding a transaction open, things deadlock. This workaround maintains a count of open ioctl-initiated transactions in fs_info, and avoids wait_current_trans() if any are currently open (in start_transaction() and btrfs_throttle()). The start transaction ioctl uses a new btrfs_start_ioctl_transaction() that _does_ call wait_current_trans(), effectively pushing the join/wait decision to the outer ioctl-initiated transaction. This more or less neuters btrfs_throttle() when ioctl-initiated transactions are in use, but that seems like a pretty fundamental consequence of wrapping lots of write()'s in a transaction. Btrfs has no way to tell if the application considers a given operation as part of it's transaction. Obviously, if the transaction start/stop ioctls aren't being used, there is no effect on current behavior. Signed-off-by: NSage Weil <sage@newdream.net> --- ctree.h | 1 + ioctl.c | 12 +++++++++++- transaction.c | 18 +++++++++++++----- transaction.h | 2 ++ 4 files changed, 27 insertions(+), 6 deletions(-) Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
A btree block cow has two parts, the first is to allocate a destination block and the second is to copy the old bock over. The first part needs locks in the extent allocation tree, and may need to do IO. This changeset splits that into a separate function that can be called without any tree locks held. btrfs_search_slot is changed to drop its path and start over if it has to COW a contended block. This often means that many writers will pre-alloc a new destination for a the same contended block, but they cache their prealloc for later use on lower levels in the tree. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Large streaming reads make for large bios, which means each entry on the list async work queues represents a large amount of data. IO congestion throttling on the device was kicking in before the async worker threads decided a single thread was busy and needed some help. The end result was that a streaming read would result in a single CPU running at 100% instead of balancing the work off to other CPUs. This patch also changes the pre-IO checksum lookup done by reads to work on a per-bio basis instead of a per-page. This results in many extra btree lookups on large streaming reads. Doing the checksum lookup right before bio submit allows us to reuse searches while processing adjacent offsets. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Yan 提交于
The memory reclaiming issue happens when snapshot exists. In that case, some cache entries may not be used during old snapshot dropping, so they will remain in the cache until umount. The patch adds a field to struct btrfs_leaf_ref to record create time. Besides, the patch makes all dead roots of a given snapshot linked together in order of create time. After a old snapshot was completely dropped, we check the dead root list and remove all cache entries created before the oldest dead root in the list. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Yan Zheng 提交于
To check whether a given file extent is referenced by multiple snapshots, the checker walks down the fs tree through dead root and checks all tree blocks in the path. We can easily detect whether a given tree block is directly referenced by other snapshot. We can also detect any indirect reference from other snapshot by checking reference's generation. The checker can always detect multiple references, but can't reliably detect cases of single reference. So btrfs may do file data cow even there is only one reference. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
A large reference cache is directly related to a lot of work pending for the cleaner thread. This throttles back new operations based on the size of the reference cache so the cleaner thread will be able to keep up. Overall, this actually makes the FS faster because the cleaner thread will be more likely to find things in cache. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
This changes the reference cache to make a single cache per root instead of one cache per transaction, and to key by the byte number of the disk block instead of the keys inside. This makes it much less likely to have cache misses if a snapshot or something has an extra reference on a higher node or a leaf while the first transaction that added the leaf into the cache is dropping. Some throttling is added to functions that free blocks heavily so they wait for old transactions to drop. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Yan Zheng 提交于
Much of the IO done while dropping snapshots is done looking up leaves in the filesystem trees to see if they point to any extents and to drop the references on any extents found. This creates a cache so that IO isn't required. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Before setting an extent to delalloc, the code needs to wait for pending ordered extents. Also, the relocation code needs to wait for ordered IO before scanning the block group again. This is because the extents are not removed until the IO for the new extents is finished Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
This releases the alloc_mutex in a few places that hold it for over long operations. btrfs_lookup_block_group is changed so that it doesn't need the mutex at all. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Lockdep has the notion of locking subclasses so that you can identify locks you expect to be taken after other locks of the same class. This changes the per-extent buffer btree locking routines to use a subclass based on the level in the tree. Unfortunately, lockdep can only handle 8 total subclasses, and the btrfs max level is also 8. So when lockdep is on, use a lower max level. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Stress testing was showing data checksum errors, most of which were caused by a lookup bug in the extent_map tree. The tree was caching the last pointer returned, and searches would check the last pointer first. But, search callers also expect the search to return the very first matching extent in the range, which wasn't always true with the last pointer usage. For now, the code to cache the last return value is just removed. It is easy to fix, but I think lookups are rare enough that it isn't required anymore. This commit also replaces do_sync_mapping_range with a local copy of the related functions. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Data checksumming is done right before the bio is sent down the IO stack, which means a single bio might span more than one ordered extent. In this case, the checksumming data is split between two ordered extents. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
btrfs_commit_transaction has to loop waiting for any writers in the transaction to finish before it can proceed. btrfs_start_transaction should be polite and not join a transaction that is in the process of being finished off. There are a few places that can't wait, basically the ones doing IO that might be needed to finish the transaction. For them, btrfs_join_transaction is added. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Higher layers sometimes call set_page_dirty without asking the filesystem to help. This causes many problems for the data=ordered and cow code. This commit detects pages that haven't been properly setup for IO and kicks off an async helper to deal with them. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
The old data=ordered code would force commit to wait until all the data extents from the transaction were fully on disk. This introduced large latencies into the commit and stalled new writers in the transaction for a long time. The new code changes the way data allocations and extents work: * When delayed allocation is filled, data extents are reserved, and the extent bit EXTENT_ORDERED is set on the entire range of the extent. A struct btrfs_ordered_extent is allocated an inserted into a per-inode rbtree to track the pending extents. * As each page is written EXTENT_ORDERED is cleared on the bytes corresponding to that page. * When all of the bytes corresponding to a single struct btrfs_ordered_extent are written, The previously reserved extent is inserted into the FS btree and into the extent allocation trees. The checksums for the file data are also updated. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-