提交 · ad48fd754676bfae4139be1a897b1ea58f9aaf21 · openeuler / Kernel

16 12月, 2009 1 次提交

Btrfs: Add btrfs_duplicate_item · ad48fd75

由 Yan, Zheng 提交于 11月 12, 2009

btrfs_duplicate_item duplicates item with new key, guaranteeing
the source item and the new items are in the same tree leaf and
contiguous. It allows us to split file extent in place, without
using lock_extent to prevent bookend extent race.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ad48fd75

24 9月, 2009 1 次提交

Btrfs: check size of inode backref before adding hardlink · a5719521

由 Yan, Zheng 提交于 9月 24, 2009

For every hardlink in btrfs, there is a corresponding inode back
reference. All inode back references for hardlinks in a given
directory are stored in single b-tree item. The size of b-tree item
is limited by the size of b-tree leaf, so we can only create limited
number of hardlinks to a given file in a directory.

The original code lacks of the check, it oops if the number of
hardlinks goes over the limit. This patch fixes the issue by adding
check to btrfs_link and btrfs_rename.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a5719521

25 7月, 2009 1 次提交

Btrfs: Avoid delayed reference update looping · d717aa1d

由 Yan Zheng 提交于 7月 24, 2009

btrfs_split_leaf and btrfs_del_items can end up in a loop
where one is constantly spliting a given leaf and the other
is constantly merging it back with the adjacent nodes.

There is a better fix for this, but in the interest of something
small, this patch just changes btrfs_del_items back to balancing less
often.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d717aa1d

24 7月, 2009 2 次提交

Btrfs: Fix ordering of key field checks in btrfs_previous_item · 0a4eefbb

由 Yan Zheng 提交于 7月 24, 2009

Check objectid of item before checking the item type, otherwise we may return
zero for a key that is actually too low.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

0a4eefbb

Btrfs: Remove code duplication in comp_keys · 20736aba

由 Diego Calleja 提交于 7月 24, 2009

comp_keys is duplicating what is done in btrfs_comp_cpu_keys, so just
call it.
Signed-off-by: NDiego Calleja <diegocg@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

20736aba

22 7月, 2009 2 次提交

Btrfs: fix locking issue in btrfs_find_next_key · 33c66f43

由 Yan Zheng 提交于 7月 22, 2009

When walking up the tree, btrfs_find_next_key assumes the upper level tree
block is properly locked. This isn't always true even path->keep_locks is 1.
This is because btrfs_find_next_key may advance path->slots[] several times
instead of only once.

When 'path->slots[level] >= btrfs_header_nritems(path->nodes[level])' is found,
we can't guarantee the original value of 'path->slots[level]' is
'btrfs_header_nritems(path->nodes[level]) - 1'. If it's not, the tree block at
'level + 1' isn't locked.

This patch fixes the issue by explicitly checking the locking state,
re-searching the tree if it's not locked.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

33c66f43

Btrfs: fix double increment of path->slots[0] in btrfs_next_leaf · e457afec

由 Yan Zheng 提交于 7月 22, 2009

if 1 is returned by btrfs_search_slot, the path already points to the
first item with 'key > searching key'. So increasing path->slots[0] by
one is superfluous in that case.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e457afec

10 6月, 2009 3 次提交

Btrfs: balance btree more often · cfbb9308

由 Chris Mason 提交于 5月 18, 2009

With the new back reference code, the cost of a balance has gone down
in terms of the number of back reference updates done.  This commit
makes us more aggressively balance leaves and nodes as they become
less full.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

cfbb9308

Btrfs: stop avoiding balancing at the end of the transaction. · b3612421

由 Chris Mason 提交于 5月 13, 2009

When the delayed reference code was added, some checks were added
to avoid extra balancing while the delayed references were being flushed.
This made for less efficient btrees, but it reduced the chances of
loops where no forward progress was made because the balances made
more delayed ref updates.

With the new dead root removal code and the mixed back references,
the extent allocation tree is no longer using precise back refs, and
the delayed reference updates don't carry the risk of looping forever
anymore.  So, the balance avoidance is no longer required.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b3612421

Btrfs: Mixed back reference (FORWARD ROLLING FORMAT CHANGE) · 5d4f98a2

由 Yan Zheng 提交于 6月 10, 2009

This commit introduces a new kind of back reference for btrfs metadata.
Once a filesystem has been mounted with this commit, IT WILL NO LONGER
BE MOUNTABLE BY OLDER KERNELS.

When a tree block in subvolume tree is cow'd, the reference counts of all
extents it points to are increased by one. At transaction commit time,
the old root of the subvolume is recorded in a "dead root" data structure,
and the btree it points to is later walked, dropping reference counts
and freeing any blocks where the reference count goes to 0.

The increments done during cow and decrements done after commit cancel out,
and the walk is a very expensive way to go about freeing the blocks that
are no longer referenced by the new btree root. This commit reduces the
transaction overhead by avoiding the need for dead root records.

When a non-shared tree block is cow'd, we free the old block at once, and the
new block inherits old block's references. When a tree block with reference
count > 1 is cow'd, we increase the reference counts of all extents
the new block points to by one, and decrease the old block's reference count by
one.

This dead tree avoidance code removes the need to modify the reference
counts of lower level extents when a non-shared tree block is cow'd.
But we still need to update back ref for all pointers in the block.
This is because the location of the block is recorded in the back ref
item.

We can solve this by introducing a new type of back ref. The new
back ref provides information about pointer's key, level and in which
tree the pointer lives. This information allow us to find the pointer
by searching the tree. The shortcoming of the new back ref is that it
only works for pointers in tree blocks referenced by their owner trees.

This is mostly a problem for snapshots, where resolving one of these
fuzzy back references would be O(number_of_snapshots) and quite slow.
The solution used here is to use the fuzzy back references in the common
case where a given tree block is only referenced by one root,
and use the full back references when multiple roots have a reference
on a given block.

This commit adds per subvolume red-black tree to keep trace of cached
inodes. The red-black tree helps the balancing code to find cached
inodes whose inode numbers within a given range.

This commit improves the balancing code by introducing several data
structures to keep the state of balancing. The most important one
is the back ref cache. It caches how the upper level tree blocks are
referenced. This greatly reduce the overhead of checking back ref.

The improved balancing code scales significantly better with a large
number of snapshots.

This is a very large commit and was written in a number of
pieces. But, they depend heavily on the disk format change and were
squashed together to make sure git bisect didn't end up in a
bad state wrt space balancing or the format change.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

5d4f98a2

15 5月, 2009 1 次提交

Btrfs: Don't loop forever on metadata IO failures · 76a05b35

由 Chris Mason 提交于 5月 14, 2009

When a btrfs metadata read fails, the first thing we try to do is find
a good copy on another mirror of the block.  If this fails, read_tree_block()
ends up returning a buffer that isn't up to date.

The btrfs btree reading code was reworked to drop locks and repeat
the search when IO was done, but the changes didn't add a check for failed
reads.  The end result was looping forever on buffers that were never
going to become up to date.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

76a05b35

21 4月, 2009 1 次提交

Btrfs: use the right node in reada_for_balance · 8c594ea8

由 Chris Mason 提交于 4月 20, 2009

reada_for_balance was using the wrong index into the path node array,
so it wasn't reading the right blocks.  We never directly used the
results of the read done by this function because the btree search is
started over at the end.

This fixes reada_for_balance to reada in the correct node and to
avoid searching past the last slot in the node.  It also makes sure to
hold the parent lock while we are finding the nodes to read.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

8c594ea8

03 4月, 2009 3 次提交

S
Btrfs: BUG to BUG_ON changes · c293498b
由 Stoyan Gaydarov 提交于 4月 02, 2009
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
c293498b

Btrfs: Optimize locking in btrfs_next_leaf() · 8e73f275

由 Chris Mason 提交于 4月 03, 2009

btrfs_next_leaf was using blocking locks when it could have been using
faster spinning ones instead. This adds a few extra checks around
the pieces that block and switches over to spinning locks.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

8e73f275

Btrfs: break up btrfs_search_slot into smaller pieces · c8c42864

由 Chris Mason 提交于 4月 03, 2009

btrfs_search_slot was doing too many things at once.  This breaks
it up into more reasonable units.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c8c42864

25 3月, 2009 5 次提交

Btrfs: limit balancing work while flushing delayed refs · a4b6e07d

由 Chris Mason 提交于 3月 16, 2009

The delayed reference mechanism is responsible for all updates to the
extent allocation trees, including those updates created while processing
the delayed references.

This commit tries to limit the amount of work that gets created during
the final run of delayed refs before a commit.  It avoids cowing new blocks
unless it is required to finish the commit, and so it avoids new allocations
that were not really required.

The goal is to avoid infinite loops where we are always making more work
on the final run of delayed refs.  Over the long term we'll make a
special log for the last delayed ref updates as well.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a4b6e07d

Btrfs: leave btree locks spinning more often · b9473439

由 Chris Mason 提交于 3月 13, 2009

btrfs_mark_buffer dirty would set dirty bits in the extent_io tree
for the buffers it was dirtying. This may require a kmalloc and it
was not atomic. So, anyone who called btrfs_mark_buffer_dirty had to
set any btree locks they were holding to blocking first.

This commit changes dirty tracking for extent buffers to just use a flag
in the extent buffer. Now that we have one and only one extent buffer
per page, this can be safely done without losing dirty bits along the way.

This also introduces a path->leave_spinning flag that callers of
btrfs_search_slot can use to indicate they will properly deal with a
path returned where all the locks are spinning instead of blocking.

Many of the btree search callers now expect spinning paths,
resulting in better btree concurrency overall.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b9473439

Btrfs: reduce stack usage in some crucial tree balancing functions · 44871b1b

由 Chris Mason 提交于 3月 13, 2009

Many of the tree balancing functions follow the same pattern.

1) cow a block
2) do something to the result

This commit breaks them up into two functions so the variables and
code required for part two don't suck down stack during part one.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

44871b1b

Btrfs: do extent allocation and reference count updates in the background · 56bec294

由 Chris Mason 提交于 3月 13, 2009

The extent allocation tree maintains a reference count and full
back reference information for every extent allocated in the
filesystem.  For subvolume and snapshot trees, every time
a block goes through COW, the new copy of the block adds a reference
on every block it points to.

If a btree node points to 150 leaves, then the COW code needs to go
and add backrefs on 150 different extents, which might be spread all
over the extent allocation tree.

These updates currently happen during btrfs_cow_block, and most COWs
happen during btrfs_search_slot.  btrfs_search_slot has locks held
on both the parent and the node we are COWing, and so we really want
to avoid IO during the COW if we can.

This commit adds an rbtree of pending reference count updates and extent
allocations.  The tree is ordered by byte number of the extent and byte number
of the parent for the back reference.  The tree allows us to:

1) Modify back references in something close to disk order, reducing seeks
2) Significantly reduce the number of modifications made as block pointers
are balanced around
3) Do all of the extent insertion and back reference modifications outside
of the performance critical btrfs_search_slot code.

#3 has the added benefit of greatly reducing the btrfs stack footprint.
The extent allocation tree modifications are done without the deep
(and somewhat recursive) call chains used in the past.

These delayed back reference updates must be done before the transaction
commits, and so the rbtree is tied to the transaction.  Throttling is
implemented to help keep the queue of backrefs at a reasonable size.

Since there was a similar mechanism in place for the extent tree
extents, that is removed and replaced by the delayed reference tree.

Yan Zheng <yan.zheng@oracle.com> helped review and fixup this code.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

56bec294

Btrfs: don't preallocate metadata blocks during btrfs_search_slot · 9fa8cfe7

由 Chris Mason 提交于 3月 13, 2009

In order to avoid doing expensive extent management with tree locks held,
btrfs_search_slot will preallocate tree blocks for use by COW without
any tree locks held.

A later commit moves all of the extent allocation work for COW into
a delayed update mechanism, and this preallocation will no longer be
required.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

9fa8cfe7

09 3月, 2009 1 次提交

Btrfs: fix spinlock assertions on UP systems · b9447ef8

由 Chris Mason 提交于 3月 09, 2009

btrfs_tree_locked was being used to make sure a given extent_buffer was
properly locked in a few places.  But, it wasn't correct for UP compiled
kernels.

This switches it to using assert_spin_locked instead, and renames it to
btrfs_assert_tree_locked to better reflect how it was really being used.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b9447ef8

13 2月, 2009 2 次提交

Btrfs: make a lockdep class for the extent buffer locks · 4008c04a

由 Chris Mason 提交于 2月 12, 2009

Btrfs is currently using spin_lock_nested with a nested value based
on the tree depth of the block. But, this doesn't quite work because
the max tree depth is bigger than what spin_lock_nested can deal with,
and because locks are sometimes taken before the level field is filled in.

The solution here is to use lockdep_set_class_and_name instead, and to
set the class before unlocking the pages when the block is read from the
disk and just after init of a freshly allocated tree block.

btrfs_clear_path_blocking is also changed to take the locks in the proper
order, and it also makes sure all the locks currently held are properly
set to blocking before it tries to retake the spinlocks. Otherwise, lockdep
gets upset about bad lock orderin.

The lockdep magic cam from Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

4008c04a

Btrfs: remove btrfs_init_path · e00f7308

由 Jeff Mahoney 提交于 2月 12, 2009

btrfs_init_path was initially used when the path objects were on the
stack.  Now all the work is done by btrfs_alloc_path and btrfs_init_path
isn't required.

This patch removes it, and just uses kmem_cache_zalloc to zero out the object.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e00f7308

12 2月, 2009 1 次提交

Btrfs: balance_level checks !child after access · 7951f3ce

由 Jeff Mahoney 提交于 2月 12, 2009

The BUG_ON() is in the wrong spot.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

7951f3ce

10 2月, 2009 1 次提交

Btrfs: don't use spin_is_contended · 284b066a

由 Chris Mason 提交于 2月 09, 2009

Btrfs was using spin_is_contended to see if it should drop locks before
doing extent allocations during btrfs_search_slot.  The idea was to avoid
expensive searches in the tree unless the lock was actually contended.

But, spin_is_contended is specific to the ticket spinlocks on x86, so this
is causing compile errors everywhere else.

In practice, the contention could easily appear some time after we started
doing the extent allocation, and it makes more sense to always drop the lock
instead.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

284b066a

04 2月, 2009 5 次提交

Btrfs: Only prep for btree deletion balances when nodes are mostly empty · 7b78c170

由 Chris Mason 提交于 2月 04, 2009

Whenever an item deletion is done, we need to balance all the nodes
in the tree to make sure we don't end up with an empty node if a pointer
is deleted.  This balance prep happens from the root of the tree down
so we can drop our locks as we go.

reada_for_balance was triggering read-ahead on neighboring nodes even
when no balancing was required.  This adds an extra check to avoid
calling balance_level() and avoid reada_for_balance() when a balance
won't be required.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

7b78c170

Btrfs: fix btrfs_unlock_up_safe to walk the entire path · 12f4dacc

由 Chris Mason 提交于 2月 04, 2009

btrfs_unlock_up_safe would break out at the first NULL node entry or
unlocked node it found in the path.

Some of the callers have missing nodes at the lower levels of the path, so this
commit fixes things to check all the nodes in the path before returning.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

12f4dacc

Btrfs: change btrfs_del_leaf to drop locks earlier · 4d081c41

由 Chris Mason 提交于 2月 04, 2009

btrfs_del_leaf does two things.  First it removes the pointer in the
parent, and then it frees the block that has the leaf.  It has the
parent node locked for both operations.

But, it only needs the parent locked while it is deleting the pointer.
After that it can safely free the block without the parent locked.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

4d081c41

Btrfs: Change btree locking to use explicit blocking points · b4ce94de

由 Chris Mason 提交于 2月 04, 2009

Most of the btrfs metadata operations can be protected by a spinlock,
but some operations still need to schedule.

So far, btrfs has been using a mutex along with a trylock loop,
most of the time it is able to avoid going for the full mutex, so
the trylock loop is a big performance gain.

This commit is step one for getting rid of the blocking locks entirely.
btrfs_tree_lock takes a spinlock, and the code explicitly switches
to a blocking lock when it starts an operation that can schedule.

We'll be able get rid of the blocking locks in smaller pieces over time.
Tracing allows us to find the most common cause of blocking, so we
can start with the hot spots first.

The basic idea is:

btrfs_tree_lock() returns with the spin lock held

btrfs_set_lock_blocking() sets the EXTENT_BUFFER_BLOCKING bit in
the extent buffer flags, and then drops the spin lock.  The buffer is
still considered locked by all of the btrfs code.

If btrfs_tree_lock gets the spinlock but finds the blocking bit set, it drops
the spin lock and waits on a wait queue for the blocking bit to go away.

Much of the code that needs to set the blocking bit finishes without actually
blocking a good percentage of the time.  So, an adaptive spin is still
used against the blocking bit to avoid very high context switch rates.

btrfs_clear_lock_blocking() clears the blocking bit and returns
with the spinlock held again.

btrfs_tree_unlock() can be called on either blocking or spinning locks,
it does the right thing based on the blocking bit.

ctree.c has a helper function to set/clear all the locked buffers in a
path as blocking.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b4ce94de

Btrfs: hash_lock is no longer needed · c487685d

由 Chris Mason 提交于 2月 04, 2009

Before metadata is written to disk, it is updated to reflect that writeout
has begun.  Once this update is done, the block must be cow'd before it
can be modified again.

This update was originally synchronized by using a per-fs spinlock.  Today
the buffers for the metadata blocks are locked before writeout begins,
and everyone that tests the flag has the buffer locked as well.

So, the per-fs spinlock (called hash_lock for no good reason) is no
longer required.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c487685d

22 1月, 2009 1 次提交

Btrfs: do less aggressive btree readahead · a7175319

由 Chris Mason 提交于 1月 22, 2009

Just before reading a leaf, btrfs scans the node for blocks that are
close by and reads them too.  It tries to build up a large window
of IO looking for blocks that are within a max distance from the top
and bottom of the IO window.

This patch changes things to just look for blocks within 64k of the
target block.  It will trigger less IO and make for lower latencies on
the read size.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a7175319

06 1月, 2009 1 次提交

Btrfs: Fix checkpatch.pl warnings · d397712b

由 Chris Mason 提交于 1月 05, 2009

There were many, most are fixed now.  struct-funcs.c generates some warnings
but these are bogus.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d397712b

17 12月, 2008 1 次提交

Btrfs: properly check free space for tree balancing · 87b29b20

由 Yan Zheng 提交于 12月 17, 2008

btrfs_insert_empty_items takes the space needed by the btrfs_item
structure into account when calculating the required free space.

So the tree balancing code shouldn't add sizeof(struct btrfs_item)
to the size when checking the free space. This patch removes these
superfluous additions.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

87b29b20

16 12月, 2008 1 次提交

Btrfs: Fix compressed writes on truncated pages · 42dc7bab

由 Chris Mason 提交于 12月 15, 2008

The compression code was using isize to limit the amount of data it
sent through zlib.  But, it wasn't properly limiting the looping to
just the pages inside i_size.  The end result was trying to compress
too many pages, including those that had not been setup and properly locked
down.  This made the compression code oops while trying find_get_page on a
page that didn't exist.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

42dc7bab

10 12月, 2008 1 次提交

Btrfs: Delete csum items when freeing extents · 459931ec

由 Chris Mason 提交于 12月 10, 2008

This finishes off the new checksumming code by removing csum items
for extents that are no longer in use.

The trick is doing it without racing because a single csum item may
hold csums for more than one extent.  Extra checks are added to
btrfs_csum_file_blocks to make sure that we are using the correct
csum item after dropping locks.

A new btrfs_split_item is added to split a single csum item so it
can be split without dropping the leaf lock.  This is used to
remove csum bytes from the middle of an item.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

459931ec

09 12月, 2008 1 次提交

Btrfs: Use map_private_extent_buffer during generic_bin_search · 934d375b

由 Chris Mason 提交于 12月 08, 2008

It is possible that generic_bin_search will be called on a tree block
that has not been locked.  This happens because cache_block_block skips
locking on the tree blocks.

Since the tree block isn't locked, we aren't allowed to change
the extent_buffer->map_token field.  Using map_private_extent_buffer
avoids any changes to the internal extent buffer fields.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

934d375b

02 12月, 2008 1 次提交

Btrfs: make things static and include the right headers · b2950863

由 Christoph Hellwig 提交于 12月 02, 2008

Shut up various sparse warnings about symbols that should be either
static or have their declarations in scope.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b2950863

19 11月, 2008 1 次提交

Btrfs: Some fixes for batching extent insert. · b4eec2ca

由 Liu Hui 提交于 11月 18, 2008

In insert_extents(), when ret==1 and last is not zero, it should
check if the current inserted item is the last item in this batching
inserts. If so, it should just break from loop. If not, 'cur =
insert_list->next' will make no sense because the list is empty now,
and 'op' will point to an unexpectable place.

There are also some trivial fixs in this patch including one comment
typo error and deleting two redundant lines.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b4eec2ca

18 11月, 2008 1 次提交

Btrfs: Seed device support · 2b82032c

由 Yan Zheng 提交于 11月 17, 2008

Seed device is a special btrfs with SEEDING super flag
set and can only be mounted in read-only mode. Seed
devices allow people to create new btrfs on top of it.

The new FS contains the same contents as the seed device,
but it can be mounted in read-write mode.

This patch does the following:

1) split code in btrfs_alloc_chunk into two parts. The first part does makes
the newly allocated chunk usable, but does not do any operation that modifies
the chunk tree. The second part does the the chunk tree modifications. This
division is for the bootstrap step of adding storage to the seed device.

2) Update device management code to handle seed device.
The basic idea is: For an FS grown from seed devices, its
seed devices are put into a list. Seed devices are
opened on demand at mounting time. If any seed device is
missing or has been changed, btrfs kernel module will
refuse to mount the FS.

3) make btrfs_find_block_group not return NULL when all
block groups are read-only.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

2b82032c

13 11月, 2008 1 次提交

Btrfs: batch extent inserts/updates/deletions on the extent root · f3465ca4

由 Josef Bacik 提交于 11月 12, 2008

While profiling the allocator I noticed a good amount of time was being spent in
finish_current_insert and del_pending_extents, and as the filesystem filled up
more and more time was being spent in those functions. This patch aims to try
and reduce that problem. This happens two ways

1) track if we tried to delete an extent that we are going to update or insert.
Once we get into finish_current_insert we discard any of the extents that were
marked for deletion. This saves us from doing unnecessary work almost every
time finish_current_insert runs.

2) Batch insertion/updates/deletions. Instead of doing a btrfs_search_slot for
each individual extent and doing the needed operation, we instead keep the leaf
around and see if there is anything else we can do on that leaf. On the insert
case I introduced a btrfs_insert_some_items, which will take an array of keys
with an array of data_sizes and try and squeeze in as many of those keys as
possible, and then return how many keys it was able to insert. In the update
case we search for an extent ref, update the ref and then loop through the leaf
to see if any of the other refs we are looking to update are on that leaf, and
then once we are done we release the path and search for the next ref we need to
update. And finally for the deletion we try and delete the extent+ref in pairs,
so we will try to find extent+ref pairs next to the extent we are trying to free
and free them in bulk if possible.

This along with the other cluster fix that Chris pushed out a bit ago helps make
the allocator preform more uniformly as it fills up the disk. There is still a
slight drop as we fill up the disk since we start having to stick new blocks in
odd places which results in more COW's than on a empty fs, but the drop is not
nearly as severe as it was before.
Signed-off-by: NJosef Bacik <jbacik@redhat.com>

f3465ca4

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功