提交 · 2dd3e67b1eaec8504da7e12b8afee77323a49f38 · OpenHarmony / kernel_linux

25 9月, 2008 40 次提交

由 Chris Mason 提交于 8月 04, 2008

* Make walk_down_tree wake up throttled tasks more often
* Make walk_down_tree call cond_resched during long loops
* As the size of the ref cache grows, wait longer in throttle
* Get rid of the reada code in walk_down_tree, the leaves don't get
  read anymore, thanks to the ref cache.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

2dd3e67b

btrfs_search_slot: reduce lock contention by cowing in two stages · 65b51a00

由 Chris Mason 提交于 8月 01, 2008

A btree block cow has two parts, the first is to allocate a destination
block and the second is to copy the old bock over.

The first part needs locks in the extent allocation tree, and may need to
do IO. This changeset splits that into a separate function that can be
called without any tree locks held.

btrfs_search_slot is changed to drop its path and start over if it has
to COW a contended block. This often means that many writers will
pre-alloc a new destination for a the same contended block, but they
cache their prealloc for later use on lower levels in the tree.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

65b51a00

C
Btrfs: Throttle less often waiting for snapshots to delete · 18e35e0a
由 Chris Mason 提交于 8月 01, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
18e35e0a

Btrfs: Throttle tuning · 37d1aeee

由 Chris Mason 提交于 7月 31, 2008

This avoids waiting for transactions with pages locked by breaking out
the code to wait for the current transaction to close into a function
called by btrfs_throttle.

It also lowers the limits for where we start throttling.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

37d1aeee

Btrfs: implement memory reclaim for leaf reference cache · bcc63abb

由 Yan 提交于 7月 30, 2008

The memory reclaiming issue happens when snapshot exists. In that
case, some cache entries may not be used during old snapshot dropping,
so they will remain in the cache until umount.

The patch adds a field to struct btrfs_leaf_ref to record create time. Besides,
the patch makes all dead roots of a given snapshot linked together in order of
create time. After a old snapshot was completely dropped, we check the dead
root list and remove all cache entries created before the oldest dead root in
the list.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

bcc63abb

Btrfs: Update and fix mount -o nodatacow · f321e491

由 Yan Zheng 提交于 7月 30, 2008

To check whether a given file extent is referenced by multiple snapshots, the
checker walks down the fs tree through dead root and checks all tree blocks in
the path.

We can easily detect whether a given tree block is directly referenced by other
snapshot. We can also detect any indirect reference from other snapshot by
checking reference's generation. The checker can always detect multiple
references, but can't reliably detect cases of single reference. So btrfs may
do file data cow even there is only one reference.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f321e491

Btrfs: Throttle operations if the reference cache gets too large · ab78c84d

由 Chris Mason 提交于 7月 29, 2008

A large reference cache is directly related to a lot of work pending
for the cleaner thread.  This throttles back new operations based on
the size of the reference cache so the cleaner thread will be able to keep
up.

Overall, this actually makes the FS faster because the cleaner thread will
be more likely to find things in cache.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ab78c84d

Btrfs: Leaf reference cache update · 017e5369

由 Chris Mason 提交于 7月 28, 2008

This changes the reference cache to make a single cache per root
instead of one cache per transaction, and to key by the byte number
of the disk block instead of the keys inside.

This makes it much less likely to have cache misses if a snapshot
or something has an extra reference on a higher node or a leaf while
the first transaction that added the leaf into the cache is dropping.

Some throttling is added to functions that free blocks heavily so they
wait for old transactions to drop.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

017e5369

Btrfs: Add a leaf reference cache · 31153d81

由 Yan Zheng 提交于 7月 28, 2008

Much of the IO done while dropping snapshots is done looking up
leaves in the filesystem trees to see if they point to any extents and
to drop the references on any extents found.

This creates a cache so that IO isn't required.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

31153d81

J
Btrfs: Implement new dir index format · aec7477b
由 Josef Bacik 提交于 7月 24, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
aec7477b
C
Btrfs: Take the csum mutex while reading checksums · ed98b56a
由 Chris Mason 提交于 7月 22, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
ed98b56a

Btrfs: Fix some data=ordered related data corruptions · f421950f

由 Chris Mason 提交于 7月 22, 2008

Stress testing was showing data checksum errors, most of which were caused
by a lookup bug in the extent_map tree.  The tree was caching the last
pointer returned, and searches would check the last pointer first.

But, search callers also expect the search to return the very first
matching extent in the range, which wasn't always true with the last
pointer usage.

For now, the code to cache the last return value is just removed.  It is
easy to fix, but I think lookups are rare enough that it isn't required anymore.

This commit also replaces do_sync_mapping_range with a local copy of the
related functions.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f421950f

btrfs_start_transaction: wait for commits in progress to finish · f9295749

由 Chris Mason 提交于 7月 17, 2008

btrfs_commit_transaction has to loop waiting for any writers in the
transaction to finish before it can proceed.  btrfs_start_transaction
should be polite and not join a transaction that is in the process
of being finished off.

There are a few places that can't wait, basically the ones doing IO that
might be needed to finish the transaction.  For them, btrfs_join_transaction
is added.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f9295749

Btrfs: New data=ordered implementation · e6dcd2dc

由 Chris Mason 提交于 7月 17, 2008

The old data=ordered code would force commit to wait until
all the data extents from the transaction were fully on disk.  This
introduced large latencies into the commit and stalled new writers
in the transaction for a long time.

The new code changes the way data allocations and extents work:

* When delayed allocation is filled, data extents are reserved, and
  the extent bit EXTENT_ORDERED is set on the entire range of the extent.
  A struct btrfs_ordered_extent is allocated an inserted into a per-inode
  rbtree to track the pending extents.

* As each page is written EXTENT_ORDERED is cleared on the bytes corresponding
  to that page.

* When all of the bytes corresponding to a single struct btrfs_ordered_extent
  are written, The previously reserved extent is inserted into the FS
  btree and into the extent allocation trees.  The checksums for the file
  data are also updated.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e6dcd2dc

C
Btrfs: Drop some verbose printks · 77a41afb
由 Chris Mason 提交于 7月 08, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
77a41afb

Btrfs: Online btree defragmentation fixes · 3f157a2f

由 Chris Mason 提交于 6月 25, 2008

The btree defragger wasn't making forward progress because the new key wasn't
being saved by the btrfs_search_forward function.

This also disables the automatic btree defrag, it wasn't scaling well to
huge filesystems. The auto-defrag needs to be done differently.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

3f157a2f

C
Btrfs: Add a per-inode csum mutex to avoid races creating csum items · 1b1e2135
由 Chris Mason 提交于 6月 25, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
1b1e2135

Btrfs: Replace the transaction work queue with kthreads · a74a4b97

由 Chris Mason 提交于 6月 25, 2008

This creates one kthread for commits and one kthread for
deleting old snapshots.  All the work queues are removed.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a74a4b97

Add btrfs_end_transaction_throttle to force writers to wait for pending commits · 89ce8a63

由 Chris Mason 提交于 6月 25, 2008

The existing throttle mechanism was often not sufficient to prevent
new writers from coming in and making a given transaction run forever.
This adds an explicit wait at the end of most operations so they will
allow the current transaction to close.

There is no wait inside file_write, inode updates, or cow filling, all which
have different deadlock possibilities.

This is a temporary measure until better asynchronous commit support is
added.  This code leads to stalls as it waits for data=ordered
writeback, and it really needs to be fixed.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

89ce8a63

Btrfs: Replace the big fs_mutex with a collection of other locks · a2135011

由 Chris Mason 提交于 6月 25, 2008

Extent alloctions are still protected by a large alloc_mutex.
Objectid allocations are covered by a objectid mutex
Other btree operations are protected by a lock on individual btree nodes
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a2135011

Btrfs: Start btree concurrency work. · 925baedd

由 Chris Mason 提交于 6月 25, 2008

The allocation trees and the chunk trees are serialized via their own
dedicated mutexes.  This means allocation location is still not very
fine grained.

The main FS btree is protected by locks on each block in the btree.  Locks
are taken top / down, and as processing finishes on a given level of the
tree, the lock is released after locking the lower level.

The end result of a search is now a path where only the lowest level
is locked.  Releasing or freeing the path drops any locks held.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

925baedd

Btrfs: Invalidate dcache entry after creating snapshot and · 3b96362c

由 Sven Wegener 提交于 6月 09, 2008

We need to invalidate an existing dcache entry after creating a new
snapshot or subvolume, because a negative dache entry will stop us from
accessing the new snapshot or subvolume.

---
  ctree.h       |   23 +++++++++++++++++++++++
  inode.c       |    4 ++++
  transaction.c |    4 ++++
  3 files changed, 31 insertions(+)
Signed-off-by: NChris Mason <chris.mason@oracle.com>

3b96362c

Btrfs: Fix race in running_transaction checks · 48ec2cf8

由 Chris Mason 提交于 6月 09, 2008

When a new transaction was started, the code would incorrectly
set the pointer in fs_info before all the data structures were setup.
fsync heavy workloads hit races on the setup of the ordered inode spinlock
Signed-off-by: NChris Mason <chris.mason@oracle.com>

48ec2cf8

Btrfs: Add support for online device removal · a061fc8d

由 Chris Mason 提交于 5月 07, 2008

This required a few structural changes to the code that manages bdev pointers:

The VFS super block now gets an anon-bdev instead of a pointer to the
lowest bdev.  This allows us to avoid swapping the super block bdev pointer
around at run time.

The code to read in the super block no longer goes through the extent
buffer interface.  Things got ugly keeping the mapping constant.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a061fc8d

Btrfs: Fixes for 2.6.18 enterprise kernels · d6bfde87

由 Chris Mason 提交于 4月 30, 2008

2.6.18 seems to get caught in an infinite loop when
cancel_rearming_delayed_workqueue is called more than once, so this switches
to cancel_delayed_work, which is arguably more correct.

Also, balance_dirty_pages can run into problems with 2.6.18 based kernels
because it doesn't have the per-bdi dirty limits. This avoids calling
balance_dirty_pages on the btree inode unless there is actually something
to balance, which is a good optimization in general.

Finally there's a compile fix for ordered-data.h
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d6bfde87

C
Btrfs: Throttle file_write when data=ordered is flushing the inode · 81d7ed29
由 Chris Mason 提交于 4月 25, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
81d7ed29

Btrfs: Do metadata checksums for reads via a workqueue · ce9adaa5

由 Chris Mason 提交于 4月 09, 2008

Before, metadata checksumming was done by the callers of read_tree_block,
which would set EXTENT_CSUM bits in the extent tree to show that a given
range of pages was already checksummed and didn't need to be verified
again.

But, those bits could go away via try_to_releasepage, and the end
result was bogus checksum failures on pages that never left the cache.

The new code validates checksums when the page is read.  It is a little
tricky because metadata blocks can span pages and a single read may
end up going via multiple bios.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ce9adaa5

C
Btrfs: Add support for multiple devices per filesystem · 0b86a832
由 Chris Mason 提交于 3月 24, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
0b86a832
C
Btrfs: Lower stack usage in transaction.c · 80b6794d
由 Chris Mason 提交于 2月 01, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
80b6794d
C
Btrfs: Add data block hints to SSD mode too · 4529ba49
由 Chris Mason 提交于 1月 31, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
4529ba49

Btrfs: Split the extent_map code into two parts · d1310b2e

由 Chris Mason 提交于 1月 24, 2008

There is now extent_map for mapping offsets in the file to disk and
extent_io for state tracking, IO submission and extent_bufers.

The new extent_map code shifts from [start,end] pairs to [start,len], and
pushes the locking out into the caller.  This allows a few performance
optimizations and is easier to use.

A number of extent_map usage bugs were fixed, mostly with failing
to remove extent_map entries when changing the file.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d1310b2e

C
Btrfs: Add mount -o ssd, which includes optimizations for seek free storage · e18e4809
由 Chris Mason 提交于 1月 18, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
e18e4809

Btrfs: Fix data=ordered vs wait_on_inode deadlock on older kernels · 4d5e74bc

由 Chris Mason 提交于 1月 16, 2008

Using ilookup5 during data=ordered writeback could deadlock on I_LOCK.  This
saves a pointer to the inode instead.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

4d5e74bc

C
Btrfs: Run igrab on data=ordered inodes to prevent deadlocks during writeout · 2da98f00
由 Chris Mason 提交于 1月 16, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
2da98f00
C
Rework btrfs_drop_inode to avoid scheduling · cee36a03
由 Chris Mason 提交于 1月 15, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
cee36a03
C
Btrfs: Add some simple throttling to wait for data=ordered and snapshot deletion · e2008b61
由 Chris Mason 提交于 1月 08, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
e2008b61

Btrfs: Move snapshot creation to commit time · 3063d29f

由 Chris Mason 提交于 1月 08, 2008

It is very difficult to create a consistent snapshot of the btree when
other writers may update the btree before the commit is done.

This changes the snapshot creation to happen during the commit, while
no other updates are possible.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

3063d29f

Btrfs: Add data=ordered support · dc17ff8f

由 Chris Mason 提交于 1月 08, 2008

This forces file data extents down the disk along with the metadata that
references them. The current implementation is fairly simple, and just
writes out all of the dirty pages in an inode before the commit.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

dc17ff8f

C
Btrfs: Reduce stack usage in the resizer, fix 32 bit compiles · 4313b399
由 Chris Mason 提交于 1月 03, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
4313b399
C
Btrfs: Back port to 2.6.18-el kernels · 6da6abae
由 Chris Mason 提交于 12月 18, 2007
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
6da6abae

OpenHarmony / kernel_linux 上一次同步 大约 4 年

OpenHarmony / kernel_linux
上一次同步大约 4 年