提交 · 31153d81284934601d08110ac7698fd9a535e4c0 · openeuler / raspberrypi-kernel

25 9月, 2008 40 次提交

Btrfs: Add a leaf reference cache · 31153d81

由 Yan Zheng 提交于 7月 28, 2008

Much of the IO done while dropping snapshots is done looking up
leaves in the filesystem trees to see if they point to any extents and
to drop the references on any extents found.

This creates a cache so that IO isn't required.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

31153d81

J
Btrfs: Create orphan inode records to prevent lost files after a crash · 7b128766
由 Josef Bacik 提交于 7月 24, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
7b128766

Btrfs: Fix the defragmention code and the block relocation code for data=ordered · 3eaa2885

由 Chris Mason 提交于 7月 24, 2008

Before setting an extent to delalloc, the code needs to wait for
pending ordered extents.

Also, the relocation code needs to wait for ordered IO before scanning
the block group again.  This is because the extents are not removed
until the IO for the new extents is finished
Signed-off-by: NChris Mason <chris.mason@oracle.com>

3eaa2885

Btrfs: Search data ordered extents first for checksums on read · 89642229

由 Chris Mason 提交于 7月 24, 2008

Checksum items are not inserted into the tree until all of the io from a
given extent is complete. This means one dirty page from an extent may
be written, freed, and then read again before the entire extent is on disk
and the checksum item is inserted.

The checksums themselves are stored in the ordered extent so they can
be inserted in bulk when IO is complete. On read, if a checksum item isn't
found, the ordered extents were being searched for a checksum record.

This all worked most of the time, but the checksum insertion code tries
to reduce the number of tree operations by pre-inserting checksum items
based on i_size and a few other factors. This means the read code might
find a checksum item that hasn't yet really been filled in.

This commit changes things to check the ordered extents first and only
dive into the btree if nothing was found. This removes the need for
extra locking and is more reliable.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

89642229

Btrfs: Index extent buffers in an rbtree · 6af118ce

由 Chris Mason 提交于 7月 22, 2008

Before, extent buffers were a temporary object, meant to map a number of pages
at once and collect operations on them.

But, a few extra fields have crept in, and they are also the best place to
store a per-tree block lock field as well.  This commit puts the extent
buffers into an rbtree, and ensures a single extent buffer for each
tree block.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

6af118ce

btrfs_start_transaction: wait for commits in progress to finish · f9295749

由 Chris Mason 提交于 7月 17, 2008

btrfs_commit_transaction has to loop waiting for any writers in the
transaction to finish before it can proceed.  btrfs_start_transaction
should be polite and not join a transaction that is in the process
of being finished off.

There are a few places that can't wait, basically the ones doing IO that
might be needed to finish the transaction.  For them, btrfs_join_transaction
is added.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f9295749

Btrfs: Use async helpers to deal with pages that have been improperly dirtied · 247e743c

由 Chris Mason 提交于 7月 17, 2008

Higher layers sometimes call set_page_dirty without asking the filesystem
to help. This causes many problems for the data=ordered and cow code.
This commit detects pages that haven't been properly setup for IO and
kicks off an async helper to deal with them.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

247e743c

Btrfs: New data=ordered implementation · e6dcd2dc

由 Chris Mason 提交于 7月 17, 2008

The old data=ordered code would force commit to wait until
all the data extents from the transaction were fully on disk.  This
introduced large latencies into the commit and stalled new writers
in the transaction for a long time.

The new code changes the way data allocations and extents work:

* When delayed allocation is filled, data extents are reserved, and
  the extent bit EXTENT_ORDERED is set on the entire range of the extent.
  A struct btrfs_ordered_extent is allocated an inserted into a per-inode
  rbtree to track the pending extents.

* As each page is written EXTENT_ORDERED is cleared on the bytes corresponding
  to that page.

* When all of the bytes corresponding to a single struct btrfs_ordered_extent
  are written, The previously reserved extent is inserted into the FS
  btree and into the extent allocation trees.  The checksums for the file
  data are also updated.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e6dcd2dc

C
Btrfs: Drop some verbose printks · 77a41afb
由 Chris Mason 提交于 7月 08, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
77a41afb
C
Btrfs: Add locking around volume management (device add/remove/balance) · 7d9eb12c
由 Chris Mason 提交于 7月 08, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
7d9eb12c

Btrfs: Online btree defragmentation fixes · 3f157a2f

由 Chris Mason 提交于 6月 25, 2008

The btree defragger wasn't making forward progress because the new key wasn't
being saved by the btrfs_search_forward function.

This also disables the automatic btree defrag, it wasn't scaling well to
huge filesystems. The auto-defrag needs to be done differently.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

3f157a2f

Btrfs: Replace the transaction work queue with kthreads · a74a4b97

由 Chris Mason 提交于 6月 25, 2008

This creates one kthread for commits and one kthread for
deleting old snapshots.  All the work queues are removed.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a74a4b97

Add btrfs_end_transaction_throttle to force writers to wait for pending commits · 89ce8a63

由 Chris Mason 提交于 6月 25, 2008

The existing throttle mechanism was often not sufficient to prevent
new writers from coming in and making a given transaction run forever.
This adds an explicit wait at the end of most operations so they will
allow the current transaction to close.

There is no wait inside file_write, inode updates, or cow filling, all which
have different deadlock possibilities.

This is a temporary measure until better asynchronous commit support is
added.  This code leads to stalls as it waits for data=ordered
writeback, and it really needs to be fixed.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

89ce8a63

C
Btrfs: Fix snapshot deletion to release the alloc_mutex much more often. · 333db94c
由 Chris Mason 提交于 6月 25, 2008
```
This lowers the impact of snapshot deletion on the rest of the FS.
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
333db94c

Drop locks in btrfs_search_slot when reading a tree block. · 051e1b9f

由 Chris Mason 提交于 6月 25, 2008

One lock per btree block can make for significant congestion if everyone
has to wait for IO at the high levels of the btree. This drops
locks held by a path when doing reads during a tree search.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

051e1b9f

Btrfs: Replace the big fs_mutex with a collection of other locks · a2135011

由 Chris Mason 提交于 6月 25, 2008

Extent alloctions are still protected by a large alloc_mutex.
Objectid allocations are covered by a objectid mutex
Other btree operations are protected by a lock on individual btree nodes
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a2135011

Btrfs: Start btree concurrency work. · 925baedd

由 Chris Mason 提交于 6月 25, 2008

The allocation trees and the chunk trees are serialized via their own
dedicated mutexes.  This means allocation location is still not very
fine grained.

The main FS btree is protected by locks on each block in the btree.  Locks
are taken top / down, and as processing finishes on a given level of the
tree, the lock is released after locking the lower level.

The end result of a search is now a path where only the lowest level
is locked.  Releasing or freeing the path drops any locks held.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

925baedd

Btrfs: Add a thread pool just for submit_bio · 1cc127b5

由 Chris Mason 提交于 6月 12, 2008

If a bio submission is after a lock holder waiting for the bio
on the work queue, it is possible to deadlock.  Move the bios
into their own pool.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1cc127b5

Btrfs: Add a mount option to control worker thread pool size · 4543df7e

由 Chris Mason 提交于 6月 11, 2008

mount -o thread_pool_size changes the default, which is
min(num_cpus + 2, 8).  Larger thread pools would make more sense on
very large disk arrays.

This mount option controls the max size of each thread pool.  There
are multiple thread pools, so the total worker count will be larger
than the mount option.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

4543df7e

Btrfs: Add async worker threads for pre and post IO checksumming · 8b712842

由 Chris Mason 提交于 6月 11, 2008

Btrfs has been using workqueues to spread the checksumming load across
other CPUs in the system.  But, workqueues only schedule work on the
same CPU that queued the work, giving them a limited benefit for systems with
higher CPU counts.

This code adds a generic facility to schedule work with pools of kthreads,
and changes the bio submission code to queue bios up.  The queueing is
important to make sure large numbers of procs on the system don't
turn streaming workloads into random workloads by sending IO down
concurrently.

The end result of all of this is much higher performance (and CPU usage) when
doing checksumming on large machines.  Two worker pools are created,
one for writes and one for endio processing.  The two could deadlock if
we tried to service both from a single pool.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

8b712842

btrfs: sanity mount option parsing and early mount code · edf24abe

由 Christoph Hellwig 提交于 6月 10, 2008

Also adds lots of comments to describe what's going on here.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

edf24abe

J
Btrfs: bdi_init and bdi_destroy come with 2.6.23 · 51ebc0d3
由 Jan Engelhardt 提交于 6月 09, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
51ebc0d3

Btrfs: Always use the async submission queue for checksummed writes · da496f2a

由 Chris Mason 提交于 5月 27, 2008

This avoids IO stalls and poorly ordered IO from inline writers mixing in
with the async submission queue
Signed-off-by: NChris Mason <chris.mason@oracle.com>

da496f2a

C
Btrfs: Enable btree balancing on old kernels again · 1c8cfcc1
由 Chris Mason 提交于 5月 16, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
1c8cfcc1

Btrfs: Change the congestion functions to meter the number of async submits as well · cb03c743

由 Chris Mason 提交于 5月 15, 2008

The async submit workqueue was absorbing too many requests, leading to long
stalls where the async submitters were stalling.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

cb03c743

Fix btrfs_open_devices to deal with changes since the scan ioctls · a0af469b

由 Chris Mason 提交于 5月 13, 2008

Devices can change after the scan ioctls are done, and btrfs_open_devices
needs to be able to verify them as they are opened and used by the FS.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a0af469b

C
Btrfs: Add mount -o degraded to allow mounts to continue with missing devices · dfe25020
由 Chris Mason 提交于 5月 13, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
dfe25020

Btrfs: Handle write errors on raid1 and raid10 · 1259ab75

由 Chris Mason 提交于 5月 12, 2008

When duplicate copies exist, writes are allowed to fail to one of those
copies.  This changeset includes a few changes that allow the FS to
continue even when some IOs fail.

It also adds verification of the parent generation number for btree blocks.
This generation is stored in the pointer to a block, and it ensures
that missed writes to are detected.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1259ab75

C
Btrfs: Pass down the expected generation number when reading tree blocks · ca7a79ad
由 Chris Mason 提交于 5月 12, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
ca7a79ad
C
Btrfs: Don't do btree balance_dirty_pages on old kernels, it stalls forever · 188de649
由 Chris Mason 提交于 5月 09, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
188de649

Btrfs: Add support for online device removal · a061fc8d

由 Chris Mason 提交于 5月 07, 2008

This required a few structural changes to the code that manages bdev pointers:

The VFS super block now gets an anon-bdev instead of a pointer to the
lowest bdev.  This allows us to avoid swapping the super block bdev pointer
around at run time.

The code to read in the super block no longer goes through the extent
buffer interface.  Things got ugly keeping the mapping constant.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a061fc8d

Btrfs: Fixes for 2.6.18 enterprise kernels · d6bfde87

由 Chris Mason 提交于 4月 30, 2008

2.6.18 seems to get caught in an infinite loop when
cancel_rearming_delayed_workqueue is called more than once, so this switches
to cancel_delayed_work, which is arguably more correct.

Also, balance_dirty_pages can run into problems with 2.6.18 based kernels
because it doesn't have the per-bdi dirty limits. This avoids calling
balance_dirty_pages on the btree inode unless there is actually something
to balance, which is a good optimization in general.

Finally there's a compile fix for ordered-data.h
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d6bfde87

C
Btrfs: Deal with failed writes in mirrored configurations · a236aed1
由 Chris Mason 提交于 4月 29, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
a236aed1
C
Btrfs: Drop some verbose printks · 4235298e
由 Chris Mason 提交于 4月 28, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
4235298e
C
Btrfs: Make the resizer work based on shrinking and growing devices · 8f18cf13
由 Chris Mason 提交于 4月 25, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
8f18cf13
C
Btrfs: Add failure handling for read_sys_array · 84eed90f
由 Chris Mason 提交于 4月 25, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
84eed90f
C
Btrfs: Fix the unplug_io_fn to grab a consistent copy of page->mapping · bcbfce8a
由 Chris Mason 提交于 4月 22, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
bcbfce8a
C
Deal with page == NULL in the btrfs_unplug_io_fn · 38b66988
由 Chris Mason 提交于 4月 22, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
38b66988
C
Btrfs: Make an unplug function that doesn't unplug every spindle · f2d8d74d
由 Chris Mason 提交于 4月 21, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
f2d8d74d
C
Btrfs: Remove debugging statements from the invalidatepage calls · 4ef64eae
由 Chris Mason 提交于 4月 21, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
4ef64eae