提交 · e7a84565bcdb239caad29ccbe559ef978090ac7e · openanolis / cloud-kernel

25 9月, 2008 40 次提交

Btrfs: Add btree locking to the tree defragmentation code · e7a84565

由 Chris Mason 提交于 6月 25, 2008

The online btree defragger is simplified and rewritten to use
standard btree searches instead of a walk up / down mechanism.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e7a84565

Btrfs: Replace the transaction work queue with kthreads · a74a4b97

由 Chris Mason 提交于 6月 25, 2008

This creates one kthread for commits and one kthread for
deleting old snapshots.  All the work queues are removed.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a74a4b97

Btrfs: Add a skip_locking parameter to struct path, and make various funcs honor it · 5cd57b2c

由 Chris Mason 提交于 6月 25, 2008

Allocations may need to read in block groups from the extent allocation tree,
which will require a tree search and take locks on the extent allocation
tree.  But, those locks might already be held in other places, leading
to deadlocks.

Since the alloc_mutex serializes everything right now, it is safe to
skip the btree locking while caching block groups.  A better fix will be
to either create a recursive lock or find a way to back off existing
locks while caching block groups.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

5cd57b2c

Drop locks in btrfs_search_slot when reading a tree block. · 051e1b9f

由 Chris Mason 提交于 6月 25, 2008

One lock per btree block can make for significant congestion if everyone
has to wait for IO at the high levels of the btree. This drops
locks held by a path when doing reads during a tree search.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

051e1b9f

Btrfs: Replace the big fs_mutex with a collection of other locks · a2135011

由 Chris Mason 提交于 6月 25, 2008

Extent alloctions are still protected by a large alloc_mutex.
Objectid allocations are covered by a objectid mutex
Other btree operations are protected by a lock on individual btree nodes
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a2135011

Btrfs: Start btree concurrency work. · 925baedd

由 Chris Mason 提交于 6月 25, 2008

The allocation trees and the chunk trees are serialized via their own
dedicated mutexes.  This means allocation location is still not very
fine grained.

The main FS btree is protected by locks on each block in the btree.  Locks
are taken top / down, and as processing finishes on a given level of the
tree, the lock is released after locking the lower level.

The end result of a search is now a path where only the lowest level
is locked.  Releasing or freeing the path drops any locks held.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

925baedd

Btrfs: Add a thread pool just for submit_bio · 1cc127b5

由 Chris Mason 提交于 6月 12, 2008

If a bio submission is after a lock holder waiting for the bio
on the work queue, it is possible to deadlock.  Move the bios
into their own pool.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1cc127b5

Btrfs: split out ioctl.c · f46b5a66

由 Christoph Hellwig 提交于 6月 11, 2008

Split the ioctl handling out of inode.c into a file of it's own.
Also fix up checkpatch.pl warnings for the moved code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f46b5a66

Btrfs: Add a mount option to control worker thread pool size · 4543df7e

由 Chris Mason 提交于 6月 11, 2008

mount -o thread_pool_size changes the default, which is
min(num_cpus + 2, 8).  Larger thread pools would make more sense on
very large disk arrays.

This mount option controls the max size of each thread pool.  There
are multiple thread pools, so the total worker count will be larger
than the mount option.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

4543df7e

Btrfs: Add async worker threads for pre and post IO checksumming · 8b712842

由 Chris Mason 提交于 6月 11, 2008

Btrfs has been using workqueues to spread the checksumming load across
other CPUs in the system.  But, workqueues only schedule work on the
same CPU that queued the work, giving them a limited benefit for systems with
higher CPU counts.

This code adds a generic facility to schedule work with pools of kthreads,
and changes the bio submission code to queue bios up.  The queueing is
important to make sure large numbers of procs on the system don't
turn streaming workloads into random workloads by sending IO down
concurrently.

The end result of all of this is much higher performance (and CPU usage) when
doing checksumming on large machines.  Two worker pools are created,
one for writes and one for endio processing.  The two could deadlock if
we tried to service both from a single pool.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

8b712842

btrfs: sanity mount option parsing and early mount code · edf24abe

由 Christoph Hellwig 提交于 6月 10, 2008

Also adds lots of comments to describe what's going on here.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

edf24abe

Btrfs: transaction ioctls · 6bf13c0c

由 Sage Weil 提交于 6月 10, 2008

These ioctls let a user application hold a transaction open while it
performs a series of operations.  A final ioctl does a sync on the fs
(closing the current transaction).  This is the main requirement for
Ceph's OSD to be able to keep the data it's storing in a btrfs volume
consistent, and AFAICS it works just fine.  The application would do
something like

	fd = ::open("some/file", O_RDONLY);
	::ioctl(fd, BTRFS_IOC_TRANS_START);
	/* do a bunch of stuff */
	::ioctl(fd, BTRFS_IOC_TRANS_END);
or just
	::close(fd);

And to ensure it commits to disk,

	::ioctl(fd, BTRFS_IOC_SYNC);

When a transaction is held open, the trans_handle is attached to the
struct file (via private_data) so that it will get cleaned up if the
process dies unexpectedly.  A held transaction is also ended on fsync() to
avoid a deadlock.

A misbehaving application could also deliberately hold a transaction open,
effectively locking up the FS, so it may make sense to restrict something
like this to root or something.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

6bf13c0c

Btrfs: Invalidate dcache entry after creating snapshot and · 3b96362c

由 Sven Wegener 提交于 6月 09, 2008

We need to invalidate an existing dcache entry after creating a new
snapshot or subvolume, because a negative dache entry will stop us from
accessing the new snapshot or subvolume.

---
  ctree.h       |   23 +++++++++++++++++++++++
  inode.c       |    4 ++++
  transaction.c |    4 ++++
  3 files changed, 31 insertions(+)
Signed-off-by: NChris Mason <chris.mason@oracle.com>

3b96362c

Btrfs: Allocator fix variety pack · 0ef3e66b

由 Chris Mason 提交于 5月 24, 2008

* Force chunk allocation when find_free_extent has to do a full scan
* Record the max key at the start of defrag so it doesn't run forever
* Block groups might not be contiguous, make a forward search for the
  next block group in extent-tree.c
* Get rid of extra checks for total fs size
* Fix relocate_one_reference to avoid relocating the same file data block
  twice when referenced by an older transaction
* Use the open device count when allocating chunks so that we don't
  try to allocate from devices that don't exist
Signed-off-by: NChris Mason <chris.mason@oracle.com>

0ef3e66b

Btrfs: Change the congestion functions to meter the number of async submits as well · cb03c743

由 Chris Mason 提交于 5月 15, 2008

The async submit workqueue was absorbing too many requests, leading to long
stalls where the async submitters were stalling.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

cb03c743

C
Btrfs: Add mount -o degraded to allow mounts to continue with missing devices · dfe25020
由 Chris Mason 提交于 5月 13, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
dfe25020

Btrfs: Update nodatacow mode to support cloned single files and resizing · a68d5933

由 Chris Mason 提交于 5月 08, 2008

Before, nodatacow only checked to make sure multiple roots didn't have
references on a single extent.  This check makes sure that multiple
inodes don't have references.

nodatacow needed an extra check to see if the block group was currently
readonly.  This way cows forced by the chunk relocation code are honored.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a68d5933

C
Btrfs: Properly find the root for snapshotted blocks during chunk relocation · bf4ef679
由 Chris Mason 提交于 5月 08, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
bf4ef679

Btrfs: Add support for online device removal · a061fc8d

由 Chris Mason 提交于 5月 07, 2008

This required a few structural changes to the code that manages bdev pointers:

The VFS super block now gets an anon-bdev instead of a pointer to the
lowest bdev.  This allows us to avoid swapping the super block bdev pointer
around at run time.

The code to read in the super block no longer goes through the extent
buffer interface.  Things got ugly keeping the mapping constant.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a061fc8d

Btrfs: Clone file data ioctl · f2eb0a24

由 Sage Weil 提交于 5月 02, 2008

Add a new ioctl to clone file data
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f2eb0a24

C
Btrfs: Add balance ioctl to restripe the chunks · ec44a35c
由 Chris Mason 提交于 4月 28, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
ec44a35c
C
Btrfs: Add new ioctl to add devices · 788f20eb
由 Chris Mason 提交于 4月 28, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
788f20eb
C
Btrfs: Make the resizer work based on shrinking and growing devices · 8f18cf13
由 Chris Mason 提交于 4月 25, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
8f18cf13
C
Btrfs: Add support for labels in the super block · 7ae9c09d
由 Chris Mason 提交于 4月 18, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
7ae9c09d
C
Btrfs: Check device uuids along with devids · a443755f
由 Chris Mason 提交于 4月 18, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
a443755f

Btrfs: Write bio checksumming outside the FS mutex · e015640f

由 Chris Mason 提交于 4月 16, 2008

This significantly improves streaming write performance by allowing
concurrency in the data checksumming.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e015640f

Btrfs: Create a work queue for bio writes · 44b8bd7e

由 Chris Mason 提交于 4月 16, 2008

This allows checksumming to happen in parallel among many cpus, and
keeps us from bogging down pdflush with the checksumming code.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

44b8bd7e

C
Btrfs: Add RAID10 support · 321aecc6
由 Chris Mason 提交于 4月 16, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
321aecc6

Btrfs: Add chunk uuids and update multi-device back references · e17cade2

由 Chris Mason 提交于 4月 15, 2008

Block headers now store the chunk tree uuid

Chunk items records the device uuid for each stripes

Device extent items record better back refs to the chunk tree

Block groups record better back refs to the chunk tree

The chunk tree format has also changed.  The objectid of BTRFS_CHUNK_ITEM_KEY
used to be the logical offset of the chunk.  Now it is a chunk tree id,
with the logical offset being stored in the offset field of the key.

This allows a single chunk tree to record multiple logical address spaces,
upping the number of bytes indexed by a chunk tree from 2^64 to
2^128.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e17cade2

Add a min size parameter to btrfs_alloc_extent · 98d20f67

由 Chris Mason 提交于 4月 14, 2008

On huge machines, delayed allocation may try to allocate massive extents.
This change allows btrfs_alloc_extent to return something smaller than
the caller asked for, and the data allocation routines will loop over
the allocations until it fills the whole delayed alloc.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

98d20f67

Btrfs: Do metadata checksums for reads via a workqueue · ce9adaa5

由 Chris Mason 提交于 4月 09, 2008

Before, metadata checksumming was done by the callers of read_tree_block,
which would set EXTENT_CSUM bits in the extent tree to show that a given
range of pages was already checksummed and didn't need to be verified
again.

But, those bits could go away via try_to_releasepage, and the end
result was bogus checksum failures on pages that never left the cache.

The new code validates checksums when the page is read.  It is a little
tricky because metadata blocks can span pages and a single read may
end up going via multiple bios.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ce9adaa5

C
Btrfs: Fix allocation profile init · d18a2c44
由 Chris Mason 提交于 4月 04, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
d18a2c44
C
Btrfs: Add support for duplicate blocks on a single spindle · 611f0e00
由 Chris Mason 提交于 4月 03, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
611f0e00
C
Btrfs: Add support for mirroring across drives · 8790d502
由 Chris Mason 提交于 4月 03, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
8790d502

Reorder the flags field in struct btrfs_header and record a flag on writeout · 63b10fc4

由 Chris Mason 提交于 4月 01, 2008

This allows detection of blocks that have already been written in the
running transaction so they can be recowed instead of modified again.
It is step one in trusting the transid field of the block pointers.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

63b10fc4

Create a btrfs backing dev info · 04160088

由 Chris Mason 提交于 3月 26, 2008

This allows intelligent versions of unplug and congestion functions
Signed-off-by: NChris Mason <chris.mason@oracle.com>

04160088

C
Btrfs: Implement raid0 when multiple devices are present · 593060d7
由 Chris Mason 提交于 3月 25, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
593060d7
C
Btrfs: Add support for device scanning and detection ioctls · 8a4b83cc
由 Chris Mason 提交于 3月 24, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
8a4b83cc
C
Btrfs: Bring back mount -o ssd optimizations · 239b14b3
由 Chris Mason 提交于 3月 24, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
239b14b3
C
Btrfs: Move device information into the super block so it can be scanned · 0d81ba5d
由 Chris Mason 提交于 3月 24, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
0d81ba5d

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功