提交 · a213501153fd66e2359e091b1612841305ba6551 · openanolis / cloud-kernel

25 9月, 2008 40 次提交

Btrfs: Replace the big fs_mutex with a collection of other locks · a2135011

由 Chris Mason 提交于 6月 25, 2008

Extent alloctions are still protected by a large alloc_mutex.
Objectid allocations are covered by a objectid mutex
Other btree operations are protected by a lock on individual btree nodes
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a2135011

Btrfs: Start btree concurrency work. · 925baedd

由 Chris Mason 提交于 6月 25, 2008

The allocation trees and the chunk trees are serialized via their own
dedicated mutexes.  This means allocation location is still not very
fine grained.

The main FS btree is protected by locks on each block in the btree.  Locks
are taken top / down, and as processing finishes on a given level of the
tree, the lock is released after locking the lower level.

The end result of a search is now a path where only the lowest level
is locked.  Releasing or freeing the path drops any locks held.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

925baedd

Btrfs: Add a thread pool just for submit_bio · 1cc127b5

由 Chris Mason 提交于 6月 12, 2008

If a bio submission is after a lock holder waiting for the bio
on the work queue, it is possible to deadlock.  Move the bios
into their own pool.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1cc127b5

Btrfs: Add a mount option to control worker thread pool size · 4543df7e

由 Chris Mason 提交于 6月 11, 2008

mount -o thread_pool_size changes the default, which is
min(num_cpus + 2, 8).  Larger thread pools would make more sense on
very large disk arrays.

This mount option controls the max size of each thread pool.  There
are multiple thread pools, so the total worker count will be larger
than the mount option.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

4543df7e

Btrfs: Add async worker threads for pre and post IO checksumming · 8b712842

由 Chris Mason 提交于 6月 11, 2008

Btrfs has been using workqueues to spread the checksumming load across
other CPUs in the system.  But, workqueues only schedule work on the
same CPU that queued the work, giving them a limited benefit for systems with
higher CPU counts.

This code adds a generic facility to schedule work with pools of kthreads,
and changes the bio submission code to queue bios up.  The queueing is
important to make sure large numbers of procs on the system don't
turn streaming workloads into random workloads by sending IO down
concurrently.

The end result of all of this is much higher performance (and CPU usage) when
doing checksumming on large machines.  Two worker pools are created,
one for writes and one for endio processing.  The two could deadlock if
we tried to service both from a single pool.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

8b712842

btrfs: sanity mount option parsing and early mount code · edf24abe

由 Christoph Hellwig 提交于 6月 10, 2008

Also adds lots of comments to describe what's going on here.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

edf24abe

J
Btrfs: bdi_init and bdi_destroy come with 2.6.23 · 51ebc0d3
由 Jan Engelhardt 提交于 6月 09, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
51ebc0d3

Btrfs: Always use the async submission queue for checksummed writes · da496f2a

由 Chris Mason 提交于 5月 27, 2008

This avoids IO stalls and poorly ordered IO from inline writers mixing in
with the async submission queue
Signed-off-by: NChris Mason <chris.mason@oracle.com>

da496f2a

C
Btrfs: Enable btree balancing on old kernels again · 1c8cfcc1
由 Chris Mason 提交于 5月 16, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
1c8cfcc1

Btrfs: Change the congestion functions to meter the number of async submits as well · cb03c743

由 Chris Mason 提交于 5月 15, 2008

The async submit workqueue was absorbing too many requests, leading to long
stalls where the async submitters were stalling.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

cb03c743

Fix btrfs_open_devices to deal with changes since the scan ioctls · a0af469b

由 Chris Mason 提交于 5月 13, 2008

Devices can change after the scan ioctls are done, and btrfs_open_devices
needs to be able to verify them as they are opened and used by the FS.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a0af469b

C
Btrfs: Add mount -o degraded to allow mounts to continue with missing devices · dfe25020
由 Chris Mason 提交于 5月 13, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
dfe25020

Btrfs: Handle write errors on raid1 and raid10 · 1259ab75

由 Chris Mason 提交于 5月 12, 2008

When duplicate copies exist, writes are allowed to fail to one of those
copies.  This changeset includes a few changes that allow the FS to
continue even when some IOs fail.

It also adds verification of the parent generation number for btree blocks.
This generation is stored in the pointer to a block, and it ensures
that missed writes to are detected.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1259ab75

C
Btrfs: Pass down the expected generation number when reading tree blocks · ca7a79ad
由 Chris Mason 提交于 5月 12, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
ca7a79ad
C
Btrfs: Don't do btree balance_dirty_pages on old kernels, it stalls forever · 188de649
由 Chris Mason 提交于 5月 09, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
188de649

Btrfs: Add support for online device removal · a061fc8d

由 Chris Mason 提交于 5月 07, 2008

This required a few structural changes to the code that manages bdev pointers:

The VFS super block now gets an anon-bdev instead of a pointer to the
lowest bdev.  This allows us to avoid swapping the super block bdev pointer
around at run time.

The code to read in the super block no longer goes through the extent
buffer interface.  Things got ugly keeping the mapping constant.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a061fc8d

Btrfs: Fixes for 2.6.18 enterprise kernels · d6bfde87

由 Chris Mason 提交于 4月 30, 2008

2.6.18 seems to get caught in an infinite loop when
cancel_rearming_delayed_workqueue is called more than once, so this switches
to cancel_delayed_work, which is arguably more correct.

Also, balance_dirty_pages can run into problems with 2.6.18 based kernels
because it doesn't have the per-bdi dirty limits. This avoids calling
balance_dirty_pages on the btree inode unless there is actually something
to balance, which is a good optimization in general.

Finally there's a compile fix for ordered-data.h
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d6bfde87

C
Btrfs: Deal with failed writes in mirrored configurations · a236aed1
由 Chris Mason 提交于 4月 29, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
a236aed1
C
Btrfs: Drop some verbose printks · 4235298e
由 Chris Mason 提交于 4月 28, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
4235298e
C
Btrfs: Make the resizer work based on shrinking and growing devices · 8f18cf13
由 Chris Mason 提交于 4月 25, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
8f18cf13
C
Btrfs: Add failure handling for read_sys_array · 84eed90f
由 Chris Mason 提交于 4月 25, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
84eed90f
C
Btrfs: Fix the unplug_io_fn to grab a consistent copy of page->mapping · bcbfce8a
由 Chris Mason 提交于 4月 22, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
bcbfce8a
C
Deal with page == NULL in the btrfs_unplug_io_fn · 38b66988
由 Chris Mason 提交于 4月 22, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
38b66988
C
Btrfs: Make an unplug function that doesn't unplug every spindle · f2d8d74d
由 Chris Mason 提交于 4月 21, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
f2d8d74d
C
Btrfs: Remove debugging statements from the invalidatepage calls · 4ef64eae
由 Chris Mason 提交于 4月 21, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
4ef64eae
C
Btrfs: Scale the bdi ra_pages by the number of devices in the FS · 4575c9cc
由 Chris Mason 提交于 4月 18, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
4575c9cc

Force page->private removal in btrfs_invalidatepage · 9ad6b7bc

由 Chris Mason 提交于 4月 18, 2008

btrfs_invalidatepage is not allowed to leave pages around on the lru.
Any such pages will trigger an oops later on because the VM will see
page->private and assume it is a buffer head.

This also forces extra flushes of the async work queues before
dropping all the pages on the btree inode during unmount.  Left over
items on the work queues are one possible cause of busy state ranges
during truncate_inode_pages.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

9ad6b7bc

C
Btrfs: Set the btree inode i_size to OFFSET_MAX · 0afbaf8c
由 Chris Mason 提交于 4月 18, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
0afbaf8c

Btrfs: Don't drop extent_map cache during releasepage on the btree inode · 7b13b7b1

由 Chris Mason 提交于 4月 18, 2008

The btree inode should only have a single extent_map in the cache,
it doesn't make sense to ever drop it.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

7b13b7b1

C
Btrfs: Only do async bio submission for pdflush · 7b859fe7
由 Chris Mason 提交于 4月 16, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
7b859fe7

Btrfs: Create a work queue for bio writes · 44b8bd7e

由 Chris Mason 提交于 4月 16, 2008

This allows checksumming to happen in parallel among many cpus, and
keeps us from bogging down pdflush with the checksumming code.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

44b8bd7e

Btrfs: Add chunk uuids and update multi-device back references · e17cade2

由 Chris Mason 提交于 4月 15, 2008

Block headers now store the chunk tree uuid

Chunk items records the device uuid for each stripes

Device extent items record better back refs to the chunk tree

Block groups record better back refs to the chunk tree

The chunk tree format has also changed.  The objectid of BTRFS_CHUNK_ITEM_KEY
used to be the logical offset of the chunk.  Now it is a chunk tree id,
with the logical offset being stored in the offset field of the key.

This allows a single chunk tree to record multiple logical address spaces,
upping the number of bytes indexed by a chunk tree from 2^64 to
2^128.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e17cade2

Btrfs: A few updates for 2.6.18 and versions older than 2.6.25 · b248a415

由 Chris Mason 提交于 4月 14, 2008

This includes fixing a missing spinlock init call that caused oops on mount
for most kernels other than 2.6.25.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b248a415

Btrfs: bio_endio support for linux 2.6.23 and older. · 73f61b2a

由 Miguel 提交于 4月 11, 2008

bio_endio() changed prototype on linux 2.6.24, support older kernels
using the older prototype.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

73f61b2a

Btrfs: Endianess bug fix for v0.13 with kernels · a5eb62e3

由 Miguel 提交于 4月 11, 2008

Fix for a endianess BUG when using btrfs v0.13 with kernels older than 2.6.23

Problem:

Has of v0.13, btrfs-progs is using crc32c.c equivalent to the one found on
linux-2.6.23/lib/libcrc32c.c Since crc32c_le() changed in linux-2.6.23, when
running btrfs v0.13 with older kernels we have a missmatch between the versions
of crc32c_le() from btrfs-progs and libcrc32c in the kernel.  This missmatch
causes a bug when using btrfs on big endian machines.

Solution:
btrfs_crc32c() macro that when compiling for kernels older than 2.6.23, does
endianess conversion to parameters and return value of crc32c().
This endianess conversion nullifies the differences in implementation
of crc32c_le().
If kernel 2.6.23 or better, it calls crc32c().
Signed-off-by: NMiguel Sousa Filipe <miguel.filipe@gmail.com>
---
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a5eb62e3

C
Btrfs: Add extra checks to avoid removing extent_state from pages we can't free · 3dd39914
由 Chris Mason 提交于 4月 11, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
3dd39914
C
Btrfs: Write out all super blocks on commit, and bring back proper barrier support · f2984462
由 Chris Mason 提交于 4月 10, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
f2984462
C
Btrfs: Retry metadata reads in the face of checksum failures · f188591e
由 Chris Mason 提交于 4月 09, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
f188591e

Btrfs: Handle data block end_io through the async work queue · 22c59948

由 Chris Mason 提交于 4月 09, 2008

Before it was done by the bio end_io routine, the work queue code is able
to scale much better with faster IO subsystems.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

22c59948

Btrfs: Do metadata checksums for reads via a workqueue · ce9adaa5

由 Chris Mason 提交于 4月 09, 2008

Before, metadata checksumming was done by the callers of read_tree_block,
which would set EXTENT_CSUM bits in the extent tree to show that a given
range of pages was already checksummed and didn't need to be verified
again.

But, those bits could go away via try_to_releasepage, and the end
result was bogus checksum failures on pages that never left the cache.

The new code validates checksums when the page is read.  It is a little
tricky because metadata blocks can span pages and a single read may
end up going via multiple bios.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ce9adaa5

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功