提交 · 26ce34a9c47334ff7984769e4661b2f1883594ff · openeuler / Kernel

30 9月, 2008 1 次提交

Btrfs: add and improve comments · d352ac68

由 Chris Mason 提交于 9月 29, 2008

This improves the comments at the top of many functions.  It didn't
dive into the guts of functions because I was trying to
avoid merging problems with the new allocator and back reference work.

extent-tree.c and volumes.c were both skipped, and there is definitely
more work todo in cleaning and commenting the code.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d352ac68

26 9月, 2008 2 次提交

Btrfs: extent_map and data=ordered fixes for space balancing · 5b21f2ed

由 Zheng Yan 提交于 9月 26, 2008

* Add an EXTENT_BOUNDARY state bit to keep the writepage code
from merging data extents that are in the process of being
relocated.  This allows us to do accounting for them properly.

* The balancing code relocates data extents indepdent of the underlying
inode.  The extent_map code was modified to properly account for
things moving around (invalidating extent_map caches in the inode).

* Don't take the drop_mutex in the create_subvol ioctl.  It isn't
required.

* Fix walking of the ordered extent list to avoid races with sys_unlink

* Change the lock ordering rules.  Transaction start goes outside
the drop_mutex.  This allows btrfs_commit_transaction to directly
drop the relocation trees.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

5b21f2ed

Remove Btrfs compat code for older kernels · 2b1f55b0

由 Chris Mason 提交于 9月 24, 2008

Btrfs had compatibility code for kernels back to 2.6.18.  These have
been removed, and will be maintained in a separate backport
git tree from now on.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

2b1f55b0

25 9月, 2008 37 次提交

Btrfs: Full back reference support · 31840ae1

由 Zheng Yan 提交于 9月 23, 2008

This patch makes the back reference system to explicit record the
location of parent node for all types of extents. The location of
parent node is placed into the offset field of backref key. Every
time a tree block is balanced, the back references for the affected
lower level extents are updated.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

31840ae1

Btrfs: free space accounting redo · 0f9dd46c

由 Josef Bacik 提交于 9月 23, 2008

1) replace the per fs_info extent_io_tree that tracked free space with two
rb-trees per block group to track free space areas via offset and size. The
reason to do this is because most allocations come with a hint byte where to
start, so we can usually find a chunk of free space at that hint byte to satisfy
the allocation and get good space packing. If we cannot find free space at or
after the given offset we fall back on looking for a chunk of the given size as
close to that given offset as possible. When we fall back on the size search we
also try to find a slot as close to the size we want as possible, to avoid
breaking small chunks off of huge areas if possible.

2) remove the extent_io_tree that tracked the block group cache from fs_info and
replaced it with an rb-tree thats tracks block group cache via offset. also
added a per space_info list that tracks the block group cache for the particular
space so we can lookup related block groups easily.

3) cleaned up the allocation code to make it a little easier to read and a
little less complicated. Basically there are 3 steps, first look from our
provided hint. If we couldn't find from that given hint, start back at our
original search start and look for space from there. If that fails try to
allocate space if we can and start looking again. If not we're screwed and need
to start over again.

4) small fixes. there were some issues in volumes.c where we wouldn't allocate
the rest of the disk. fixed cow_file_range to actually pass the alloc_hint,
which has helped a good bit in making the fs_mark test I run have semi-normal
results as we run out of space. Generally with data allocations we don't track
where we last allocated from, so everytime we did a data allocation we'd search
through every block group that we have looking for free space. Now searching a
block group with no free space isn't terribly time consuming, it was causing a
slight degradation as we got more data block groups. The alloc_hint has fixed
this slight degredation and made things semi-normal.

There is still one nagging problem I'm working on where we will get ENOSPC when
there is definitely plenty of space. This only happens with metadata
allocations, and only when we are almost full. So you generally hit the 85%
mark first, but sometimes you'll hit the BUG before you hit the 85% wall. I'm
still tracking it down, but until then this seems to be pretty stable and make a
significant performance gain.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

0f9dd46c

Btrfs: Tree logging fixes · 4bef0848

由 Chris Mason 提交于 9月 08, 2008

* Pin down data blocks to prevent them from being reallocated like so:

trans 1: allocate file extent
trans 2: free file extent
trans 3: free file extent during old snapshot deletion
trans 3: allocate file extent to new file
trans 3: fsync new file

Before the tree logging code, this was legal because the fsync
would commit the transation that did the final data extent free
and the transaction that allocated the extent to the new file
at the same time.

With the tree logging code, the tree log subtransaction can commit
before the transaction that freed the extent.  If we crash,
we're left with two different files using the extent.

* Don't wait in start_transaction if log replay is going on.  This
avoids deadlocks from iput while we're cleaning up link counts in the
replay code.

* Don't deadlock in replay_one_name by trying to read an inode off
the disk while holding paths for the directory

* Hold the buffer lock while we mark a buffer as written.  This
closes a race where someone is changing a buffer while we write it.
They are supposed to mark it dirty again after they change it, but
this violates the cow rules.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

4bef0848

Btrfs: trivial sparse fixes · b214107e

由 Christoph Hellwig 提交于 9月 05, 2008

Fix a bunch of trivial sparse complaints.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b214107e

C
Btrfs: Add debugging checks to track down corrupted metadata · a1b32a59
由 Chris Mason 提交于 9月 05, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
a1b32a59

Btrfs: Remove broken optimisations in end_bio functions. · 902b22f3

由 David Woodhouse 提交于 8月 20, 2008

These ended up freeing objects while they were still using them. Under
guidance from Chris, just rip out the 'clever' bits and do things the
simple way.
Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

902b22f3

Btrfs: Change TestSetPageLocked() to trylock_page() · 2db04966

由 David Woodhouse 提交于 8月 07, 2008

Add backwards compatibility in compat.h
Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
---
 compat.h    |    3 +++
 extent_io.c |    3 ++-
 2 files changed, 5 insertions(+), 1 deletions(-)
Signed-off-by: NChris Mason <chris.mason@oracle.com>

2db04966

Btrfs: Add compatibility for kernels >= 2.6.27-rc1 · 0ee0fda0

由 Sven Wegener 提交于 7月 30, 2008

Add a couple of #if's to follow API changes.
Signed-off-by: NSven Wegener <sven.wegener@stealer.net>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

0ee0fda0

Btrfs: implement memory reclaim for leaf reference cache · bcc63abb

由 Yan 提交于 7月 30, 2008

The memory reclaiming issue happens when snapshot exists. In that
case, some cache entries may not be used during old snapshot dropping,
so they will remain in the cache until umount.

The patch adds a field to struct btrfs_leaf_ref to record create time. Besides,
the patch makes all dead roots of a given snapshot linked together in order of
create time. After a old snapshot was completely dropped, we check the dead
root list and remove all cache entries created before the oldest dead root in
the list.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

bcc63abb

Btrfs: Fix verify_parent_transid · 33958dc6

由 Chris Mason 提交于 7月 30, 2008

It was incorrectly clearing the up to date flag on the buffer even
when the buffer properly verified.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

33958dc6

Btrfs: Search data ordered extents first for checksums on read · 89642229

由 Chris Mason 提交于 7月 24, 2008

Checksum items are not inserted into the tree until all of the io from a
given extent is complete. This means one dirty page from an extent may
be written, freed, and then read again before the entire extent is on disk
and the checksum item is inserted.

The checksums themselves are stored in the ordered extent so they can
be inserted in bulk when IO is complete. On read, if a checksum item isn't
found, the ordered extents were being searched for a checksum record.

This all worked most of the time, but the checksum insertion code tries
to reduce the number of tree operations by pre-inserting checksum items
based on i_size and a few other factors. This means the read code might
find a checksum item that hasn't yet really been filled in.

This commit changes things to check the ordered extents first and only
dive into the btree if nothing was found. This removes the need for
extra locking and is more reliable.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

89642229

Btrfs: Fix some data=ordered related data corruptions · f421950f

由 Chris Mason 提交于 7月 22, 2008

Stress testing was showing data checksum errors, most of which were caused
by a lookup bug in the extent_map tree.  The tree was caching the last
pointer returned, and searches would check the last pointer first.

But, search callers also expect the search to return the very first
matching extent in the range, which wasn't always true with the last
pointer usage.

For now, the code to cache the last return value is just removed.  It is
easy to fix, but I think lookups are rare enough that it isn't required anymore.

This commit also replaces do_sync_mapping_range with a local copy of the
related functions.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f421950f

Btrfs: Use a mutex in the extent buffer for tree block locking · a61e6f29

由 Chris Mason 提交于 7月 22, 2008

This replaces the use of the page cache lock bit for locking, which wasn't
suitable for block size < page size and couldn't be used recursively.

The mutexes alone don't fix either problem, but they are the first step.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a61e6f29

Btrfs: Index extent buffers in an rbtree · 6af118ce

由 Chris Mason 提交于 7月 22, 2008

Before, extent buffers were a temporary object, meant to map a number of pages
at once and collect operations on them.

But, a few extra fields have crept in, and they are also the best place to
store a per-tree block lock field as well.  This commit puts the extent
buffers into an rbtree, and ensures a single extent buffer for each
tree block.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

6af118ce

Btrfs: Keep extent mappings in ram until pending ordered extents are done · 7f3c74fb

由 Chris Mason 提交于 7月 18, 2008

It was possible for stale mappings from disk to be used instead of the
new pending ordered extent. This adds a flag to the extent map struct
to keep it pinned until the pending ordered extent is actually on disk.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

7f3c74fb

C
Btrfs: Don't allow releasepage to succeed if EXTENT_ORDERED is set · 211f90e6
由 Chris Mason 提交于 7月 18, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
211f90e6

Btrfs: Use async helpers to deal with pages that have been improperly dirtied · 247e743c

由 Chris Mason 提交于 7月 17, 2008

Higher layers sometimes call set_page_dirty without asking the filesystem
to help. This causes many problems for the data=ordered and cow code.
This commit detects pages that haven't been properly setup for IO and
kicks off an async helper to deal with them.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

247e743c

Btrfs: New data=ordered implementation · e6dcd2dc

由 Chris Mason 提交于 7月 17, 2008

The old data=ordered code would force commit to wait until
all the data extents from the transaction were fully on disk.  This
introduced large latencies into the commit and stalled new writers
in the transaction for a long time.

The new code changes the way data allocations and extents work:

* When delayed allocation is filled, data extents are reserved, and
  the extent bit EXTENT_ORDERED is set on the entire range of the extent.
  A struct btrfs_ordered_extent is allocated an inserted into a per-inode
  rbtree to track the pending extents.

* As each page is written EXTENT_ORDERED is cleared on the bytes corresponding
  to that page.

* When all of the bytes corresponding to a single struct btrfs_ordered_extent
  are written, The previously reserved extent is inserted into the FS
  btree and into the extent allocation trees.  The checksums for the file
  data are also updated.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e6dcd2dc

Btrfs: Change find_extent_buffer to use TestSetPageLocked · 079899c2

由 Chris Mason 提交于 6月 25, 2008

This makes it possible for callers to check for extent_buffers in cache
without deadlocking against any btree locks held.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

079899c2

Btrfs: Start btree concurrency work. · 925baedd

由 Chris Mason 提交于 6月 25, 2008

The allocation trees and the chunk trees are serialized via their own
dedicated mutexes.  This means allocation location is still not very
fine grained.

The main FS btree is protected by locks on each block in the btree.  Locks
are taken top / down, and as processing finishes on a given level of the
tree, the lock is released after locking the lower level.

The end result of a search is now a path where only the lowest level
is locked.  Releasing or freeing the path drops any locks held.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

925baedd

Fix corners in writepage and btrfs_truncate_page · 211c17f5

由 Chris Mason 提交于 5月 15, 2008

The extent_io writepage calls needed an extra check for discarding
pages that started on th last byte in the file.

btrfs_truncate_page needed checks to make sure the page was still part
of the file after reading it, and most importantly, needed to wait for
all IO to the page to finish before freeing the corresponding extents on
disk.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

211c17f5

Btrfs: Handle write errors on raid1 and raid10 · 1259ab75

由 Chris Mason 提交于 5月 12, 2008

When duplicate copies exist, writes are allowed to fail to one of those
copies.  This changeset includes a few changes that allow the FS to
continue even when some IOs fail.

It also adds verification of the parent generation number for btree blocks.
This generation is stored in the pointer to a block, and it ensures
that missed writes to are detected.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1259ab75

C
Btrfs: Drop some verbose printks · 4235298e
由 Chris Mason 提交于 4月 28, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
4235298e
C
Btrfs: write_cache_pages came in 2.6.22 · 5e478dc9
由 Chris Mason 提交于 4月 25, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
5e478dc9
C
Btrfs: write_extent_pages came in 2.6.23 · 004fb575
由 Chris Mason 提交于 4月 25, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
004fb575

Fix btrfs_get_extent and get_block corner cases, and disable O_DIRECT reads · e1c4b745

由 Chris Mason 提交于 4月 22, 2008

The generic O_DIRECT code assumes all the bios have the same bdev,
which isn't true for multi-device btrfs.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e1c4b745

Btrfs: Don't drop extent_map cache during releasepage on the btree inode · 7b13b7b1

由 Chris Mason 提交于 4月 18, 2008

The btree inode should only have a single extent_map in the cache,
it doesn't make sense to ever drop it.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

7b13b7b1

Btrfs: Remove bogus max_sector warnings from the extent_io code · 41471e83

由 Chris Mason 提交于 4月 17, 2008

It was testing the bio before doing logical->physical mapping, so the
test was always wrong.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

41471e83

Btrfs: Use the extent map cache to find the logical disk block during data retries · 3b951516

由 Chris Mason 提交于 4月 17, 2008

The data read retry code needs to find the logical disk block before it
can resubmit new bios. But, finding this block isn't allowed to take
the fs_mutex because that will deadlock with a number of different callers.

This changes the retry code to use the extent map cache instead, but
that requires the extent map cache to have the extent we're looking for.
This is a problem because btrfs_drop_extent_cache just drops the entire
extent instead of the little tiny part it is invalidating.

The bulk of the code in this patch changes btrfs_drop_extent_cache to
invalidate only a portion of the extent cache, and changes btrfs_get_extent
to deal with the results.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

3b951516

Btrfs: define write_cache_pages for linux kernel <= 2.6.20 instead · 594994aa

由 Miguel 提交于 4月 11, 2008

write_cache_pages doesn't exist in linux 2.6.20,  change the #if
condition to match that.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

594994aa

C
Btrfs: Handle checksumming errors while reading data blocks · 7e38326f
由 Chris Mason 提交于 4月 09, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
7e38326f
C
Btrfs: Retry metadata reads in the face of checksum failures · f188591e
由 Chris Mason 提交于 4月 09, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
f188591e

Btrfs: Do metadata checksums for reads via a workqueue · ce9adaa5

由 Chris Mason 提交于 4月 09, 2008

Before, metadata checksumming was done by the callers of read_tree_block,
which would set EXTENT_CSUM bits in the extent tree to show that a given
range of pages was already checksummed and didn't need to be verified
again.

But, those bits could go away via try_to_releasepage, and the end
result was bogus checksum failures on pages that never left the cache.

The new code validates checksums when the page is read.  It is a little
tricky because metadata blocks can span pages and a single read may
end up going via multiple bios.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ce9adaa5

C
Btrfs: Add additional debugging for metadata checksum failures · 728131d8
由 Chris Mason 提交于 4月 09, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
728131d8

Btrfs: Correct usage of IS_ERR() in extent_io.c · 2b114d1d

由 tthtlc 提交于 4月 01, 2008

Signed-off-by: Peter Teoh <htmldeveloper@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

2b114d1d

Btrfs: Add leak debugging for extent_buffer and extent_state · 2d2ae547

由 Chris Mason 提交于 3月 26, 2008

This also fixes one leak around the super block when failing to mount the
FS.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

2d2ae547

C
Btrfs: Bring back mount -o ssd optimizations · 239b14b3
由 Chris Mason 提交于 3月 24, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
239b14b3

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功