提交 · 6a63209fc02d5483371f07e4913ee8abad608051 · openanolis / cloud-kernel

21 2月, 2009 1 次提交

Btrfs: add better -ENOSPC handling · 6a63209f

由 Josef Bacik 提交于 2月 20, 2009

This is a step in the direction of better -ENOSPC handling. Instead of
checking the global bytes counter we check the space_info bytes counters to
make sure we have enough space.

If we don't we go ahead and try to allocate a new chunk, and then if that fails
we return -ENOSPC. This patch adds two counters to btrfs_space_info,
bytes_delalloc and bytes_may_use.

bytes_delalloc account for extents we've actually setup for delalloc and will
be allocated at some point down the line.

bytes_may_use is to keep track of how many bytes we may use for delalloc at
some point. When we actually set the extent_bit for the delalloc bytes we
subtract the reserved bytes from the bytes_may_use counter. This keeps us from
not actually being able to allocate space for any delalloc bytes.
Signed-off-by: NJosef Bacik <jbacik@redhat.com>

6a63209f

20 2月, 2009 1 次提交

Btrfs: check file pointer in btrfs_sync_file · 2cfbd50b

由 Chris Mason 提交于 2月 20, 2009

fsync can be called by NFS with a null file pointer, and btrfs was
oopsing in this case.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

2cfbd50b

22 1月, 2009 1 次提交

Btrfs: fix tree logs parallel sync · 7237f183

由 Yan Zheng 提交于 1月 21, 2009

To improve performance, btrfs_sync_log merges tree log sync
requests. But it wrongly merges sync requests for different
tree logs. If multiple tree logs are synced at the same time,
only one of them actually gets synced.

This patch has following changes to fix the bug:

Move most tree log related fields in btrfs_fs_info to
btrfs_root. This allows merging sync requests separately
for each tree log.

Don't insert root item into the log root tree immediately
after log tree is allocated. Root item for log tree is
inserted when log tree get synced for the first time. This
allows syncing the log root tree without first syncing all
log trees.

At tree-log sync, btrfs_sync_log first sync the log tree;
then updates corresponding root item in the log root tree;
sync the log root tree; then update the super block.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

7237f183

21 1月, 2009 1 次提交

Btrfs: removed unused #include <version.h>'s · 7eaebe7d

由 Huang Weiyi 提交于 1月 21, 2009

Removed unused #include <version.h>'s in btrfs
Signed-off-by: NHuang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

7eaebe7d

06 1月, 2009 3 次提交

Btrfs: don't change file extent's ram_bytes in btrfs_drop_extents · 1ba12553

由 Yan Zheng 提交于 1月 06, 2009

btrfs_drop_extents doesn't change file extent's ram_bytes
in the case of booked extent. To be consistent, we should
also not change ram_bytes when truncating existing extent.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

1ba12553

Btrfs: Fix checkpatch.pl warnings · d397712b

由 Chris Mason 提交于 1月 05, 2009

There were many, most are fixed now.  struct-funcs.c generates some warnings
but these are bogus.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d397712b

Y
Btrfs: Fix memset length in btrfs_file_write · 9aead435
由 yanhai zhu 提交于 1月 05, 2009
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
9aead435

12 12月, 2008 1 次提交

Btrfs: fix nodatasum handling in balancing code · 17d217fe

由 Yan Zheng 提交于 12月 12, 2008

Checksums on data can be disabled by mount option, so it's
possible some data extents don't have checksums or have
invalid checksums. This causes trouble for data relocation.
This patch contains following things to make data relocation
work.

1) make nodatasum/nodatacow mount option only affects new
files. Checksums and COW on data are only controlled by the
inode flags.

2) check the existence of checksum in the nodatacow checker.
If checksums exist, force COW the data extent. This ensure that
checksum for a given block is either valid or does not exist.

3) update data relocation code to properly handle the case
of checksum missing.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

17d217fe

09 12月, 2008 2 次提交

Btrfs: Fix compressed checksum fsync log copies · 580afd76

由 Chris Mason 提交于 12月 08, 2008

The fsync logging code makes sure to onl copy the relevant checksum for each
extent based on the file extent pointers it finds.

But for compressed extents, it needs to copy the checksum for the
entire extent.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

580afd76

Btrfs: Add inode sequence number for NFS and reserved space in a few structs · c3027eb5

由 Chris Mason 提交于 12月 08, 2008

This adds a sequence number to the btrfs inode that is increased on
every update.  NFS will be able to use that to detect when an inode has
changed, without relying on inaccurate time fields.

While we're here, this also:

Puts reserved space into the super block and inode

Adds a log root transid to the super so we can pick the newest super
based on the fsync log as well as the main transaction ID.  For now
the log root transid is always zero, but that'll get fixed.

Adds a starting offset to the dev_item.  This will let us do better
alignment calculations if we know the start of a partition on the disk.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c3027eb5

02 12月, 2008 1 次提交
- C
  Btrfs: fix shadowed variable declarations · 6e430f94
  由 Christoph Hellwig 提交于 12月 02, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
  6e430f94
13 11月, 2008 1 次提交

Btrfs: Fix race in btrfs_mark_extent_written · c36047d7

由 Yan Zheng 提交于 11月 12, 2008

When extent needs to be split, btrfs_mark_extent_written truncates the extent
first, then inserts a new extent and increases the reference count.

The race happens if someone else deletes the old extent before the new extent
is inserted. The fix here is increase the reference count in advance. This race
is similar to the race in btrfs_drop_extents that was recently fixed.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

c36047d7

11 11月, 2008 2 次提交

Btrfs: Fix starting search offset inside btrfs_drop_extents · 8247b41a

由 Yan Zheng 提交于 11月 11, 2008

btrfs_drop_extents will drop paths and search again when it needs to
force COW of higher nodes.  It was using the key it found during the last
search as the offset for the next search.

But, this wasn't always correct.  The key could be from before our desired
range, and because we're dropping the path, it is possible for file's items
to change while we do the search again.

The fix here is to make sure we don't search for something smaller than
the offset btrfs_drop_extents was called with.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

8247b41a

Btrfs: Fix usage of struct extent_map->orig_start · 445a6944

由 Chris Mason 提交于 11月 10, 2008

This makes sure the orig_start field in struct extent_map gets set
everywhere the extent_map structs are created or modified.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

445a6944

10 11月, 2008 1 次提交

Btrfs: Fix csum error for compressed data · ff5b7ee3

由 Yan Zheng 提交于 11月 10, 2008

The decompress code doesn't take the logical offset in extent
pointer into account. If the logical offset isn't zero, data
will be decompressed into wrong pages.

The solution used here is to record the starting offset of the extent
in the file separately from the logical start of the extent_map struct.
This allows us to avoid problems inserting overlapping extents.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

ff5b7ee3

07 11月, 2008 1 次提交

Btrfs: Optimize compressed writeback and reads · 771ed689

由 Chris Mason 提交于 11月 06, 2008

When reading compressed extents, try to put pages into the page cache
for any pages covered by the compressed extent that readpages didn't already
preload.

Add an async work queue to handle transformations at delayed allocation processing
time.  Right now this is just compression.  The workflow is:

1) Find offsets in the file marked for delayed allocation
2) Lock the pages
3) Lock the state bits
4) Call the async delalloc code

The async delalloc code clears the state lock bits and delalloc bits.  It is
important this happens before the range goes into the work queue because
otherwise it might deadlock with other work queue items that try to lock
those extent bits.

The file pages are compressed, and if the compression doesn't work the
pages are written back directly.

An ordered work queue is used to make sure the inodes are written in the same
order that pdflush or writepages sent them down.

This changes extent_write_cache_pages to let the writepage function
update the wbc nr_written count.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

771ed689

01 11月, 2008 1 次提交

Btrfs: Compression corner fixes · 70b99e69

由 Chris Mason 提交于 10月 31, 2008

Make sure we keep page->mapping NULL on the pages we're getting
via alloc_page.  It gets set so a few of the callbacks can do the right
thing, but in general these pages don't have a mapping.

Don't try to truncate compressed inline items in btrfs_drop_extents.
The whole compressed item must be preserved.

Don't try to create multipage inline compressed items.  When we try to
overwrite just the first page of the file, we would have to read in and recow
all the pages after it in the same compressed inline items.  For now, only
create single page inline items.

Make sure we lock pages in the correct order during delalloc.  The
search into the state tree for delalloc bytes can return bytes before
the page we already have locked.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

70b99e69

31 10月, 2008 3 次提交

Btrfs: Add fallocate support v2 · d899e052

由 Yan Zheng 提交于 10月 30, 2008

This patch updates btrfs-progs for fallocate support.

fallocate is a little different in Btrfs because we need to tell the
COW system that a given preallocated extent doesn't need to be
cow'd as long as there are no snapshots of it.  This leverages the
-o nodatacow checks.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

d899e052

Btrfs: Fix bookend extent race v2 · 6643558d

由 Yan Zheng 提交于 10月 30, 2008

When dropping middle part of an extent, btrfs_drop_extents truncates
the extent at first, then inserts a bookend extent.

Since truncation and insertion can't be done atomically, there is a small
period that the bookend extent isn't in the tree. This causes problem for
functions that search the tree for file extent item. The way to fix this is
lock the range of the bookend extent before truncation.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

6643558d

Btrfs: update hole handling v2 · 9036c102

由 Yan Zheng 提交于 10月 30, 2008

This patch splits the hole insertion code out of btrfs_setattr
into btrfs_cont_expand and updates btrfs_get_extent to properly
handle the case that file extent items are not continuous.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

9036c102

30 10月, 2008 1 次提交

Btrfs: Add zlib compression support · c8b97818

由 Chris Mason 提交于 10月 29, 2008

This is a large change for adding compression on reading and writing,
both for inline and regular extents.  It does some fairly large
surgery to the writeback paths.

Compression is off by default and enabled by mount -o compress.  Even
when the -o compress mount option is not used, it is possible to read
compressed extents off the disk.

If compression for a given set of pages fails to make them smaller, the
file is flagged to avoid future compression attempts later.

* While finding delalloc extents, the pages are locked before being sent down
to the delalloc handler.  This allows the delalloc handler to do complex things
such as cleaning the pages, marking them writeback and starting IO on their
behalf.

* Inline extents are inserted at delalloc time now.  This allows us to compress
the data before inserting the inline extent, and it allows us to insert
an inline extent that spans multiple pages.

* All of the in-memory extent representations (extent_map.c, ordered-data.c etc)
are changed to record both an in-memory size and an on disk size, as well
as a flag for compression.

From a disk format point of view, the extent pointers in the file are changed
to record the on disk size of a given extent and some encoding flags.
Space in the disk format is allocated for compression encoding, as well
as encryption and a generic 'other' field.  Neither the encryption or the
'other' field are currently used.

In order to limit the amount of data read for a single random read in the
file, the size of a compressed extent is limited to 128k.  This is a
software only limit, the disk format supports u64 sized compressed extents.

In order to limit the ram consumed while processing extents, the uncompressed
size of a compressed extent is limited to 256k.  This is a software only limit
and will be subject to tuning later.

Checksumming is still done on compressed extents, and it is done on the
uncompressed version of the data.  This way additional encodings can be
layered on without having to figure out which encoding to checksum.

Compression happens at delalloc time, which is basically singled threaded because
it is usually done by a single pdflush thread.  This makes it tricky to
spread the compression load across all the cpus on the box.  We'll have to
look at parallel pdflush walks of dirty inodes at a later time.

Decompression is hooked into readpages and it does spread across CPUs nicely.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c8b97818

09 10月, 2008 2 次提交

Btrfs: Remove offset field from struct btrfs_extent_ref · 3bb1a1bc

由 Yan Zheng 提交于 10月 09, 2008

The offset field in struct btrfs_extent_ref records the position
inside file that file extent is referenced by. In the new back
reference system, tree leaves holding references to file extent
are recorded explicitly. We can scan these tree leaves very quickly, so the
offset field is not required.

This patch also makes the back reference system check the objectid
when extents are in deleting.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

3bb1a1bc

Btrfs: Count space allocated to file in bytes · a76a3cd4

由 Yan Zheng 提交于 10月 09, 2008

This patch makes btrfs count space allocated to file in bytes instead
of 512 byte sectors.

Everything else in btrfs uses a byte count instead of sector sizes or
blocks sizes, so this fits better.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

a76a3cd4

04 10月, 2008 1 次提交

Btrfs: O_DIRECT writes via buffered writes + invaldiate · cb843a6f

由 Chris Mason 提交于 10月 03, 2008

This reworks the btrfs O_DIRECT write code a bit. It had always fallen
back to buffered IO and done an invalidate, but needed to be updated
for the data=ordered code. The invalidate wasn't actually removing pages
because they were still inside an ordered extent.

This also combines the O_DIRECT/O_SYNC paths where possible, and kicks
off IO in the main btrfs_file_write loop to keep the pipe down the the
disk full as we process long writes.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

cb843a6f

30 9月, 2008 1 次提交

Btrfs: add and improve comments · d352ac68

由 Chris Mason 提交于 9月 29, 2008

This improves the comments at the top of many functions.  It didn't
dive into the guts of functions because I was trying to
avoid merging problems with the new allocator and back reference work.

extent-tree.c and volumes.c were both skipped, and there is definitely
more work todo in cleaning and commenting the code.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d352ac68

26 9月, 2008 2 次提交

Btrfs: extent_map and data=ordered fixes for space balancing · 5b21f2ed

由 Zheng Yan 提交于 9月 26, 2008

* Add an EXTENT_BOUNDARY state bit to keep the writepage code
from merging data extents that are in the process of being
relocated.  This allows us to do accounting for them properly.

* The balancing code relocates data extents indepdent of the underlying
inode.  The extent_map code was modified to properly account for
things moving around (invalidating extent_map caches in the inode).

* Don't take the drop_mutex in the create_subvol ioctl.  It isn't
required.

* Fix walking of the ordered extent list to avoid races with sys_unlink

* Change the lock ordering rules.  Transaction start goes outside
the drop_mutex.  This allows btrfs_commit_transaction to directly
drop the relocation trees.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

5b21f2ed

Remove Btrfs compat code for older kernels · 2b1f55b0

由 Chris Mason 提交于 9月 24, 2008

Btrfs had compatibility code for kernels back to 2.6.18.  These have
been removed, and will be maintained in a separate backport
git tree from now on.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

2b1f55b0

25 9月, 2008 13 次提交

Btrfs: Full back reference support · 31840ae1

由 Zheng Yan 提交于 9月 23, 2008

This patch makes the back reference system to explicit record the
location of parent node for all types of extents. The location of
parent node is placed into the offset field of backref key. Every
time a tree block is balanced, the back references for the affected
lower level extents are updated.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

31840ae1

Btrfs: Dir fsync optimizations · 49eb7e46

由 Chris Mason 提交于 9月 11, 2008

Drop i_mutex during the commit

Don't bother doing the fsync at all unless the dir is marked as dirtied
and needing fsync in this transaction.  For directories, this means
that someone has unlinked a file from the dir without fsyncing the
file.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

49eb7e46

Btrfs: Add a write ahead tree log to optimize synchronous operations · e02119d5

由 Chris Mason 提交于 9月 05, 2008

File syncs and directory syncs are optimized by copying their
items into a special (copy-on-write) log tree. There is one log tree per
subvolume and the btrfs super block points to a tree of log tree roots.

After a crash, items are copied out of the log tree and back into the
subvolume. See tree-log.c for all the details.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e02119d5

C
Btrfs: Add debugging checks to track down corrupted metadata · a1b32a59
由 Chris Mason 提交于 9月 05, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
a1b32a59
C
Btrfs: Maintain a list of inodes that are delalloc and a way to wait on them · ea8c2819
由 Chris Mason 提交于 8月 04, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
ea8c2819

Btrfs: Improve and cleanup locking done by walk_down_tree · f87f057b

由 Chris Mason 提交于 8月 01, 2008

While dropping snapshots, walk_down_tree does most of the work of checking
reference counts and limiting tree traversal to just the blocks that
we are freeing.

It dropped and held the allocation mutex in strange and confusing ways,
this commit changes it to only hold the mutex while actually freeing a block.

The rest of the checks around reference counts should be safe without the lock
because we only allow one process in btrfs_drop_snapshot at a time. Other
processes dropping reference counts should not drop it to 1 because
their tree roots already have an extra ref on the block.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f87f057b

C
Btrfs: Drop some debugging around the extent_map pinned flag · 3ce7e67a
由 Chris Mason 提交于 7月 31, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
3ce7e67a

Btrfs: Throttle tuning · 37d1aeee

由 Chris Mason 提交于 7月 31, 2008

This avoids waiting for transactions with pages locked by breaking out
the code to wait for the current transaction to close into a function
called by btrfs_throttle.

It also lowers the limits for where we start throttling.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

37d1aeee

Btrfs: Add compatibility for kernels >= 2.6.27-rc1 · 0ee0fda0

由 Sven Wegener 提交于 7月 30, 2008

Add a couple of #if's to follow API changes.
Signed-off-by: NSven Wegener <sven.wegener@stealer.net>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

0ee0fda0

Btrfs: implement memory reclaim for leaf reference cache · bcc63abb

由 Yan 提交于 7月 30, 2008

The memory reclaiming issue happens when snapshot exists. In that
case, some cache entries may not be used during old snapshot dropping,
so they will remain in the cache until umount.

The patch adds a field to struct btrfs_leaf_ref to record create time. Besides,
the patch makes all dead roots of a given snapshot linked together in order of
create time. After a old snapshot was completely dropped, we check the dead
root list and remove all cache entries created before the oldest dead root in
the list.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

bcc63abb

Btrfs: Throttle operations if the reference cache gets too large · ab78c84d

由 Chris Mason 提交于 7月 29, 2008

A large reference cache is directly related to a lot of work pending
for the cleaner thread.  This throttles back new operations based on
the size of the reference cache so the cleaner thread will be able to keep
up.

Overall, this actually makes the FS faster because the cleaner thread will
be more likely to find things in cache.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ab78c84d

Btrfs: Leaf reference cache update · 017e5369

由 Chris Mason 提交于 7月 28, 2008

This changes the reference cache to make a single cache per root
instead of one cache per transaction, and to key by the byte number
of the disk block instead of the keys inside.

This makes it much less likely to have cache misses if a snapshot
or something has an extra reference on a higher node or a leaf while
the first transaction that added the leaf into the cache is dropping.

Some throttling is added to functions that free blocks heavily so they
wait for old transactions to drop.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

017e5369

Btrfs: Fix some data=ordered related data corruptions · f421950f

由 Chris Mason 提交于 7月 22, 2008

Stress testing was showing data checksum errors, most of which were caused
by a lookup bug in the extent_map tree.  The tree was caching the last
pointer returned, and searches would check the last pointer first.

But, search callers also expect the search to return the very first
matching extent in the range, which wasn't always true with the last
pointer usage.

For now, the code to cache the last return value is just removed.  It is
easy to fix, but I think lookups are rare enough that it isn't required anymore.

This commit also replaces do_sync_mapping_range with a local copy of the
related functions.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f421950f

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功