openeuler / raspberrypi-kernel

13

1

0

代码
- 文件
- 提交
- 分支
- Tags
- 贡献者
- 分支图
- Diff
Issue 0
- 列表
- 看板
- 标记
- 里程碑
合并请求 0
Wiki 0
- Wiki
分析
- 仓库
- DevOps
项目成员
Pages

27 3月, 2012 2 次提交

C

Btrfs: allow metadata blocks larger than the page size · 727011e0

由 Chris Mason 提交于 8月 06, 2010

A few years ago the btrfs code to support blocks lager than
the page size was disabled to fix a few corner cases in the
page cache handling.  This fixes the code to properly support
large metadata blocks again.

Since current kernels will crash early and often with larger
metadata blocks, this adds an incompat bit so that older kernels
can't mount it.

This also does away with different blocksizes for nodes and leaves.
You get a single block size for all tree blocks.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

727011e0

J

Btrfs: remove search_start and search_end from find_free_extent and callers · 81c9ad23

由 Josef Bacik 提交于 1月 18, 2012

We have been passing nothing but (u64)-1 to find_free_extent for search_end in
all of the callers, so it's completely useless, and we've always been passing 0
in as search_start, so just remove them as function arguments and move
search_start into find_free_extent. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

81c9ad23

15 2月, 2012 1 次提交

D

btrfs: fix structs where bitfields and spinlock/atomic share 8B word · c08782da

由 David Sterba 提交于 1月 26, 2012

On ia64, powerpc64 and sparc64 the bitfield is modified through a RMW cycle and current
gcc rewrites the adjacent 4B word, which in case of a spinlock or atomic has
disaterous effect.

https://lkml.org/lkml/2012/2/1/220Signed-off-by: NDavid Sterba <dsterba@suse.cz>

c08782da

17 1月, 2012 9 次提交

I

Btrfs: allow for canceling restriper · a7e99c69

由 Ilya Dryomov 提交于 1月 16, 2012

Implement an ioctl for canceling restriper.  Currently we wait until
relocation of the current block group is finished, in future this can be
done by triggering a commit.  Balance item is deleted and no memory
about the interrupted balance is kept.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

a7e99c69

I

Btrfs: allow for pausing restriper · 837d5b6e

由 Ilya Dryomov 提交于 1月 16, 2012

Implement an ioctl for pausing restriper.  This pauses the relocation,
but balance is still considered to be "in progress": balance item is
not deleted, other volume operations cannot be started, etc.  If paused
in the middle of profile changing operation we will continue making
allocations with the target profile.

Add a hook to close_ctree() to pause restriper and free its data
structures on unmount.  (It's safe to unmount when restriper is in
"paused" state, we will resume with the same parameters on the next
mount)
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

837d5b6e

I

Btrfs: add skip_balance mount option · 9555c6c1

由 Ilya Dryomov 提交于 1月 16, 2012

Since restriper kthread starts involuntarily on mount and can suck cpu
and memory bandwidth add a mount option to forcefully skip it.  The
restriper in that case hangs around in paused state and can be resumed
from userspace when it's convenient.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

9555c6c1

I

Btrfs: save balance parameters to disk · 0940ebf6

由 Ilya Dryomov 提交于 1月 16, 2012

Introduce a new btree objectid for storing balance item.  The reason is
to be able to resume restriper after a crash with the same parameters.
Balance item has a very high objectid and goes into tree of tree roots.

The key for the new item is as follows:

	[ BTRFS_BALANCE_OBJECTID ; BTRFS_BALANCE_ITEM_KEY ; 0 ]

Older kernels simply ignore it so it's safe to mount with an older
kernel and then go back to the newer one.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

0940ebf6

I

Btrfs: do not reduce profile in do_chunk_alloc() · 70922617

由 Ilya Dryomov 提交于 1月 16, 2012

Every caller of do_chunk_alloc() feeds it the reduced allocation
profile, so stop trying to reduce it one more time.  Instead check the
validity of the passed profile.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

70922617

I

Btrfs: add basic restriper infrastructure · c9e9f97b

由 Ilya Dryomov 提交于 1月 16, 2012

Add basic restriper infrastructure: extended balancing ioctl and all
related ioctl data structures, add data structure for tracking
restriper's state to fs_info, etc.  The semantics of the old balancing
ioctl are fully preserved.

Explicitly disallow any volume operations when balance is in progress.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

c9e9f97b

I

Btrfs: add BTRFS_AVAIL_ALLOC_BIT_SINGLE bit · a46d11a8

由 Ilya Dryomov 提交于 1月 16, 2012

Right now on-disk BTRFS_BLOCK_GROUP_* profile bits are used for
avail_{data,metadata,system}_alloc_bits fields, which gather info about
available allocation profiles in the FS. When chunk is created or read
from disk, its profile is OR'ed with the corresponding avail_alloc_bits
field. Since SINGLE is denoted by 0 in the on-disk format, currently
there is no way to tell when such chunks become avaialble. Restriper
needs that information, so add a separate bit for SINGLE profile.

This bit is going to be in-memory only, it should never be written out
to disk, so it's not a disk format change. However to avoid remappings
in future, reserve corresponding on-disk bit.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

a46d11a8

I

Btrfs: introduce masks for chunk type and profile · 52ba6929

由 Ilya Dryomov 提交于 1月 16, 2012

Chunk's type and profile are encoded in u64 flags field.  Introduce
masks to easily access them.  Also fix the type of BTRFS_BLOCK_GROUP_*
constants, it should be ULL.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

52ba6929

I

Btrfs: get rid of *_alloc_profile fields · 6fef8df1

由 Ilya Dryomov 提交于 1月 16, 2012

{data,metadata,system}_alloc_profile fields have been unused for a long
time now.  Get rid of them.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

6fef8df1

09 1月, 2012 1 次提交

A

btrfs: let ->s_fs_info point to fs_info, not root... · 815745cf

由 Al Viro 提交于 11月 17, 2011

the latter can be obtained from the former (by looking as ->tree_root)
just as cheaply as we currently are doing the other way round.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

815745cf

22 12月, 2011 3 次提交

A

Btrfs: mark delayed refs as for cow · 66d7e7f0

由 Arne Jansen 提交于 9月 12, 2011

Add a for_cow parameter to add_delayed_*_ref and pass the appropriate value
from every call site. The for_cow parameter will later on be used to
determine if a ref will change anything with respect to qgroups.

Delayed refs coming from relocation are always counted as for_cow, as they
don't change subvol quota.

Also pass in the fs_info for later use.

btrfs_find_all_roots() will use this as an optimization, as changes that are
for_cow will not change anything with respect to which root points to a
certain leaf. Thus, we don't need to add the current sequence number to
those delayed refs.
Signed-off-by: NArne Jansen <sensille@gmx.net>
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

66d7e7f0

J

Btrfs: added helper btrfs_next_item() · c7d22a3c

由 Jan Schmidt 提交于 11月 22, 2011

btrfs_next_item() makes the btrfs path point to the next item, crossing leaf
boundaries if needed.
Signed-off-by: NArne Jansen <sensille@gmx.net>
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

c7d22a3c

S

Btrfs: integrate integrity check module into btrfs · 21adbd5c

由 Stefan Behrens 提交于 11月 09, 2011

This is the last part of the patch series. It modifies the btrfs
code to use the integrity check module if configured to do so
with the define BTRFS_FS_CHECK_INTEGRITY. If this define is not set,
the only effective change is that code is added that handles the
mount option to activate the integrity check. If the mount option is
set and the define BTRFS_FS_CHECK_INTEGRITY is not set, that code
complains in the log and the mount fails with EINVAL.

Add the mount option to activate the usage of the integrity check
code.
Add invocation of btrfs integrity check code init and cleanup
function on mount and umount, respectively.
Add hook to call btrfs integrity check code version of
submit_bh/submit_bio.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>

21adbd5c

16 12月, 2011 1 次提交

J

Btrfs: deal with enospc from dirtying inodes properly · 22c44fe6

由 Josef Bacik 提交于 11月 30, 2011

Now that we're properly keeping track of delayed inode space we've been getting
a lot of warnings out of btrfs_dirty_inode() when running xfstest 83. This is
because a bunch of people call mark_inode_dirty, which is void so we can't
return ENOSPC. This needs to be fixed in a few areas

1) file_update_time - this updates the mtime and such when writing to a file,
which will call mark_inode_dirty. So copy file_update_time into btrfs so we can
call btrfs_dirty_inode directly and return an error if we get one appropriately.

2) fix symlinks to use btrfs_setattr for ->setattr. For some reason we weren't
setting ->setattr for symlinks, even though we should have been. This catches
one of the cases where we were getting errors in mark_inode_dirty.

3) Fix btrfs_setattr and btrfs_setsize to call btrfs_dirty_inode directly
instead of mark_inode_dirty. This lets us return errors properly for truncate
and chown/anything related to setattr.

4) Add a new btrfs_fs_dirty_inode which will just call btrfs_dirty_inode and
print an error if we have one. The only remaining user we can't control for
this is touch_atime(), but we don't really want to keep people from walking
down the tree if we don't have space to save the atime update, so just complain
but don't worry about it.

With this patch xfstests 83 complains a handful of times instead of hundreds of
times. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

22c44fe6

01 12月, 2011 1 次提交

M

Btrfs: fix deadlock on metadata reservation when evicting a inode · aa38a711

由 Miao Xie 提交于 11月 18, 2011

When I ran the xfstests, I found the test tasks was blocked on meta-data
reservation.

By debugging, I found the reason of this bug:
   start transaction
        |
	v
   reserve meta-data space
	|
	v
   flush delay allocation -> iput inode -> evict inode
	^					|
	|					v
   wait for delay allocation flush <- reserve meta-data space

And besides that, the flush on evicting inode will block the thread, which
is reclaiming the memory, and make oom happen easily.

Fix this bug by skipping the flush step when evicting inode.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>

aa38a711

20 11月, 2011 1 次提交

J

Btrfs: wait on caching if we're loading the free space cache · 291c7d2f

由 Josef Bacik 提交于 11月 14, 2011

We've been hitting panics when running xfstest 13 in a loop for long periods of
time. And actually this problem has always existed so we've been hitting these
things randomly for a while. Basically what happens is we get a thread coming
into the allocator and reading the space cache off of disk and adding the
entries to the free space cache as we go. Then we get another thread that comes
in and tries to allocate from that block group. Since block_group->cached !=
BTRFS_CACHE_NO it goes ahead and tries to do the allocation. We do this because
if we're doing the old slow way of caching we don't want to hold people up and
wait for everything to finish. The problem with this is we could end up
discarding the space cache at some arbitrary point in the future, which means we
could very well end up allocating space that is either bad, or when the real
caching happens it could end up thinking the space isn't in use when it really
is and cause all sorts of other problems.

The solution is to add a new flag to indicate we are loading the free space
cache from disk, and always try to cache the block group if cache->cached !=
BTRFS_CACHE_FINISHED. That way if we are loading the space cache anybody else
who tries to allocate from the block group will have to wait until it's finished
to make sure it completes successfully. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

291c7d2f

15 11月, 2011 1 次提交

L

Btrfs: fix tree corruption after multi-thread snapshots and inode_cache flush · f1ebcc74

由 Liu Bo 提交于 11月 14, 2011

The btrfs snapshotting code requires that once a root has been
snapshotted, we don't change it during a commit.

But there are two cases to lead to tree corruptions:

1) multi-thread snapshots can commit serveral snapshots in a transaction,
   and this may change the src root when processing the following pending
   snapshots, which lead to the former snapshots corruptions;

2) the free inode cache was changing the roots when it root the cache,
   which lead to corruptions.

This fixes things by making sure we force COW the block after we create a
snapshot during commiting a transaction, then any changes to the roots
will result in COW, and we get all the fs roots and snapshot roots to be
consistent.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f1ebcc74

06 11月, 2011 5 次提交

J

Btrfs: fix delayed insertion reservation · c06a0e12

由 Josef Bacik 提交于 11月 04, 2011

We all keep getting those stupid warnings from use_block_rsv when running
stress.sh, and it's because the delayed insertion stuff is being stupid. It's
not the delayed insertion stuffs fault, it's all just stupid. When marking an
inode dirty for oh say updating the time on it, we just do a
btrfs_join_transaction, which doesn't reserve any space. This is stupid because
we're going to have to have space reserve to make this change, but we do it
because it's fast because chances are we're going to call it over and over again
and it doesn't matter. Well thanks to the delayed insertion stuff this is
mostly the case, so we do actually need to make this reservation. So if
trans->bytes_reserved is 0 then try to do a normal reservation. If not return
ENOSPC which will make the btrfs_dirty_inode start a proper transaction which
will let it do the whole ENOSPC dance and reserve enough space for the delayed
insertion to steal the reservation from the transaction.

The other stupid thing we do is not reserve space for the inode when writing to
the thing. Usually this is ok since we have to update the time so we'd have
already done all this work before we get to the endio stuff, so it doesn't
matter. But this is stupid because we could write the data after the
transaction commits where we changed the mtime of the inode so we have to cow
all the way down to the inode anyway. This used to be masked by the delalloc
reservation stuff, but because we delay the update it doesn't get masked in this
case. So again the delayed insertion stuff bites us in the ass. So if our
trans->block_rsv is delalloc, just steal the reservation from the delalloc
reserve. Hopefully this won't bite us in the ass, but I've said that before.

With this patch stress.sh no longer spits out those stupid warnings (famous last
words). Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c06a0e12

J

Btrfs: make a delayed_block_rsv for the delayed item insertion · 6d668dda

由 Josef Bacik 提交于 11月 03, 2011

I've been hitting warnings in use_block_rsv when running the delayed insertion
stuff. It's because we will readjust global block rsv based on what is in use,
which means we could end up discarding reservations that are for the delayed
insertion stuff. So instead create a seperate block rsv for the delayed
insertion stuff. This will also make it easier to debug problems with the
delayed insertion reservations since we will know that only the delayed
insertion code touches this block_rsv. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

6d668dda

C

Btrfs: add a log of past tree roots · af31f5e5

由 Chris Mason 提交于 11月 03, 2011

This takes some of the free space in the btrfs super block
to record information about most of the roots in the last four
commits.

It also adds a -o recovery to use the root history log when
we're not able to read the tree of tree roots, the extent
tree root, the device tree root or the csum root.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

af31f5e5

D

btrfs: separate superblock items out of fs_info · 6c41761f

由 David Sterba 提交于 4月 13, 2011

fs_info has now ~9kb, more than fits into one page. This will cause
mount failure when memory is too fragmented. Top space consumers are
super block structures super_copy and super_for_commit, ~2.8kb each.
Allocate them dynamically. fs_info will be ~3.5kb. (measured on x86_64)

Add a wrapper for freeing fs_info and all of it's dynamically allocated
members.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

6c41761f

C

Btrfs: fix extent pinning bugs in the tree log · e688b725

由 Chris Mason 提交于 10月 31, 2011

The tree log had two important bugs that could cause corruptions after a
crash.  Sometimes we were allowing tree log blocks to be reused after
the tree log was committed but before the transaction commit was done.

This allowed a future metadata write to overwrite the tree log data.  It
is fixed by adding a new variant of freeing reserved extents that always
pins them.  Credit goes to Stefan Behrens and Arne Jansen for many many
hours spent tracking this bug down.

During tree log replay, we do a pass through the tree log and pin all
the extents we find.  This makes sure the replay code won't go in and
use any of those blocks for new allocations during replay.  The problem
is the free space cache isn't honoring these pinned extents.  So the
allocator can end up handing them out, leading to all kinds of problems
during replay.

The fix here is to force any free space cache to load while we pin the
extents, and then to make sure we remove the pinned extents from the
free space rbtree.
Signed-off-by: NChris Mason <chris.mason@oracle.com>
Reported-by: NStefan Behrens <sbehrens@giantdisaster.de>

e688b725

20 10月, 2011 12 次提交

J

Btrfs: seperate out btrfs_block_rsv_check out into 2 different functions · 36ba022a

由 Josef Bacik 提交于 10月 18, 2011

Currently btrfs_block_rsv_check does 2 things, it will either refill a block
reserve like in the truncate or refill case, or it will check to see if there is
enough space in the global reserve and possibly refill it. However because of
overcommit we could be well overcommitting ourselves just to try and refill the
global reserve, when really we should just be committing the transaction. So
breack this out into btrfs_block_rsv_refill and btrfs_block_rsv_check. Refill
will try to reserve more metadata if it can and btrfs_block_rsv_check will not,
it will only tell you if the factor of the total space is still reserved.
Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

36ba022a

J

Btrfs: inline checksums into the disk free space cache · 5b0e95bf

由 Josef Bacik 提交于 10月 06, 2011

Yeah yeah I know this is how we used to do it and then I changed it, but damnit
I'm changing it back. The fact is that writing out checksums will modify
metadata, which could cause us to dirty a block group we've already written out,
so we have to truncate it and all of it's checksums and re-write it which will
write new checksums which could dirty a blockg roup that has already been
written and you see where I'm going with this? This can cause unmount or really
anything that depends on a transaction to commit to take it's sweet damned time
to happen. So go back to the way it was, only this time we're specifically
setting NODATACOW because we can't go through the COW pathway anyway and we're
doing our own built-in cow'ing by truncating the free space cache. The other
new thing is once we truncate the old cache and preallocate the new space, we
don't need to do that song and dance at all for the rest of the transaction, we
can just overwrite the existing space with the new cache if the block group
changes for whatever reason, and the NODATACOW will let us do this fine. So
keep track of which transaction we last cleared our cache in and if we cleared
it in this transaction just say we're all setup and carry on. This survives
xfstests and stress.sh.

The inode cache will continue to use the normal csum infrastructure since it
only gets written once and there will be no more modifications to the fs tree in
a transaction commit.
Signed-off-by: NJosef Bacik <josef@redhat.com>

5b0e95bf

J

Btrfs: allow us to overcommit our enospc reservations · 2bf64758

由 Josef Bacik 提交于 9月 26, 2011

One of the things that kills us is the fact that our ENOSPC reservations are
horribly over the top in most normal cases.  There isn't too much that can be
done about this because when we are completely full we really need them to work
like this so we don't under reserve.  However if there is plenty of unallocated
chunks on the disk we can use that to gauge how much we can overcommit.  So this
patch adds chunk free space accounting so we always know how much unallocated
space we have.  Then if we fail to make a reservation within our allocated
space, check to see if we can overcommit.  In the normal flushing case (like
with delalloc metadata reservations) we'll take the free space and divide it by
2 if our metadata profile is setup for DUP or any of those, and then divide it
by 8 to make sure we don't overcommit too much.  Then if we're in a non-flushing
case (we really need this reservation now!) we only limit ourselves to half of
the free space.  This makes this fio test

[torrent]
filename=torrent-test
rw=randwrite
size=4g
ioengine=sync
directory=/mnt/btrfs-test

go from taking around 45 minutes to 10 seconds on my freshly formatted 3 TiB
file system.  This doesn't seem to break my other enospc tests, but could really
use some more testing as this is a super scary change.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

2bf64758

J

Btrfs: use the inode's mapping mask for allocating pages · 3b16a4e3

由 Josef Bacik 提交于 9月 21, 2011

Johannes pointed out we were allocating only kernel pages for doing writes,
which is kind of a big deal if you are on 32bit and have more than a gig of ram.
So fix our allocations to use the mapping's gfp but still clear __GFP_FS so we
don't re-enter.  Thanks,
Reported-by: NJohannes Weiner <jweiner@redhat.com>
Signed-off-by: NJosef Bacik <josef@redhat.com>

3b16a4e3

J

Btrfs: stop passing a trans handle all around the reservation code · 4a92b1b8

由 Josef Bacik 提交于 8月 30, 2011

The only thing that we need to have a trans handle for is in
reserve_metadata_bytes and thats to know how much flushing we can do.  So
instead of passing it around, just check current->journal_info for a
trans_handle so we know if we can commit a transaction to try and free up space
or not.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

4a92b1b8

J

Btrfs: allow callers to specify if flushing can occur for btrfs_block_rsv_check · 482e6dc5

由 Josef Bacik 提交于 8月 19, 2011

If you run xfstest 224 it you will get lots of messages about not being able to
delete inodes and that they will be cleaned up next mount. This is because
btrfs_block_rsv_check was not calling reserve_metadata_bytes with the ability to
flush, so if there was not enough space, it simply failed. But in truncate and
evict case we could easily flush space to try and get enough space to do our
work, so make btrfs_block_rsv_check take a flush argument to pass down to
reserve_metadata_bytes. Now xfstests 224 runs fine without all those
complaints. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

482e6dc5

J

Btrfs: reduce the amount of space needed for truncates · 07127184

由 Josef Bacik 提交于 8月 19, 2011

With btrfs_truncate_inode_items we always return if we have to go to another
leaf, which makes us do our reservation again. This means we will only ever
modify one leaf at a time, so we only need 1 items worth of slack space. Also,
since we are deleting we will not be creating nodes as we go down, if anything
we'll be free'ing them as we merge them together, so make a different
calculation for truncate which will only have the worst case useage of COW'ing
the entire path down to the leaf. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

07127184

J

Btrfs: kill btrfs_truncate_reserve_metadata · 5e962c78

由 Josef Bacik 提交于 8月 08, 2011

Since we've optimized the truncate path, we no longer require this function.
Signed-off-by: NJosef Bacik <josef@redhat.com>

5e962c78

J

Btrfs: kill unused parts of block_rsv · dabdb640

由 Josef Bacik 提交于 8月 08, 2011

The priority and refill_used flags are not used anymore, and neither is the
usage counter, so just remove them from btrfs_block_rsv.
Signed-off-by: NJosef Bacik <josef@redhat.com>

dabdb640

J

Btrfs: kill the durable block rsv stuff · 37be25bc

由 Josef Bacik 提交于 8月 05, 2011

This is confusing code and isn't used by anything anymore, so delete it.
Signed-off-by: NJosef Bacik <josef@redhat.com>

37be25bc

J

Btrfs: kill the orphan space calculation for snapshots · dba68306

由 Josef Bacik 提交于 8月 04, 2011

This patch kills off the calculation for the amount of space needed for the
orphan operations during a snapshot. The thing is we only do snapshots on
commit, so any space that is in the block_rsv->freed[] isn't going to be in the
new snapshot anyway, so there isn't any reason to require that space to be
reserved for the snapshot to occur. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

dba68306

J

Btrfs: use bytes_may_use for all ENOSPC reservations · fb25e914

由 Josef Bacik 提交于 7月 26, 2011

We have been using bytes_reserved for metadata reservations, which is wrong
since we use that to keep track of outstanding reservations from the allocator.
This resulted in us doing a lot of silly things to make sure we don't allocate a
bunch of metadata chunks since we never had a real view of how much space was
actually in use by metadata.

This passes Arne's enospc test and xfstests as well as my own enospc tests.
Hopefully this will get us moving in the right direction. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

fb25e914

02 10月, 2011 2 次提交

A

btrfs: initial readahead code and prototypes · 7414a03f

由 Arne Jansen 提交于 5月 23, 2011

This is the implementation for the generic read ahead framework.

To trigger a readahead, btrfs_reada_add must be called. It will start
a read ahead for the given range [start, end) on tree root. The returned
handle can either be used to wait on the readahead to finish
(btrfs_reada_wait), or to send it to the background (btrfs_reada_detach).

The read ahead works as follows:
On btrfs_reada_add, the root of the tree is inserted into a radix_tree.
reada_start_machine will then search for extents to prefetch and trigger
some reads. When a read finishes for a node, all contained node/leaf
pointers that lie in the given range will also be enqueued. The reads will
be triggered in sequential order, thus giving a big win over a naive
enumeration. It will also make use of multi-device layouts. Each disk
will have its on read pointer and all disks will by utilized in parallel.
Also will no two disks read both sides of a mirror simultaneously, as this
would waste seeking capacity. Instead both disks will read different parts
of the filesystem.
Any number of readaheads can be started in parallel. The read order will be
determined globally, i.e. 2 parallel readaheads will normally finish faster
than the 2 started one after another.

Changes v2:
 - protect root->node by transaction instead of node_lock
 - fix missed branches:
    The readahead had a too simple check to determine if a branch from
    a node should be checked or not. It now also records the upper bound
    of each node to see if the requested RA range lies within.
 - use KERN_CONT to debug output, to avoid line breaks
 - defer reada_start_machine to worker to avoid deadlock

Changes v3:
 - protect root->node by rcu

Changes v5:
 - changed EIO-semantics of reada_tree_block_flagged
 - remove spin_lock from reada_control and make elems an atomic_t
 - remove unused read_total from reada_control
 - kill reada_key_cmp, use btrfs_comp_cpu_keys instead
 - use kref-style release functions where possible
 - return struct reada_control * instead of void * from btrfs_reada_add
Signed-off-by: NArne Jansen <sensille@gmx.net>

7414a03f

A

btrfs: state information for readahead · 90519d66

由 Arne Jansen 提交于 5月 23, 2011

Add state information for readahead to btrfs_fs_info and btrfs_device

Changes v2:
 - don't wait in radix_trees
 - add own set of workers for readahead
Reviewed-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NArne Jansen <sensille@gmx.net>

90519d66

17 8月, 2011 1 次提交

L

Btrfs: use plain page_address() in header fields setget functions · c97c2916

由 Li Zefan 提交于 8月 03, 2011

We've stopped using highmem for extent buffers.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c97c2916