提交 · c126dea771be1b3c370c0ffc4a09e6a82d492a49 · openeuler / raspberrypi-kernel

17 1月, 2012 21 次提交

I
Btrfs: add balance progress reporting · 19a39dce
由 Ilya Dryomov 提交于 1月 16, 2012
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
19a39dce

Btrfs: allow for resuming restriper after it was paused · de322263

由 Ilya Dryomov 提交于 1月 16, 2012

Recognize BTRFS_BALANCE_RESUME flag passed from userspace.  We use the
same heuristics used when recovering balance after a crash to try to
start where we left off last time.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

de322263

Btrfs: allow for canceling restriper · a7e99c69

由 Ilya Dryomov 提交于 1月 16, 2012

Implement an ioctl for canceling restriper.  Currently we wait until
relocation of the current block group is finished, in future this can be
done by triggering a commit.  Balance item is deleted and no memory
about the interrupted balance is kept.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

a7e99c69

Btrfs: allow for pausing restriper · 837d5b6e

由 Ilya Dryomov 提交于 1月 16, 2012

Implement an ioctl for pausing restriper.  This pauses the relocation,
but balance is still considered to be "in progress": balance item is
not deleted, other volume operations cannot be started, etc.  If paused
in the middle of profile changing operation we will continue making
allocations with the target profile.

Add a hook to close_ctree() to pause restriper and free its data
structures on unmount.  (It's safe to unmount when restriper is in
"paused" state, we will resume with the same parameters on the next
mount)
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

837d5b6e

Btrfs: add skip_balance mount option · 9555c6c1

由 Ilya Dryomov 提交于 1月 16, 2012

Since restriper kthread starts involuntarily on mount and can suck cpu
and memory bandwidth add a mount option to forcefully skip it.  The
restriper in that case hangs around in paused state and can be resumed
from userspace when it's convenient.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

9555c6c1

Btrfs: recover balance on mount · 59641015

由 Ilya Dryomov 提交于 1月 16, 2012

On mount, if balance item is found, resume balance in a separate
kernel thread.

Try to be smart to continue roughly where previous balance (or convert)
was interrupted.  For chunk types that were being converted to some
profile we turn on soft convert, in case of a simple balance we turn on
usage filter and relocate only less-than-90%-full chunks of that type.
These are just heuristics but they help quite a bit, and can be improved
in future.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

59641015

Btrfs: save balance parameters to disk · 0940ebf6

由 Ilya Dryomov 提交于 1月 16, 2012

Introduce a new btree objectid for storing balance item.  The reason is
to be able to resume restriper after a crash with the same parameters.
Balance item has a very high objectid and goes into tree of tree roots.

The key for the new item is as follows:

	[ BTRFS_BALANCE_OBJECTID ; BTRFS_BALANCE_ITEM_KEY ; 0 ]

Older kernels simply ignore it so it's safe to mount with an older
kernel and then go back to the newer one.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

0940ebf6

Btrfs: soft profile changing mode (aka soft convert) · cfa4c961

由 Ilya Dryomov 提交于 1月 16, 2012

When doing convert from one profile to another if soft mode is on
restriper won't touch chunks that already have the profile we are
converting to.  This is useful if e.g. half of the FS was converted
earlier.

The soft mode switch is (like every other filter) per-type.  This means
that we can convert for example meta chunks the "hard" way while
converting data chunks selectively with soft switch.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

cfa4c961

Btrfs: implement online profile changing · e4d8ec0f

由 Ilya Dryomov 提交于 1月 16, 2012

Profile changing is done by launching a balance with
BTRFS_BALANCE_CONVERT bits set and target fields of respective
btrfs_balance_args structs initialized.  Profile reducing code in this
case will pick restriper's target profile if it's available instead of
doing a blind reduce.  If target profile is not yet available it goes
back to a plain reduce.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

e4d8ec0f

Btrfs: do not reduce profile in do_chunk_alloc() · 70922617

由 Ilya Dryomov 提交于 1月 16, 2012

Every caller of do_chunk_alloc() feeds it the reduced allocation
profile, so stop trying to reduce it one more time.  Instead check the
validity of the passed profile.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

70922617

Btrfs: virtual address space subset filter · ea67176a

由 Ilya Dryomov 提交于 1月 16, 2012

Select chunks which have at least one byte located inside a given
[vstart, vend) virtual address space range.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

ea67176a

Btrfs: devid subset filter · 94e60d5a

由 Ilya Dryomov 提交于 1月 16, 2012

Select chunks which have at least one byte of at least one stripe
located on a device with devid X in a given [pstart,pend) physical
address range.

This filter only works when devid filter is turned on.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

94e60d5a

Btrfs: devid filter · 409d404b

由 Ilya Dryomov 提交于 1月 16, 2012

Relocate chunks which have at least one stripe located on a device with
devid X.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

409d404b

Btrfs: usage filter · 5ce5b3c0

由 Ilya Dryomov 提交于 1月 16, 2012

Select chunks that are less than X percent full.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

5ce5b3c0

Btrfs: profiles filter · ed25e9b2

由 Ilya Dryomov 提交于 1月 16, 2012

Select chunks based on a given profile mask.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

ed25e9b2

Btrfs: add basic infrastructure for selective balancing · f43ffb60

由 Ilya Dryomov 提交于 1月 16, 2012

This allows to have a separate set of filters for each chunk type
(data,meta,sys).  The code however is generic and switch on chunk type
is only done once.

This commit also adds a type filter: it allows to balance for example
meta and system chunks w/o touching data ones.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

f43ffb60

Btrfs: add basic restriper infrastructure · c9e9f97b

由 Ilya Dryomov 提交于 1月 16, 2012

Add basic restriper infrastructure: extended balancing ioctl and all
related ioctl data structures, add data structure for tracking
restriper's state to fs_info, etc.  The semantics of the old balancing
ioctl are fully preserved.

Explicitly disallow any volume operations when balance is in progress.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

c9e9f97b

Btrfs: make avail_*_alloc_bits fields dynamic · 10ea00f5

由 Ilya Dryomov 提交于 1月 16, 2012

Currently when new chunks are created respective avail_alloc_bits field
is updated to reflect profiles of all chunks present in the system.
However when chunks are removed profile bits are never cleared.

This patch clears profile bit of respective avail_alloc_bits field when
the last chunk with that profile is removed.  Restriper needs this to
properly operate when "downgrading".
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

10ea00f5

Btrfs: add BTRFS_AVAIL_ALLOC_BIT_SINGLE bit · a46d11a8

由 Ilya Dryomov 提交于 1月 16, 2012

Right now on-disk BTRFS_BLOCK_GROUP_* profile bits are used for
avail_{data,metadata,system}_alloc_bits fields, which gather info about
available allocation profiles in the FS. When chunk is created or read
from disk, its profile is OR'ed with the corresponding avail_alloc_bits
field. Since SINGLE is denoted by 0 in the on-disk format, currently
there is no way to tell when such chunks become avaialble. Restriper
needs that information, so add a separate bit for SINGLE profile.

This bit is going to be in-memory only, it should never be written out
to disk, so it's not a disk format change. However to avoid remappings
in future, reserve corresponding on-disk bit.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

a46d11a8

Btrfs: introduce masks for chunk type and profile · 52ba6929

由 Ilya Dryomov 提交于 1月 16, 2012

Chunk's type and profile are encoded in u64 flags field.  Introduce
masks to easily access them.  Also fix the type of BTRFS_BLOCK_GROUP_*
constants, it should be ULL.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

52ba6929

Btrfs: get rid of *_alloc_profile fields · 6fef8df1

由 Ilya Dryomov 提交于 1月 16, 2012

{data,metadata,system}_alloc_profile fields have been unused for a long
time now.  Get rid of them.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

6fef8df1

11 1月, 2012 11 次提交

Btrfs: fix possible deadlock when opening a seed device · b367e47f

由 Li Zefan 提交于 12月 07, 2011

The correct lock order is uuid_mutex -> volume_mutex -> chunk_mutex,
but when we mount a filesystem which has backing seed devices, we have
this lock chain:

    open_ctree()
        lock(chunk_mutex);
        read_chunk_tree();
            read_one_dev();
                open_seed_devices();
                    lock(uuid_mutex);

and then we hit a lockdep splat.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>

b367e47f

Btrfs: update global block_rsv when creating a new block group · c7c144db

由 Li Zefan 提交于 12月 07, 2011

A bug was triggered while using seed device:

    # mkfs.btrfs /dev/loop1
    # btrfstune -S 1 /dev/loop1
    # mount -o /dev/loop1 /mnt
    # btrfs dev add /dev/loop2 /mnt

btrfs: block rsv returned -28
------------[ cut here ]------------
WARNING: at fs/btrfs/extent-tree.c:5969 btrfs_alloc_free_block+0x166/0x396 [btrfs]()
...
Call Trace:
...
[<f7b7c31c>] btrfs_cow_block+0x101/0x147 [btrfs]
[<f7b7eaa6>] btrfs_search_slot+0x1b8/0x55f [btrfs]
[<f7b7f844>] btrfs_insert_empty_items+0x42/0x7f [btrfs]
[<f7b7f8c1>] btrfs_insert_item+0x40/0x7e [btrfs]
[<f7b8ac02>] btrfs_make_block_group+0x243/0x2aa [btrfs]
[<f7bb3f53>] __btrfs_alloc_chunk+0x672/0x70e [btrfs]
[<f7bb41ff>] init_first_rw_device+0x77/0x13c [btrfs]
[<f7bb5a62>] btrfs_init_new_device+0x664/0x9fd [btrfs]
[<f7bbb65a>] btrfs_ioctl+0x694/0xdbe [btrfs]
[<c04f55f7>] do_vfs_ioctl+0x496/0x4cc
[<c04f5660>] sys_ioctl+0x33/0x4f
[<c07b9edf>] sysenter_do_call+0x12/0x38
---[ end trace 906adac595facc7d ]---

Since seed device is readonly, there's no usable space in the filesystem.
Afterwards we add a sprout device to it, and the kernel creates a METADATA
block group and a SYSTEM block group where comes free space we can reserve,
but we still get revervation failure because the global block_rsv hasn't
been updated accordingly.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>

c7c144db

Btrfs: rewrite btrfs_trim_block_group() · 7fe1e641

由 Li Zefan 提交于 12月 29, 2011

There are various bugs in block group trimming:

- It may trim from offset smaller than user-specified offset.
- It may trim beyond user-specified range.
- It may leak free space for extents smaller than specified minlen.
- It may truncate the last trimmed extent thus leak free space.
- With mixed extents+bitmaps, some extents may not be trimmed.
- With mixed extents+bitmaps, some bitmaps may not be trimmed (even
none will be trimmed). Even for those trimmed, not all the free space
in the bitmaps will be trimmed.

I rewrite btrfs_trim_block_group() and break it into two functions.
One is to trim extents only, and the other is to trim bitmaps only.

Before patching:

	# fstrim -v /mnt/
	/mnt/: 1496465408 bytes were trimmed

After patching:

	# fstrim -v /mnt/
	/mnt/: 2193768448 bytes were trimmed

And this matches the total free space:

	# btrfs fi df /mnt
	Data: total=3.58GB, used=1.79GB
	System, DUP: total=8.00MB, used=4.00KB
	System: total=4.00MB, used=0.00
	Metadata, DUP: total=205.12MB, used=97.14MB
	Metadata: total=8.00MB, used=0.00
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>

7fe1e641

Btrfs: simplfy calculation of stripe length for discard operation · ec9ef7a1

由 Li Zefan 提交于 12月 01, 2011

For btrfs raid, while discarding a range of space, we'll need to know
the start offset and length to discard for each device, and it's done
in btrfs_map_block().

However the calculation is a bit complex for raid0 and raid10, so I
reimplement it based on a fact that:

        dev1          dev2           dev3    (raid0)
        -----------------------------------
        s0 s3 s6      s1 s4 s7       s2 s5

Each device has (total_stripes / nr_dev) stripes, or plus one.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>

ec9ef7a1

Btrfs: don't pre-allocate btrfs bio · de11cc12

由 Li Zefan 提交于 12月 01, 2011

We pre-allocate a btrfs bio with fixed size, and then may re-allocate
memory if we find stripes are bigger than the fixed size. But this
pre-allocation is not necessary.

Also we don't have to calcuate the stripe number twice.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>

de11cc12

L
Btrfs: don't pass a trans handle unnecessarily in volumes.c · 125ccb0a
由 Li Zefan 提交于 12月 08, 2011
```
Some functions never use the transaction handle passed to them.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
```
125ccb0a
L
Btrfs: reserve metadata space in btrfs_ioctl_setflags() · 4da6f1a3
由 Li Zefan 提交于 12月 29, 2011
```
Check and reserve space for btrfs_update_inode().
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
```
4da6f1a3

Btrfs: remove BUG_ON()s in btrfs_ioctl_setflags() · f062abf0

由 Li Zefan 提交于 12月 29, 2011

We can recover from errors and return -errno to user space.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>

f062abf0

L
Btrfs: check the return value of io_ctl_init() · 706efc66
由 Li Zefan 提交于 1月 09, 2012
```
It can return -ENOMEM.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
```
706efc66

Btrfs: avoid possible NULL deref in io_ctl_drop_pages() · a1ee5a45

由 Li Zefan 提交于 1月 09, 2012

If we run into some failure path in io_ctl_prepare_pages(),
io_ctl->pages[] array may have some NULL pointers.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>

a1ee5a45

Btrfs: add pinned extents to on-disk free space cache correctly · db804f23

由 Li Zefan 提交于 1月 10, 2012

I got this while running xfstests:

[24256.836098] block group 317849600 has an wrong amount of free space
[24256.836100] btrfs: failed to load free space cache for block group 317849600

We should clamp the extent returned by find_first_extent_bit(),
so the start of the extent won't smaller than the start of the
block group.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>

db804f23

08 1月, 2012 2 次提交

Btrfs: revamp clustered allocation logic · 1bb91902

由 Alexandre Oliva 提交于 10月 14, 2011

Parameterize clusters on minimum total size, minimum chunk size and
minimum contiguous size for at least one chunk, without limits on
cluster, window or gap sizes. Don't tolerate any fragmentation for
SSD_SPREAD; accept it for metadata, but try to keep data dense.
Signed-off-by: NAlexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1bb91902

Btrfs: don't set up allocation result twice · fc7c1077

由 Alexandre Oliva 提交于 11月 28, 2011

We store the allocation start and length twice in ins, once right
after the other, but with intervening calls that may prevent the
duplicate from being optimized out by the compiler.  Remove one of the
assignments.
Signed-off-by: NAlexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

fc7c1077

07 1月, 2012 4 次提交

Btrfs: test free space only for unclustered allocation · a5f6f719

由 Alexandre Oliva 提交于 12月 12, 2011

Since the clustered allocation may be taking extents from a different
block group, there's no point in spin-locking and testing the current
block group free space before attempting to allocate space from a
cluster, even more so when we might refrain from even trying the
cluster in the current block group because, after the cluster was set
up, not enough free space remained. Furthermore, cluster creation
attempts fail fast when the block group doesn't have enough free
space, so the test was completely superfluous.

I've move the free space test past the cluster allocation attempt,
where it is more useful, and arranged for a cluster in the current
block group to be released before trying an unclustered allocation,
when we reach the LOOP_NO_EMPTY_SIZE stage, so that the free space in
the cluster stands a chance of being combined with additional free
space in the block group so as to succeed in the allocation attempt.
Signed-off-by: NAlexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a5f6f719

Btrfs: use bigger metadata chunks on bigger filesystems · 1100373f

由 Chris Mason 提交于 1月 06, 2012

The 256MB chunk is a little small on a huge FS.  This scales up the
chunk size.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1100373f

Btrfs: lower the bar for chunk allocation · cf1d72c9

由 Chris Mason 提交于 1月 06, 2012

The chunk allocation code has tried to keep a pretty tight lid on creating new
metadata chunks.  This is partially because in the past the reservation
code didn't give us an accurate idea of how much space was being used.

The new code is much more accurate, so we're able to get rid of some of these
checks.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

cf1d72c9

Btrfs: run chunk allocations while we do delayed refs · 203bf287

由 Chris Mason 提交于 1月 06, 2012

Btrfs tries to batch extent allocation tree changes to improve performance
and reduce metadata trashing. But it doesn't allocate new metadata chunks
while it is doing allocations for the extent allocation tree.

This commit changes the delayed refence code to do chunk allocations if we're
getting low on room. It prevents crashes and improves performance.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

203bf287

05 1月, 2012 2 次提交

Btrfs: make sure we're not using obsolete code in btrfs_get_extent · 6bf7e080

由 Jan Schmidt 提交于 12月 01, 2011

There's code in btrfs_get_extent that should never be used. This patch turns
a WARN_ON(1) into a BUG(), hoping we can remove the transaction code from
btrfs_get_extent soon.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

6bf7e080

Btrfs: new backref walking code · 4692cf58

由 Jan Schmidt 提交于 12月 02, 2011

The old backref iteration code could only safely be used on commit roots.
Besides this limitation, it had bugs in finding the roots for these
references. This commit replaces large parts of it by btrfs_find_all_roots()
which a) really finds all roots and the correct roots, b) works correctly
under heavy file system load, c) considers delayed refs.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

4692cf58