提交 · 5f524444c351e145a5f7e28253594688a421bfe8 · openeuler / Kernel

20 10月, 2011 1 次提交

Btrfs: allow us to overcommit our enospc reservations · 2bf64758

由 Josef Bacik 提交于 9月 26, 2011

One of the things that kills us is the fact that our ENOSPC reservations are
horribly over the top in most normal cases.  There isn't too much that can be
done about this because when we are completely full we really need them to work
like this so we don't under reserve.  However if there is plenty of unallocated
chunks on the disk we can use that to gauge how much we can overcommit.  So this
patch adds chunk free space accounting so we always know how much unallocated
space we have.  Then if we fail to make a reservation within our allocated
space, check to see if we can overcommit.  In the normal flushing case (like
with delalloc metadata reservations) we'll take the free space and divide it by
2 if our metadata profile is setup for DUP or any of those, and then divide it
by 8 to make sure we don't overcommit too much.  Then if we're in a non-flushing
case (we really need this reservation now!) we only limit ourselves to half of
the free space.  This makes this fio test

[torrent]
filename=torrent-test
rw=randwrite
size=4g
ioengine=sync
directory=/mnt/btrfs-test

go from taking around 45 minutes to 10 seconds on my freshly formatted 3 TiB
file system.  This doesn't seem to break my other enospc tests, but could really
use some more testing as this is a super scary change.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

2bf64758

17 8月, 2011 3 次提交

Btrfs: fix uninitialized sync_pending · 0e588859

由 Miao Xie 提交于 8月 05, 2011

sync_pending is uninitialized before it be used, fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

0e588859

Btrfs: fix a bug of balance on full multi-disk partitions · 38c01b96

由 liubo 提交于 8月 02, 2011

When balancing, we'll first try to shrink devices for some space,
but if it is working on a full multi-disk partition with raid protection,
we may encounter a bug, that is, while shrinking, total_bytes may be less
than bytes_used, and btrfs may allocate a dev extent that accesses out of
device's bounds.

Then we will not be able to write or read the data which stores at the end
of the device, and get the followings:

device fsid 0939f071-7ea3-46c8-95df-f176d773bfb6 devid 1 transid 10 /dev/sdb5
Btrfs detected SSD devices, enabling SSD mode
btrfs: relocating block group 476315648 flags 9
btrfs: found 4 extents
attempt to access beyond end of device
sdb5: rw=145, want=546176, limit=546147
attempt to access beyond end of device
sdb5: rw=145, want=546304, limit=546147
attempt to access beyond end of device
sdb5: rw=145, want=546432, limit=546147
attempt to access beyond end of device
sdb5: rw=145, want=546560, limit=546147
attempt to access beyond end of device
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

38c01b96

Btrfs: detect wether a device supports discard · d5e2003c

由 Josef Bacik 提交于 8月 04, 2011

We have a problem where if a user specifies discard but doesn't actually support
it we will return EOPNOTSUPP from btrfs_discard_extent. This is a problem
because this gets called (in a fashion) from the tree log recovery code, which
has a nice little BUG_ON(ret) after it, which causes us to fail the tree log
replay. So instead detect wether our devices support discard when we're adding
them and then don't issue discards if we know that the device doesn't support
it. And just for good measure set ret = 0 in btrfs_issue_discard just in case
we still get EOPNOTSUPP so we don't screw anybody up like this again. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d5e2003c

06 8月, 2011 1 次提交

Btrfs: force unplugs when switching from high to regular priority bios · 2ab1ba68

由 Chris Mason 提交于 8月 04, 2011

Btrfs does bio submissions from a worker thread, and each device
has a list of high priority bios and regular priority bios.

Synchronous writes go to the high priority thread while async writes
go to regular list.  This commit brings back an explicit unplug
any time we switch from high to regular priority, which makes it
easier for the block layer to give us low latencies.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

2ab1ba68

28 7月, 2011 1 次提交

Btrfs: make a lockdep class for each root · 85d4e461

由 Chris Mason 提交于 7月 26, 2011

This patch was originally from Tejun Heo. lockdep complains about the btrfs
locking because we sometimes take btree locks from two different trees at the
same time. The current classes are based only on level in the btree, which
isn't enough information for lockdep to figure out if the lock is safe.

This patch makes a class for each type of tree, and lumps all the FS trees that
actually have files and directories into the same class.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

85d4e461

26 7月, 2011 1 次提交

btrfs: Don't BUG_ON alloc_path errors in find_next_chunk · 92b8e897

由 Mark Fasheh 提交于 7月 12, 2011

I also removed the BUG_ON from error return of find_next_chunk in
init_first_rw_device(). It turns out that the only caller of
init_first_rw_device() also BUGS on any nonzero return so no actual behavior
change has occurred here.

do_chunk_alloc() also needed an update since it calls btrfs_alloc_chunk()
which can now return -ENOMEM. Instead of setting space_info->full on any
error from btrfs_alloc_chunk() I catch and return every error value _except_
-ENOSPC. Thanks goes to Tsutomu Itoh for pointing that issue out.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

92b8e897

15 7月, 2011 1 次提交

btrfs: Don't BUG_ON alloc_path errors in btrfs_balance() · 17e9f796

由 Mark Fasheh 提交于 7月 12, 2011

Dealing with this seems trivial - the only caller of btrfs_balance() is
btrfs_ioctl() which passes the error code directly back to userspace. There
also isn't much state to unwind (if I'm wrong about this point, we can
always safely move the allocation to the top of btrfs_balance() anyway).
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

17e9f796

07 7月, 2011 1 次提交

Btrfs: don't panic if we get an error while balancing V2 · 508794eb

由 Josef Bacik 提交于 7月 02, 2011

A user reported an error where if we try to balance an fs after a device has
been removed it will blow up. This is because we get an EIO back and this is
where BUG_ON(ret) bites us in the ass. To fix we just exit. Thanks,
Reported-by: NAnand Jain <Anand.Jain@oracle.com>
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

508794eb

11 6月, 2011 1 次提交

Btrfs - use %pU to print fsid · 22b63a29

由 Ilya Dryomov 提交于 2月 09, 2011

Get rid of FIXME comment.  Uuids from dmesg are now the same as uuids
given by btrfs-progs.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

22b63a29

04 6月, 2011 1 次提交

btrfs: false BUG_ON when degraded · 5f3f302a

由 Arne Jansen 提交于 5月 30, 2011

In degraded mode the struct btrfs_device of missing devs don't have
device->name set. A kstrdup of NULL correctly returns NULL. Don't
BUG in this case.
Signed-off-by: NArne Jansen <sensille@gmx.net>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

5f3f302a

24 5月, 2011 7 次提交

Btrfs: using rcu lock in the reader side of devices list · 1f78160c

由 Xiao Guangrong 提交于 4月 20, 2011

fs_devices->devices is only updated on remove and add device paths, so we can
use rcu to protect it in the reader side
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1f78160c

Btrfs: drop unnecessary device lock · 46224705

由 Xiao Guangrong 提交于 4月 20, 2011

Drop device_list_mutex for the reader side  on clone_fs_devices and
btrfs_rm_device pathes since the fs_info->volume_mutex can ensure the device
list is not updated

btrfs_close_extra_devices is the initialized path, we can not add or remove
device at this time, so we can simply drop the mutex safely, like other
initialized function does(add_missing_dev, __find_device, __btrfs_open_devices
...).
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

46224705

Btrfs: fix the race between remove dev and alloc chunk · 0c1daee0

由 Xiao Guangrong 提交于 4月 20, 2011

On remove device path, it updates device->dev_alloc_list but does not hold
chunk lock
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

0c1daee0

Btrfs: fix the race between reading and updating devices · c9513edb

由 Xiao Guangrong 提交于 4月 20, 2011

On btrfs_congested_fn and __unplug_io_fn paths, we should hold
device_list_mutex to avoid remove/add device path to
update fs_devices->devices

On __btrfs_close_devices and btrfs_prepare_sprout paths, the devices in
fs_devices->devices or fs_devices->devices is updated, so we should hold
the mutex to avoid the reader side to reach them
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c9513edb

Btrfs: fix bh leak on __btrfs_open_devices path · 4f6c9328

由 Xiao Guangrong 提交于 4月 20, 2011

'bh' is forgot to release if no error is detected
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

4f6c9328

Btrfs: return error code to caller when btrfs_del_item fails · 65a246c5

由 Tsutomu Itoh 提交于 5月 19, 2011

The error code is returned instead of calling BUG_ON when
btrfs_del_item returns the error.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

65a246c5

Btrfs: return error code to caller when btrfs_previous_item fails · b0b802d7

由 Tsutomu Itoh 提交于 5月 19, 2011

The error code is returned instead of calling BUG_ON when
btrfs_previous_item returns the error.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b0b802d7

13 5月, 2011 3 次提交

btrfs: quasi-round-robin for chunk allocation · 73c5de00

由 Arne Jansen 提交于 4月 12, 2011

In a multi device setup, the chunk allocator currently always allocates
chunks on the devices in the same order. This leads to a very uneven
distribution, especially with RAID1 or RAID10 and an uneven number of
devices.
This patch always sorts the devices before allocating, and allocates the
stripes on the devices with the most available space, as long as there
is enough space available. In a low space situation, it first tries to
maximize striping.
The patch also simplifies the allocator and reduces the checks for
corner cases.
The simplification is done by several means. First, it defines the
properties of each RAID type upfront. These properties are used afterwards
instead of differentiating cases in several places.
Second, the old allocator defined a minimum stripe size for each block
group type, tried to find a large enough chunk, and if this fails just
allocates a smaller one. This is now done in one step. The largest possible
chunk (up to max_chunk_size) is searched and allocated.
Because we now have only one pass, the allocation of the map (struct
map_lookup) is moved down to the point where the number of stripes is
already known. This way we avoid reallocation of the map.
We still avoid allocating stripes that are not a multiple of STRIPE_SIZE.

73c5de00

btrfs: heed alloc_start · a9c9bf68

由 Arne Jansen 提交于 4月 12, 2011

currently alloc_start is disregarded if the requested
chunk size is bigger than (device size - alloc_start),
but smaller than the device size.
The only situation where I see this could have made sense
was when a chunk equal the size of the device has been
requested. This was possible as the allocator failed to
take alloc_start into account when calculating the request
chunk size. As this gets fixed by this patch, the workaround
is not necessary anymore.

a9c9bf68

A
btrfs: move btrfs_cmp_device_free_bytes to super.c · bcd53741
由 Arne Jansen 提交于 4月 12, 2011
```
this function won't be used here anymore, so move it super.c where it is
used for df-calculation
```
bcd53741

12 5月, 2011 1 次提交

btrfs: scrub · a2de733c

由 Arne Jansen 提交于 3月 08, 2011

This adds an initial implementation for scrub. It works quite
straightforward. The usermode issues an ioctl for each device in the
fs. For each device, it enumerates the allocated device chunks. For
each chunk, the contained extents are enumerated and the data checksums
fetched. The extents are read sequentially and the checksums verified.
If an error occurs (checksum or EIO), a good copy is searched for. If
one is found, the bad copy will be rewritten.
All enumerations happen from the commit roots. During a transaction
commit, the scrubs get paused and afterwards continue from the new
roots.

This commit is based on the series originally posted to linux-btrfs
with some improvements that resulted from comments from David Sterba,
Ilya Dryomov and Jan Schmidt.
Signed-off-by: NArne Jansen <sensille@gmx.net>

a2de733c

06 5月, 2011 1 次提交

btrfs: remove all unused functions · f2a97a9d

由 David Sterba 提交于 5月 05, 2011

Remove static and global declarations and/or definitions. Reduces size
of btrfs.ko by ~3.4kB.

  text    data     bss     dec     hex filename
402081    7464     200  409745   64091 btrfs.ko.base
398620    7144     200  405964   631cc btrfs.ko.remove-all
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

f2a97a9d

02 5月, 2011 3 次提交

btrfs: drop unused parameter from btrfs_release_path · b3b4aa74

由 David Sterba 提交于 4月 21, 2011

parameter tree root it's not used since commit
5f39d397 ("Btrfs: Create extent_buffer
interface for large blocksizes")
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

b3b4aa74

D
btrfs: drop gfp parameter from alloc_extent_map · 172ddd60
由 David Sterba 提交于 4月 21, 2011
```
pass GFP_NOFS directly to kmem_cache_alloc
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
```
172ddd60

btrfs: drop unused parameter from extent_map_tree_init · a8067e02

由 David Sterba 提交于 4月 21, 2011

the GFP flags are not stored anywhere and all allocations are done via
alloc_extent_map(GFP_NOFS).
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

a8067e02

20 4月, 2011 1 次提交

Btrfs: do some plugging in the submit_bio threads · 211588ad

由 Chris Mason 提交于 4月 19, 2011

The Btrfs submit bio threads have a small number of
threads responsible for pushing down bios we've collected
for a large number of devices.

Since we do all the bios for a single device at once,
we want to make sure we unplug and send down the bios
for each device as we're done processing them.

The new plugging API removed the btrfs code to
unplug while processing bios, this adds it back with
the new API.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

211588ad

28 3月, 2011 3 次提交

Btrfs: fix __btrfs_map_block on 32 bit machines · d9d04879

由 Chris Mason 提交于 3月 27, 2011

Recent changes for discard support didn't compile,
this fixes them not to try and % 64 bit numbers.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d9d04879

Btrfs: make btrfs_map_block() return entire free extent for each device of RAID0/1/10/DUP · fce3bb9a

由 Li Dongyang 提交于 3月 24, 2011

btrfs_map_block() will only return a single stripe length, but we want the
full extent be mapped to each disk when we are trimming the extent,
so we add length to btrfs_bio_stripe and fill it if we are mapping for REQ_DISCARD.
Signed-off-by: NLi Dongyang <lidongyang@novell.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

fce3bb9a

Btrfs: add initial tracepoint support for btrfs · 1abe9b8a

由 liubo 提交于 3月 24, 2011

Tracepoints can provide insight into why btrfs hits bugs and be greatly
helpful for debugging, e.g
              dd-7822  [000]  2121.641088: btrfs_inode_request: root = 5(FS_TREE), gen = 4, ino = 256, blocks = 8, disk_i_size = 0, last_trans = 8, logged_trans = 0
              dd-7822  [000]  2121.641100: btrfs_inode_new: root = 5(FS_TREE), gen = 8, ino = 257, blocks = 0, disk_i_size = 0, last_trans = 0, logged_trans = 0
 btrfs-transacti-7804  [001]  2146.935420: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29368320 (orig_level = 0), cow_buf = 29388800 (cow_level = 0)
 btrfs-transacti-7804  [001]  2146.935473: btrfs_cow_block: root = 1(ROOT_TREE), refs = 2, orig_buf = 29364224 (orig_level = 0), cow_buf = 29392896 (cow_level = 0)
 btrfs-transacti-7804  [001]  2146.972221: btrfs_transaction_commit: root = 1(ROOT_TREE), gen = 8
   flush-btrfs-2-7821  [001]  2155.824210: btrfs_chunk_alloc: root = 3(CHUNK_TREE), offset = 1103101952, size = 1073741824, num_stripes = 1, sub_stripes = 0, type = DATA
   flush-btrfs-2-7821  [001]  2155.824241: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29388800 (orig_level = 0), cow_buf = 29396992 (cow_level = 0)
   flush-btrfs-2-7821  [001]  2155.824255: btrfs_cow_block: root = 4(DEV_TREE), refs = 2, orig_buf = 29372416 (orig_level = 0), cow_buf = 29401088 (cow_level = 0)
   flush-btrfs-2-7821  [000]  2155.824329: btrfs_cow_block: root = 3(CHUNK_TREE), refs = 2, orig_buf = 20971520 (orig_level = 0), cow_buf = 20975616 (cow_level = 0)
 btrfs-endio-wri-7800  [001]  2155.898019: btrfs_cow_block: root = 5(FS_TREE), refs = 2, orig_buf = 29384704 (orig_level = 0), cow_buf = 29405184 (cow_level = 0)
 btrfs-endio-wri-7800  [001]  2155.898043: btrfs_cow_block: root = 7(CSUM_TREE), refs = 2, orig_buf = 29376512 (orig_level = 0), cow_buf = 29409280 (cow_level = 0)

Here is what I have added:

1) ordere_extent:
        btrfs_ordered_extent_add
        btrfs_ordered_extent_remove
        btrfs_ordered_extent_start
        btrfs_ordered_extent_put

These provide critical information to understand how ordered_extents are
updated.

2) extent_map:
        btrfs_get_extent

extent_map is used in both read and write cases, and it is useful for tracking
how btrfs specific IO is running.

3) writepage:
        __extent_writepage
        btrfs_writepage_end_io_hook

Pages are cirtical resourses and produce a lot of corner cases during writeback,
so it is valuable to know how page is written to disk.

4) inode:
        btrfs_inode_new
        btrfs_inode_request
        btrfs_inode_evict

These can show where and when a inode is created, when a inode is evicted.

5) sync:
        btrfs_sync_file
        btrfs_sync_fs

These show sync arguments.

6) transaction:
        btrfs_transaction_commit

In transaction based filesystem, it will be useful to know the generation and
who does commit.

7) back reference and cow:
	btrfs_delayed_tree_ref
	btrfs_delayed_data_ref
	btrfs_delayed_ref_head
	btrfs_cow_block

Btrfs natively supports back references, these tracepoints are helpful on
understanding btrfs's COW mechanism.

8) chunk:
	btrfs_chunk_alloc
	btrfs_chunk_free

Chunk is a link between physical offset and logical offset, and stands for space
infomation in btrfs, and these are helpful on tracing space things.

9) reserved_extent:
	btrfs_reserved_extent_alloc
	btrfs_reserved_extent_free

These can show how btrfs uses its space.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1abe9b8a

10 3月, 2011 1 次提交

block: remove per-queue plugging · 7eaceacc

由 Jens Axboe 提交于 3月 10, 2011

Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

7eaceacc

17 2月, 2011 2 次提交

Btrfs: set FMODE_EXCL in btrfs_device->mode · fb01aa85

由 Ilya Dryomov 提交于 2月 15, 2011

This fixes a bug introduced in d4d77629, where the device added online
(and therefore initialized via btrfs_init_new_device()) would be left
with the positive bdev->bd_holders after unmount.  Since d4d77629 we no
longer OR FMODE_EXCL explicitly on blkdev_put(), set it in
btrfs_device->mode.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

fb01aa85

Btrfs: make btrfs_rm_device() fail gracefully · 9b3517e9

由 Ilya Dryomov 提交于 2月 15, 2011

If shrinking done as part of the online device removal fails add that
device back to the allocation list and increment the rw_devices counter.
This fixes two bugs:

1) we could have a perfectly good device out of alloc list for no good
reason;

2) in the btrfs consisting of two devices, failure in btrfs_rm_device()
could lead to a situation where it was impossible to remove any of the
devices because of the "unable to remove the only writeable device"
error.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

9b3517e9

15 2月, 2011 1 次提交

Btrfs - Fix memory leak in btrfs_init_new_device() · 67100f25

由 Ilya Dryomov 提交于 2月 06, 2011

Memory allocated by calling kstrdup() should be freed.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

67100f25

01 2月, 2011 1 次提交

btrfs: fix return value check of btrfs_start_transaction() · 98d5dc13

由 Tsutomu Itoh 提交于 1月 20, 2011

The error check of btrfs_start_transaction() is added, and the mistake
of the error check on several places is corrected.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

98d5dc13

17 1月, 2011 5 次提交

btrfs: Require CAP_SYS_ADMIN for filesystem rebalance · 6f88a440

由 Ben Hutchings 提交于 12月 29, 2010

Filesystem rebalancing (BTRFS_IOC_BALANCE) affects the entire
filesystem and may run uninterruptibly for a long time.  This does not
seem to be something that an unprivileged user should be able to do.
Reported-by: NAron Xu <happyaron.xu@gmail.com>
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

6f88a440

btrfs: mount failure return value fix · 20b45077

由 Dave Young 提交于 1月 08, 2011

I happened to pass swap partition as root partition in cmdline,
then kernel panic and tell me about "Cannot open root device".
It is not correct, in fact it is a fs type mismatch instead of 'no device'.

Eventually I found btrfs mounting failed with -EIO, it should be -EINVAL.
The logic in init/do_mounts.c:
        for (p = fs_names; *p; p += strlen(p)+1) {
                int err = do_mount_root(name, p, flags, root_mount_data);
                switch (err) {
                        case 0:
                                goto out;
                        case -EACCES:
                                flags |= MS_RDONLY;
                                goto retry;
                        case -EINVAL:
                                continue;
                }
		print "Cannot open root device"
		panic
	}
SO fs type after btrfs will have no chance to mount

Here fix the return value as -EINVAL
Signed-off-by: NDave Young <hidave.darkstar@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

20b45077

btrfs: fix wrong free space information of btrfs · 6d07bcec

由 Miao Xie 提交于 1月 05, 2011

When we store data by raid profile in btrfs with two or more different size
disks, df command shows there is some free space in the filesystem, but the
user can not write any data in fact, df command shows the wrong free space
information of btrfs.

 # mkfs.btrfs -d raid1 /dev/sda9 /dev/sda10
 # btrfs-show
 Label: none  uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64
 	Total devices 2 FS bytes used 28.00KB
 	devid    1 size 5.01GB used 2.03GB path /dev/sda9
 	devid    2 size 10.00GB used 2.01GB path /dev/sda10
 # btrfs device scan /dev/sda9 /dev/sda10
 # mount /dev/sda9 /mnt
 # dd if=/dev/zero of=tmpfile0 bs=4K count=9999999999
   (fill the filesystem)
 # sync
 # df -TH
 Filesystem	Type	Size	Used	Avail	Use%	Mounted on
 /dev/sda9	btrfs	17G	8.6G	5.4G	62%	/mnt
 # btrfs-show
 Label: none  uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64
 	Total devices 2 FS bytes used 3.99GB
 	devid    1 size 5.01GB used 5.01GB path /dev/sda9
 	devid    2 size 10.00GB used 4.99GB path /dev/sda10

It is because btrfs cannot allocate chunks when one of the pairing disks has
no space, the free space on the other disks can not be used for ever, and should
be subtracted from the total space, but btrfs doesn't subtract this space from
the total. It is strange to the user.

This patch fixes it by calcing the free space that can be used to allocate
chunks.

Implementation:
1. get all the devices free space, and align them by stripe length.
2. sort the devices by the free space.
3. check the free space of the devices,
   3.1. if it is not zero, and then check the number of the devices that has
        more free space than this device,
        if the number of the devices is beyond the min stripe number, the free
        space can be used, and add into total free space.
        if the number of the devices is below the min stripe number, we can not
        use the free space, the check ends.
   3.2. if the free space is zero, check the next devices, goto 3.1

This implementation is just likely fake chunk allocation.

After appling this patch, df can show correct space information:
 # df -TH
 Filesystem	Type	Size	Used	Avail	Use%	Mounted on
 /dev/sda9	btrfs	17G	8.6G	0	100%	/mnt
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

6d07bcec

btrfs: make the chunk allocator utilize the devices better · b2117a39

由 Miao Xie 提交于 1月 05, 2011

With this patch, we change the handling method when we can not get enough free
extents with default size.

Implementation:
1. Look up the suitable free extent on each device and keep the search result.
   If not find a suitable free extent, keep the max free extent
2. If we get enough suitable free extents with default size, chunk allocation
   succeeds.
3. If we can not get enough free extents, but the number of the extent with
   default size is >= min_stripes, we just change the mapping information
   (reduce the number of stripes in the extent map), and chunk allocation
   succeeds.
4. If the number of the extent with default size is < min_stripes, sort the
   devices by its max free extent's size descending
5. Use the size of the max free extent on the (num_stripes - 1)th device as the
   stripe size to allocate the device space

By this way, the chunk allocator can allocate chunks as large as possible when
the devices' space is not enough and make full use of the devices.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b2117a39

btrfs: restructure find_free_dev_extent() · 7bfc837d

由 Miao Xie 提交于 1月 05, 2011

- make it return the start position and length of the max free space when it can
  not find a suitable free space.
- make it more readability
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

7bfc837d

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功