提交 · e3176ca2769e420f64eba4b093bbddea6d7a89c3 · openeuler / Kernel

27 3月, 2012 1 次提交

Btrfs: allow metadata blocks larger than the page size · 727011e0

由 Chris Mason 提交于 8月 06, 2010

A few years ago the btrfs code to support blocks lager than
the page size was disabled to fix a few corner cases in the
page cache handling.  This fixes the code to properly support
large metadata blocks again.

Since current kernels will crash early and often with larger
metadata blocks, this adds an incompat bit so that older kernels
can't mount it.

This also does away with different blocksizes for nodes and leaves.
You get a single block size for all tree blocks.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

727011e0

23 2月, 2012 1 次提交

Btrfs: make sure we update latest_bdev · a6b0d5c8

由 Chris Mason 提交于 2月 20, 2012

When we are setting up the mount, we close all the
devices that were not actually part of the metadata we found.

But, we don't make sure that one of those devices wasn't
fs_devices->latest_bdev, which means we can do a use after free
on the one we closed.

This updates latest_bdev as it goes.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a6b0d5c8

17 2月, 2012 1 次提交
- T
  Btrfs: check return value of lookup_extent_mapping() correctly · 285190d9
  由 Tsutomu Itoh 提交于 2月 16, 2012
```
This patch corrects error checking of lookup_extent_mapping().
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
```
  285190d9
15 2月, 2012 1 次提交

btrfs: silence warning in raid array setup · 8a334426

由 David Sterba 提交于 10月 07, 2011

Raid array setup code creates an extent buffer in an usual way. When the
PAGE_CACHE_SIZE is > super block size, the extent pages are not marked
up-to-date, which triggers a WARN_ON in the following
write_extent_buffer call. Add an explicit up-to-date call to silence the
warning.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

8a334426

17 1月, 2012 18 次提交

Btrfs: use larger system chunks · 96bdc7dc

由 Chris Mason 提交于 1月 16, 2012

system chunks by default are very small.  This makes them slightly
larger and also fixes the conditional checks to make sure we don't
allocate a billion of them at once.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

96bdc7dc

I
Btrfs: add balance progress reporting · 19a39dce
由 Ilya Dryomov 提交于 1月 16, 2012
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
19a39dce

Btrfs: allow for canceling restriper · a7e99c69

由 Ilya Dryomov 提交于 1月 16, 2012

Implement an ioctl for canceling restriper.  Currently we wait until
relocation of the current block group is finished, in future this can be
done by triggering a commit.  Balance item is deleted and no memory
about the interrupted balance is kept.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

a7e99c69

Btrfs: allow for pausing restriper · 837d5b6e

由 Ilya Dryomov 提交于 1月 16, 2012

Implement an ioctl for pausing restriper.  This pauses the relocation,
but balance is still considered to be "in progress": balance item is
not deleted, other volume operations cannot be started, etc.  If paused
in the middle of profile changing operation we will continue making
allocations with the target profile.

Add a hook to close_ctree() to pause restriper and free its data
structures on unmount.  (It's safe to unmount when restriper is in
"paused" state, we will resume with the same parameters on the next
mount)
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

837d5b6e

Btrfs: add skip_balance mount option · 9555c6c1

由 Ilya Dryomov 提交于 1月 16, 2012

Since restriper kthread starts involuntarily on mount and can suck cpu
and memory bandwidth add a mount option to forcefully skip it.  The
restriper in that case hangs around in paused state and can be resumed
from userspace when it's convenient.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

9555c6c1

Btrfs: recover balance on mount · 59641015

由 Ilya Dryomov 提交于 1月 16, 2012

On mount, if balance item is found, resume balance in a separate
kernel thread.

Try to be smart to continue roughly where previous balance (or convert)
was interrupted.  For chunk types that were being converted to some
profile we turn on soft convert, in case of a simple balance we turn on
usage filter and relocate only less-than-90%-full chunks of that type.
These are just heuristics but they help quite a bit, and can be improved
in future.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

59641015

Btrfs: save balance parameters to disk · 0940ebf6

由 Ilya Dryomov 提交于 1月 16, 2012

Introduce a new btree objectid for storing balance item.  The reason is
to be able to resume restriper after a crash with the same parameters.
Balance item has a very high objectid and goes into tree of tree roots.

The key for the new item is as follows:

	[ BTRFS_BALANCE_OBJECTID ; BTRFS_BALANCE_ITEM_KEY ; 0 ]

Older kernels simply ignore it so it's safe to mount with an older
kernel and then go back to the newer one.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

0940ebf6

Btrfs: soft profile changing mode (aka soft convert) · cfa4c961

由 Ilya Dryomov 提交于 1月 16, 2012

When doing convert from one profile to another if soft mode is on
restriper won't touch chunks that already have the profile we are
converting to.  This is useful if e.g. half of the FS was converted
earlier.

The soft mode switch is (like every other filter) per-type.  This means
that we can convert for example meta chunks the "hard" way while
converting data chunks selectively with soft switch.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

cfa4c961

Btrfs: implement online profile changing · e4d8ec0f

由 Ilya Dryomov 提交于 1月 16, 2012

Profile changing is done by launching a balance with
BTRFS_BALANCE_CONVERT bits set and target fields of respective
btrfs_balance_args structs initialized.  Profile reducing code in this
case will pick restriper's target profile if it's available instead of
doing a blind reduce.  If target profile is not yet available it goes
back to a plain reduce.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

e4d8ec0f

Btrfs: virtual address space subset filter · ea67176a

由 Ilya Dryomov 提交于 1月 16, 2012

Select chunks which have at least one byte located inside a given
[vstart, vend) virtual address space range.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

ea67176a

Btrfs: devid subset filter · 94e60d5a

由 Ilya Dryomov 提交于 1月 16, 2012

Select chunks which have at least one byte of at least one stripe
located on a device with devid X in a given [pstart,pend) physical
address range.

This filter only works when devid filter is turned on.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

94e60d5a

Btrfs: devid filter · 409d404b

由 Ilya Dryomov 提交于 1月 16, 2012

Relocate chunks which have at least one stripe located on a device with
devid X.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

409d404b

Btrfs: usage filter · 5ce5b3c0

由 Ilya Dryomov 提交于 1月 16, 2012

Select chunks that are less than X percent full.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

5ce5b3c0

Btrfs: profiles filter · ed25e9b2

由 Ilya Dryomov 提交于 1月 16, 2012

Select chunks based on a given profile mask.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

ed25e9b2

Btrfs: add basic infrastructure for selective balancing · f43ffb60

由 Ilya Dryomov 提交于 1月 16, 2012

This allows to have a separate set of filters for each chunk type
(data,meta,sys).  The code however is generic and switch on chunk type
is only done once.

This commit also adds a type filter: it allows to balance for example
meta and system chunks w/o touching data ones.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

f43ffb60

Btrfs: add basic restriper infrastructure · c9e9f97b

由 Ilya Dryomov 提交于 1月 16, 2012

Add basic restriper infrastructure: extended balancing ioctl and all
related ioctl data structures, add data structure for tracking
restriper's state to fs_info, etc.  The semantics of the old balancing
ioctl are fully preserved.

Explicitly disallow any volume operations when balance is in progress.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

c9e9f97b

Btrfs: introduce masks for chunk type and profile · 52ba6929

由 Ilya Dryomov 提交于 1月 16, 2012

Chunk's type and profile are encoded in u64 flags field.  Introduce
masks to easily access them.  Also fix the type of BTRFS_BLOCK_GROUP_*
constants, it should be ULL.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

52ba6929

Btrfs: get rid of *_alloc_profile fields · 6fef8df1

由 Ilya Dryomov 提交于 1月 16, 2012

{data,metadata,system}_alloc_profile fields have been unused for a long
time now.  Get rid of them.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

6fef8df1

11 1月, 2012 4 次提交

Btrfs: fix possible deadlock when opening a seed device · b367e47f

由 Li Zefan 提交于 12月 07, 2011

The correct lock order is uuid_mutex -> volume_mutex -> chunk_mutex,
but when we mount a filesystem which has backing seed devices, we have
this lock chain:

    open_ctree()
        lock(chunk_mutex);
        read_chunk_tree();
            read_one_dev();
                open_seed_devices();
                    lock(uuid_mutex);

and then we hit a lockdep splat.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>

b367e47f

Btrfs: simplfy calculation of stripe length for discard operation · ec9ef7a1

由 Li Zefan 提交于 12月 01, 2011

For btrfs raid, while discarding a range of space, we'll need to know
the start offset and length to discard for each device, and it's done
in btrfs_map_block().

However the calculation is a bit complex for raid0 and raid10, so I
reimplement it based on a fact that:

        dev1          dev2           dev3    (raid0)
        -----------------------------------
        s0 s3 s6      s1 s4 s7       s2 s5

Each device has (total_stripes / nr_dev) stripes, or plus one.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>

ec9ef7a1

Btrfs: don't pre-allocate btrfs bio · de11cc12

由 Li Zefan 提交于 12月 01, 2011

We pre-allocate a btrfs bio with fixed size, and then may re-allocate
memory if we find stripes are bigger than the fixed size. But this
pre-allocation is not necessary.

Also we don't have to calcuate the stripe number twice.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>

de11cc12

L
Btrfs: don't pass a trans handle unnecessarily in volumes.c · 125ccb0a
由 Li Zefan 提交于 12月 08, 2011
```
Some functions never use the transaction handle passed to them.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
```
125ccb0a

09 1月, 2012 1 次提交

btrfs: fix a deadlock in btrfs_scan_one_device() · 10f6327b

由 Al Viro 提交于 11月 17, 2011

pathname resolution under a global mutex, taken on some paths in ->mount()
is a Bad Idea(tm) - think what happens if said pathname resolution triggers
automount of some btrfs instance and walks into attempt to grab the same
mutex. Deadlock - we are waiting for daemon to finish walking the path,
daemon is waiting for us to release the mutex...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

10f6327b

07 1月, 2012 1 次提交

Btrfs: use bigger metadata chunks on bigger filesystems · 1100373f

由 Chris Mason 提交于 1月 06, 2012

The 256MB chunk is a little small on a huge FS.  This scales up the
chunk size.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1100373f

22 12月, 2011 1 次提交

Btrfs: integrate integrity check module into btrfs · 21adbd5c

由 Stefan Behrens 提交于 11月 09, 2011

This is the last part of the patch series. It modifies the btrfs
code to use the integrity check module if configured to do so
with the define BTRFS_FS_CHECK_INTEGRITY. If this define is not set,
the only effective change is that code is added that handles the
mount option to activate the integrity check. If the mount option is
set and the define BTRFS_FS_CHECK_INTEGRITY is not set, that code
complains in the log and the mount fails with EINVAL.

Add the mount option to activate the usage of the integrity check
code.
Add invocation of btrfs integrity check code init and cleanup
function on mount and umount, respectively.
Add hook to call btrfs integrity check code version of
submit_bh/submit_bio.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>

21adbd5c

16 12月, 2011 1 次提交

Btrfs: unplug every once and a while · d85c8a6f

由 Chris Mason 提交于 12月 15, 2011

The btrfs io submission threads can build up massive plug lists.  This
keeps things more reasonable so we don't hand over huge dumps of IO at
once.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d85c8a6f

10 12月, 2011 1 次提交

Btrfs: fix btrfs_end_bio to deal with write errors to a single mirror · 5dbc8fca

由 Chris Mason 提交于 12月 09, 2011

btrfs_end_bio checks the number of errors on a bio against the max
number of errors allowed before sending any EIOs up to the higher
levels.

If we got enough copies of the bio done for a given raid level, it is
supposed to clear the bio error flag and return success.

We have pointers to the original bio sent down by the higher layers and
pointers to any cloned bios we made for raid purposes.  If the original
bio happens to be the one that got an io error, but not the last one to
finish, it might not have the BIO_UPTODATE bit set.

Then, when the last bio does finish, we'll call bio_end_io on the
original bio.  It won't have the uptodate bit set and we'll end up
sending EIO to the higher layers.

We already had a check for this, it just was conditional on getting the
IO error on the very last bio.  Make the check unconditional so we eat
the EIOs properly.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

5dbc8fca

08 12月, 2011 1 次提交

Btrfs: check if the to-be-added device is writable · a5d16333

由 Li Zefan 提交于 12月 07, 2011

If we call ioctl(BTRFS_IOC_ADD_DEV) directly, we'll succeed in adding
a readonly device to a btrfs filesystem, and btrfs will write to
that device, emitting kernel errors:

[ 3109.833692] lost page write due to I/O error on loop2
[ 3109.833720] lost page write due to I/O error on loop2
...
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a5d16333

11 11月, 2011 1 次提交

Btrfs: fix nocow when deleting the item · 924cd8fb

由 Miao Xie 提交于 11月 10, 2011

btrfs_previous_item() just search the b+ tree, do not COW the nodes or leaves,
if we modify the result of it, the meta-data will be broken. fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

924cd8fb

06 11月, 2011 1 次提交

btrfs: separate superblock items out of fs_info · 6c41761f

由 David Sterba 提交于 4月 13, 2011

fs_info has now ~9kb, more than fits into one page. This will cause
mount failure when memory is too fragmented. Top space consumers are
super block structures super_copy and super_for_commit, ~2.8kb each.
Allocate them dynamically. fs_info will be ~3.5kb. (measured on x86_64)

Add a wrapper for freeing fs_info and all of it's dynamically allocated
members.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

6c41761f

21 10月, 2011 1 次提交

Btrfs: close all bdevs on mount failure · 20bcd649

由 Ilya Dryomov 提交于 10月 20, 2011

Fix a bug introduced by 20b45077.  We have to return EINVAL on mount
failure, but doing that too early in the sequence leaves all of the
devices opened exclusively.  This also fixes an issue where under some
scenarios only a second mount -o degraded <devices> command would
succeed.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

20bcd649

20 10月, 2011 1 次提交

Btrfs: allow us to overcommit our enospc reservations · 2bf64758

由 Josef Bacik 提交于 9月 26, 2011

One of the things that kills us is the fact that our ENOSPC reservations are
horribly over the top in most normal cases.  There isn't too much that can be
done about this because when we are completely full we really need them to work
like this so we don't under reserve.  However if there is plenty of unallocated
chunks on the disk we can use that to gauge how much we can overcommit.  So this
patch adds chunk free space accounting so we always know how much unallocated
space we have.  Then if we fail to make a reservation within our allocated
space, check to see if we can overcommit.  In the normal flushing case (like
with delalloc metadata reservations) we'll take the free space and divide it by
2 if our metadata profile is setup for DUP or any of those, and then divide it
by 8 to make sure we don't overcommit too much.  Then if we're in a non-flushing
case (we really need this reservation now!) we only limit ourselves to half of
the free space.  This makes this fio test

[torrent]
filename=torrent-test
rw=randwrite
size=4g
ioengine=sync
directory=/mnt/btrfs-test

go from taking around 45 minutes to 10 seconds on my freshly formatted 3 TiB
file system.  This doesn't seem to break my other enospc tests, but could really
use some more testing as this is a super scary change.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

2bf64758

02 10月, 2011 1 次提交

btrfs: state information for readahead · 90519d66

由 Arne Jansen 提交于 5月 23, 2011

Add state information for readahead to btrfs_fs_info and btrfs_device

Changes v2:
 - don't wait in radix_trees
 - add own set of workers for readahead
Reviewed-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NArne Jansen <sensille@gmx.net>

90519d66

29 9月, 2011 2 次提交

btrfs: Put mirror_num in bi_bdev · 2774b2ca

由 Jan Schmidt 提交于 6月 16, 2011

The error correction code wants to make sure that only the bad mirror is
rewritten. Thus, we need to know which mirror is the bad one. I did not
find a more apropriate field than bi_bdev. But I think using this is fine,
because it is modified by the block layer, anyway, and should not be read
after the bio returned.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

2774b2ca

btrfs: btrfs_multi_bio replaced with btrfs_bio · a1d3c478

由 Jan Schmidt 提交于 8月 04, 2011

btrfs_bio is a bio abstraction able to split and not complete after the last
bio has returned (like the old btrfs_multi_bio). Additionally, btrfs_bio
tracks the mirror_num used to read data which can be used for error
correction purposes.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

a1d3c478

17 8月, 2011 1 次提交

Btrfs: fix uninitialized sync_pending · 0e588859

由 Miao Xie 提交于 8月 05, 2011

sync_pending is uninitialized before it be used, fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

0e588859

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功