提交 · cfa4c961cc69ffb7bda450972320a25cbd413e19 · bug2833 / cloud-kernel

17 1月, 2012 14 次提交

Btrfs: soft profile changing mode (aka soft convert) · cfa4c961

由 Ilya Dryomov 提交于 1月 16, 2012

When doing convert from one profile to another if soft mode is on
restriper won't touch chunks that already have the profile we are
converting to.  This is useful if e.g. half of the FS was converted
earlier.

The soft mode switch is (like every other filter) per-type.  This means
that we can convert for example meta chunks the "hard" way while
converting data chunks selectively with soft switch.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

cfa4c961

Btrfs: implement online profile changing · e4d8ec0f

由 Ilya Dryomov 提交于 1月 16, 2012

Profile changing is done by launching a balance with
BTRFS_BALANCE_CONVERT bits set and target fields of respective
btrfs_balance_args structs initialized.  Profile reducing code in this
case will pick restriper's target profile if it's available instead of
doing a blind reduce.  If target profile is not yet available it goes
back to a plain reduce.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

e4d8ec0f

Btrfs: do not reduce profile in do_chunk_alloc() · 70922617

由 Ilya Dryomov 提交于 1月 16, 2012

Every caller of do_chunk_alloc() feeds it the reduced allocation
profile, so stop trying to reduce it one more time.  Instead check the
validity of the passed profile.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

70922617

Btrfs: virtual address space subset filter · ea67176a

由 Ilya Dryomov 提交于 1月 16, 2012

Select chunks which have at least one byte located inside a given
[vstart, vend) virtual address space range.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

ea67176a

Btrfs: devid subset filter · 94e60d5a

由 Ilya Dryomov 提交于 1月 16, 2012

Select chunks which have at least one byte of at least one stripe
located on a device with devid X in a given [pstart,pend) physical
address range.

This filter only works when devid filter is turned on.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

94e60d5a

Btrfs: devid filter · 409d404b

由 Ilya Dryomov 提交于 1月 16, 2012

Relocate chunks which have at least one stripe located on a device with
devid X.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

409d404b

Btrfs: usage filter · 5ce5b3c0

由 Ilya Dryomov 提交于 1月 16, 2012

Select chunks that are less than X percent full.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

5ce5b3c0

Btrfs: profiles filter · ed25e9b2

由 Ilya Dryomov 提交于 1月 16, 2012

Select chunks based on a given profile mask.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

ed25e9b2

Btrfs: add basic infrastructure for selective balancing · f43ffb60

由 Ilya Dryomov 提交于 1月 16, 2012

This allows to have a separate set of filters for each chunk type
(data,meta,sys).  The code however is generic and switch on chunk type
is only done once.

This commit also adds a type filter: it allows to balance for example
meta and system chunks w/o touching data ones.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

f43ffb60

Btrfs: add basic restriper infrastructure · c9e9f97b

由 Ilya Dryomov 提交于 1月 16, 2012

Add basic restriper infrastructure: extended balancing ioctl and all
related ioctl data structures, add data structure for tracking
restriper's state to fs_info, etc.  The semantics of the old balancing
ioctl are fully preserved.

Explicitly disallow any volume operations when balance is in progress.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

c9e9f97b

Btrfs: make avail_*_alloc_bits fields dynamic · 10ea00f5

由 Ilya Dryomov 提交于 1月 16, 2012

Currently when new chunks are created respective avail_alloc_bits field
is updated to reflect profiles of all chunks present in the system.
However when chunks are removed profile bits are never cleared.

This patch clears profile bit of respective avail_alloc_bits field when
the last chunk with that profile is removed.  Restriper needs this to
properly operate when "downgrading".
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

10ea00f5

Btrfs: add BTRFS_AVAIL_ALLOC_BIT_SINGLE bit · a46d11a8

由 Ilya Dryomov 提交于 1月 16, 2012

Right now on-disk BTRFS_BLOCK_GROUP_* profile bits are used for
avail_{data,metadata,system}_alloc_bits fields, which gather info about
available allocation profiles in the FS. When chunk is created or read
from disk, its profile is OR'ed with the corresponding avail_alloc_bits
field. Since SINGLE is denoted by 0 in the on-disk format, currently
there is no way to tell when such chunks become avaialble. Restriper
needs that information, so add a separate bit for SINGLE profile.

This bit is going to be in-memory only, it should never be written out
to disk, so it's not a disk format change. However to avoid remappings
in future, reserve corresponding on-disk bit.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

a46d11a8

Btrfs: introduce masks for chunk type and profile · 52ba6929

由 Ilya Dryomov 提交于 1月 16, 2012

Chunk's type and profile are encoded in u64 flags field.  Introduce
masks to easily access them.  Also fix the type of BTRFS_BLOCK_GROUP_*
constants, it should be ULL.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

52ba6929

Btrfs: get rid of *_alloc_profile fields · 6fef8df1

由 Ilya Dryomov 提交于 1月 16, 2012

{data,metadata,system}_alloc_profile fields have been unused for a long
time now.  Get rid of them.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

6fef8df1

23 12月, 2011 2 次提交

Btrfs: call d_instantiate after all ops are setup · 08c422c2

由 Al Viro 提交于 12月 23, 2011

This closes races where btrfs is calling d_instantiate too soon during
inode creation.  All of the callers of btrfs_add_nondir are updated to
instantiate after the inode is fully setup in memory.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

08c422c2

Btrfs: fix worker lock misuse in find_worker · 8d532b2a

由 Chris Mason 提交于 12月 23, 2011

Dan Carpenter noticed that we were doing a double unlock on the worker
lock, and sometimes picking a worker thread without the lock held.

This fixes both errors.
Signed-off-by: NChris Mason <chris.mason@oracle.com>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>

8d532b2a

16 12月, 2011 9 次提交

Btrfs: unplug every once and a while · d85c8a6f

由 Chris Mason 提交于 12月 15, 2011

The btrfs io submission threads can build up massive plug lists.  This
keeps things more reasonable so we don't hand over huge dumps of IO at
once.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d85c8a6f

Merge branch 'for-chris' of... · 567a45e9

由 Chris Mason 提交于 12月 15, 2011

Merge branch 'for-chris' of http://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work into integration

Conflicts:
	fs/btrfs/inode.c
Signed-off-by: NChris Mason <chris.mason@oracle.com>

567a45e9

C
Btrfs: deal with NULL srv_rsv in the delalloc inode reservation code · e755d9ab
由 Chris Mason 提交于 12月 15, 2011
```
btrfs_update_inode is sometimes called with a null reservation.
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
e755d9ab

Btrfs: only set cache_generation if we setup the block group · e65cbb94

由 Josef Bacik 提交于 12月 13, 2011

A user reported a problem booting into a new kernel with the old format inodes.
He was panicing in cow_file_range while writing out the inode cache. This is
because if the block group is not cached we'll just skip writing out the cache,
however if it gets dirtied again in the same transaction and it finished caching
we'd go ahead and write it out, but since we set cache_generation to the transid
we think we've already truncated it and will just carry on, running into
cow_file_range and blowing up. We need to make sure we only set
cache_generation if we've done the truncate. The user tested this patch and
verified that the panic no longer occured. Thanks,
Reported-and-Tested-by: NKlaus Bitto <klaus.bitto@gmail.com>
Signed-off-by: NJosef Bacik <josef@redhat.com>

e65cbb94

Btrfs: don't panic if orphan item already exists · ee4d89f0

由 Josef Bacik 提交于 12月 13, 2011

I've been hitting this BUG_ON() in btrfs_orphan_add when running xfstest 269 in
a loop. This is because we will add an orphan item, do the truncate, the
truncate will fail for whatever reason (*cough*ENOSPC*cough*) and then we're
left with an orphan item still in the fs. Then we come back later to do another
truncate and it blows up because we already have an orphan item. This is ok so
just fix the BUG_ON() to only BUG() if ret is not EEXIST. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

ee4d89f0

Btrfs: fix leaked space in truncate · 7041ee97

由 Josef Bacik 提交于 12月 09, 2011

We were occasionaly leaking space when running xfstest 269. This is because if
we failed to start the transaction in the truncate loop we'd just goto out, but
we need to break so that the inode is removed from the orphan list and the space
is properly freed. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

7041ee97

Btrfs: fix how we do delalloc reservations and how we free reservations on error · 660d3f6c

由 Josef Bacik 提交于 12月 09, 2011

Running xfstests 269 with some tracing my scripts kept spitting out errors about
releasing bytes that we didn't actually have reserved. This took me down a huge
rabbit hole and it turns out the way we deal with reserved_extents is wrong,
we need to only be setting it if the reservation succeeds, otherwise the free()
method will come in and unreserve space that isn't actually reserved yet, which
can lead to other warnings and such. The math was all working out right in the
end, but it caused all sorts of other issues in addition to making my scripts
yell and scream and generally make it impossible for me to track down the
original issue I was looking for. The other problem is with our error handling
in the reservation code. There are two cases that we need to deal with

1) We raced with free. In this case free won't free anything because csum_bytes
is modified before we dro the lock in our reservation path, so free rightly
doesn't release any space because the reservation code may be depending on that
reservation. However if we fail, we need the reservation side to do the free at
that point since that space is no longer in use. So as it stands the code was
doing this fine and it worked out, except in case #2

2) We don't race with free. Nobody comes in and changes anything, and our
reservation fails. In this case we didn't reserve anything anyway and we just
need to clean up csum_bytes but not free anything. So we keep track of
csum_bytes before we drop the lock and if it hasn't changed we know we can just
decrement csum_bytes and carry on.

Because of the case where we can race with free()'s since we have to drop our
spin_lock to do the reservation, I'm going to serialize all reservations with
the i_mutex. We already get this for free in the heavy use paths, truncate and
file write all hold the i_mutex, just needed to add it to page_mkwrite and
various ioctl/balance things. With this patch my space leak scripts no longer
scream bloody murder. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

660d3f6c

Btrfs: deal with enospc from dirtying inodes properly · 22c44fe6

由 Josef Bacik 提交于 11月 30, 2011

Now that we're properly keeping track of delayed inode space we've been getting
a lot of warnings out of btrfs_dirty_inode() when running xfstest 83. This is
because a bunch of people call mark_inode_dirty, which is void so we can't
return ENOSPC. This needs to be fixed in a few areas

1) file_update_time - this updates the mtime and such when writing to a file,
which will call mark_inode_dirty. So copy file_update_time into btrfs so we can
call btrfs_dirty_inode directly and return an error if we get one appropriately.

2) fix symlinks to use btrfs_setattr for ->setattr. For some reason we weren't
setting ->setattr for symlinks, even though we should have been. This catches
one of the cases where we were getting errors in mark_inode_dirty.

3) Fix btrfs_setattr and btrfs_setsize to call btrfs_dirty_inode directly
instead of mark_inode_dirty. This lets us return errors properly for truncate
and chown/anything related to setattr.

4) Add a new btrfs_fs_dirty_inode which will just call btrfs_dirty_inode and
print an error if we have one. The only remaining user we can't control for
this is touch_atime(), but we don't really want to keep people from walking
down the tree if we don't have space to save the atime update, so just complain
but don't worry about it.

With this patch xfstests 83 complains a handful of times instead of hundreds of
times. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

22c44fe6

Btrfs: fix num_workers_starting bug and other bugs in async thread · 0dc3b84a

由 Josef Bacik 提交于 11月 18, 2011

Al pointed out we have some random problems with the way we account for
num_workers_starting in the async thread stuff.  First of all we need to make
sure to decrement num_workers_starting if we fail to start the worker, so make
__btrfs_start_workers do this.  Also fix __btrfs_start_workers so that it
doesn't call btrfs_stop_workers(), there is no point in stopping everybody if we
failed to create a worker.  Also check_pending_worker_creates needs to call
__btrfs_start_work in it's work function since it already increments
num_workers_starting.

People only start one worker at a time, so get rid of the num_workers argument
everywhere, and make btrfs_queue_worker a void since it will always succeed.
Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

0dc3b84a

15 12月, 2011 7 次提交

BTRFS: Establish i_ops before calling d_instantiate · ad19db71

由 Casey Schaufler 提交于 12月 15, 2011

The Smack LSM hook for security_d_instantiate checks
the inode's i_op->getxattr value to determine if the
containing filesystem supports extended attributes.
The BTRFS filesystem sets the inode's i_op value only
after it has instantiated the inode. This results in
Smack incorrectly giving new BTRFS inodes attributes
from the filesystem defaults on the assumption that
values can't be stored on the filesystem. This patch
moves the assignment of inode operation vectors ahead
of the calls to d_instantiate, letting Smack know that
the filesystem supports extended attributes. There
should be no impact on the performance or behavior of
BTRFS.
Signed-off-by: NCasey Schaufler <casey@schaufler-ca.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ad19db71

Btrfs: add a cond_resched() into the worker loop · 8f3b65a3

由 Chris Mason 提交于 12月 15, 2011

If we have a constant stream of end_io completions or crc work,
we can hit softlockup messages from the async helper threads.  This
adds a cond_resched() into the loop to avoid them.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

8f3b65a3

Btrfs: fix ctime update of on-disk inode · 306424cc

由 Li Zefan 提交于 12月 14, 2011

To reproduce the bug:

    # touch /mnt/tmp
    # stat /mnt/tmp | grep Change
    Change: 2011-12-09 09:32:23.412105981 +0800
    # chattr +i /mnt/tmp
    # stat /mnt/tmp | grep Change
    Change: 2011-12-09 09:32:43.198105295 +0800
    # umount /mnt
    # mount /dev/loop1 /mnt
    # stat /mnt/tmp | grep Change
    Change: 2011-12-09 09:32:23.412105981 +0800

We should update ctime of in-memory inode before calling
btrfs_update_inode().
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

306424cc

btrfs: keep orphans for subvolume deletion · f8e9e0b0

由 Arne Jansen 提交于 12月 14, 2011

Since we have the free space caches, btrfs_orphan_cleanup also runs for
the tree_root. Unfortunately this also cleans up the orphans used to mark
subvol deletions in progress.

Currently if a subvol deletion gets interrupted twice by umount/mount, the
deletion will not be continued and the space permanently lost, though it
would be possible to write a tool to recover those lost subvol deletions.
This patch checks if the orphan belongs to a subvol (dead root) and skips
the deletion.
Signed-off-by: NArne Jansen <sensille@gmx.net>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f8e9e0b0

Btrfs: fix inaccurate available space on raid0 profile · 39fb26c3

由 Miao Xie 提交于 12月 14, 2011

When we use raid0 as the data profile, df command may show us a very
inaccurate value of the available space, which may be much less than the
real one. It may make the users puzzled. Fix it by changing the calculation
of the available space, and making it be more similar to a fake chunk
allocation.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

39fb26c3

Btrfs: fix wrong disk space information of the files · 3642320e

由 Miao Xie 提交于 12月 14, 2011

Btrfsck report errors after the 83th case of xfstests was run, The error
number is 400, it means the used disk space of the file is wrong.

The reason of this bug is that:
The file truncation may fail when the space of the file system is not enough,
and leave some file extents, whose offset are beyond the end of the files.
When we want to expand those files, we will drop those file extents, and
put in dummy file extents, and then we should update the i-node. But btrfs
forgets to do it.

This patch adds the forgotten i-node update.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

3642320e

Btrfs: fix wrong i_size when truncating a file to a larger size · f4a2f4c5

由 Miao Xie 提交于 12月 14, 2011

Btrfsck report error 100 after the 83th case of xfstests was run, it means
the i_size of the file is wrong.

The reason of this bug is that:
Btrfs increased i_size of the file at the beginning, but it failed to expand
the file, and failed to update the i_size to the old size because there is no
enough space in the file system, so we found a wrong i_size.

This patch fixes this bug by updating the i_size just when we pass the file
expanding and get enough space to update i-node.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f4a2f4c5

10 12月, 2011 1 次提交

Btrfs: fix btrfs_end_bio to deal with write errors to a single mirror · 5dbc8fca

由 Chris Mason 提交于 12月 09, 2011

btrfs_end_bio checks the number of errors on a bio against the max
number of errors allowed before sending any EIOs up to the higher
levels.

If we got enough copies of the bio done for a given raid level, it is
supposed to clear the bio error flag and return success.

We have pointers to the original bio sent down by the higher layers and
pointers to any cloned bios we made for raid purposes.  If the original
bio happens to be the one that got an io error, but not the last one to
finish, it might not have the BIO_UPTODATE bit set.

Then, when the last bio does finish, we'll call bio_end_io on the
original bio.  It won't have the uptodate bit set and we'll end up
sending EIO to the higher layers.

We already had a check for this, it just was conditional on getting the
IO error on the very last bio.  Make the check unconditional so we eat
the EIOs properly.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

5dbc8fca

08 12月, 2011 4 次提交

Btrfs: drop spin lock when memory alloc fails · 1cf4ffdb

由 Liu Bo 提交于 12月 07, 2011

Drop spin lock in convert_extent_bit() when memory alloc fails,
otherwise, it will be a deadlock.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1cf4ffdb

Btrfs: check if the to-be-added device is writable · a5d16333

由 Li Zefan 提交于 12月 07, 2011

If we call ioctl(BTRFS_IOC_ADD_DEV) directly, we'll succeed in adding
a readonly device to a btrfs filesystem, and btrfs will write to
that device, emitting kernel errors:

[ 3109.833692] lost page write due to I/O error on loop2
[ 3109.833720] lost page write due to I/O error on loop2
...
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a5d16333

Btrfs: try cluster but don't advance in search list · 274bd4fb

由 Alexandre Oliva 提交于 12月 07, 2011

When we find an existing cluster, we switch to its block group as the
current block group, possibly skipping multiple blocks in the process.
Furthermore, under heavy contention, multiple threads may fail to
allocate from a cluster and then release just-created clusters just to
proceed to create new ones in a different block group.

This patch tries to allocate from an existing cluster regardless of its
block group, and doesn't switch to that group, instead proceeding to
try to allocate a cluster from the group it was iterating before the
attempt.
Signed-off-by: NAlexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

274bd4fb

Btrfs: try to allocate from cluster even at LOOP_NO_EMPTY_SIZE · 062c05c4

由 Alexandre Oliva 提交于 12月 07, 2011

If we reach LOOP_NO_EMPTY_SIZE, we won't even try to use a cluster that
others might have set up.  Odds are that there won't be one, but if
someone else succeeded in setting it up, we might as well use it, even
if we don't try to set up a cluster again.
Signed-off-by: NAlexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

062c05c4

01 12月, 2011 3 次提交

Btrfs: fix meta data raid-repair merge problem · f4a8e656

由 Jan Schmidt 提交于 12月 01, 2011

Commit 4a54c8c1 introduced raid-repair, killing the individual
readpage_io_failed_hook entries from inode.c and disk-io.c. Commit
4bb31e92 introduced new readahead code, adding a readpage_io_failed_hook to
disk-io.c.

The raid-repair commit had logic to disable raid-repair, if
readpage_io_failed_hook is set. Thus, the readahead commit effectively
disabled raid-repair for meta data.

This commit changes the logic to always attempt raid-repair when needed and
call the readpage_io_failed_hook in case raid-repair fails. This is much
more straight forward and should have been like that from the beginning.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Reported-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f4a8e656

Btrfs: skip allocation attempt from empty cluster · be064d11

由 Alexandre Oliva 提交于 11月 30, 2011

If we don't have a cluster, don't bother trying to allocate from it,
jumping right away to the attempt to allocate a new cluster.
Signed-off-by: NAlexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

be064d11

Btrfs: skip block groups without enough space for a cluster · 425d8315

由 Alexandre Oliva 提交于 11月 30, 2011

We test whether a block group has enough free space to hold the
requested block, but when we're doing clustered allocation, we can
save some cycles by testing whether it has enough room for the cluster
upfront, otherwise we end up attempting to set up a cluster and
failing. Only in the NO_EMPTY_SIZE loop do we attempt an unclustered
allocation, and by then we'll have zeroed the cluster size, so this
patch won't stop us from using the block group as a last resort.
Signed-off-by: NAlexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

425d8315

bug2833 / cloud-kernel 与 Fork 源项目一致

bug2833 / cloud-kernel
与 Fork 源项目一致