提交 · e2d845211eda9cf296e8edf6724b3d541f4fbfd5 · openanolis / cloud-kernel

21 2月, 2013 3 次提交

Btrfs: use percpu counter for dirty metadata count · e2d84521

由 Miao Xie 提交于 1月 29, 2013

->dirty_metadata_bytes is accessed very frequently, so use percpu
counter instead of the u64 variant to reduce the contention of
the lock.

This patch also fixed the problem that we access it without
lock protection in __btrfs_btree_balance_dirty(), which may
cause we skip the dirty pages flush.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

e2d84521

Btrfs: protect fs_info->alloc_start · c018daec

由 Miao Xie 提交于 1月 29, 2013

fs_info->alloc_start is a 64bits variant, can be accessed by
multi-task, but it is not protected strictly, it can be changed
while we are accessing it. On 32bit machine, we will get wrong
value because we access it by two instructions.(In fact, it is
also possible that the same problem happens on the 64bit machine,
because the compiler may split the 64bit operation into two 32bit
operation.)

For example:
Assuming -> alloc_start is 0x0000 0000 0001 0000 at the beginning,
then we remount and set ->alloc_start to 0x0000 0100 0000 0000.
	Task0 			Task1
				load high 32 bits
	set high 32 bits
	set low 32 bits
				load low 32 bits

Task1 will get 0.

This patch fixes this problem by using two locks to protect it
	fs_info->chunk_mutex
	sb->s_umount
On the read side, we just need get one of these two locks, and on
the write side, we must lock all of them.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

c018daec

Btrfs: add a comment for fs_info->max_inline · 8c6a3ee6

由 Miao Xie 提交于 1月 29, 2013

Though ->max_inline is a 64bit variant, and may be accessed by
multi-task, but it is just suggestive number, so we needn't add
anything to protect fs_info->max_inline, just add a comment to
explain wny we don't use a lock to protect it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

8c6a3ee6

20 2月, 2013 26 次提交

Btrfs: move fs/btrfs/ioctl.h to include/uapi/linux/btrfs.h · 55e301fd

由 Filipe Brandenburger 提交于 1月 29, 2013

The header file will then be installed under /usr/include/linux so that
userspace applications can refer to Btrfs ioctls by name and use the same
structs used internally in the kernel.
Signed-off-by: NFilipe Brandenburger <filbranden@google.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

55e301fd

Btrfs: Check CAP_DAC_READ_SEARCH for BTRFS_IOC_INO_PATHS · 82b22ac8

由 Kusanagi Kouichi 提交于 1月 28, 2013

CAP_DAC_READ_SEARCH overrides read and search permission check on
file and directory. It seems fit for BTRFS_IOC_INO_PATHS.
Signed-off-by: NKusanagi Kouichi <slash@ac.auone-net.jp>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

82b22ac8

Revert "Btrfs: fix permissions of empty files not affected by umask" · fe5fafbe

由 Josef Bacik 提交于 1月 24, 2013

This reverts commit 2794ed01.

Wasn't supposed to get used in btrfs_mknod, it was supposed to be in
btrfs_create, which was done in commit
9185aa58.
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

fe5fafbe

Btrfs: don't traverse the ordered operation list repeatedly · 5b947f1b

由 Miao Xie 提交于 1月 22, 2013

btrfs_run_ordered_operations() needn't traverse the ordered operation list
repeatedly, it is because the transaction commiter will invoke it again when
there is no other writer in this transaction, it can ensure that no one can
add new objects into the ordered operation list.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

5b947f1b

Btrfs: traverse and flush the delalloc inodes once · 63607cc8

由 Miao Xie 提交于 1月 22, 2013

btrfs_start_delalloc_inodes() needn't traverse and flush the delalloc inodes
repeatedly. It is because we can regard the data that the users write after
we start delalloc inodes flush as the one which is after the delalloc inodes
flush is done, and we can flush it next time.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

63607cc8

Btrfs: check the return value of btrfs_run_ordered_operations() · eebc6084

由 Miao Xie 提交于 1月 22, 2013

We forget to check the return value of btrfs_run_ordered_operations() when
flushing all the pending stuffs, fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

eebc6084

Btrfs: check the return value of btrfs_start_delalloc_inodes() · 3edb2a68

由 Miao Xie 提交于 1月 22, 2013

We forget to check the return value of btrfs_start_delalloc_inodes(), fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

3edb2a68

Btrfs: make raid attr array more readable · e6ec716f

由 Miao Xie 提交于 1月 17, 2013

The current code of raid attr arry is hard to understand and it is easy to
introduce some problem if we modify the array. So I changed it and made it
more readable.

Cc: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

e6ec716f

Btrfs: record first logical byte in memory · a1897fdd

由 Liu Bo 提交于 12月 27, 2012

This'd save us a rbtree search which may become expensive in large filesystem.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

a1897fdd

Btrfs: save us a read_lock · 39f9d028

由 Liu Bo 提交于 12月 27, 2012

This does not change the logic of code, but can save us a read_lock.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

39f9d028

Btrfs: use token to avoid times mapping extent buffer · 51fab693

由 Liu Bo 提交于 12月 27, 2012

The API in tree log code has done sort of changes, and it proves that
we can benifit from using token, so do the same thing here.

function_graph tracer's timer shows that it costs nearly half time
of before(39.788us -> 22.391us).
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

51fab693

Btrfs: kill unused argument of btrfs_pin_extent_for_log_replay · dcfac415

由 Liu Bo 提交于 12月 27, 2012

Argument 'trans' is not used any more.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

dcfac415

Btrfs: kill unused argument of update_block_group · c53d613e

由 Liu Bo 提交于 12月 27, 2012

Argument 'trans' is not used any more.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

c53d613e

Btrfs: kill unused arguments of cache_block_group · f6373bf3

由 Liu Bo 提交于 12月 27, 2012

Argument 'trans' and 'root' are not used any more.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

f6373bf3

Btrfs: remove deprecated comments · 17b85495

由 Liu Bo 提交于 12月 27, 2012

commit d53ba474
(Btrfs: use commit root when loading free space cache) has remove
the deadlock check, and the related comments can be removed as well.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

17b85495

Btrfs: don't re-enter when allocating a chunk · c6b305a8

由 Josef Bacik 提交于 12月 18, 2012

If we start running low on metadata space we will try to allocate a chunk,
which could then try to allocate a chunk to add the device entry. The thing
is we allocate a chunk before we try really hard to make the allocation, so
we should be able to find space for the device entry. Add a flag to the
trans handle so we know we're currently allocating a chunk so we can just
bail out if we try to allocate another chunk. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

c6b305a8

Btrfs: wait on ordered extents at the last possible moment · 2ab28f32

由 Josef Bacik 提交于 10月 12, 2012

Since we don't actually copy the extent information from the source tree in
the fast case we don't need to wait for ordered io to be completed in order
to fsync, we just need to wait for the io to be completed. So when we're
logging our file just attach all of the ordered extents to the log, and then
when the log syncs just wait for IO_DONE on the ordered extents and then
write the super. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

2ab28f32

Btrfs: fix trivial error in btrfs_ioctl_resize() · dfd79829

由 Miao Xie 提交于 12月 21, 2012

This patch fixes the following problem:
- improper return value
- unnecessary read-only check
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

dfd79829

Btrfs: use wrapper page_offset · 4eee4fa4

由 Miao Xie 提交于 12月 21, 2012

Use wrapper page_offset to get byte-offset into filesystem object for page.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

4eee4fa4

Btrfs: flush all dirty inodes if writeback can not start · da633a42

由 Miao Xie 提交于 12月 20, 2012

We may try to flush some dirty pages when there is no enough space to reserve.
But it is possible that this operation fails, in order to get enough space to
reserve successfully, we will sync all the delalloc file. This operation is
safe, we needn't worry about the case that the filesystem goes from r/w to r/o.
because the filesystem should guarantee all the dirty pages have been written
into the disk after it becomes readonly, so the sync operation will do nothing
if the filesystem is already readonly. Though it may waste lots of time,
as a corner case, we needn't care.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

da633a42

Btrfs: make delayed ref lock logic more readable · 093486c4

由 Miao Xie 提交于 12月 19, 2012

Locking and unlocking delayed ref mutex are in the different functions,
and the name of lock functions is not uniform, so the readability is not
so good, this patch optimizes the lock logic and makes it more readable.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

093486c4

Btrfs: fix lots of orphan inodes when the space is not enough · 0e8c36a9

由 Miao Xie 提交于 12月 19, 2012

We're running into having 50-100 orphans left over with xfstests 83
because of ENOSPC when trying to start the transaction for the inode update.
But in fact, it makes no sense in updating the inode for the new size while
we're deleting the stupid thing. This patch fixes this problem.
Reported-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

0e8c36a9

Btrfs: cleanup similar code in delayed inode · 4ea41ce0

由 Miao Xie 提交于 12月 19, 2012

The delayed item commit code in several functions is similar, so
cleanup it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

4ea41ce0

Btrfs: use common work instead of delayed work · 7892b5af

由 Miao Xie 提交于 11月 15, 2012

Since we do not want to delay the async transaction commit, we should
use common work, not delayed work.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>

7892b5af

Btrfs: cleanup unnecessary clear when freeing a transaction or a trans handle · 7b5a1c53

由 Miao Xie 提交于 11月 15, 2012

We clear the transaction object and the trans handle when they are about to be
freed, it is unnecessary, cleanup it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>

7b5a1c53

Btrfs: use slabs for delayed reference allocation · 78a6184a

由 Miao Xie 提交于 11月 21, 2012

The delayed reference allocation is in the fast path of the IO, so use slabs
to improve the speed of the allocation.

And besides that, it can do check for leaked objects when the module is removed.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>

78a6184a

16 2月, 2013 1 次提交

btrfs: access superblock via pagecache in scan_one_device · 6f60cbd3

由 David Sterba 提交于 2月 15, 2013

btrfs_scan_one_device is calling set_blocksize() which can race
with a concurrent process making dirty page cache pages.  It can end up
dropping dirty page cache pages on the floor, which isn't very nice when
someone is just running btrfs dev scan to find filesystems on the
box.

Now that udev is registering btrfs devices as it discovers them, we can
actually end up racing with our own mkfs program too.  When this
happens, we drop some of the important blocks written by mkfs.

This commit changes scan_one_device to read the super out of the page
cache instead of trying to use bread.  This way we don't have to care
about the blocksize of the device.

This also drops the invalidate_bdev() call.  It wasn't very polite to
invalidate during the scan either.  mkfs is putting the super into the
page cache, there's no reason to invalidate at this point.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

6f60cbd3

15 2月, 2013 1 次提交

Btrfs: fix crash in log replay with qgroups enabled · 2a745b14

由 Arne Jansen 提交于 2月 13, 2013

When replaying a log tree with qgroups enabled, tree_mod_log_rewind does a
sanity-check of the number of items against the maximum possible number.
It calculates that number with the nodesize of fs_root. Unfortunately
fs_root is not yet set at this stage. So instead use the nodesize from
tree_root, which is already initialized.
Signed-off-by: NArne Jansen <sensille@gmx.net>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

2a745b14

07 2月, 2013 1 次提交

Btrfs: move d_instantiate outside the transaction during mksubvol · 1a65e24b

由 Chris Mason 提交于 2月 06, 2013

Dave Sterba triggered a lockdep complaint about lock ordering
between the sb_internal lock and the cleaner semaphore.

btrfs_lookup_dentry() checks for orphans if we're looking up
the inode for a subvolume, and subvolume creation is triggering
the lookup with a transaction running.

This commit moves the d_instantiate after the transaction closes.
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

1a65e24b

06 2月, 2013 8 次提交

Btrfs: fix EDQUOT handling in btrfs_delalloc_reserve_metadata · eb6b88d9

由 Jan Schmidt 提交于 1月 27, 2013

When btrfs_qgroup_reserve returned a failure, we were missing a counter
operation for BTRFS_I(inode)->outstanding_extents++, leading to warning
messages about outstanding extents and space_info->bytes_may_use != 0.
Additionally, the error handling code didn't take into account that we
dropped the inode lock which might require more cleanup.

Luckily, all the cleanup code we need is already there and can be shared
with reserve_metadata_bytes, which is exactly what this patch does.
Reported-by: NLev Vainblat <lev@zadarastorage.com>
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

eb6b88d9

C

Merge git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git for-chris into for-linus · 24f8ebe9
由 Chris Mason 提交于 2月 05, 2013

24f8ebe9

Btrfs: fix possible stale data exposure · 59fe4f41

由 Josef Bacik 提交于 1月 30, 2013

We specifically do not update the disk i_size if there are ordered extents
outstanding for any area between the current disk_i_size and our ordered
extent so that we do not expose stale data. The problem is the check we
have only checks if the ordered extent starts at or after the current
disk_i_size, which doesn't take into account an ordered extent that starts
before the current disk_i_size and ends past the disk_i_size. Fix this by
checking if the extent ends past the disk_i_size. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

59fe4f41

Btrfs: fix missing i_size update · 5d1f4020

由 Josef Bacik 提交于 1月 30, 2013

If we have an ordered extent before the ordered extent we are currently
completing that is after the current disk_i_size we will put our i_size
update into that ordered extent so that we do not expose stale data. The
problem is that if our disk i_size is updated past the previous ordered
extent we won't update the i_size with the pending i_size update. So check
the pending i_size update and if its above the current disk i_size we need
to go ahead and try to update. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

5d1f4020

Btrfs: fix race between snapshot deletion and getting inode · 6f1c3605

由 Liu Bo 提交于 1月 29, 2013

While running snapshot testscript created by Mitch and David,
the race between autodefrag and snapshot deletion can lead to
corruption of dead_root list so that we can get crash on
btrfs_clean_old_snapshots().

And besides autodefrag, scrub also does the same thing, ie. read
root first and get inode.

Here is the story(take autodefrag as an example):
(1) when we delete a snapshot or subvolume, it will set its root's
refs to zero and do a iput() on its own inode, and if this inode happens
to be the only active in-meory one in root's inode rbtree, it will add
itself to the global dead_roots list for later cleanup.

(2) after (1), the autodefrag thread may read another inode for defrag
and the inode is just in the deleted snapshot/subvolume, but all of these
are without checking if the root is still valid(refs > 0).  So the end up
result is adding the deleted snapshot/subvolume's root to the global
dead_roots list AGAIN.

Fortunately, we already have a srcu lock to avoid the race, ie. subvol_srcu.

So all we need to do is to take the lock to protect 'read root and get inode',
since we synchronize to wait for the rcu grace period before adding something
to the global dead_roots list.
Reported-by: NMitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

6f1c3605

Btrfs: fix missing release of the space/qgroup reservation in start_transaction() · 843fcf35

由 Miao Xie 提交于 1月 28, 2013

When we fail to start a transaction, we need to release the reserved free space
and qgroup space, fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Reviewed-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

843fcf35

Btrfs: fix wrong sync_writers decrement in btrfs_file_aio_write() · 0a3404dc

由 Miao Xie 提交于 1月 28, 2013

If the checks at the beginning of btrfs_file_aio_write() fail, we needn't
decrease ->sync_writers, because we have not increased it. Fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

0a3404dc

Btrfs: do not merge logged extents if we've removed them from the tree · 222c81dc

由 Josef Bacik 提交于 1月 28, 2013

You can run into this problem where if somebody is fsyncing and writing out
the existing extents you will have removed the extent map from the em tree,
but it's still valid for the current fsync so we go ahead and write it. The
problem is we unconditionally try to merge it back into the em tree, but if
we've removed it from the em tree that will cause use after free problems.
Fix this to only merge if we are still a part of the tree. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

222c81dc

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功