提交 · 2b8195bb5717729e4e94ab4ad73a543feaafb0a2 · openeuler / raspberrypi-kernel

21 2月, 2013 31 次提交

Btrfs: fix the race between bio and btrfs_stop_workers · 2b8195bb

由 Miao Xie 提交于 2月 07, 2013

open_ctree() need read the metadata to initialize the global information
of btrfs. But it may fail after it submit some bio, and then it will jump
to the error path. Unfortunately, it doesn't check if there are some bios
in flight, and just stop all the worker threads. As a result, when the
submitted bios end, they can not find any worker thread which can deal with
subsequent work, then oops happen.

kernel BUG at fs/btrfs/async-thread.c:605!

Fix this problem by invoking invalidate_inode_pages2() before we stop the
worker threads. This function will wait until the bio end because it need
lock the pages which are going to be invalidated, and if a page is under
disk read IO, it must be locked. invalidate_inode_pages2() need wait until
end bio handler to unlocked it.
Reported-and-Tested-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

2b8195bb

btrfs: add "no file data" flag to btrfs send ioctl · cb95e7bf

由 Mark Fasheh 提交于 2月 04, 2013

This patch adds the flag, BTRFS_SEND_FLAG_NO_FILE_DATA to the btrfs send
ioctl code. When this flag is set, the btrfs send code will never write file
data into the stream (thus also avoiding expensive reads of that data in the
first place). BTRFS_SEND_C_UPDATE_EXTENT commands will be sent (instead of
BTRFS_SEND_C_WRITE) with an offset, length pair indicating the extent in
question.

This patch does not affect the operation of BTRFS_SEND_C_CLONE commands -
they will continue to be sent when a search finds an appropriate extent to
clone from.
Signed-off-by: NMark Fasheh <mfasheh@suse.de>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

cb95e7bf

Btrfs: extend the checksum item as much as possible · 2f697dc6

由 Liu Bo 提交于 2月 04, 2013

For write, we also reserve some space for COW blocks during updating
the checksum tree, and we calculate the number of blocks by checking
if the number of bytes outstanding that are going to need csums needs
one more block for csum.

When we add these checksum into the checksum tree, we use ordered sums
list.
Every ordered sum contains csums for each sector, and we'll first try
to look up an existing csum item,
a) if we don't yet have a proper csum item, then we need to insert one,
b) or if we find one but the csum item is not big enough, then we need
to extend it.

The point is we'll unlock the whole path and then insert or extend.
So others can hack in and update the tree.

Each insert or extend needs update the tree with COW on, and we may need
to insert/extend for many times.

That means what we've reserved for updating checksum tree is NOT enough
indeed.

The case is even more serious with having several write threads at the
same time, it can end up eating our reserved space quickly and starting
eating globle reserve pool instead.

I don't yet come up with a way to calculate the worse case for updating
csum, but extending the checksum item as much as possible can be helpful
in my test.

The idea behind is that it can reduce the times we insert/extend so that
it saves us precious reserved space.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

2f697dc6

btrfs: remove cache only arguments from defrag path · de78b51a

由 Eric Sandeen 提交于 1月 31, 2013

The entry point at the defrag ioctl always sets "cache only" to 0;
the codepaths haven't run for a long time as far as I can
tell.  Chris says they're dead code, so remove them.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

de78b51a

Btrfs: if we aren't committing just end the transaction if we error out · e4a2bcac

由 Josef Bacik 提交于 2月 06, 2013

I hit a deadlock where transaction commit was waiting on num_writers to be
0. This happened because somebody came into btrfs_commit_transaction and
noticed we had aborted and it went to cleanup_transaction. This shouldn't
happen because cleanup_transaction is really to fixup a bad commit, it
doesn't do the normal trans handle cleanup things. So if we have an error
just do the normal btrfs_end_transaction dance and return. Once we are in
the actual commit path we can use cleanup_transaction and be good to go.
Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

e4a2bcac

Btrfs: handle errors in compression submission path · 3e04e7f1

由 Josef Bacik 提交于 2月 06, 2013

I noticed we would deadlock if we aborted a transaction while doing
compressed io. This is because we don't unlock our pages if something goes
horribly wrong. To fix this we need to make sure that we call
extent_clear_unlock_delalloc in order to unlock all the pages. If we have
to cow in the async submission thread we need to make sure to unlock our
locked_page as the cow error path will not unlock the locked page as it
depends on the caller to unlock that page. With this patch we no longer
deadlock on the page lock when we have an aborted transaction. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

3e04e7f1

Btrfs: rework the overcommit logic to be based on the total size · 70afa399

由 Josef Bacik 提交于 2月 06, 2013

People have been complaining about random ENOSPC errors that will clear up
after a umount or just a given amount of time. Chris was able to reproduce
this with stress.sh and lots of processes and so was I. Basically the
overcommit stuff would really let us get out of hand, in my tests I saw up
to 30 gigs of outstanding reservations with only 2 gigs total of metadata
space. This usually worked out fine but with so much outstanding
reservation the flushing stuff short circuits to make sure we don't hang
forever flushing when we really need ENOSPC. Plus we allocate chunks in
order to alleviate the pressure, but this doesn't actually help us since we
only use the non-allocated area in our over commit logic.

So instead of basing overcommit on the amount of non-allocated space,
instead just do it based on how much total space we have, and then limit it
to the non-allocated space in case we are short on space to spill over into.
This allows us to have the same performance as well as no longer giving
random ENOSPC. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

70afa399

Btrfs: account for orphan inodes properly during cleanup · 925396ec

由 Josef Bacik 提交于 2月 01, 2013

Dave sent me a panic where we were doing the orphan cleanup and panic'ed
trying to release our reservation from the orphan block rsv. The reason for
this is because our orphan block rsv had been free'd out from underneath us
because the transaction commit found that there were no orphan inodes
according to its count and decided to free it. This is incorrect so make
sure we inc the orphan inodes count so the accounting is all done properly.
This would also cause the warning in the orphan commit code normally if you
had any orphans to cleanup as they would only decrement the orphan count so
you'd get a negative orphan count which could cause problems during runtime.
Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

925396ec

Btrfs: unreserve space if our ordered extent fails to work · 0bec9ef5

由 Josef Bacik 提交于 1月 31, 2013

When a transaction aborts or there's an EIO on an ordered extent or any
error really we will not free up the space we reserved for this ordered
extent. This results in warnings from the block group cache cleanup in the
case of a transaction abort, or leaking space in the case of EIO on an
ordered extent. Fix this up by free'ing the reserved space if we have an
error at all trying to complete an ordered extent. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

0bec9ef5

Btrfs: fix how we discard outstanding ordered extents on abort · 779880ef

由 Josef Bacik 提交于 1月 31, 2013

When we abort we've been just free'ing up all the ordered extents and
hoping for the best. This results in lots of warnings from various places,
warnings from btrfs_destroy_inode() because it's ENOSPC accounting isn't
fixed. It will also screw up lots of pages who have been set private but
never get cleared because the ordered extents are never allowed to be
submitted. This patch fixes those warnings. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

779880ef

Btrfs: fix freeing delayed ref head while still holding its mutex · eb12db69

由 Josef Bacik 提交于 1月 30, 2013

I hit this error when reproducing a bug that would end in a transaction
abort. We take the delayed ref head's mutex to keep anybody from processing
it while we're destroying it, but we fail to drop the mutex before we carry
on and free the damned thing. Fix this by doing the remove logic for the
head ourselves and unlock the mutex, that way we can avoid use after free's
or hung tasks waiting on that mutex to come back so they know the delayed
ref completed. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

eb12db69

btrfs: ensure we don't overrun devices_info[] in __btrfs_alloc_chunk · 063d006f

由 Eric Sandeen 提交于 1月 31, 2013

WARN_ON isn't enough, we need to stop the loop if for any reason
we would overrun the devices_info array.

I tried to track down the connection between the length of
the alloc_devices list and the rw_devices counter but
it wasn't immediately obvious, so be defensive about it.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

063d006f

btrfs: remove unnecessary DEFINE_WAIT() declarations · 1971e917

由 Eric Sandeen 提交于 1月 31, 2013

No point in DEFINE_WAIT(wait) if it's not used!
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

1971e917

btrfs: remove unused "item" in btrfs_insert_delayed_item() · d4c0a7da

由 Eric Sandeen 提交于 1月 31, 2013

"item" was set but never used in this function.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

d4c0a7da

btrfs: fix varargs in __btrfs_std_error · 37252a66

由 Eric Sandeen 提交于 1月 31, 2013

__btrfs_std_error didn't always properly call va_end,
and might call va_start even if fmt was NULL.

Move all the varargs handling into the block where we
have fmt.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

37252a66

btrfs: add missing break in btrfs_print_leaf() · 0e636027

由 Eric Sandeen 提交于 1月 31, 2013

I don't think that BTRFS_DEV_EXTENT_KEY is supposed
to fall through to BTRFS_DEV_STATS_KEY ...
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

0e636027

btrfs: annotate intentional switch case fallthroughs · 1c697d4a

由 Eric Sandeen 提交于 1月 31, 2013

This keeps static checkers happy.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

1c697d4a

btrfs: handle null fs_info in btrfs_panic() · aa43a17c

由 Eric Sandeen 提交于 1月 31, 2013

At least backref_tree_panic() can apparently pass
in a null fs_info, so handle that in __btrfs_panic
to get the message out on the console.

The btrfs_panic macro also uses fs_info, but that's
largely pointless; it's testing to see if
BTRFS_MOUNT_PANIC_ON_FATAL_ERROR is not set.
But if it *were* set, __btrfs_panic() would have,
well, paniced and we wouldn't be here, testing it!
So just BUG() at this point.

And since we only use fs_info once now, just use it
directly.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

aa43a17c

E
btrfs: remove unused fs_info from btrfs_decode_error() · 5a016047
由 Eric Sandeen 提交于 1月 31, 2013
```
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
```
5a016047

btrfs: list_entry can't return NULL · d1d3cd27

由 Eric Sandeen 提交于 1月 31, 2013

No need to test the result, we can't get a
null pointer from list_entry()
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

d1d3cd27

btrfs: remove unused fd in btrfs_ioctl_send() · b4c6f7b7

由 Eric Sandeen 提交于 1月 31, 2013

All we do is set it to NULL and test it :)
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

b4c6f7b7

Btrfs: do not overcommit if we don't have enough space for global rsv · 96f1bb57

由 Josef Bacik 提交于 1月 30, 2013

Because of how little we allocate chunks now we can get really tight on
metadata space before we will allocate a new chunk. This resulted in being
unable to add device extents when allocating a new metadata chunk as we did
not have enough space. This is because we were allowed to overcommit too
much metadata without actually making sure we had enough space to make
allocations. The idea behind overcommit is that we are allowed to say "sure
you can have that reservation" when most of the free space is occupied by
reservations, not actual allocations. But in this case where a majority of
the total space is in use by actual allocations we can screw ourselves by
not being able to make real allocations when it matters. So make sure we
have enough real space for our global reserve, and if not then don't allow
overcommitting. Thanks,
Reported-and-tested-by: NJim Schutt <jaschut@sandia.gov>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

96f1bb57

Btrfs: remove extent mapping if we fail to add chunk · 0f5d42b2

由 Josef Bacik 提交于 1月 31, 2013

I got a double free error when unmounting a file system that failed to add a
chunk during its operation.  This is because we will kfree the mapping that
we created but leave the extent_map in the em_tree for chunks.  So to fix
this just remove the extent_map when we error out so we don't run into this
problem.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

0f5d42b2

Btrfs: fix chunk allocation error handling · 04487488

由 Josef Bacik 提交于 1月 29, 2013

If we error out allocating a dev extent we will have already created the
block group and such which will cause problems since the allocator may have
tried to allocate out of the block group that no longer exists. This will
cause BUG_ON()'s in the bio submission path. This also makes a failure to
allocate a dev extent a non-abort error, we will just clean up the dev
extents we did allocate and exit. Now if we fail to delete the dev extents
we will abort since we can't have half of the dev extents hanging around,
but this will make us much less likely to abort. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

04487488

Btrfs: use bit operation for ->fs_state · 87533c47

由 Miao Xie 提交于 1月 29, 2013

There is no lock to protect fs_info->fs_state, it will introduce
some problems, such as the value may be covered by the other task
when several tasks modify it. For example:
	Task0 - CPU0		Task1 - CPU1
	mov %fs_state rax
	or $0x1 rax
				mov %fs_state rax
				or $0x2 rax
	mov rax %fs_state
				mov rax %fs_state
The expected value is 3, but in fact, it is 2.

Though this problem doesn't happen now (because there is only one
flag currently), the code is error prone, if we add other flags,
the above problem will happen to a certainty.

Now we use bit operation for it to fix the above problem.
In this way, we can make the code more robust and be easy to
add new flags.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

87533c47

Btrfs: use seqlock to protect fs_info->avail_{data, metadata, system}_alloc_bits · de98ced9

由 Miao Xie 提交于 1月 29, 2013

There is no lock to protect
  fs_info->avail_{data, metadata, system}_alloc_bits,
it may introduce some problem, such as the wrong profile
information, so we add a seqlock to protect them.
Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

de98ced9

Btrfs: use the inode own lock to protect its delalloc_bytes · df0af1a5

由 Miao Xie 提交于 1月 29, 2013

We need not use a global lock to protect the delalloc_bytes of the
inode, just use its own lock. In this way, we can reduce the lock
contention and ->delalloc_lock will just protect delalloc inode
list.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

df0af1a5

Btrfs: use percpu counter for fs_info->delalloc_bytes · 963d678b

由 Miao Xie 提交于 1月 29, 2013

fs_info->delalloc_bytes is accessed very frequently, so use percpu
counter instead of the u64 variant for it to reduce the lock
contention.

This patch also fixed the problem that we access the variant
without the lock protection.At worst, we would not flush the
delalloc inodes, and just return ENOSPC error when we still have
some free space in the fs.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

963d678b

Btrfs: use percpu counter for dirty metadata count · e2d84521

由 Miao Xie 提交于 1月 29, 2013

->dirty_metadata_bytes is accessed very frequently, so use percpu
counter instead of the u64 variant to reduce the contention of
the lock.

This patch also fixed the problem that we access it without
lock protection in __btrfs_btree_balance_dirty(), which may
cause we skip the dirty pages flush.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

e2d84521

Btrfs: protect fs_info->alloc_start · c018daec

由 Miao Xie 提交于 1月 29, 2013

fs_info->alloc_start is a 64bits variant, can be accessed by
multi-task, but it is not protected strictly, it can be changed
while we are accessing it. On 32bit machine, we will get wrong
value because we access it by two instructions.(In fact, it is
also possible that the same problem happens on the 64bit machine,
because the compiler may split the 64bit operation into two 32bit
operation.)

For example:
Assuming -> alloc_start is 0x0000 0000 0001 0000 at the beginning,
then we remount and set ->alloc_start to 0x0000 0100 0000 0000.
	Task0 			Task1
				load high 32 bits
	set high 32 bits
	set low 32 bits
				load low 32 bits

Task1 will get 0.

This patch fixes this problem by using two locks to protect it
	fs_info->chunk_mutex
	sb->s_umount
On the read side, we just need get one of these two locks, and on
the write side, we must lock all of them.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

c018daec

Btrfs: add a comment for fs_info->max_inline · 8c6a3ee6

由 Miao Xie 提交于 1月 29, 2013

Though ->max_inline is a 64bit variant, and may be accessed by
multi-task, but it is just suggestive number, so we needn't add
anything to protect fs_info->max_inline, just add a comment to
explain wny we don't use a lock to protect it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

8c6a3ee6

20 2月, 2013 9 次提交

Btrfs: move fs/btrfs/ioctl.h to include/uapi/linux/btrfs.h · 55e301fd

由 Filipe Brandenburger 提交于 1月 29, 2013

The header file will then be installed under /usr/include/linux so that
userspace applications can refer to Btrfs ioctls by name and use the same
structs used internally in the kernel.
Signed-off-by: NFilipe Brandenburger <filbranden@google.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

55e301fd

Btrfs: Check CAP_DAC_READ_SEARCH for BTRFS_IOC_INO_PATHS · 82b22ac8

由 Kusanagi Kouichi 提交于 1月 28, 2013

CAP_DAC_READ_SEARCH overrides read and search permission check on
file and directory. It seems fit for BTRFS_IOC_INO_PATHS.
Signed-off-by: NKusanagi Kouichi <slash@ac.auone-net.jp>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

82b22ac8

Revert "Btrfs: fix permissions of empty files not affected by umask" · fe5fafbe

由 Josef Bacik 提交于 1月 24, 2013

This reverts commit 2794ed01.

Wasn't supposed to get used in btrfs_mknod, it was supposed to be in
btrfs_create, which was done in commit
9185aa58.
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

fe5fafbe

Btrfs: don't traverse the ordered operation list repeatedly · 5b947f1b

由 Miao Xie 提交于 1月 22, 2013

btrfs_run_ordered_operations() needn't traverse the ordered operation list
repeatedly, it is because the transaction commiter will invoke it again when
there is no other writer in this transaction, it can ensure that no one can
add new objects into the ordered operation list.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

5b947f1b

Btrfs: traverse and flush the delalloc inodes once · 63607cc8

由 Miao Xie 提交于 1月 22, 2013

btrfs_start_delalloc_inodes() needn't traverse and flush the delalloc inodes
repeatedly. It is because we can regard the data that the users write after
we start delalloc inodes flush as the one which is after the delalloc inodes
flush is done, and we can flush it next time.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

63607cc8

Btrfs: check the return value of btrfs_run_ordered_operations() · eebc6084

由 Miao Xie 提交于 1月 22, 2013

We forget to check the return value of btrfs_run_ordered_operations() when
flushing all the pending stuffs, fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

eebc6084

Btrfs: check the return value of btrfs_start_delalloc_inodes() · 3edb2a68

由 Miao Xie 提交于 1月 22, 2013

We forget to check the return value of btrfs_start_delalloc_inodes(), fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

3edb2a68

Btrfs: make raid attr array more readable · e6ec716f

由 Miao Xie 提交于 1月 17, 2013

The current code of raid attr arry is hard to understand and it is easy to
introduce some problem if we modify the array. So I changed it and made it
more readable.

Cc: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

e6ec716f

Btrfs: record first logical byte in memory · a1897fdd

由 Liu Bo 提交于 12月 27, 2012

This'd save us a rbtree search which may become expensive in large filesystem.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

a1897fdd