提交 · 9be3395bcd4ad4af76476ac38152b4cafa6b6159 · openeuler / raspberrypi-kernel

18 5月, 2013 1 次提交

Btrfs: use a btrfs bioset instead of abusing bio internals · 9be3395b

由 Chris Mason 提交于 5月 17, 2013

Btrfs has been pointer tagging bi_private and using bi_bdev
to store the stripe index and mirror number of failed IOs.

As bios bubble back up through the call chain, we use these
to decide if and how to retry our IOs.  They are also used
to count IO failures on a per device basis.

Recently a bio tracepoint was added lead to crashes because
we were abusing bi_bdev.

This commit adds a btrfs bioset, and creates explicit fields
for the mirror number and stripe index.  The plan is to
extend this structure for all of the fields currently in
struct btrfs_bio, which will mean one less kmalloc in
our IO path.
Signed-off-by: NChris Mason <chris.mason@fusionio.com>
Reported-by: NTejun Heo <tj@kernel.org>

9be3395b

07 5月, 2013 39 次提交

Btrfs: allow superblock mismatch from older mkfs · 667e7d94

由 Chris Mason 提交于 5月 07, 2013

We've added new checks to make sure the super block crc is correct
during mount.  A fresh filesystem from an older mkfs won't have the
crc set.  This adds a warning when it finds a newly created filesystem
but doesn't fail the mount.
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

667e7d94

btrfs: enhance superblock checks · 1104a885

由 David Sterba 提交于 3月 06, 2013

The superblock checksum is not verified upon mount. <awkward silence>

Add that check and also reorder existing checks to a more logical
order.

Current mkfs.btrfs does not calculate the correct checksum of
super_block and thus a freshly created filesytem will fail to mount when
this patch is applied.

First transaction commit calculates correct superblock checksum and
saves it to disk.

Reproducer:
$ mfks.btrfs /dev/sda
$ mount /dev/sda /mnt
$ btrfs scrub start /mnt
$ sleep 5
$ btrfs scrub status /mnt
... super:2 ...
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

1104a885

btrfs: fix misleading variable name for flags · b6919a58

由 David Sterba 提交于 4月 29, 2013

The variable was named 'data' in btrfs_reserve_extent and that's the
only function that actually uses it to let btrfs_get_alloc_profile know
what profile we want. Then it's passed down as u64 flags.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

b6919a58

D
btrfs: use unsigned long type for extent state bits · 41074888
由 David Sterba 提交于 4月 29, 2013
```
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
```
41074888

Btrfs: improve the loop of scrub_stripe · 625f1c8d

由 Liu Bo 提交于 4月 27, 2013

1) Right now scrub_stripe() is looping in some unnecessary cases:
* when the found extent item's objectid has been out of the dev extent's range
  but we haven't finish scanning all the range within the dev extent
* when all the items has been processed but we haven't finish scanning all the
  range within the dev extent

In both cases, we can just finish the loop to save costs.

2) Besides, when the found extent item's length is larger than the stripe
len(64k), we don't have to release the path and search again as it'll get at the
same key used in the last loop, we can instead increase the logical cursor in
place till all space of the extent is scanned.

3) And we use 0 as the key's offset to search btree, then get to previous item
to find a smaller item, and again have to move to the next one to get the right
item.  Setting offset=-1 and previous_item() is the correct way.

4) As we won't find any checksum at offset unless this 'offset' is in a data
extent, we can just find checksum when we're really going to scrub an extent.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

625f1c8d

btrfs: read entire device info under lock · 55793c0d

由 David Sterba 提交于 4月 26, 2013

There's a theoretical possibility of reading stale (or even more
theoretically, freed) data from DEV_INFO ioctl when the device would
disappear between an early mutex unlock and data being copied from the
device structure.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

55793c0d

D
btrfs: remove unused gfp mask parameter from release_extent_buffer callchain · f7a52a40
由 David Sterba 提交于 4月 26, 2013
```
It's unused since 0b32f4bb.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
```
f7a52a40

btrfs: handle errors returned from get_tree_block_key · 34c2b290

由 David Sterba 提交于 4月 26, 2013

Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Reviewed-by: NZach Brown <zab@redhat.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

34c2b290

btrfs: make static code static & remove dead code · 48a3b636

由 Eric Sandeen 提交于 4月 25, 2013

Big patch, but all it does is add statics to functions which
are in fact static, then remove the associated dead-code fallout.

removed functions:

btrfs_iref_to_path()
__btrfs_lookup_delayed_deletion_item()
__btrfs_search_delayed_insertion_item()
__btrfs_search_delayed_deletion_item()
find_eb_for_page()
btrfs_find_block_group()
range_straddles_pages()
extent_range_uptodate()
btrfs_file_extent_length()
btrfs_scrub_cancel_devid()
btrfs_start_transaction_lflush()

btrfs_print_tree() is left because it is used for debugging.
btrfs_start_transaction_lflush() and btrfs_reada_detach() are
left for symmetry.

ulist.c functions are left, another patch will take care of those.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

48a3b636

Btrfs: deal with errors in write_dev_supers · 634554dc

由 Josef Bacik 提交于 4月 29, 2013

If you try to mount -o loop a restored file system it will panic if the file
ends up being smaller than the original disk. This is because we go to try and
get a block for a super that may be past the EOF which makes __getblk return
NULL for a buffer head when we aren't expecting it to. Fix this by dealing with
this case and just jacking up the errors count. With this patch we no longer
panic when mounting a restored file system loopback. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

634554dc

Btrfs: remove almost all of the BUG()'s from tree-log.c · 3650860b

由 Josef Bacik 提交于 4月 25, 2013

There were a whole bunch and I was doing it for other things. I haven't tested
these error paths but at the very least this is better than panicing. I've only
left 2 BUG_ON()'s since they are logic errors and I want to replace them with a
ASSERT framework that we can compile out for production users. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

3650860b

Btrfs: deal with free space cache errors while replaying log · b50c6e25

由 Josef Bacik 提交于 4月 25, 2013

So everybody who got hit by my fsync bug will still continue to hit this
BUG_ON() in the free space cache, which is pretty heavy handed. So I took a
file system that had this bug and fixed up all the BUG_ON()'s and leaks that
popped up when I tried to mount a broken file system like this. With this patch
we just fail to mount instead of panicing. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

b50c6e25

Btrfs: automatic rescan after "quota enable" command · 3d7b5a28

由 Jan Schmidt 提交于 4月 25, 2013

When qgroup tracking is enabled, we do an automatic cycle of the new rescan
mechanism.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

3d7b5a28

Btrfs: rescan for qgroups · 2f232036

由 Jan Schmidt 提交于 4月 25, 2013

If qgroup tracking is out of sync, a rescan operation can be started. It
iterates the complete extent tree and recalculates all qgroup tracking data.
This is an expensive operation and should not be used unless required.

A filesystem under rescan can still be umounted. The rescan continues on the
next mount. Status information is provided with a separate ioctl while a
rescan operation is in progress.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

2f232036

Btrfs: split btrfs_qgroup_account_ref into four functions · 46b665ce

由 Jan Schmidt 提交于 4月 25, 2013

The function is separated into a preparation part and the three accounting
steps mentioned in the qgroups documentation. The goal is to make steps two
and three usable by the rescan functionality. A side effect is that the
function is restructured into readable subunits.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

46b665ce

Btrfs: allocate new chunks if the space is not enough for global rsv · 3c76cd84

由 Miao Xie 提交于 4月 25, 2013

When running the 208th of xfstests, the fs returned the enospc
error when there was lots of free space in the disk.

By bisect debug, we found it was introduced by commit 96f1bb57.
This commit makes the space check for the global reservation in
can_overcommit() be inconsistent with should_alloc_chunk().
can_overcommit() requires that the free space is 2 times the size
of the global reservation, or we can't do overcommit. And instead,
we need reclaim some reserved space, and if we still don't have
enough free space, we need allocate a new chunk. But unfortunately,
should_alloc_chunk() just requires that the free space is 1 time
the size of the global reservation, that is we would not try to
allocate a new chunk if the free space size is in the middle of
these two requires, and just return the enospc error. Fix it.

Cc: Jim Schutt <jaschut@sandia.gov>
Cc: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

3c76cd84

Btrfs: separate sequence numbers for delayed ref tracking and tree mod log · fc36ed7e

由 Jan Schmidt 提交于 4月 24, 2013

Sequence numbers for delayed refs have been introduced in the first version
of the qgroup patch set. To solve the problem of find_all_roots on a busy
file system, the tree mod log was introduced. The sequence numbers for that
were simply shared between those two users.

However, at one point in qgroup's quota accounting, there's a statement
accessing the previous sequence number, that's still just doing (seq - 1)
just as it would have to in the very first version.

To satisfy that requirement, this patch makes the sequence number counter 64
bit and splits it into a major part (used for qgroup sequence number
counting) and a minor part (incremented for each tree modification in the
log). This enables us to go exactly one major step backwards, as required
for qgroups, while still incrementing the sequence counter for tree mod log
insertions to keep track of their order. Keeping them in a single variable
means there's no need to change all the code dealing with comparisons of two
sequence numbers.

The sequence number is reset to 0 on commit (not new in this patch), which
ensures we won't overflow the two 32 bit counters.

Without this fix, the qgroup tracking can occasionally go wrong and WARN_ONs
from the tree mod log code may happen.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

fc36ed7e

btrfs: move leak debug code to functions · 6d49ba1b

由 Eric Sandeen 提交于 4月 22, 2013

Clean up the leak debugging in extent_io.c by moving
the debug code into functions.  This also removes the
list_heads used for debugging from the extent_buffer
and extent_state structures when debug is not enabled.

Since we need a global debug config to do that last
part, implement CONFIG_BTRFS_DEBUG to accommodate.

Thanks to Dave Sterba for the Kconfig bit.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

6d49ba1b

Btrfs: return free space in cow error path · ace68bac

由 Liu Bo 提交于 4月 22, 2013

Replace some BUG_ONs with proper handling and take allocated space back to
free space cache for later use.

We don't have to worry about extent maps since they'd be freed in releasepage
path.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

ace68bac

Btrfs: set UUID in root_item for created trees · 6463fe58

由 Stefan Behrens 提交于 4月 19, 2013

It is a rare exception that a new tree is created, like the qgroups
tree. So far these new trees have an all-zero UUID in their root
items. All trees that mkfs.btrfs has created get an UUID during the
first mount when btrfs_read_root_item() rewrites the root_item to
the v2 structure style. These UUID are never used so far, but
anyway, since it is better to have it uniform for all trees, this
commit adds some lines that generate and write an UUID for newly
created trees.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

6463fe58

S
Btrfs: delete unused parameter to btrfs_read_root_item() · 5fbf83c1
由 Stefan Behrens 提交于 4月 19, 2013
```
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
```
5fbf83c1

Btrfs: fix error handling in btrfs_ioctl_send() · ecc7ada7

由 Tsutomu Itoh 提交于 4月 19, 2013

fget() returns NULL if error. So, we should check NULL or not.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

ecc7ada7

Btrfs: remove unused variable in __process_changed_new_xattr() · ba1eeaac

由 Tsutomu Itoh 提交于 4月 18, 2013

Variable 'p' is not used any more. So, remove it.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

ba1eeaac

Btrfs: various abort cleanups · 54067ae9

由 Josef Bacik 提交于 4月 25, 2013

I have a broken file system that when it aborts leaves all sorts of accounting
things wrong and gives you lots of WARN_ON()'s other than the abort. This is
because we're not cleaning up various parts of the file system when we abort.
The first chunks are specific to mount failures, we weren't cleaning up the
block group cached inodes and we weren't cleaning up any transactions that had
been aborted, which leaves a bunch of things laying around.

The second half of this are related to the cleanup parts. First we don't need
to release space for the dirty pages from the trans_block_rsv, that's all
handled by the trans handles so this is just plain wrong. The other thing is we
need to pin down extents that were set ->must_insert_reserved for delayed refs.
This isn't so much for the pinning but more for the cleaning up the
cache->reserved counter since we are no longer going to use those reserved
bytes. With this patch I no longer see a bunch of WARN_ON()'s when I try to
mount this broken file system, just the initial one from the abort. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

54067ae9

Btrfs: cleanup destroy_marked_extents · fd8b2b61

由 Josef Bacik 提交于 4月 24, 2013

We can just look up the extent_buffers for the range and free stuff that way.
This makes the cleanup a bit cleaner and we can make sure to evict the
extent_buffers pretty quickly by marking them as stale. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

fd8b2b61

Btrfs: check return value of commit when recovering log · abefa55a

由 Josef Bacik 提交于 4月 24, 2013

We need to check the return value of the commit in case something goes wrong,
otherwise we could end up going down the line and doing more stuff (like orphan
cleanup) before we notice we should have errored out. We need to do this before
we free up the log_tree_root since the caller will handle all of that. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

abefa55a

Btrfs: don't panic if we're trying to drop too many refs · 32b02538

由 Josef Bacik 提交于 4月 24, 2013

This is just obnoxious.  Just print a message, abort the transaction, and return
an error.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

32b02538

Btrfs: cleanup fs roots if we fail to mount · 171f6537

由 Josef Bacik 提交于 4月 24, 2013

We can run the tree logging recovery or the orphan cleanup on mount, so we'll
end up looking up a random fs tree in the meantime. So we need to clean this up
so we don't leave extent buffers hanging around on the cache. With this patch
we no longer leak extent buffers on failure to mount. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

171f6537

Btrfs: fix extent logging with O_DIRECT into prealloc · eb384b55

由 Josef Bacik 提交于 4月 24, 2013

This is the same as the fix from commit

Btrfs: fix bad extent logging

but for O_DIRECT.  I missed this when I fixed the problem originally, we were
still using the em for the orig_start and orig_block_len, which would be the
merged extent.  We need to use the actual extent from the on disk file extent
item, which we have to lookup to make sure it's ok to nocow anyway so just pass
in some pointers to hold this info.  Thanks,

Cc: stable@vger.kernel.org
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

eb384b55

Btrfs: fix all callers of read_tree_block · 416bc658

由 Josef Bacik 提交于 4月 23, 2013

We kept leaking extent buffers when mounting a broken file system and it turns
out it's because not everybody uses read_tree_block properly. You need to check
and make sure the extent_buffer is uptodate before you use it. This patch fixes
everybody who calls read_tree_block directly to make sure they check that it is
uptodate and free it and return an error if it is not. With this we no longer
leak EB's when things go horribly wrong. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

416bc658

Btrfs: only exclude supers in the range of our block group · 51bf5f0b

由 Josef Bacik 提交于 4月 23, 2013

If we fail to load block groups halfway through we can leave extent_state's on
the excluded tree. This is because we just lookup the supers and add them to
the excluded tree regardless of which block group we are looking at currently.
This is a problem because we remove the excluded extents for the range of the
block group only, so if we don't ever load a block group for one of the excluded
extents we won't ever free it. This fixes the problem by only adding excluded
extents if it falls in the block group range we care about. With this patch
we're no longer leaking space when we fail to read all of the block groups.
Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

51bf5f0b

Btrfs: add tree block level sanity check · 1c24c3ce

由 Josef Bacik 提交于 4月 23, 2013

With a users corrupted fs I was getting weird behavior and panics and it turns
out it was because one of his tree blocks had a bogus header level. So add this
to the sanity checks in the endio handler for tree blocks. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

1c24c3ce

Btrfs: don't try and free ebs twice in log replay · 5ec8dca7

由 Josef Bacik 提交于 4月 23, 2013

This work is done by btrfs_free_path() anyway so there's no need for this
duplicate work.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

5ec8dca7

Btrfs: don't BUG_ON() in btrfs_num_copies · fb7669b5

由 Josef Bacik 提交于 4月 23, 2013

A user sent me a btrfs-image that was panicing because of some corruption. This
is because we pass in a bogus value to btrfs_num_copies, and it panics. Instead
just return 1. We only call btrfs_num_copies to see if there are other copies
to try and read for things, so if we just return 1 it will make the callers exit
out with an appropriate error value. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

fb7669b5

Btrfs: don't call readahead hook until we have read the entire eb · 79fb65a1

由 Josef Bacik 提交于 4月 20, 2013

Martin Steigerwald reported a BUG_ON() where we were given a bogus bytenr to
map. Turns out he is using > PAGESIZE leafsizes. The readahead stuff is called
every time we do a completion, but we may not have finished reading in all the
pages, so the bytenr we read off the node could be completely bogus. Fix this
by only calling the readahead hook once all pages have been read in. Thanks,
Reported-by: NMartin Steigerwald <Martin@lichtvoll.de>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

79fb65a1

Btrfs: deal with bad mappings in btrfs_map_block · 9bb91873

由 Josef Bacik 提交于 4月 19, 2013

Martin Steigerwald reported a BUG_ON() in btrfs_map_block where we didn't find
a chunk for a particular block we were trying to map. This happened because the
block was bogus. We shouldn't be BUG_ON()'ing in this case, just print a
message and return an error. This came from reada_add_block and it appears to
deal with an error fine so we should be good there. Thanks,
Reported-by: NMartin Steigerwald <Martin@lichtvoll.de>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

9bb91873

Btrfs: use REQ_META for all metadata IO · d4c7ca86

由 Josef Bacik 提交于 4月 19, 2013

We need to tag metadata io with REQ_META to avoid priority inversion when using
io throttling cqroups.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

d4c7ca86

Btrfs: fix possible infinite loop in slow caching · 0a3896d0

由 Josef Bacik 提交于 4月 19, 2013

So I noticed there is an infinite loop in the slow caching code. If we return 1
when we hit the end of the tree, so we could end up caching the last block group
the slow way and suddenly we're looping forever because we just keep
re-searching and trying again. Fix this by only doing btrfs_next_leaf() if we
don't need_resched(). Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

0a3896d0

Btrfs: fix lockdep warning · 62dbd717

由 Josef Bacik 提交于 4月 17, 2013

The locking order for stuff is

__sb_start_write
ordered_mutex

but with sync() we don't do __sb_start_write for some strange reason, which
means that our iput in wait_ordered_extents could start a transaction which does
the __sb_start_write while we're holding the ordered_mutex.  Fix this by using
delayed iput in sync.  Thanks,
Reported-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

62dbd717