提交 · c995ab3cda3f4178c1f1a47926bea5f8372880cb · openanolis / cloud-kernel

02 11月, 2017 5 次提交

btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents · c995ab3c

由 Zygo Blaxell 提交于 9月 22, 2017

The LOGICAL_INO ioctl provides a backward mapping from extent bytenr and
offset (encoded as a single logical address) to a list of extent refs.
LOGICAL_INO complements TREE_SEARCH, which provides the forward mapping
(extent ref -> extent bytenr and offset, or logical address).  These are
useful capabilities for programs that manipulate extents and extent
references from userspace (e.g. dedup and defrag utilities).

When the extents are uncompressed (and not encrypted and not other),
check_extent_in_eb performs filtering of the extent refs to remove any
extent refs which do not contain the same extent offset as the 'logical'
parameter's extent offset.  This prevents LOGICAL_INO from returning
references to more than a single block.

To find the set of extent references to an uncompressed extent from [a, b),
userspace has to run a loop like this pseudocode:

	for (i = a; i < b; ++i)
		extent_ref_set += LOGICAL_INO(i);

At each iteration of the loop (up to 32768 iterations for a 128M extent),
data we are interested in is collected in the kernel, then deleted by
the filter in check_extent_in_eb.

When the extents are compressed (or encrypted or other), the 'logical'
parameter must be an extent bytenr (the 'a' parameter in the loop).
No filtering by extent offset is done (or possible?) so the result is
the complete set of extent refs for the entire extent.  This removes
the need for the loop, since we get all the extent refs in one call.

Add an 'ignore_offset' argument to iterate_inodes_from_logical,
[...several levels of function call graph...], and check_extent_in_eb, so
that we can disable the extent offset filtering for uncompressed extents.
This flag can be set by an improved version of the LOGICAL_INO ioctl to
get either behavior as desired.

There is no functional change in this patch.  The new flag is always
false.
Signed-off-by: NZygo Blaxell <ce3g8jdj@umail.furryterror.org>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ minor coding style fixes ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c995ab3c

btrfs: send: remove unused code · eb7b9d6a

由 Nikolay Borisov 提交于 10月 16, 2017

This code was first introduced in 31db9f7c ("Btrfs: introduce
BTRFS_IOC_SEND for btrfs send/receive") and it was not functional, then
it got slightly refactored in e938c8ad ("Btrfs: code cleanups for
send/receive"), alas it was still dead. So let's remove it for good!
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

eb7b9d6a

btrfs: remove BUG_ON in btrfs_rm_dev_replace_free_srcdev() · 6dd38f81

由 Anand Jain 提交于 10月 17, 2017

That was only an extra check to tackle a few bugs around this area, now
its safe to remove it.  Replace it by an ASSERT.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

6dd38f81

btrfs: allow setting zlib compression level via :9 · fa4d885a

由 Adam Borowski 提交于 9月 15, 2017

This is bikeshedding, but it seems people are drastically more likely to
understand "zlib:9" as compression level rather than an algorithm
version compared to "zlib9".

Based on feedback on the mailinglist, the ":9" will be the only accepted
syntax. The level must be a single digit. Unrecognized format will
result to the default, for forward compatibility in a similar way the
compression algorithm specifier was relaxed in commit
a7164fa4 ("btrfs: prepare for extensions in compression
options").
Signed-off-by: NAdam Borowski <kilobyte@angband.pl>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ tighten the accepted format ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

fa4d885a

btrfs: allow to set compression level for zlib · f51d2b59

由 David Sterba 提交于 9月 15, 2017

Preliminary support for setting compression level for zlib, the
following works:

$ mount -o compess=zlib                 # default
$ mount -o compess=zlib0                # same
$ mount -o compess=zlib9                # level 9, slower sync, less data
$ mount -o compess=zlib1                # level 1, faster sync, more data
$ mount -o remount,compress=zlib3	# level set by remount

The compress-force works the same as compress'.  The level is visible in
the same format in /proc/mounts. Level set via file property does not
work yet.

Required patch: "btrfs: prepare for extensions in compression options"
Signed-off-by: NDavid Sterba <dsterba@suse.com>

f51d2b59

30 10月, 2017 35 次提交

btrfs: Replace opencoded sizes with their symbolic constants · d4417e22

由 Nikolay Borisov 提交于 10月 16, 2017

Currently btrfs' code uses a mix of opencoded sizes and defines from sizes.h.
Let's unifiy the code base to always use the symbolic constants. No functional
changes
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

d4417e22

btrfs: Use bd_dev to generate index when dev_state_hashtable add items. · 859a58a2

由 Gu JinXiang 提交于 10月 11, 2017

Fix missing change from commit f8f84b2d
("btrfs: index check-integrity state hash by a dev_t").

Function btrfsic_dev_state_hashtable_lookup uses dev_t to generate hashval
when look in up a btrfsic_dev_state in hash table. So when we add a
btrfsic_dev_state into the hash table, it should also use dev_t.

Reproducer of this bug:
Use MOUNT_OPTIONS="-o check_int" when running xfstest, device can not be
mounted successfully. So xfstest can not run.
Signed-off-by: NGu JinXiang <gujx@cn.fujitsu.com>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

859a58a2

btrfs: fix false EIO for missing device · 102ed2c5

由 Anand Jain 提交于 10月 14, 2017

When one of the device is missing, bbio_error() takes care of setting
the error status. And if its only IO that is pending in that stripe, it
fails to check the status of the other IO at %bbio_error before setting
the error %bi_status for the %orig_bio. Fix this by checking if
%bbio->error has exceeded the %bbio->max_errors.

Reproducer as below fdatasync error is seen intermittently.

 mount -o degraded /dev/sdc /btrfs
 dd status=none if=/dev/zero of=$(mktemp /btrfs/XXX) bs=4096 count=1 conv=fdatasync

 dd: fdatasync failed for ‘/btrfs/LSe’: Input/output error

 The reason for the intermittences of the problem is because
 the following conditions have to be met, which depends on timing:
 In btrfs_map_bio()
  - the RAID1 the missing device has to be at %dev_nr = 1
 In bbio_error()
  . before bbio_error() is called the bio of the not-missing
    device at %dev_nr = 0 must be completed so that the below
    condition is true
     if (atomic_dec_and_test(&bbio->stripes_pending)) {
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

102ed2c5

btrfs: use need_full_stripe() in __btrfs_map_block() · de483734

由 Anand Jain 提交于 10月 12, 2017

A cleanup patch, use need_full_stripe() to replace the open code.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NQu Wenruo <quwenruo.btrfs@gmx.com>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

de483734

btrfs: cleanup extent locking sequence · 79f015f2

由 Goldwyn Rodrigues 提交于 10月 16, 2017

Code cleanup for better understanding:
Variable needs_unlock to be called extent_locked to show state as
opposed to action. Changed the type to int, to reduce code in the
critical path.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

79f015f2

btrfs: use BLK_STS defines where needed · 2dbe0c77

由 Anand Jain 提交于 10月 14, 2017

At few places we could use BLK_STS_OK and BLK_STS_NOSUPP.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NSatoru Taekeuchi <satoru.takeuchi@gmail.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ dropped first hunk btrfs_endio_direct_read ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

2dbe0c77

btrfs: add assertions for releasing trans handle reservations · bf2681cb

由 Josef Bacik 提交于 9月 29, 2017

These are useful for debugging problems where we mess with
trans->block_rsv to make sure we're not screwing something up.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

bf2681cb

btrfs: remove type argument from comp_tree_refs · 3b60d436

由 Josef Bacik 提交于 9月 29, 2017

We can get this from the ref we've passed in.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

3b60d436

btrfs: remove delayed_ref_node from ref_head · d278850e

由 Josef Bacik 提交于 9月 29, 2017

This is just excessive information in the ref_head, and makes the code
complicated.  It is a relic from when we had the heads and the refs in
the same tree, which is no longer the case.  With this removal I've
cleaned up a bunch of the cruft around this old assumption as well.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

d278850e

btrfs: move all ref head cleanup to the helper function · c1103f7a

由 Josef Bacik 提交于 9月 29, 2017

We do a couple different cleanup operations on the ref head.  We adjust
counters, we'll free any reserved space if we didn't end up using the
ref, and we clear the pending csum bytes.  Move all these disparate
things into cleanup_ref_head and clean up the logic in
__btrfs_run_delayed_refs so that it handles the !ref case a lot cleaner,
as well as making run_one_delayed_ref() only deal with real refs and not
the ref head.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c1103f7a

btrfs: move ref_mod modification into the if (ref) logic · 1ce7a5ec

由 Josef Bacik 提交于 9月 29, 2017

We only use this logic if our ref isn't a ref_head, so move it up into
the if (ref) case since we know that this is a normal ref and not a
delayed ref head.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

1ce7a5ec

btrfs: breakout empty head cleanup to a helper · 194ab0bc

由 Josef Bacik 提交于 9月 29, 2017

Move this code out to a helper function to further simplivy
__btrfs_run_delayed_refs.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

194ab0bc

btrfs: move extent_op cleanup to a helper · b00e6250

由 Josef Bacik 提交于 9月 29, 2017

Move the extent_op cleanup for an empty head ref to a helper function to
help simplify __btrfs_run_delayed_refs.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

b00e6250

btrfs: add a helper to return a head ref · 2eadaa22

由 Josef Bacik 提交于 9月 29, 2017

Simplify the error handling in __btrfs_run_delayed_refs by breaking out
the code used to return a head back to the delayed_refs tree for
processing into a helper function.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

2eadaa22

Btrfs: only check delayed ref usage in should_end_transaction · 7c777430

由 Josef Bacik 提交于 9月 29, 2017

We were only doing btrfs_check_space_for_delayed_refs() if the metadata
space was full, ie we couldn't allocate chunks.  This assumes we'll be
able to allocate chunks during transaction commit, but since nothing
does a LIMIT flush during the transaction commit this won't actually
happen unless we happen to run shy of actual space.  We already take
into account a full fs in btrfs_check_space_for_delayed_refs() so just
kill this extra check to make sure we're ending the transaction when we
need to.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

7c777430

Btrfs: add a extent ref verify tool · fd708b81

由 Josef Bacik 提交于 9月 29, 2017

We were having corruption issues that were tied back to problems with
the extent tree.  In order to track them down I built this tool to try
and find the culprit, which was pretty successful.  If you compile with
this tool on it will live verify every ref update that the fs makes and
make sure it is consistent and valid.  I've run this through with
xfstests and haven't gotten any false positives.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ update error messages, add fixup from Dan Carpenter to handle errors
  of read_tree_block ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

fd708b81

btrfs: pass root to various extent ref mod functions · 84f7d8e6

由 Josef Bacik 提交于 9月 29, 2017

We need the actual root for the ref verifier tool to work, so change
these functions to pass the root around instead.  This will be used in
a subsequent patch.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

84f7d8e6

btrfs: add ref-verify mount option · fb592373

由 Josef Bacik 提交于 9月 29, 2017

This adds the infrastructure for turning ref verify on and off for a
mount, to be used by a later patch.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ enhnance btrfs_print_mod_info to print if ref-verify is compiled in ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

fb592373

btrfs: get rid of sector_t and use u64 offset in submit_extent_page · 6273b7f8

由 David Sterba 提交于 10月 04, 2017

The use of sector_t in the callchain of submit_extent_page is not
necessary.  Switch to u64 and rename the variable and use byte units
instead of 512b, ie.  dropping the >> 9 shifts and avoiding the
con(tro)versions of sector_t.
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

6273b7f8

btrfs: rename page offset parameter in submit_extent_page · 6c5a4e2c

由 David Sterba 提交于 10月 04, 2017

We're going to remove sector_t and will use 'offset', so this patch
frees the name.
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

6c5a4e2c

btrfs: scrub: get rid of sector_t · 6aa21263

由 David Sterba 提交于 10月 04, 2017

The use of sector_t is not necessry, it's just for a warning.  Switch to
u64 and rename the variable and use byte units instead of 512b, ie.
dropping the >> 9 shifts.  The messages are adjusted as well.
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

6aa21263

btrfs: fix send ioctl on 32bit with 64bit kernel · 2351f431

由 Josef Bacik 提交于 9月 27, 2017

We pass in a pointer in our send arg struct, this means the struct size
doesn't match with 32bit user space and 64bit kernel space.  Fix this by
adding a compat mode and doing the appropriate conversion.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ move structure to the beginning, next to receive 32bit compat ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

2351f431

btrfs: fix use of error or warning for missing device · 2b902dfc

由 Anand Jain 提交于 10月 09, 2017

When device is missing without the -o degraded option then its an error
so report it as an error instead of a warning.  And when -o degraded
option is provided, log the missing device as warning.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ switch error to bool ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

2b902dfc

A
btrfs: declare btrfs_report_missing_device() static · 5a2b8e60
由 Anand Jain 提交于 10月 09, 2017
```
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
5a2b8e60

btrfs: fix EIO misuse to report missing degraded option · 45dbdbc9

由 Anand Jain 提交于 10月 09, 2017

EIO is only for the IO failure to the device, avoid it. Use ENOENT as
that's the closest error code describing what happened.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

45dbdbc9

btrfs: add_missing_dev() should return the actual error · adfb69af

由 Anand Jain 提交于 10月 11, 2017

add_missing_dev() can return device pointer so that IS_ERR/PTR_ERR can
be used to check for the actual error that occurred in the function.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
[ minor error message adjustment ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

adfb69af

btrfs: Clean up unused variables in free-space-tree.c · 9e882d6d

由 Christos Gkekas 提交于 10月 12, 2017

Remove variables 'start' and 'end', which are set but never used.
Signed-off-by: NChristos Gkekas <chris.gekas@gmail.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

9e882d6d

btrfs: tree-checker: use %zu format string for size_t · 709a95c3

由 Arnd Bergmann 提交于 10月 13, 2017

We now get a harmless compile-time on 32-bit architectures:

fs/btrfs/tree-checker.c: In function 'check_extent_data_item':
fs/btrfs/tree-checker.c:189:70: error: format '%lu' expects argument of type 'long unsigned int', but argument 6 has type 'unsigned int' [-Werror=format=]

This changes the format string to use %zu instead of %lu for size_t.

Fixes: c1f6520bf360 ("btrfs: tree-checker: Enhance output for check_extent_data_item")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

709a95c3

Btrfs: remove nr_async_submits and async_submit_draining · 736cd52e

由 Liu Bo 提交于 9月 07, 2017

Now that we have the combo of flushing twice, which can make sure IO
have started since the second flush will wait for page lock which
won't be unlocked unless setting page writeback and queuing ordered
extents, we don't need %async_submit_draining, %async_delalloc_pages
and %nr_async_submits to tell whether the IO has actually started.

Moreover, all the flushers in use are followed by functions that wait
for ordered extents to complete, so %nr_async_submits, which tracks
whether bio's async submit has made progress, doesn't really make
sense.

However, %async_delalloc_pages is still required by shrink_delalloc()
as that function doesn't flush twice in the normal case (just issues a
writeback with WB_REASON_FS_FREE_SPACE).
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

736cd52e

Btrfs: do not make defrag wait on async_delalloc_pages · 80e03a2c

由 Liu Bo 提交于 9月 07, 2017

By setting compression for a defrag task, the task will start IO at
the end of defrag.

After the combo of filemap_flush(), we've already made sure that
dirty pages have made progress via async compress thread because the
second filemap_flush() will wait for page lock, which won't be
unlocked until those pages have been marked as writeback and ordered
extents have been queued.

And this is for per-inode defrag, it's not helpful to wait on a global
%async_delalloc_pages and %nr_async_submits from fs_info.

Although waiting on %nr_async_submits means that all bios are
submitted down to per-device schedule IO lists, it doesn't wait for
their completions, thus users still need to do fsync/sync to make sure
the data is on disk.  While with this change, it makes sure that pages
are marked with writeback bits and will be submitted asynchronously
shortly, therefore, the behavior of defrag option '-c' remains unchanged.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

80e03a2c

Btrfs: remove nr_async_bios · f851689b

由 Liu Bo 提交于 9月 07, 2017

This was intended to congest higher layers to not send bios, but as

1) the congested bit has been taken by writeback

Async bios come from buffered writes and DIO writes.

For DIO writes, we want to submit them ASAP, while for buffered writes,
writeback uses balance_dirty_pages() to throttle how much dirty pages we
can have.

2) and no one is waiting for %nr_async_bios down to zero,

Historically, it was introduced along with changes which let
checksumming workload spread accross different cpus.  And at that time,
pdflush was used instead of per-bdi flushing, perhaps pdflush did not
have the necessary information for writeback to do throttling.

We can safely remove them now.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
[ additional explanation from mails, removed unused variable 'limit' ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

f851689b

btrfs: tree-checker: Enhance output for check_extent_data_item · 8806d718

由 Qu Wenruo 提交于 10月 09, 2017

Output the invalid member name and its bad value, along with its
expected value range or alignment.
Signed-off-by: NQu Wenruo <quwenruo.btrfs@gmx.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

8806d718

btrfs: tree-checker: Enhance output for check_csum_item · d508c5f0

由 Qu Wenruo 提交于 10月 09, 2017

Output the bad value and expected good value (or its alignment).
Signed-off-by: NQu Wenruo <quwenruo.btrfs@gmx.com>
[ unindent long strings ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

d508c5f0

btrfs: tree-checker: Enhance output for btrfs_check_leaf · 478d01b3

由 Qu Wenruo 提交于 10月 09, 2017

Enhance the output to print:
1) the eason
2) the ad value, if reason is not sufficient
3) good value (range)
Signed-off-by: NQu Wenruo <quwenruo.btrfs@gmx.com>
[ wording, unidented long strings ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

478d01b3

btrfs: tree-checker: Enhance btrfs_check_node output · bba4f298

由 Qu Wenruo 提交于 10月 09, 2017

Use inline function to replace macro since we don't need
stringification.
(Macro still exists until all callers get updated)

And add more info about the error, and replace EIO with EUCLEAN.

For nr_items error, report if it's too large or too small, and output
the valid value range.

For node block pointer, added a new alignment checker.

For key order, also output the next key to make the problem more
obvious.
Signed-off-by: NQu Wenruo <quwenruo.btrfs@gmx.com>
[ wording adjustments, unindented long strings ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

bba4f298

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功