提交 · 47f08b96993831f4c51ed7cb07a86a97d4138d3f · openanolis / cloud-kernel

16 8月, 2017 7 次提交

btrfs: Enhance message when a device is missing during mount · c5502451

由 Qu Wenruo 提交于 3月 09, 2017

For a missing device, btrfs will just refuse to mount with almost
meaningless kernel message like:

 BTRFS info (device vdb6): disk space caching is enabled
 BTRFS info (device vdb6): has skinny extents
 BTRFS error (device vdb6): failed to read the system array: -5
 BTRFS error (device vdb6): open_ctree failed

This patch will print a new message about the missing device:

 BTRFS info (device vdb6): disk space caching is enabled
 BTRFS info (device vdb6): has skinny extents
 BTRFS warning (device vdb6): devid 2 uuid 80470722-cad2-4b90-b7c3-fee294552f1b is missing
 BTRFS error (device vdb6): failed to read the system array: -5
 BTRFS error (device vdb6): open_ctree failed
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c5502451

btrfs: Cleanup num_tolerated_disk_barrier_failures · bc3cce23

由 Qu Wenruo 提交于 3月 09, 2017

As we use per-chunk degradable check, the global
num_tolerated_disk_barrier_failures is of no use.

We can now remove it.
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

bc3cce23

btrfs: Introduce a function to check if all chunks a OK for degraded rw mount · 21634a19

由 Qu Wenruo 提交于 3月 09, 2017

Introduce a new function, btrfs_check_rw_degradable(), to check if all
chunks in btrfs is OK for degraded rw mount.

It provides the new basis for accurate btrfs mount/remount and even
runtime degraded mount check other than old one-size-fit-all method.

Btrfs currently uses num_tolerated_disk_barrier_failures to do global
check for tolerated missing device.

Although the one-size-fit-all solution is quite safe, it's too strict
if data and metadata has different duplication level.

For example, if one use Single data and RAID1 metadata for 2 disks, it
means any missing device will make the fs unable to be degraded
mounted.

But in fact, some times all single chunks may be in the existing
device and in that case, we should allow it to be rw degraded mounted.

Such case can be easily reproduced using the following script:
 # mkfs.btrfs -f -m raid1 -d sing /dev/sdb /dev/sdc
 # wipefs -f /dev/sdc
 # mount /dev/sdb -o degraded,rw

If using btrfs-debug-tree to check /dev/sdb, one should find that the
data chunk is only in sdb, so in fact it should allow degraded mount.

This patchset will introduce a new per-chunk degradable check for
btrfs, allow above case to succeed, and it's quite small anyway.
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ copied text from cover letter with more details about the problem being
  solved ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

21634a19

btrfs: Prevent possible ERR_PTR() dereference · 69f03f13

由 Nikolay Borisov 提交于 7月 11, 2017

In btrfs_full_stripe_len/btrfs_is_parity_mirror we have similar code which
gets the chunk map for a particular range via get_chunk_map. However,
get_chunk_map can return an ERR_PTR value and while the 2 callers do catch
this with a WARN_ON they then proceed to indiscriminately dereference the
extent map. This of course leads to a crash. Fix the offenders by making the
dereference conditional on IS_ERR.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

69f03f13

btrfs: Be explicit about usage of min() · f148ef4d

由 Nikolay Borisov 提交于 6月 27, 2017

__btrfs_alloc_chunk contains code which boils down to:

    ndevs = min(ndevs, devs_max)

It's conditional upon devs_max not being 0. However, it cannot really be 0
since it's always set to either BTRFS_MAX_DEVS_SYS_CHUNK or
BTRFS_MAX_DEVS(fs_info->chunk_root). So eliminate the condition check and use
min explicitly. This has no functional changes.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

f148ef4d

btrfs: Use explicit round_down call rather than open-coding it · e5600fd6

由 Nikolay Borisov 提交于 6月 27, 2017

No functional changes.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e5600fd6

btrfs: convert while loop to list_for_each_entry · ebcc9301

由 Nikolay Borisov 提交于 6月 27, 2017

No functional changes, just make the loop a bit more readable
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ebcc9301

24 7月, 2017 1 次提交

btrfs: round down size diff when shrinking/growing device · 0e4324a4

由 Nikolay Borisov 提交于 7月 21, 2017

Further testing showed that the fix introduced in 7dfb8be1 ("btrfs:
Round down values which are written for total_bytes_size") was
insufficient and it could still lead to discrepancies between the
total_bytes in the super block and the device total bytes. So this patch
also ensures that the difference between old/new sizes when
shrinking/growing is also rounded down. This ensure that we won't be
subtracting/adding a non-sectorsize multiples to the superblock/device
total sizees.

Fixes: 7dfb8be1 ("btrfs: Round down values which are written for total_bytes_size")
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

0e4324a4

22 6月, 2017 1 次提交

btrfs: preallocate device flush bio · e0ae9994

由 David Sterba 提交于 6月 06, 2017

For devices that support flushing, we allocate a bio, submit, wait for
it and then free it. The bio allocation does not fail so ENOMEM is not a
problem but we still may unnecessarily stress the allocation subsystem.

Instead, we can allocate the bio at the same time we allocate the device
and reuse it each time we need to flush the barriers. The bio is reset
before each use. Reference counting is simplified to just device
allocation (get) and freeing (put).

The bio used to be submitted through the integrity checker which will
find out that bio has no data attached and call submit_bio.

Status of the bio in flight needs to be tracked separately in case the
device caches get switched off between write and wait.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e0ae9994

20 6月, 2017 7 次提交

btrfs: Round down values which are written for total_bytes_size · 7dfb8be1

由 Nikolay Borisov 提交于 6月 16, 2017

We got an internal report about a file system not wanting to mount
following 99e3ecfc ("Btrfs: add more validation checks for
superblock").

BTRFS error (device sdb1): super_total_bytes 1000203816960 mismatch with
fs_devices total_rw_bytes 1000203820544

Subtracting the numbers we get a difference of less than a 4kb. Upon
closer inspection it became apparent that mkfs actually rounds down the
size of the device to a multiple of sector size. However, the same
cannot be said for various functions which modify the total size and are
called from btrfs_balance as well as when adding a new device. So this
patch ensures that values being saved into on-disk data structures are
always rounded down to a multiple of sectorsize.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

7dfb8be1

btrfs: obsolete and remove mount option alloc_start · 0d0c71b3

由 David Sterba 提交于 6月 15, 2017

The mount option alloc_start was used in the past for debugging and
stressing the chunk allocator. Not meant to be used by users, so we're
not breaking anybody's setup.

There was some added complexity handling changes of the value and when
it was not same as default. Such code has likely been untested and I
think it's better to remove it.

This patch kills all use of alloc_start, and by doing that also fixes
a bug when alloc_size is set, potentially called from statfs:

in btrfs_calc_avail_data_space, traversing the list in RCU, the RCU
protection is temporarily dropped so btrfs_account_dev_extents_size can
be called and then RCU is locked again! Doing that inside
list_for_each_entry_rcu is just asking for trouble, but unlikely to be
observed in practice.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

0d0c71b3

btrfs: use GFP_KERNEL in btrfs_init_dev_replace_tgtdev · 6165572c

由 David Sterba 提交于 6月 15, 2017

The function is called from ioctl context and we don't hold any locks
that take part in writeback. Right now it's only fs_info::volume_mutex.
Reviewed-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

6165572c

btrfs: sink gfp parameter to btrfs_bio_clone · 8b6c1d56

由 David Sterba 提交于 6月 02, 2017

All callers pass GFP_NOFS.
Reviewed-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

8b6c1d56

btrfs: btrfs_bio_clone never fails, skip error handling · 3aa8e074

由 David Sterba 提交于 6月 02, 2017

Update direct callers of btrfs_bio_clone that do error handling, that we
can now remove.
Reviewed-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

3aa8e074

btrfs: cleanup root usage by btrfs_get_alloc_profile · 1b86826d

由 Jeff Mahoney 提交于 5月 17, 2017

There are two places where we don't already know what kind of alloc
profile we need before calling btrfs_get_alloc_profile, but we need
access to a root everywhere we call it.

This patch adds helpers for btrfs_{data,metadata,system}_alloc_profile()
and relegates btrfs_system_alloc_profile to a static for use in those
two cases.  The next patch will eliminate one of those.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

1b86826d

btrfs: Convert fs_info->free_chunk_space to atomic64_t · a5ed45f8

由 Nikolay Borisov 提交于 5月 11, 2017

The ->free_chunk_space variable is used to track the unallocated space
and access to it is protected by a spinlock, which is not used for
anything else.  Make the code a bit self-explanatory by switching the
variable to an atomic64_t type and kill the spinlock.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
[ not a performance critical code, use of atomic type is ok ]
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

a5ed45f8

09 6月, 2017 1 次提交

block: switch bios to blk_status_t · 4e4cbee9

由 Christoph Hellwig 提交于 6月 03, 2017

Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

4e4cbee9

05 5月, 2017 1 次提交

btrfs: fix the gfp_mask for the reada_zones radix tree · 9bcaaea7

由 Chris Mason 提交于 5月 04, 2017

Commits cc8385b5 and 7ef70b4d added preallocation for the
reada radix trees and also switched them over to GFP_KERNEL for the
default gfp mask.

Since we're doing radix tree insertions under spinlocks, we need
to make sure the mask doesn't allow sleeping.  This fix keeps
the radix preallocation but switches back to the original gfp_mask.
Reported-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

9bcaaea7

18 4月, 2017 15 次提交

btrfs: use q which is already obtained from bdev_get_queue · e884f4f0

由 Anand Jain 提交于 4月 04, 2017

We have already assigned q from bdev_get_queue() so use it.
And rearrange the code for better view.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e884f4f0

Btrfs: switch to div64_u64 if with a u64 divisor · 42c61ab6

由 Liu Bo 提交于 4月 03, 2017

This is fixing code pieces where we use div_u64 when passing a u64 divisor.

Cc: David Sterba <dsterba@suse.cz>
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

42c61ab6

D
btrfs: drop redundant parameters from btrfs_map_sblock · 825ad4c9
由 David Sterba 提交于 3月 28, 2017
```
All callers pass 0 for mirror_num and 1 for need_raid_map.
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
825ad4c9

btrfs: track exclusive filesystem operation in flags · 171938e5

由 David Sterba 提交于 3月 28, 2017

There are several operations, usually started from ioctls, that cannot
run concurrently. The status is tracked in
mutually_exclusive_operation_running as an atomic_t. We can easily track
the status as one of the per-filesystem flag bits with same
synchronization guarantees.

The conversion replaces:

* atomic_xchg(..., 1)    ->   test_and_set_bit(FLAG, ...)
* atomic_set(..., 0)     ->   clear_bit(FLAG, ...)
Reviewed-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

171938e5

btrfs: preallocate radix tree node for readahead · cc8385b5

由 David Sterba 提交于 3月 02, 2017

We can preallocate the node so insertion does not have to do that under
the lock. The GFP flags for the per-device radix tree are initialized to
 GFP_NOFS & ~__GFP_DIRECT_RECLAIM
but we can use GFP_KERNEL, same as an allocation above anyway, but also
because readahead is optional and not on any critical writeout path.
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

cc8385b5

Btrfs: convert BUG_ON to WARN_ON · 539b50d2

由 Liu Bo 提交于 3月 14, 2017

These two BUG_ON()s would never be true, ensured by callers' logic.
Reviewed-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

539b50d2

Btrfs: helper for ops that requires full stripe · 2b19a1fe

由 Liu Bo 提交于 3月 14, 2017

This adds a helper to show directly whether ops require full stripe.
Reviewed-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

2b19a1fe

Btrfs: do not add extra mirror when dev_replace target dev is not available · 6fad823f

由 Liu Bo 提交于 3月 14, 2017

With this, we can avoid allocating memory for dev replace copies if the
target dev is not available.
Reviewed-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

6fad823f

Btrfs: handle operations for device replace separately · 73c0f228

由 Liu Bo 提交于 3月 14, 2017

Since this part is mostly independent, this moves it to a separate
function.
Reviewed-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

73c0f228

Btrfs: introduce a function to get extra mirror from replace · 5ab56090

由 Liu Bo 提交于 3月 14, 2017

As the part of getting extra mirror in __btrfs_map_block is
independent, this puts it into a separate function.
Reviewed-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

5ab56090

Btrfs: separate DISCARD from __btrfs_map_block · 0b3d4cd3

由 Liu Bo 提交于 3月 14, 2017

Since DISCARD is not as important as an operation like write, we don't
copy it to target device during replace, and it makes __btrfs_map_block
less complex.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

0b3d4cd3

Btrfs: create a helper for getting chunk map · 592d92ee

由 Liu Bo 提交于 3月 14, 2017

We have similar code here and there, this merges them into a helper.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

592d92ee

btrfs: convert extent_map.refs from atomic_t to refcount_t · 490b54d6

由 Elena Reshetova 提交于 3月 03, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

490b54d6

btrfs: convert btrfs_bio.refs from atomic_t to refcount_t · 140475ae

由 Elena Reshetova 提交于 3月 03, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

140475ae

btrfs: fix a bogus warning when converting only data or metadata · 14506127

由 Adam Borowski 提交于 3月 07, 2017

If your filesystem has, eg, data:raid0 metadata:raid1, and you run "btrfs
balance -dconvert=raid1", the meta.target field will be uninitialized.
That's otherwise ok, as it's unused except for this warning.

Thus, let's use the existing set of raid levels for the comparison.

As a side effect, non-convert balances will now nag about data>metadata.
Signed-off-by: NAdam Borowski <kilobyte@angband.pl>
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

14506127

12 4月, 2017 1 次提交

Btrfs: fix potential use-after-free for cloned bio · a967efb3

由 Liu Bo 提交于 4月 10, 2017

KASAN reports that there is a use-after-free case of bio in btrfs_map_bio.

If we need to submit IOs to several disks at a time, the original bio
would get cloned and mapped to the destination disk, but we really should
use the original bio instead of a cloned bio to do the sanity check
because cloned bios are likely to be freed by its endio.
Reported-by: NDiego <diegocg@gmail.com>
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

a967efb3

28 2月, 2017 2 次提交
- D
  btrfs: handle allocation error in update_dev_stat_item · fa252992
  由 David Sterba 提交于 2月 15, 2017
```
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
  fa252992
- D
  btrfs: constify device path passed to relevant helpers · da353f6b
  由 David Sterba 提交于 2月 14, 2017
```
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
  da353f6b
17 2月, 2017 2 次提交

btrfs: remove unused parameter from init_first_rw_device · e4a4dce7

由 David Sterba 提交于 2月 10, 2017

The 'device' used to be added in that function, but now it's done by the
caller.
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e4a4dce7

btrfs: remove unused parameter from __btrfs_alloc_chunk · 72b468c8

由 David Sterba 提交于 2月 10, 2017

We grab fs_info from other parameters.
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

72b468c8

02 2月, 2017 1 次提交

block: Get rid of blk_get_backing_dev_info() · efa7c9f9

由 Jan Kara 提交于 2月 02, 2017

blk_get_backing_dev_info() is now a simple dereference. Remove that
function and simplify some code around that.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@fb.com>

efa7c9f9

06 12月, 2016 1 次提交

btrfs: opencode chunk locking, remove helpers · 34441361

由 David Sterba 提交于 10月 04, 2016

The helpers are trivial and we don't use them consistently.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

34441361

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功