提交 · dccdb07bc996e9c8de80d06813163ca08288bf73 · openanolis / cloud-kernel

29 5月, 2018 4 次提交

btrfs: kill btrfs_fs_info::volume_mutex · dccdb07b

由 David Sterba 提交于 3月 21, 2018

Mutual exclusion of device add/rm and balance was done by the volume
mutex up to version 3.7. The commit 5ac00add ("Btrfs: disallow
mutually exclusive admin operations from user mode") added a bit that
essentially tracked the same information.

The status bit has an advantage over a mutex that it can be set without
restrictions of function context, so it started to be used in the
mount-time resuming of balance or device replace.

But we don't really need to track the same information in two ways.

1) After the previous cleanups, the main ioctl handlers for
   add/del/resize copy the EXCL_OP bit next to the volume mutex, here
   it's clearly safe.

2) Resuming balance during mount or after rw remount will set only the
   EXCL_OP bit and the volume_mutex is held in the kernel thread that
   calls btrfs_balance.

3) Resuming device replace during mount or after rw remount is done
   after balance and is excluded by the EXCL_OP bit. It does not take
   the volume_mutex at all and completely relies on the EXCL_OP bit.

4) The resuming of balance and dev-replace cannot hapen at the same time
   as the ioctls cannot be started in parallel. Nevertheless, a crafted
   image could trigger that and a warning is printed.

5) Balance is normally excluded by EXCL_OP and also uses own mutex to
   protect against concurrent access to its status data. There's some
   trickery to maintain the right lock nesting in case we need to
   reexamine the status in btrfs_ioctl_balance. The volume_mutex is
   removed and the unlock/lock sequence is left in place as we might
   expect other waiters to proceed.

6) Similar to 5, the unlock/lock sequence is kept in
   btrfs_cancel_balance to allow waiters to continue.
Reviewed-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

dccdb07b

btrfs: Remove btrfs_wait_and_free_delalloc_work · 40012f96

由 Nikolay Borisov 提交于 4月 19, 2018

This function is called from only 1 place and is effectively a wrapper
over wait_completion/kfree. It doesn't really bring any value having
those two calls in a separate function. Just open code it and remove it.
No functional changes.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

40012f96

btrfs: Factor out the main deletion process from btrfs_ioctl_snap_destroy() · f60a2364

由 Misono Tomohiro 提交于 4月 18, 2018

Factor out the second half of btrfs_ioctl_snap_destroy() as
btrfs_delete_subvolume(), which performs some subvolume specific checks
before deletion:

1. send is not in progress
2. the subvolume is not the default subvolume
3. the subvolume does not contain other subvolumes

and actual deletion process. btrfs_delete_subvolume() requires
inode_lock for both @dir and inode of @dentry. The remaining part of
btrfs_ioctl_snap_destroy() is mainly permission checks.

Note that call of d_delete() is not included in btrfs_delete_subvolume()
as this function will also be used by btrfs_rmdir() to delete an empty
subvolume and in that case d_delete() is called in VFS layer.

As a result, btrfs_unlink_subvol() and may_destroy_subvol()
become static functions. No functional changes.
Signed-off-by: NTomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ minor comment updates ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

f60a2364

btrfs: Move may_destroy_subvol() from ioctl.c to inode.c · ec42f167

由 Misono Tomohiro 提交于 4月 18, 2018

This is a preparation work to refactor btrfs_ioctl_snap_destroy()
and to allow rmdir(2) to delete an empty subvolume.
Signed-off-by: NTomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ minor update of the function comment ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ec42f167

28 5月, 2018 1 次提交

btrfs: rename btrfs_get_block_group_info and make it static · c065f5b1

由 Su Yue 提交于 4月 02, 2018

The function btrfs_get_block_group_info() was introduced by the
commit 5af3e8cc ("Btrfs: make filesystem read-only when submitting
 barrier fails") which used it in disk-io.c.

However, the function is only called in ioctl.c now.
Its parameter type btrfs_ioctl_space_info* is only for ioctl.

So, make it static and rename it to be original name
get_block_group_info.

No functional change.
Signed-off-by: NSu Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c065f5b1

17 5月, 2018 1 次提交

btrfs: Split btrfs_del_delalloc_inode into 2 functions · 2b877331

由 Nikolay Borisov 提交于 4月 27, 2018

This is in preparation of fixing delalloc inodes leakage on transaction
abort. Also export the new function.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Reviewed-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

2b877331

18 4月, 2018 2 次提交

btrfs: qgroup: Use independent and accurate per inode qgroup rsv · ff6bc37e

由 Qu Wenruo 提交于 12月 21, 2017

Unlike reservation calculation used in inode rsv for metadata, qgroup
doesn't really need to care about things like csum size or extent usage
for the whole tree COW.

Qgroups care more about net change of the extent usage.
That's to say, if we're going to insert one file extent, it will mostly
find its place in COWed tree block, leaving no change in extent usage.
Or causing a leaf split, resulting in one new net extent and increasing
qgroup number by nodesize.
Or in an even more rare case, increase the tree level, increasing qgroup
number by 2 * nodesize.

So here instead of using the complicated calculation for extent
allocator, which cares more about accuracy and no error, qgroup doesn't
need that over-estimated reservation.

This patch will maintain 2 new members in btrfs_block_rsv structure for
qgroup, using much smaller calculation for qgroup rsv, reducing false
EDQUOT.
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NQu Wenruo <wqu@suse.com>

ff6bc37e

btrfs: qgroup: Commit transaction in advance to reduce early EDQUOT · a514d638

由 Qu Wenruo 提交于 12月 22, 2017

Unlike previous method that tries to commit transaction inside
qgroup_reserve(), this time we will try to commit transaction using
fs_info->transaction_kthread to avoid nested transaction and no need to
worry about locking context.

Since it's an asynchronous function call and we won't wait for
transaction commit, unlike previous method, we must call it before we
hit the qgroup limit.

So this patch will use the ratio and size of qgroup meta_pertrans
reservation as indicator to check if we should trigger a transaction
commit.  (meta_prealloc won't be cleaned in transaction committ, it's
useless anyway)
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

a514d638

12 4月, 2018 1 次提交

btrfs: replace GPL boilerplate by SPDX -- headers · 9888c340

由 David Sterba 提交于 4月 03, 2018

Remove GPL boilerplate text (long, short, one-line) and keep the rest,
ie. personal, company or original source copyright statements. Add the
SPDX header.

Unify the include protection macros to match the file names.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

9888c340

31 3月, 2018 12 次提交

btrfs: qgroup: Use root::qgroup_meta_rsv_* to record qgroup meta reserved space · 8287475a

由 Qu Wenruo 提交于 12月 12, 2017

For quota disabled->enable case, it's possible that at reservation time
quota was not enabled so no bytes were really reserved, while at release
time, quota was enabled so we will try to release some bytes we didn't
really own.

Such situation can cause metadata reserveation underflow, for both types,
also less possible for per-trans type since quota enable will commit
transaction.

To address this, record qgroup meta reserved bytes into
root::qgroup_meta_rsv_pertrans and ::prealloc.
So at releasing time we won't free any bytes we didn't reserve.

For DATA, it's already handled by io_tree, so nothing needs to be done
there.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

8287475a

btrfs: qgroup: Use separate meta reservation type for delalloc · 43b18595

由 Qu Wenruo 提交于 12月 12, 2017

Before this patch, btrfs qgroup is mixing per-transcation meta rsv with
preallocated meta rsv, making it quite easy to underflow qgroup meta
reservation.

Since we have the new qgroup meta rsv types, apply it to delalloc
reservation.

Now for delalloc, most of its reserved space will use META_PREALLOC qgroup
rsv type.

And for callers reducing outstanding extent like btrfs_finish_ordered_io(),
they will convert corresponding META_PREALLOC reservation to
META_PERTRANS.

This is mainly due to the fact that current qgroup numbers will only be
updated in btrfs_commit_transaction(), that's to say if we don't keep
such placeholder reservation, we can exceed qgroup limitation.

And for callers freeing outstanding extent in error handler, we will
just free META_PREALLOC bytes.

This behavior makes callers of btrfs_qgroup_release_meta() or
btrfs_qgroup_convert_meta() to be aware of which type they are.
So in this patch, btrfs_delalloc_release_metadata() and its callers get
an extra parameter to info qgroup to do correct meta convert/release.

The good news is, even we use the wrong type (convert or free), it won't
cause obvious bug, as prealloc type is always in good shape, and the
type only affects how per-trans meta is increased or not.

So the worst case will be at most metadata limitation can be sometimes
exceeded (no convert at all) or metadata limitation is reached too soon
(no free at all).
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

43b18595

btrfs: qgroup: Don't use root->qgroup_meta_rsv for qgroup · e1211d0e

由 Qu Wenruo 提交于 12月 12, 2017

Since qgroup has seperate metadata reservation types now, we can
completely get rid of the old root->qgroup_meta_rsv, which mostly acts
as current META_PERTRANS reservation type.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e1211d0e

btrfs: ctree.h: Fix wrong comment position about csum size · 4408ea7c

由 Misono, Tomohiro 提交于 3月 20, 2018

Signed-off-by: NTomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

4408ea7c

btrfs: defer adding raid type kobject until after chunk relocation · 75cb379d

由 Jeff Mahoney 提交于 3月 20, 2018

Any time the first block group of a new type is created, we add a new
kobject to sysfs to hold the attributes for that type.  Kobject-internal
allocations always use GFP_KERNEL, making them prone to fs-reclaim races.
While it appears as if this can occur any time a block group is created,
the only times the first block group of a new type can be created in
memory is at mount and when we create the first new block group during
raid conversion.

This patch adds a new list to track pending kobject additions and then
handles them after we do chunk relocation.  Between relocating the
target chunk (or forcing allocation of a new chunk in the case of data)
and removing the old chunk, we're in a safe place for fs-reclaim to
occur.  We're holding the volume mutex, which is already held across
page faults, and the delete_unused_bgs_mutex, which will only stall
the cleaner thread.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

75cb379d

btrfs: remove dead create_space_info calls · dc2d3005

由 Jeff Mahoney 提交于 3月 20, 2018

Since commit 2be12ef7 (btrfs: Separate space_info create/update), we've
separated out the creation and updating of the space info structures.
That commit was a straightforward refactoring of the two parts of
update_space_info, but we can go a step further.  Since commits
c59021f8 (Btrfs: fix OOPS of empty filesystem after balance) and
b742bb82 (Btrfs: Link block groups of different raid types), we know
that the space_info structures will be created at mount and there will
only ever be, at most, three of them.

This patch cleans out the create_space_info calls after __find_space_info
returns NULL since __find_space_info *can't* return NULL.

The initial cause for reviewing this was the kobject_add calls from
create_space_info occuring in sites where fs-reclaim wasn't allowed.  Now
we are certain they occur only early in the mount process and are safe.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

dc2d3005

btrfs: Drop fs_info parameter from btrfs_finish_extent_commit · 5ead2dd0

由 Nikolay Borisov 提交于 3月 15, 2018

It's provided by the transaction handle.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

5ead2dd0

btrfs: drop fs_info parameter from btrfs_run_delayed_refs · c79a70b1

由 Nikolay Borisov 提交于 3月 15, 2018

It's provided by the transaction handle.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c79a70b1

btrfs: Remove btrfs_fs_info::open_ioctl_trans · 92e2f7e3

由 Nikolay Borisov 提交于 2月 05, 2018

Since userspace transaction have been removed we no longer have use
for this field so delete it.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

92e2f7e3

btrfs: Remove btrfs_file_private::trans · 859e682d

由 Nikolay Borisov 提交于 2月 05, 2018

Now that the userspace transaction IOCTL have been removed, this member
is no longer used so just remove it
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

859e682d

btrfs: Remove userspace transaction ioctls · 7a5a07a8

由 Nikolay Borisov 提交于 2月 05, 2018

Commit 3558d4f8 ("btrfs: Deprecate userspace transaction ioctls")
marked the beginning of the end of userspace transaction. This commit
finishes the job! There are no known users and ceph does not use the
ioctl anymore.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Acked-by: NSage Weil <sage@redhat.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

7a5a07a8

btrfs: add define for oldest generation · 7c829b72

由 Anand Jain 提交于 3月 07, 2018

Some functions can filter metadata by the generation. Add a define that
will annotate such arguments.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

7c829b72

26 3月, 2018 11 次提交

btrfs: move btrfs_listxattr prototype to xattr.h · 738c93d4

由 David Sterba 提交于 2月 27, 2018

There's a proper header for xattr handlers.
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

738c93d4

btrfs: unify types for metadata_ratio and data_chunk_allocations · d612ac59

由 Anand Jain 提交于 2月 26, 2018

We have btrfs_fs_info::data_chunk_allocations and
btrfs_fs_info::metadata_ratio declared as unsigned which would be
unsinged int and kernel style prefers unsigned int over bare unsigned.
So this patch changes them to u32.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

d612ac59

btrfs: add more __cold annotations · e67c718b

由 David Sterba 提交于 2月 19, 2018

The __cold functions are placed to a special section, as they're
expected to be called rarely. This could help i-cache prefetches or help
compiler to decide which branches are more/less likely to be taken
without any other annotations needed.

Though we can't add more __exit annotations, it's still possible to add
__cold (that's also added with __exit). That way the following function
categories are tagged:

- printf wrappers, error messages
- exit helpers
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e67c718b

btrfs: Remove custom crc32c init code · 9678c543

由 Nikolay Borisov 提交于 1月 08, 2018

The custom crc32 init code was introduced in
14a958e6 ("Btrfs: fix btrfs boot when compiled as built-in") to
enable using btrfs as a built-in. However, later as pointed out by
60efa5eb ("Btrfs: use late_initcall instead of module_init") this
wasn't enough and finally btrfs was switched to late_initcall which
comes after the generic crc32c implementation is initiliased. The
latter commit superseeded the former. Now that we don't have to
maintain our own code let's just remove it and switch to using the
generic implementation.

Despite touching a lot of files the patch is really simple. Here is the gist of
the changes:

1. Select LIBCRC32C rather than the low-level modules.
2. s/btrfs_crc32c/crc32c/g
3. replace hash.h with linux/crc32c.h
4. Move the btrfs namehash funcs to ctree.h and change the tree accordingly.

I've tested this with btrfs being both a module and a built-in and xfstest
doesn't complain.

Does seem to fix the longstanding problem of not automatically selectiong
the crc32c module when btrfs is used. Possibly there is a workaround in
dracut.

The modinfo confirms that now all the module dependencies are there:

before:
depends:        zstd_compress,zstd_decompress,raid6_pq,xor,zlib_deflate

after:
depends:        libcrc32c,zstd_compress,zstd_decompress,raid6_pq,xor,zlib_deflate
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ add more info to changelog from mails ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

9678c543

btrfs: Refactor __get_raid_index() to btrfs_bg_flags_to_raid_index() · 3e72ee88

由 Qu Wenruo 提交于 1月 30, 2018

Function __get_raid_index() is used to convert block group flags into
raid index, which can be used to get various info directly from
btrfs_raid_array[].

Refactor this function a little:

1) Rename to btrfs_bg_flags_to_raid_index()
   Double underline prefix is normally for internal functions, while the
   function is used by both extent-tree and volumes.

   Although the name is a little longer, but it should explain its usage
   quite well.

2) Move it to volumes.h and make it static inline
   Just several if-else branches, really no need to define it as a normal
   function.

   This also makes later code re-use between kernel and btrfs-progs
   easier.

3) Remove function get_block_group_index()
   Really no need to do such a simple thing as an exported function.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

3e72ee88

btrfs: Move qgroup rescan on quota enable to btrfs_quota_enable · 5d23515b

由 Nikolay Borisov 提交于 1月 31, 2018

Currently btrfs_run_qgroups is doing a bit too much. Not only is it
responsible for synchronizing in-memory state of qgroups to disk but
it also contains code to trigger the initial qgroup rescan when
quota is enabled initially. This condition is detected by checking that
BTRFS_FS_QUOTA_ENABLED is not set and BTRFS_FS_QUOTA_ENABLING is set.
Nothing really requires from the code to be structured (and scattered)
the way it is so let's streamline things. First move the quota rescan
code into btrfs_quota_enable, where its invocation is closer to the
use. This also makes the FS_QUOTA_ENABLING flag redundant so let's
remove it as well.

This has been tested with a full xfstest run with qgroups enabled on
the scratch device of every xfstest and no regressions were observed.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

5d23515b

btrfs: manage commit mount option as %u · d3740608

由 Anand Jain 提交于 2月 13, 2018

As the commit mount option is unsigned so manage it as %u for token
verifications, instead of %d.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

d3740608

btrfs: manage thread_pool mount option as %u · f7b885be

由 Anand Jain 提交于 2月 13, 2018

The mount option thread_pool is always unsigned. Manage it that way all
around.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

f7b885be

btrfs: Don't pass fs_info arg to btrfs_start_dirty_block_groups · 21217054

由 Nikolay Borisov 提交于 2月 07, 2018

It can be referenced from the passed transaction so no point in passing
it as a function argument. No functional changes.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

21217054

btrfs: Remove fs_info argument from btrfs_create_pending_block_groups · 6c686b35

由 Nikolay Borisov 提交于 2月 07, 2018

It can be referenced from the passed transaciton so no point in
passing it as function argument. No functional changes.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

6c686b35

btrfs: Make btrfs_trans_release_metadata private to transaction.c · 0e34693f

由 Nikolay Borisov 提交于 2月 07, 2018

This function is only ever used in __btrfs_end_transaction and
btrfs_commit_transaction so there is no need to export it via header.
Let's move it closer to where it's used, make it static and remove it
from the header. No functional changes.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

0e34693f

01 3月, 2018 2 次提交

Btrfs: fix log replay failure after unlink and link combination · 1f250e92

由 Filipe Manana 提交于 2月 28, 2018

If we have a file with 2 (or more) hard links in the same directory,
remove one of the hard links, create a new file (or link an existing file)
in the same directory with the name of the removed hard link, and then
finally fsync the new file, we end up with a log that fails to replay,
causing a mount failure.

Example:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  $ mkdir /mnt/testdir
  $ touch /mnt/testdir/foo
  $ ln /mnt/testdir/foo /mnt/testdir/bar

  $ sync

  $ unlink /mnt/testdir/bar
  $ touch /mnt/testdir/bar
  $ xfs_io -c "fsync" /mnt/testdir/bar

  <power failure>

  $ mount /dev/sdb /mnt
  mount: mount(2) failed: /mnt: No such file or directory

When replaying the log, for that example, we also see the following in
dmesg/syslog:

  [71813.671307] BTRFS info (device dm-0): failed to delete reference to bar, inode 258 parent 257
  [71813.674204] ------------[ cut here ]------------
  [71813.675694] BTRFS: Transaction aborted (error -2)
  [71813.677236] WARNING: CPU: 1 PID: 13231 at fs/btrfs/inode.c:4128 __btrfs_unlink_inode+0x17b/0x355 [btrfs]
  [71813.679669] Modules linked in: btrfs xfs f2fs dm_flakey dm_mod dax ghash_clmulni_intel ppdev pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper evdev psmouse i2c_piix4 parport_pc i2c_core pcspkr sg serio_raw parport button sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod ata_generic sd_mod virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel floppy virtio e1000 scsi_mod [last unloaded: btrfs]
  [71813.679669] CPU: 1 PID: 13231 Comm: mount Tainted: G        W        4.15.0-rc9-btrfs-next-56+ #1
  [71813.679669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
  [71813.679669] RIP: 0010:__btrfs_unlink_inode+0x17b/0x355 [btrfs]
  [71813.679669] RSP: 0018:ffffc90001cef738 EFLAGS: 00010286
  [71813.679669] RAX: 0000000000000025 RBX: ffff880217ce4708 RCX: 0000000000000001
  [71813.679669] RDX: 0000000000000000 RSI: ffffffff81c14bae RDI: 00000000ffffffff
  [71813.679669] RBP: ffffc90001cef7c0 R08: 0000000000000001 R09: 0000000000000001
  [71813.679669] R10: ffffc90001cef5e0 R11: ffffffff8343f007 R12: ffff880217d474c8
  [71813.679669] R13: 00000000fffffffe R14: ffff88021ccf1548 R15: 0000000000000101
  [71813.679669] FS:  00007f7cee84c480(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000
  [71813.679669] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [71813.679669] CR2: 00007f7cedc1abf9 CR3: 00000002354b4003 CR4: 00000000001606e0
  [71813.679669] Call Trace:
  [71813.679669]  btrfs_unlink_inode+0x17/0x41 [btrfs]
  [71813.679669]  drop_one_dir_item+0xfa/0x131 [btrfs]
  [71813.679669]  add_inode_ref+0x71e/0x851 [btrfs]
  [71813.679669]  ? __lock_is_held+0x39/0x71
  [71813.679669]  ? replay_one_buffer+0x53/0x53a [btrfs]
  [71813.679669]  replay_one_buffer+0x4a4/0x53a [btrfs]
  [71813.679669]  ? rcu_read_unlock+0x3a/0x57
  [71813.679669]  ? __lock_is_held+0x39/0x71
  [71813.679669]  walk_up_log_tree+0x101/0x1d2 [btrfs]
  [71813.679669]  walk_log_tree+0xad/0x188 [btrfs]
  [71813.679669]  btrfs_recover_log_trees+0x1fa/0x31e [btrfs]
  [71813.679669]  ? replay_one_extent+0x544/0x544 [btrfs]
  [71813.679669]  open_ctree+0x1cf6/0x2209 [btrfs]
  [71813.679669]  btrfs_mount_root+0x368/0x482 [btrfs]
  [71813.679669]  ? trace_hardirqs_on_caller+0x14c/0x1a6
  [71813.679669]  ? __lockdep_init_map+0x176/0x1c2
  [71813.679669]  ? mount_fs+0x64/0x10b
  [71813.679669]  mount_fs+0x64/0x10b
  [71813.679669]  vfs_kern_mount+0x68/0xce
  [71813.679669]  btrfs_mount+0x13e/0x772 [btrfs]
  [71813.679669]  ? trace_hardirqs_on_caller+0x14c/0x1a6
  [71813.679669]  ? __lockdep_init_map+0x176/0x1c2
  [71813.679669]  ? mount_fs+0x64/0x10b
  [71813.679669]  mount_fs+0x64/0x10b
  [71813.679669]  vfs_kern_mount+0x68/0xce
  [71813.679669]  do_mount+0x6e5/0x973
  [71813.679669]  ? memdup_user+0x3e/0x5c
  [71813.679669]  SyS_mount+0x72/0x98
  [71813.679669]  entry_SYSCALL_64_fastpath+0x1e/0x8b
  [71813.679669] RIP: 0033:0x7f7cedf150ba
  [71813.679669] RSP: 002b:00007ffca71da688 EFLAGS: 00000206
  [71813.679669] Code: 7f a0 e8 51 0c fd ff 48 8b 43 50 f0 0f ba a8 30 2c 00 00 02 72 17 41 83 fd fb 74 11 44 89 ee 48 c7 c7 7d 11 7f a0 e8 38 f5 8d e0 <0f> ff 44 89 e9 ba 20 10 00 00 eb 4d 48 8b 4d b0 48 8b 75 88 4c
  [71813.679669] ---[ end trace 83bd473fc5b4663b ]---
  [71813.854764] BTRFS: error (device dm-0) in __btrfs_unlink_inode:4128: errno=-2 No such entry
  [71813.886994] BTRFS: error (device dm-0) in btrfs_replay_log:2307: errno=-2 No such entry (Failed to recover log tree)
  [71813.903357] BTRFS error (device dm-0): cleaner transaction attach returned -30
  [71814.128078] BTRFS error (device dm-0): open_ctree failed

This happens because the log has inode reference items for both inode 258
(the first file we created) and inode 259 (the second file created), and
when processing the reference item for inode 258, we replace the
corresponding item in the subvolume tree (which has two names, "foo" and
"bar") witht he one in the log (which only has one name, "foo") without
removing the corresponding dir index keys from the parent directory.
Later, when processing the inode reference item for inode 259, which has
a name of "bar" associated to it, we notice that dir index entries exist
for that name and for a different inode, so we attempt to unlink that
name, which fails because the inode reference item for inode 258 no longer
has the name "bar" associated to it, making a call to btrfs_unlink_inode()
fail with a -ENOENT error.

Fix this by unlinking all the names in an inode reference item from a
subvolume tree that are not present in the inode reference item found in
the log tree, before overwriting it with the item from the log tree.
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

1f250e92

btrfs: use kvzalloc to allocate btrfs_fs_info · a8fd1f71

由 Jeff Mahoney 提交于 2月 15, 2018

The srcu_struct in btrfs_fs_info scales in size with NR_CPUS.  On
kernels built with NR_CPUS=8192, this can result in kmalloc failures
that prevent mounting.

There is work in progress to try to resolve this for every user of
srcu_struct but using kvzalloc will work around the failures until
that is complete.

As an example with NR_CPUS=512 on x86_64: the overall size of
subvol_srcu is 3460 bytes, fs_info is 6496.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

a8fd1f71

22 1月, 2018 5 次提交

Btrfs: move extent map specific code to extent_map.c · c04e61b5

由 Liu Bo 提交于 1月 05, 2018

These helpers are extent map specific, move them to extent_map.c.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c04e61b5

Btrfs: add helper for em merge logic · 7b4df058

由 Liu Bo 提交于 1月 05, 2018

This is a prepare work for the following extent map selftest, which
runs tests against em merge logic.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

7b4df058

Btrfs: remove unused wait in btrfs_stripe_hash · 203e02d9

由 Liu Bo 提交于 12月 22, 2017

In fact nobody is waiting on @wait's waitqueue, it can be safely
removed.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

203e02d9

btrfs: Cleanup existing name_len checks · bae15d95

由 Qu Wenruo 提交于 11月 08, 2017

Since tree-checker has verified leaf when reading from disk, we don't
need the existing verify_dir_item() or btrfs_is_name_len_valid() checks.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

bae15d95

Btrfs: add __init macro to btrfs init functions · f5c29bd9

由 Liu Bo 提交于 11月 02, 2017

Adding __init macro gives kernel a hint that this function is only used
during the initialization phase and its memory resources can be freed up
after.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

f5c29bd9

28 11月, 2017 1 次提交

Rename superblock flags (MS_xyz -> SB_xyz) · 1751e8a6

由 Linus Torvalds 提交于 11月 27, 2017

This is a pure automated search-and-replace of the internal kernel
superblock flags.

The s_flags are now called SB_*, with the names and the values for the
moment mirroring the MS_* flags that they're equivalent to.

Note how the MS_xyz flags are the ones passed to the mount system call,
while the SB_xyz flags are what we then use in sb->s_flags.

The script to do this was:

    # places to look in; re security/*: it generally should *not* be
    # touched (that stuff parses mount(2) arguments directly), but
    # there are two places where we really deal with superblock flags.
    FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
            include/linux/fs.h include/uapi/linux/bfs_fs.h \
            security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
    # the list of MS_... constants
    SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
          DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
          POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
          I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
          ACTIVE NOUSER"

    SED_PROG=
    for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done

    # we want files that contain at least one of MS_...,
    # with fs/namespace.c and fs/pnode.c excluded.
    L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')

    for f in $L; do sed -i $f $SED_PROG; done
Requested-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1751e8a6

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功