提交 · f5b3a4173ff624b766c56936bb315e1517603891 · openanolis / cloud-kernel

06 8月, 2018 40 次提交

由 Al Viro 提交于 7月 29, 2018

Don't open-code iget_failed(), don't bother with btrfs_free_path(NULL),
move handling of positive return values of btrfs_lookup_inode() from
btrfs_read_locked_inode() to btrfs_iget() and kill now obviously
pointless ASSERT() in there.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

f5b3a417

btrfs: lift make_bad_inode into btrfs_iget · 9bc2ceff

由 Al Viro 提交于 7月 29, 2018

We don't need to check is_bad_inode() after the call of
btrfs_read_locked_inode() - it's exactly the same as checking return
value for being non-zero.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

9bc2ceff

btrfs: simplify IS_ERR/PTR_ERR checks · 8d9e220c

由 Al Viro 提交于 7月 29, 2018

IS_ERR(p) && PTR_ERR(p) == n is a weird way to spell p == ERR_PTR(n).
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
[ update changelog ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

8d9e220c

btrfs: btrfs_iget never returns an is_bad_inode inode · 2e19f1f9

由 Al Viro 提交于 7月 29, 2018

Just get rid of pointless checks.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

2e19f1f9

btrfs: replace: Reset on-disk dev stats value after replace · 1e7e1f9e

由 Misono Tomohiro 提交于 7月 31, 2018

on-disk devs stats value is updated in btrfs_run_dev_stats(),
which is called during commit transaction, if device->dev_stats_ccnt
is not zero.

Since current replace operation does not touch dev_stats_ccnt,
on-disk dev stats value is not updated. Therefore "btrfs device stats"
may return old device's value after umount/mount
(Example: See "btrfs ins dump-t -t DEV $DEV" after btrfs/100 finish).

Fix this by just incrementing dev_stats_ccnt in
btrfs_dev_replace_finishing() when replace is succeeded and this will
update the values.
Signed-off-by: NMisono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

1e7e1f9e

btrfs: extent-tree: Remove unused __btrfs_free_block_rsv · 85c39548

由 Misono Tomohiro 提交于 7月 26, 2018

There is no user of this function anymore.

This was forgotten to be removed in commit a575ceeb
("Btrfs: get rid of unused orphan infrastructure").
Signed-off-by: NMisono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

85c39548

btrfs: backref: Use ERR_CAST to return error code · afc6961f

由 Misono Tomohiro 提交于 7月 26, 2018

Use ERR_CAST() instead of void * to make meaning clear.
Signed-off-by: NMisono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

afc6961f

btrfs: Remove redundant btrfs_release_path from btrfs_unlink_subvol · 5b7d687a

由 Lu Fengqi 提交于 8月 01, 2018

Although it is safe to call this on already released paths with no locks
held or extent buffers, removing the redundant btrfs_release_path is
reasonable.
Signed-off-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

5b7d687a

btrfs: Remove root parameter from btrfs_unlink_subvol · 401b3b19

由 Lu Fengqi 提交于 8月 01, 2018

All callers pass the root tree of dir, we can push that down to the
function itself.
Signed-off-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

401b3b19

btrfs: Remove fs_info from btrfs_add_root_ref · 6025c19f

由 Lu Fengqi 提交于 8月 01, 2018

It can be referenced from the passed transaction handle.
Signed-off-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

6025c19f

btrfs: Remove fs_info from btrfs_del_root_ref · 3ee1c553

由 Lu Fengqi 提交于 8月 01, 2018

It can be referenced from the passed transaction handle.
Signed-off-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

3ee1c553

btrfs: Remove fs_info from btrfs_del_root · ab9ce7d4

由 Lu Fengqi 提交于 8月 01, 2018

It can be referenced from the passed transaction handle.
Signed-off-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ab9ce7d4

btrfs: Remove fs_info from btrfs_delete_delayed_dir_index · 9add2945

由 Lu Fengqi 提交于 8月 01, 2018

It can be referenced from the passed transaction handle.
Signed-off-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

9add2945

btrfs: Remove fs_info from btrfs_insert_delayed_dir_index · 4465c8b4

由 Lu Fengqi 提交于 8月 01, 2018

It can be referenced from the passed transaction handle.
Signed-off-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

4465c8b4

btrfs: extent-tree: remove unused member walk_control::for_reloc · b5851021

由 David Sterba 提交于 7月 24, 2018

Leftover after fix e339a6b0 ("Btrfs: __btrfs_mod_ref should always
use no_quota"), that removed it from the function calls but not the
structure.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

b5851021

Btrfs: fix send failure when root has deleted files still open · 46b2f459

由 Filipe Manana 提交于 7月 24, 2018

The more common use case of send involves creating a RO snapshot and then
use it for a send operation. In this case it's not possible to have inodes
in the snapshot that have a link count of zero (inode with an orphan item)
since during snapshot creation we do the orphan cleanup. However, other
less common use cases for send can end up seeing inodes with a link count
of zero and in this case the send operation fails with a ENOENT error
because any attempt to generate a path for the inode, with the purpose
of creating it or updating it at the receiver, fails since there are no
inode reference items. One use case it to use a regular subvolume for
a send operation after turning it to RO mode or turning a RW snapshot
into RO mode and then using it for a send operation. In both cases, if a
file gets all its hard links deleted while there is an open file
descriptor before turning the subvolume/snapshot into RO mode, the send
operation will encounter an inode with a link count of zero and then
fail with errno ENOENT.

Example using a full send with a subvolume:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  $ btrfs subvolume create /mnt/sv1
  $ touch /mnt/sv1/foo
  $ touch /mnt/sv1/bar

  # keep an open file descriptor on file bar
  $ exec 73</mnt/sv1/bar
  $ unlink /mnt/sv1/bar

  # Turn the subvolume to RO mode and use it for a full send, while
  # holding the open file descriptor.
  $ btrfs property set /mnt/sv1 ro true

  $ btrfs send -f /tmp/full.send /mnt/sv1
  At subvol /mnt/sv1
  ERROR: send ioctl failed with -2: No such file or directory

Example using an incremental send with snapshots:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  $ btrfs subvolume create /mnt/sv1
  $ touch /mnt/sv1/foo
  $ touch /mnt/sv1/bar

  $ btrfs subvolume snapshot -r /mnt/sv1 /mnt/snap1

  $ echo "hello world" >> /mnt/sv1/bar

  $ btrfs subvolume snapshot -r /mnt/sv1 /mnt/snap2

  # Turn the second snapshot to RW mode and delete file foo while
  # holding an open file descriptor on it.
  $ btrfs property set /mnt/snap2 ro false
  $ exec 73</mnt/snap2/foo
  $ unlink /mnt/snap2/foo

  # Set the second snapshot back to RO mode and do an incremental send.
  $ btrfs property set /mnt/snap2 ro true

  $ btrfs send -f /tmp/inc.send -p /mnt/snap1 /mnt/snap2
  At subvol /mnt/snap2
  ERROR: send ioctl failed with -2: No such file or directory

So fix this by ignoring inodes with a link count of zero if we are either
doing a full send or if they do not exist in the parent snapshot (they
are new in the send snapshot), and unlink all paths found in the parent
snapshot when doing an incremental send (and ignoring all other inode
items, such as xattrs and extents).

A test case for fstests follows soon.

CC: stable@vger.kernel.org # 4.4+
Reported-by: NMartin Wilck <martin.wilck@suse.com>
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

46b2f459

Btrfs: fix mount failure after fsync due to hard link recreation · 0d836392

由 Filipe Manana 提交于 7月 20, 2018

If we end up with logging an inode reference item which has the same name
but different index from the one we have persisted, we end up failing when
replaying the log with an errno value of -EEXIST. The error comes from
btrfs_add_link(), which is called from add_inode_ref(), when we are
replaying an inode reference item.

Example scenario where this happens:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  $ touch /mnt/foo
  $ ln /mnt/foo /mnt/bar

  $ sync

  # Rename the first hard link (foo) to a new name and rename the second
  # hard link (bar) to the old name of the first hard link (foo).
  $ mv /mnt/foo /mnt/qwerty
  $ mv /mnt/bar /mnt/foo

  # Create a new file, in the same parent directory, with the old name of
  # the second hard link (bar) and fsync this new file.
  # We do this instead of calling fsync on foo/qwerty because if we did
  # that the fsync resulted in a full transaction commit, not triggering
  # the problem.
  $ touch /mnt/bar
  $ xfs_io -c "fsync" /mnt/bar

  <power fail>

  $ mount /dev/sdb /mnt
  mount: mount /dev/sdb on /mnt failed: File exists

So fix this by checking if a conflicting inode reference exists (same
name, same parent but different index), removing it (and the associated
dir index entries from the parent inode) if it exists, before attempting
to add the new reference.

A test case for fstests follows soon.

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

0d836392

btrfs: don't leak ret from do_chunk_alloc · 4559b0a7

由 Josef Bacik 提交于 7月 19, 2018

If we're trying to make a data reservation and we have to allocate a
data chunk we could leak ret == 1, as do_chunk_alloc() will return 1 if
it allocated a chunk.  Since the end of the function is the success path
just return 0.

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

4559b0a7

btrfs: merge free_fs_root helpers · 84db5ccf

由 David Sterba 提交于 7月 20, 2018

The exported helper just calls the static one. There's no obvious reason
to have them separate eg. for performance reasons where the static one
could be better optimized in the same unit. There's a slight decrease in
code size and stack consumption.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

84db5ccf

D
btrfs: constify strings passed to assertion helper · 2ffad70e
由 David Sterba 提交于 7月 20, 2018
```
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
2ffad70e

btrfs: dev-replace: remove unused members of btrfs_dev_replace · e9539cff

由 David Sterba 提交于 7月 20, 2018

Lock owner and nesting level have been unused since day 1, probably
copy&pasted from the extent_buffer locking scheme without much thinking.
The locking of device replace is simpler and does not need any lock
nesting.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e9539cff

btrfs: remove unused member btrfs_root::name · e17385ca

由 David Sterba 提交于 7月 20, 2018

Added in 58176a96 ("Btrfs: Add per-root block accounting and sysfs
entries") in 2007, the roots had names exported in sysfs. The code
was commented out in 4df27c4d ("Btrfs: change how subvolumes
are organized") and cleaned by 182608c8 ("btrfs: remove old
unused commented out code").
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e17385ca

btrfs: allow defrag on a file opened read-only that has rw permissions · 616d374e

由 Adam Borowski 提交于 7月 18, 2018

Requiring a read-write descriptor conflicts both ways with exec,
returning ETXTBSY whenever you try to defrag a program that's currently
being run, or causing intermittent exec failures on a live system being
defragged.

As defrag doesn't change the file's contents in any way, there's no
reason to consider it a rw operation.  Thus, let's check only whether
the file could have been opened rw.  Such access control is still needed
as currently defrag can use extra disk space, and might trigger bugs.

We return EINVAL when the request is invalid; here it's ok but merely
the user has insufficient privileges.  Thus, the EPERM return value
reflects the error better -- as discussed in the identical case for
dedupe.

According to codesearch.debian.net, no userspace program distinguishes
these values beyond strerror().
Signed-off-by: NAdam Borowski <kilobyte@angband.pl>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ fold the EPERM patch from Adam ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

616d374e

Btrfs: fix btrfs_write_inode vs delayed iput deadlock · 3c427693

由 Josef Bacik 提交于 7月 20, 2018

We recently ran into the following deadlock involving
btrfs_write_inode():

[  +0.005066]  __schedule+0x38e/0x8c0
[  +0.007144]  schedule+0x36/0x80
[  +0.006447]  bit_wait+0x11/0x60
[  +0.006446]  __wait_on_bit+0xbe/0x110
[  +0.007487]  ? bit_wait_io+0x60/0x60
[  +0.007319]  __inode_wait_for_writeback+0x96/0xc0
[  +0.009568]  ? autoremove_wake_function+0x40/0x40
[  +0.009565]  inode_wait_for_writeback+0x21/0x30
[  +0.009224]  evict+0xb0/0x190
[  +0.006099]  iput+0x1a8/0x210
[  +0.006103]  btrfs_run_delayed_iputs+0x73/0xc0
[  +0.009047]  btrfs_commit_transaction+0x799/0x8c0
[  +0.009567]  btrfs_write_inode+0x81/0xb0
[  +0.008008]  __writeback_single_inode+0x267/0x320
[  +0.009569]  writeback_sb_inodes+0x25b/0x4e0
[  +0.008702]  wb_writeback+0x102/0x2d0
[  +0.007487]  wb_workfn+0xa4/0x310
[  +0.006794]  ? wb_workfn+0xa4/0x310
[  +0.007143]  process_one_work+0x150/0x410
[  +0.008179]  worker_thread+0x6d/0x520
[  +0.007490]  kthread+0x12c/0x160
[  +0.006620]  ? put_pwq_unlocked+0x80/0x80
[  +0.008185]  ? kthread_park+0xa0/0xa0
[  +0.007484]  ? do_syscall_64+0x53/0x150
[  +0.007837]  ret_from_fork+0x29/0x40

Writeback calls:

btrfs_write_inode
  btrfs_commit_transaction
    btrfs_run_delayed_iputs

If iput() is called on that same inode, evict() will wait for writeback
forever.

btrfs_write_inode() was originally added way back in 4730a4bc
("btrfs_dirty_inode") to support O_SYNC writes. However, ->write_inode()
hasn't been used for O_SYNC since 148f948b ("vfs: Introduce new
helpers for syncing after writing to O_SYNC file or IS_SYNC inode"), so
btrfs_write_inode() is actually unnecessary (and leads to a bunch of
unnecessary commits). Get rid of it, which also gets rid of the
deadlock.

CC: stable@vger.kernel.org # 3.2+
Signed-off-by: NJosef Bacik <jbacik@fb.com>
[Omar: new commit message]
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

3c427693

btrfs: Remove fs_info from btrfs_finish_chunk_alloc · 97aff912

由 Nikolay Borisov 提交于 7月 20, 2018

It can be referenced from the passed transaction handle.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

97aff912

btrfs: Remove fs_info form btrfs_free_chunk · f4208794

由 Nikolay Borisov 提交于 7月 20, 2018

It can be referenced from the passed transaction handle.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

f4208794

btrfs: Remove fs_info from btrfs_destroy_dev_replace_tgtdev · 4f5ad7bd

由 Nikolay Borisov 提交于 7月 20, 2018

This function is always passed a well-formed tgtdevice so the fs_info
can be referenced from there.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

4f5ad7bd

btrfs: Remove fs_info from btrfs_assign_next_active_device · d6507cf1

由 Nikolay Borisov 提交于 7月 20, 2018

It can be referenced from the passed 'device' argument which is always
a well-formed device.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

d6507cf1

btrfs: remove fs_info argument from update_dev_stat_item · 5495f195

由 Nikolay Borisov 提交于 7月 20, 2018

It can be referenced from the passed transaction handle.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

5495f195

btrfs: Remove fs_info from btrfs_rm_dev_replace_remove_srcdev · 68a9db5f

由 Nikolay Borisov 提交于 7月 20, 2018

It can be referenced from the passed srcdev argument.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

68a9db5f

btrfs: Remove fs_info argument from btrfs_add_dev_item · 8e87e856

由 Nikolay Borisov 提交于 7月 20, 2018

It can be referenced form the passed transaction handle.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

8e87e856

btrfs: extent-tree: Remove dead alignment check · 5e23a6fe

由 Qu Wenruo 提交于 7月 23, 2018

In find_free_extent() under checks: label, we have the following code:

		search_start = ALIGN(offset, fs_info->stripesize);
		/* move on to the next group */
		if (search_start + num_bytes >
		    block_group->key.objectid + block_group->key.offset) {
			btrfs_add_free_space(block_group, offset, num_bytes);
			goto loop;
		}
		if (offset < search_start)
			btrfs_add_free_space(block_group, offset,
					     search_start - offset);
		BUG_ON(offset > search_start);

However ALIGN() is rounding up, thus @search_start >= @offset and that
BUG_ON() will never be triggered.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

5e23a6fe

Btrfs: remove unused key assignment when doing a full send · ca5d2ba1

由 Filipe Manana 提交于 7月 23, 2018

At send.c:full_send_tree() we were setting the 'key' variable in the loop
while never using it later. We were also using two btrfs_key variables
to store the initial key for search and the key found in every iteration
of the loop. So remove this useless key assignment and use the same
btrfs_key variable to store the initial search key and the key found in
each iteration. This was introduced in the initial send commit but was
never used (commit 31db9f7c ("Btrfs: introduce BTRFS_IOC_SEND for
btrfs send/receive").
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ca5d2ba1

btrfs: drop extent_io_ops::set_range_writeback callback · 5cdc84bf