提交 · 14e46e04958df740c6c6a94849f176159a333f13 · openeuler / Kernel

22 1月, 2016 1 次提交

btrfs: synchronize incompat feature bits with sysfs files · 14e46e04

由 David Sterba 提交于 1月 21, 2016

The files under /sys/fs/UUID/features get out of sync with the actual
incompat bits set for the filesystem if they change after mount (eg. the
LZO compression).

Synchronize the feature bits with the sysfs files representing them
right after we set/clear them.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

14e46e04

07 1月, 2016 3 次提交

btrfs: statfs: report zero available if metadata are exhausted · ca8a51b3

由 David Sterba 提交于 10月 10, 2015

There is one ENOSPC case that's very confusing. There's Available
greater than zero but no file operation succeds (besides removing
files). This happens when the metadata are exhausted and there's no
possibility to allocate another chunk.

In this scenario it's normal that there's still some space in the data
chunk and the calculation in df reflects that in the Avail value.

To at least give some clue about the ENOSPC situation, let statfs report
zero value in Avail, even if there's still data space available.

Current:
  /dev/sdb1             4.0G  3.3G  719M  83% /mnt/test

New:
  /dev/sdb1             4.0G  3.3G     0 100% /mnt/test

We calculate the remaining metadata space minus global reserve. If this
is (supposedly) smaller than zero, there's no space. But this does not
hold in practice, the exhausted state happens where's still some
positive delta. So we apply some guesswork and compare the delta to a 4M
threshold. (Practically observed delta was 2M.)

We probably cannot calculate the exact threshold value because this
depends on the internal reservations requested by various operations, so
some operations that consume a few metadata will succeed even if the
Avail is zero. But this is better than the other way around.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ca8a51b3

btrfs: constify static arrays · 4d4ab6d6

由 David Sterba 提交于 11月 19, 2015

There are a few statically initialized arrays that can be made const.
The remaining (like file_system_type, sysfs attributes or prop handlers)
do not allow that due to type mismatch when passed to the APIs or
because the structures are modified through other members.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

4d4ab6d6

Btrfs: use linux/sizes.h to represent constants · ee22184b

由 Byongho Lee 提交于 12月 15, 2015

We use many constants to represent size and offset value.  And to make
code readable we use '256 * 1024 * 1024' instead of '268435456' to
represent '256MB'.  However we can make far more readable with 'SZ_256MB'
which is defined in the 'linux/sizes.h'.

So this patch replaces 'xxx * 1024 * 1024' kind of expression with
single 'SZ_xxxMB' if 'xxx' is a power of 2 then 'xxx * SZ_1M' if 'xxx' is
not a power of 2. And I haven't touched to '4096' & '8192' because it's
more intuitive than 'SZ_4KB' & 'SZ_8KB'.
Signed-off-by: NByongho Lee <bhlee.kernel@gmail.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ee22184b

18 12月, 2015 2 次提交

Btrfs: add free space tree mount option · 70f6d82e

由 Omar Sandoval 提交于 9月 29, 2015

Now we can finally hook up everything so we can actually use free space
tree. The free space tree is enabled by passing the space_cache=v2 mount
option. On the first mount with the this option set, the free space tree
will be created and the FREE_SPACE_TREE read-only compat bit will be
set. Any time the filesystem is mounted from then on, we must use the
free space tree. The clear_cache option will also clear the free space
tree.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

70f6d82e

Btrfs: add free space tree sanity tests · 7c55ee0c

由 Omar Sandoval 提交于 9月 29, 2015

This tests the operations on the free space tree trying to excercise all
of the main cases for both formats. Between this and xfstests, the free
space tree should have pretty good coverage.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

7c55ee0c

22 10月, 2015 1 次提交

Btrfs: add fragment=* debug mount option · d0bd4560

由 Josef Bacik 提交于 9月 23, 2015

In tracking down these weird bitmap problems it was helpful to artificially
create an extremely fragmented file system.  These mount options let us either
fragment data or metadata or both.  With these options I could reproduce all
sorts of weird latencies and hangs that occur under extreme fragmentation and
get them fixed.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

d0bd4560

29 9月, 2015 1 次提交

Btrfs: __btrfs_std_error() logic should be consistent w/out CONFIG_PRINTK defined · 57d816a1

由 Anand Jain 提交于 8月 14, 2015

error handling logic behaves differently with or without
CONFIG_PRINTK defined, since there are two copies of the same
function which a bit of different logic

One, when CONFIG_PRINTK is defined, code is

__btrfs_std_error(..)
{
::
       save_error_info(fs_info);
       if (sb->s_flags & MS_BORN)
               btrfs_handle_error(fs_info);
}

and two when CONFIG_PRINTK is not defined, the code is

__btrfs_std_error(..)
{
::
       if (sb->s_flags & MS_BORN) {
               save_error_info(fs_info);
               btrfs_handle_error(fs_info);
        }
}

I doubt if this was intentional ? and appear to have caused since
we maintain two copies of the same function and they got diverged
with commits.

Now to decide which logic is correct reviewed changes as below,

 533574c6
Commit added two copies of this function

 cf79ffb5
Commit made change to only one copy of the function and to the
copy when CONFIG_PRINTK is defined.

To fix this, instead of maintaining two copies of same function
approach, maintain single function, and just put the extra
portion of the code under CONFIG_PRINTK define.

This patch just does that. And keeps code of with CONFIG_PRINTK
defined.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

57d816a1

10 9月, 2015 1 次提交

Btrfs: remove unnecessary locking of cleaner_mutex to avoid deadlock · 85e0a0f2

由 Filipe Manana 提交于 9月 10, 2015

After commmit e44163e1 ("btrfs: explictly delete unused block groups
in close_ctree and ro-remount"), added in the 4.3 merge window, we have
calls to btrfs_delete_unused_bgs() while holding the cleaner_mutex.
This can cause a deadlock with a concurrent block group relocation (when
a filesystem balance or shrink operation is in progress for example)
because btrfs_delete_unused_bgs() locks delete_unused_bgs_mutex and the
relocation path locks first delete_unused_bgs_mutex and then it locks
cleaner_mutex, resulting in a classic ABBA deadlock:

CPU 0 CPU 1

lock fs_info->cleaner_mutex

__btrfs_balance() || btrfs_shrink_device()
lock fs_info->delete_unused_bgs_mutex
btrfs_relocate_chunk()
btrfs_relocate_block_group()
lock fs_info->cleaner_mutex
btrfs_delete_unused_bgs()
lock fs_info->delete_unused_bgs_mutex

Fix this by not taking the cleaner_mutex before calling
btrfs_delete_unused_bgs() because it's no longer needed after
commit 67c5e7d4 ("Btrfs: fix race between balance and unused block
group deletion"). The mutex fs_info->delete_unused_bgs_mutex, the
spinlock fs_info->unused_bgs_lock and a block group's spinlock are
enough to get correct serialization between tasks running relocation
and unused block group deletion (as well as between multiple tasks
concurrently calling btrfs_delete_unused_bgs()).

This issue was discussed (in the mailing list) during the review of
the patch titled "btrfs: explictly delete unused block groups in
close_ctree and ro-remount" and it was agreed that acquiring the
cleaner mutex had to be dropped after the patch titled
"Btrfs: fix race between balance and unused block group deletion"
got merged (both patches were submitted at about the same time, but
one landed in kernel 4.2 and the other in the 4.3 merge window).
Signed-off-by: NFilipe Manana <fdmanana@suse.com>

85e0a0f2

09 8月, 2015 1 次提交

Btrfs: add support for blkio controllers · da2f0f74

由 Chris Mason 提交于 7月 02, 2015

This attaches accounting information to bios as we submit them so the
new blkio controllers can throttle on btrfs filesystems.

Not much is required, we're just associating bios with blkcgs during clone,
calling wbc_init_bio()/wbc_account_io() during writepages submission,
and attaching the bios to the current context during direct IO.

Finally if we are splitting bios during btrfs_map_bio, this attaches
accounting information to the split.

The end result is able to throttle nicely on single disk filesystems.  A
little more work is required for multi-device filesystems.
Signed-off-by: NChris Mason <clm@fb.com>

da2f0f74

06 8月, 2015 1 次提交

char: make misc_deregister a void function · f368ed60

由 Greg Kroah-Hartman 提交于 7月 30, 2015

With well over 200+ users of this api, there are a mere 12 users that
actually checked the return value of this function.  And all of them
really didn't do anything with that information as the system or module
was shutting down no matter what.

So stop pretending like it matters, and just return void from
misc_deregister().  If something goes wrong in the call, you will get a
WARNING splat in the syslog so you know how to fix up your driver.
Other than that, there's nothing that can go wrong.

Cc: Alasdair Kergon <agk@redhat.com>
Cc: Neil Brown <neilb@suse.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Wim Van Sebroeck <wim@iguana.be>
Cc: Christine Caulfield <ccaulfie@redhat.com>
Cc: David Teigland <teigland@redhat.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Acked-by: NJoel Becker <jlbec@evilplan.org>
Acked-by: NAlexandre Belloni <alexandre.belloni@free-electrons.com>
Acked-by: NAlessandro Zummo <a.zummo@towertech.it>
Acked-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

f368ed60

29 7月, 2015 2 次提交

btrfs: add missing discards when unpinning extents with -o discard · e33e17ee

由 Jeff Mahoney 提交于 6月 15, 2015

When we clear the dirty bits in btrfs_delete_unused_bgs for extents
in the empty block group, it results in btrfs_finish_extent_commit being
unable to discard the freed extents.

The block group removal patch added an alternate path to forget extents
other than btrfs_finish_extent_commit.  As a result, any extents that
would be freed when the block group is removed aren't discarded.  In my
test run, with a large copy of mixed sized files followed by removal, it
left nearly 2/3 of extents undiscarded.

To clean up the block groups, we add the removed block group onto a list
that will be discarded after transaction commit.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Reviewed-by: NFilipe Manana <fdmanana@suse.com>
Tested-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

e33e17ee

btrfs: explictly delete unused block groups in close_ctree and ro-remount · e44163e1

由 Jeff Mahoney 提交于 6月 15, 2015

The cleaner thread may already be sleeping by the time we enter
close_ctree.  If that's the case, we'll skip removing any unused
block groups queued for removal, even during a normal umount.
They'll be cleaned up automatically at next mount, but users
expect a umount to be a clean synchronization point, especially
when used on thin-provisioned storage with -odiscard.  We also
explicitly remove unused block groups in the ro-remount path
for the same reason.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Reviewed-by: NFilipe Manana <fdmanana@suse.com>
Tested-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

e44163e1

03 6月, 2015 8 次提交

Btrfs: show subvol= and subvolid= in /proc/mounts · c8d3fe02

由 Omar Sandoval 提交于 5月 18, 2015

Now that we're guaranteed to have a meaningful root dentry, we can just
export seq_dentry() and use it in btrfs_show_options(). The subvolume ID
is easy to get and can also be useful, so put that in there, too.
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NOmar Sandoval <osandov@osandov.com>
Signed-off-by: NChris Mason <clm@fb.com>

c8d3fe02

Btrfs: unify subvol= and subvolid= mounting · 05dbe683

由 Omar Sandoval 提交于 5月 18, 2015

Currently, mounting a subvolume with subvolid= takes a different code
path than mounting with subvol=. This isn't really a big deal except for
the fact that mounts done with subvolid= or the default subvolume don't
have a dentry that's connected to the dentry tree like in the subvol=
case. To unify the code paths, when given subvolid= or using the default
subvolume ID, translate it into a subvolume name by walking
ROOT_BACKREFs in the root tree and INODE_REFs in the filesystem trees.
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NOmar Sandoval <osandov@osandov.com>
Signed-off-by: NChris Mason <clm@fb.com>

05dbe683

Btrfs: fail on mismatched subvol and subvolid mount options · bb289b7b

由 Omar Sandoval 提交于 5月 18, 2015

There's nothing to stop a user from passing both subvol= and subvolid=
to mount, but if they don't refer to the same subvolume, someone is
going to be surprised at some point. Error out on this case, but allow
users to pass in both if they do match (which they could, for example,
get out of /proc/mounts).
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NOmar Sandoval <osandov@osandov.com>
Signed-off-by: NChris Mason <clm@fb.com>

bb289b7b

Btrfs: clean up error handling in mount_subvol() · fa330659

由 Omar Sandoval 提交于 5月 18, 2015

In preparation for new functionality in mount_subvol(), give it
ownership of subvol_name and tidy up the error paths.
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NOmar Sandoval <osandov@osandov.com>
Signed-off-by: NChris Mason <clm@fb.com>

fa330659

Btrfs: remove all subvol options before mounting top-level · e6e4dbe8

由 Omar Sandoval 提交于 5月 18, 2015

Currently, setup_root_args() substitutes 's/subvol=[^,]*/subvolid=0/'.
But, this means that if the user passes both a subvol and subvolid for
some reason, we won't actually mount the top-level when we recursively
mount. For example, consider:

mkfs.btrfs -f /dev/sdb
mount /dev/sdb /mnt
btrfs subvol create /mnt/subvol1 # subvolid=257
btrfs subvol create /mnt/subvol2 # subvolid=258
umount /mnt
mount -osubvol=/subvol1,subvolid=258 /dev/sdb /mnt

In the final mount, subvol=/subvol1,subvolid=258 becomes
subvolid=0,subvolid=258, and the last option takes precedence, so we
mount subvol2 and try to look up subvol1 inside of it, which fails.

So, instead, do a thorough scan through the argument list and remove any
subvol= and subvolid= options, then append subvolid=0 to the end. This
implicitly makes subvol= take precedence over subvolid=, but we're about
to add a stricter check for that. This also makes setup_root_args() more
generic, which we'll need soon.
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NOmar Sandoval <osandov@osandov.com>
Signed-off-by: NChris Mason <clm@fb.com>

e6e4dbe8

Btrfs: lock superblock before remounting for rw subvol · 773cd04e

由 Omar Sandoval 提交于 5月 18, 2015

Since commit 0723a047 ("btrfs: allow mounting btrfs subvolumes with
different ro/rw options"), when mounting a subvolume read/write when
another subvolume has previously been mounted read-only, we first do a
remount. However, this should be done with the superblock locked, as per
sync_filesystem():

	/*
	 * We need to be protected against the filesystem going from
	 * r/o to r/w or vice versa.
	 */
	WARN_ON(!rwsem_is_locked(&sb->s_umount));

This WARN_ON can easily be hit with:

mkfs.btrfs -f /dev/vdb
mount /dev/vdb /mnt
btrfs subvol create /mnt/vol1
btrfs subvol create /mnt/vol2
umount /mnt
mount -oro,subvol=/vol1 /dev/vdb /mnt
mount -orw,subvol=/vol2 /dev/vdb /mnt2

Fixes: 0723a047 ("btrfs: allow mounting btrfs subvolumes with different ro/rw options")
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NOmar Sandoval <osandov@osandov.com>
Signed-off-by: NChris Mason <clm@fb.com>

773cd04e

btrfs: add 'cold' compiler annotations to all error handling functions · c0d19e2b

由 David Sterba 提交于 4月 24, 2015

The annotated functios will be placed into .text.unlikely section. The
annotation also hints compiler to move the code out of the hot paths,
and may implicitly mark if-statement leading to that block as unlikely.

This is a heuristic, the impact on the generated code is not
significant.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

c0d19e2b

btrfs: report exact callsite where transaction abort occurs · 1a9a8a71

由 David Sterba 提交于 4月 24, 2015

WARN is called from a single location and all bugreports say that's in
super.c __btrfs_abort_transaction. This is slightly confusing as we'd
rather want to know the exact callsite. Whereas this information is
printed in the syslog below the stacktrace, this requires further look
and we usually see only the headline from WARNING.

Moving the WARN into the macro has to inline some code and increases
code by a few kilobytes:

  text    data     bss     dec     hex filename
835481   20305   14120  869906   d4612 btrfs.ko.before
842883   20305   14120  877308   d62fc btrfs.ko.after

The delta is +7k (130+ calls), measured on 3.19 x86_64, distro config.
The increase is not small and could lead to worse icache use. The code
is on error/exit paths that can be recognized by compiler as cold and
moved out of the way so the impact is speculated to be low, if
measurable at all.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

1a9a8a71

16 4月, 2015 1 次提交

VFS: normal filesystems (and lustre): d_inode() annotations · 2b0143b5

由 David Howells 提交于 3月 17, 2015

that's the bulk of filesystem drivers dealing with inodes of their own
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2b0143b5

27 3月, 2015 2 次提交

btrfs: cleanup orphans while looking up default subvolume · 727b9784

由 Jeff Mahoney 提交于 3月 20, 2015

Orphans in the fs tree are cleaned up via open_ctree and subvolume
orphans are cleaned via btrfs_lookup_dentry -- except when a default
subvolume is in use.  The name for the default subvolume uses a manual
lookup that doesn't trigger orphan cleanup and needs to trigger it
manually as well. This doesn't apply to the remount case since the
subvolumes are cleaned up by walking the root radix tree.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

727b9784

btrfs: explicitly set control file's private_data · d8620958

由 Tom Van Braeckel 提交于 3月 24, 2015

The private_data member of the Btrfs control device file
(/dev/btrfs-control) is used to hold the current transaction and needs
to be initialized to NULL to signify that no transaction is in progress.

We explicitly set the control file's private_data to NULL to be
independent of whatever value the misc subsystem initializes it to.

Backstory:
----------

The misc subsystem (which is used by /dev/btrfs-control) initializes
a file's private_data to point to the misc device when a driver has
registered a custom open file operation and initializes it to NULL
when a custom open file operation has *not* been provided.

This subtle quirk is confusing, to the point where kernel code registers
*empty* file open operations to have private_data point to the misc
device structure.

And it leads to bugs, where the addition or removal of a custom open
file operation surprisingly changes the initial contents of a file's
private_data structure.

To simplify things in the misc subsystem, a patch [1] has been proposed
to *always* set private_data to point to the misc device instead of
only doing this when a custom open file operation has been registered.

But before we can fix this in the misc subsystem itself, we need to
modify the (few) drivers that rely on this very subtle behavior.

[1] https://lkml.org/lkml/2014/12/4/939Signed-off-by: NMartin Kepplinger <martink@posteo.de>
Signed-off-by: NTom Van Braeckel <tomvanbraeckel@gmail.com>
Signed-off-by: NChris Mason <clm@fb.com>

d8620958

04 3月, 2015 1 次提交

btrfs: cleanup 64bit/32bit divs, compile time constants · f8c269d7

由 David Sterba 提交于 1月 16, 2015

Switch to div_u64 if the divisor is a numeric constant or sum of
sizeof()s. We can remove a few instances of do_div that has the hidden
semtantics of changing the 1st argument.

Small power-of-two divisors are converted to bitshifts, large values are
kept intact for clarity.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

f8c269d7

21 2月, 2015 1 次提交

btrfs: cleanup 64bit/32bit divs, compile time constants · 16068ec1

由 David Sterba 提交于 1月 16, 2015

Switch to div_u64 if the divisor is a numeric constant or sum of
sizeof()s. We can remove a few instances of do_div that has the hidden
semtantics of changing the 1st argument.

Small power-of-two divisors are converted to bitshifts, large values are
kept intact for clarity.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

16068ec1

22 1月, 2015 1 次提交
- D
  btrfs: remove a no-op unfreeze superbock callback · 730a78c7
  由 David Sterba 提交于 1月 20, 2015
```
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>
```
  730a78c7
21 1月, 2015 1 次提交

btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. · a53f4f8e

由 Qu Wenruo 提交于 1月 19, 2015

Commit 6b5fe46d (btrfs: do commit in sync_fs if there are pending
changes) will call btrfs_start_transaction() in sync_fs(), to handle
some operations needed to be done in next transaction.

However this can cause deadlock if the filesystem is frozen, with the
following sys_r+w output:
[  143.255932] Call Trace:
[  143.255936]  [<ffffffff816c0e09>] schedule+0x29/0x70
[  143.255939]  [<ffffffff811cb7f3>] __sb_start_write+0xb3/0x100
[  143.255971]  [<ffffffffa040ec06>] start_transaction+0x2e6/0x5a0
[btrfs]
[  143.255992]  [<ffffffffa040f1eb>] btrfs_start_transaction+0x1b/0x20
[btrfs]
[  143.256003]  [<ffffffffa03dc0ba>] btrfs_sync_fs+0xca/0xd0 [btrfs]
[  143.256007]  [<ffffffff811f7be0>] sync_fs_one_sb+0x20/0x30
[  143.256011]  [<ffffffff811cbd01>] iterate_supers+0xe1/0xf0
[  143.256014]  [<ffffffff811f7d75>] sys_sync+0x55/0x90
[  143.256017]  [<ffffffff816c49d2>] system_call_fastpath+0x12/0x17
[  143.256111] Call Trace:
[  143.256114]  [<ffffffff816c0e09>] schedule+0x29/0x70
[  143.256119]  [<ffffffff816c3405>] rwsem_down_write_failed+0x1c5/0x2d0
[  143.256123]  [<ffffffff8133f013>] call_rwsem_down_write_failed+0x13/0x20
[  143.256131]  [<ffffffff811caae8>] thaw_super+0x28/0xc0
[  143.256135]  [<ffffffff811db3e5>] do_vfs_ioctl+0x3f5/0x540
[  143.256187]  [<ffffffff811db5c1>] SyS_ioctl+0x91/0xb0
[  143.256213]  [<ffffffff816c49d2>] system_call_fastpath+0x12/0x17

The reason is like the following:
(Holding s_umount)
VFS sync_fs staff:
|- btrfs_sync_fs()
   |- btrfs_start_transaction()
      |- sb_start_intwrite()
      (Waiting thaw_fs to unfreeze)
					VFS thaw_fs staff:
					thaw_fs()
					(Waiting sync_fs to release
					 s_umount)

So deadlock happens.
This can be easily triggered by fstest/generic/068 with inode_cache
mount option.

The fix is to check if the fs is frozen, if the fs is frozen, just
return and waiting for the next transaction.

Cc: David Sterba <dsterba@suse.cz>
Reported-by: NGui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
[enhanced comment, changed to SB_FREEZE_WRITE]
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

a53f4f8e

20 1月, 2015 1 次提交

btrfs: sync ioctl, handle errors after transaction start · 98bd5c54

由 David Sterba 提交于 1月 19, 2015

The version merged to 3.19 did not handle errors from start_trancaction
and could pass an invalid pointer to commit_transaction.

Fixes: 6b5fe46d ("btrfs: do commit in sync_fs if there are pending changes")
Reported-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

98bd5c54

03 12月, 2014 1 次提交

Btrfs: make btrfs_abort_transaction consider existence of new block groups · c92f6be3

由 Filipe Manana 提交于 11月 26, 2014

If the transaction handle doesn't have used blocks but has created new block
groups make sure we turn the fs into readonly mode too. This is because the
new block groups didn't get all their metadata persisted into the chunk and
device trees, and therefore if a subsequent transaction starts, allocates
space from the new block groups, writes data or metadata into that space,
commits successfully and then after we unmount and mount the filesystem
again, the same space can be allocated again for a new block group,
resulting in file data or metadata corruption.

Example where we don't abort the transaction when we fail to finish the
chunk allocation (add items to the chunk and device trees) and later a
future transaction where the block group is removed fails because it can't
find the chunk item in the chunk tree:

[25230.404300] WARNING: CPU: 0 PID: 7721 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x50/0xfc [btrfs]()
[25230.404301] BTRFS: Transaction aborted (error -28)
[25230.404302] Modules linked in: btrfs dm_flakey nls_utf8 fuse xor raid6_pq ntfs vfat msdos fat xfs crc32c_generic libcrc32c ext3 jbd ext2 dm_mod nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc loop psmouse i2c_piix4 i2ccore parport_pc parport processor button pcspkr serio_raw thermal_sys evdev microcode ext4 crc16 jbd2 mbcache sr_mod cdrom ata_generic sg sd_mod crc_t10dif crct10dif_generic crct10dif_common virtio_scsi floppy e1000 ata_piix libata virtio_pci virtio_ring scsi_mod virtio [last unloaded: btrfs]
[25230.404325] CPU: 0 PID: 7721 Comm: xfs_io Not tainted 3.17.0-rc5-btrfs-next-1+ #1
[25230.404326] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[25230.404328]  0000000000000000 ffff88004581bb08 ffffffff813e7a13 ffff88004581bb50
[25230.404330]  ffff88004581bb40 ffffffff810423aa ffffffffa049386a 00000000ffffffe4
[25230.404332]  ffffffffa05214c0 000000000000240c ffff88010fc8f800 ffff88004581bba8
[25230.404334] Call Trace:
[25230.404338]  [<ffffffff813e7a13>] dump_stack+0x45/0x56
[25230.404342]  [<ffffffff810423aa>] warn_slowpath_common+0x7f/0x98
[25230.404351]  [<ffffffffa049386a>] ? __btrfs_abort_transaction+0x50/0xfc [btrfs]
[25230.404353]  [<ffffffff8104240b>] warn_slowpath_fmt+0x48/0x50
[25230.404362]  [<ffffffffa049386a>] __btrfs_abort_transaction+0x50/0xfc [btrfs]
[25230.404374]  [<ffffffffa04a8c43>] btrfs_create_pending_block_groups+0x10c/0x135 [btrfs]
[25230.404387]  [<ffffffffa04b77fd>] __btrfs_end_transaction+0x7e/0x2de [btrfs]
[25230.404398]  [<ffffffffa04b7a6d>] btrfs_end_transaction+0x10/0x12 [btrfs]
[25230.404408]  [<ffffffffa04a3d64>] btrfs_check_data_free_space+0x111/0x1f0 [btrfs]
[25230.404421]  [<ffffffffa04c53bd>] __btrfs_buffered_write+0x160/0x48d [btrfs]
[25230.404425]  [<ffffffff811a9268>] ? cap_inode_need_killpriv+0x2d/0x37
[25230.404429]  [<ffffffff810f6501>] ? get_page+0x1a/0x2b
[25230.404441]  [<ffffffffa04c7c95>] btrfs_file_write_iter+0x321/0x42f [btrfs]
[25230.404443]  [<ffffffff8110f5d9>] ? handle_mm_fault+0x7f3/0x846
[25230.404446]  [<ffffffff813e98c5>] ? mutex_unlock+0x16/0x18
[25230.404449]  [<ffffffff81138d68>] new_sync_write+0x7c/0xa0
[25230.404450]  [<ffffffff81139401>] vfs_write+0xb0/0x112
[25230.404452]  [<ffffffff81139c9d>] SyS_pwrite64+0x66/0x84
[25230.404454]  [<ffffffff813ebf52>] system_call_fastpath+0x16/0x1b
[25230.404455] ---[ end trace 5aa5684fdf47ab38 ]---
[25230.404458] BTRFS warning (device sdc): btrfs_create_pending_block_groups:9228: Aborting unused transaction(No space left).
[25288.084814] BTRFS: error (device sdc) in btrfs_free_chunk:2509: errno=-2 No such entry (Failed lookup while freeing chunk.)
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

c92f6be3

21 11月, 2014 2 次提交

btrfs: fix wrong accounting of raid1 data profile in statfs · 0d95c1be

由 David Sterba 提交于 11月 14, 2014

The sizes that are obtained from space infos are in raw units and have
to be adjusted according to the raid factor. This was missing for
f_bavail and df reported doubled size for raid1.
Reported-by: NMartin Steigerwald <Martin@lichtvoll.de>
Fixes: ba7b6e62 ("btrfs: adjust statfs calculations according to raid profiles")
CC: stable@vger.kernel.org
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

0d95c1be

Btrfs: don't take the chunk_mutex/dev_list mutex in statfs V2 · 7e33fd99

由 Josef Bacik 提交于 11月 03, 2014

Our gluster boxes get several thousand statfs() calls per second, which begins
to suck hardcore with all of the lock contention on the chunk mutex and dev list
mutex.  We don't really need to hold these things, if we have transient
weirdness with statfs() because of the chunk allocator we don't care, so remove
this locking.

We still need the dev_list lock if you mount with -o alloc_start however, which
is a good argument for nuking that thing from orbit, but that's a patch for
another day.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

7e33fd99

12 11月, 2014 2 次提交

btrfs: switch inode_cache option handling to pending changes · 7e1876ac

由 David Sterba 提交于 2月 05, 2014

The pending mount option(s) now share namespace and bits with the normal
options, and the existing one for (inode_cache) is unset unconditionally
at each transaction commit.

Introduce a separate namespace for pending changes and enhance the
descriptions of the intended change to use separate bits for each
action.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

7e1876ac

btrfs: do commit in sync_fs if there are pending changes · 6b5fe46d

由 David Sterba 提交于 3月 28, 2014

If a pending change is requested, it's not processed unless there is a
transaction commit about to happen, not even after sync or SYNC_FS
ioctl. For example a remount that toggles the inode_cache option will
not take effect after sync on a quiescent filesystem.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

6b5fe46d

28 10月, 2014 1 次提交

Btrfs: properly clean up btrfs_end_io_wq_cache · 5ed5f588

由 Josef Bacik 提交于 10月 15, 2014

In one of Dave's cleanup commits he forgot to call btrfs_end_io_wq_exit on
unload, which makes us unable to unload and then re-load the btrfs module.  This
fixes the problem.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Reviewed-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

5ed5f588

08 10月, 2014 1 次提交

btrfs: Fix compile error when CONFIG_SECURITY is not set. · a43bb39b

由 Qu Wenruo 提交于 10月 08, 2014

Fix the following compile error when CONFIG_SECURITY is not set:

error: 'struct security_mnt_opts' has no member named 'num_mnt_opts'
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

a43bb39b

06 10月, 2014 1 次提交

btrfs: Make btrfs handle security mount options internally to avoid losing security label. · f667aef6

由 Qu Wenruo 提交于 9月 23, 2014

[BUG]
Originally when mount btrfs with "-o subvol=" mount option, btrfs will
lose all security lable.
And if the btrfs fs is mounted somewhere else, due to the lost of
security lable, SELinux will refuse to mount since the same super block
is being mounted using different security lable.

[REPRODUCER]
With SELinux enabled:
 #mkfs -t btrfs /dev/sda5
 #mount -o context=system_u:object_r:nfs_t:s0 /dev/sda5 /mnt/btrfs
 #btrfs subvolume create /mnt/btrfs/subvol
 #mount -o subvol=subvol,context=system_u:object_r:nfs_t:s0 /dev/sda5
  /mnt/test

kernel message:
SELinux: mount invalid.  Same superblock, different security settings
for (dev sda5, type btrfs)

[REASON]
This happens because btrfs will call vfs_kern_mount() and then
mount_subtree() to handle subvolume name lookup.
First mount will cut off all the security lables and when it comes to
the second vfs_kern_mount(), it has no security label now.

[FIX]
This patch will makes btrfs behavior much more like nfs,
which has the type flag FS_BINARY_MOUNTDATA,
making btrfs handles the security label internally.
So security label will be set in the real mount time and won't lose
label when use with "subvol=" mount option.
Reported-by: NEryu Guan <guaneryu@gmail.com>
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

f667aef6

02 10月, 2014 3 次提交

btrfs: use slab for end_io_wq structures · 97eb6b69

由 David Sterba 提交于 7月 30, 2014

The structure is frequently reused.  Rename it according to the slab
name.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

97eb6b69

btrfs: fix error labels in init_btrfs_fs · af13b492

由 David Sterba 提交于 7月 30, 2014

btrfs_interface_init rarely fails but we could leak the prelim_ref slab.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

af13b492

Btrfs: set default max_inline to 8KiB instead of 8MiB · 95ac567a

由 Filipe David Borba Manana 提交于 8月 08, 2013

8MiB is way too large and likely set by mistake. This is not
a significant issue as in practice the max amount of data
added to an inline extent is also limited by the page cache
and btree leaf sizes.
Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

95ac567a

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功