提交 · b8dae3138876080d4dd98cc438ff759338d632ef · openeuler / Kernel

01 3月, 2013 8 次提交

btrfs: use only inline_pages from extent buffer · b8dae313

由 David Sterba 提交于 2月 28, 2013

The nodesize is capped at 64k and there are enough pages preallocated in
extent_buffer::inline_pages. The fallback to kmalloc never happened
because even on the smallest page size considered (4k) inline_pages
covered the needs.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

b8dae313

Btrfs: fix wrong reserved space when deleting a snapshot/subvolume · c58aaad2

由 Miao Xie 提交于 2月 28, 2013

When deleting a snapshot/subvolume, we need remove root ref/backref,
dir entries and update the dir inode, so we must reserve free space
for those operations.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

c58aaad2

Btrfs: fix wrong reserved space in qgroup during snap/subv creation · d5c12070

由 Miao Xie 提交于 2月 28, 2013

There are two problems in the space reservation of the snapshot/
subvolume creation.
- don't reserve the space for the root item insertion
- the space which is reserved in the qgroup is different with
  the free space reservation. we need reserve free space for
  7 items, but in qgroup reservation, we need reserve space only
  for 3 items.

So we implement new metadata reservation functions for the
snapshot/subvolume creation.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

d5c12070

Btrfs: remove unnecessary dget_parent/dput when creating the pending snapshot · e9662f70

由 Miao Xie 提交于 2月 28, 2013

Since we have grabbed the parent inode at the beginning of the
snapshot creation, and both sync and async snapshot creation
release it after the pending snapshots are actually created,
it is safe to access the parent inode directly during the snapshot
creation, we needn't use dget_parent/dput to fix the parent dentry
and get the dir inode.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

e9662f70

btrfs: remove a printk from scan_one_device · 2d8946c5

由 David Sterba 提交于 2月 27, 2013

Dave pointed out that he saw messages from btrfs although there was no
such filesystem on his computers. The automatic device scan is called on
every new blockdevice if the usual distro udev rule set is used. The
printk introduced in 6f60cbd3 was a remainder from copying
portions of code from btrfs_get_bdev_and_sb which is used under
different conditions and the warning makes sense there.
Reported-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

2d8946c5

Btrfs: fix NULL pointer after aborting a transaction · f094ac32

由 Liu Bo 提交于 2月 27, 2013

While doing cleanup work on an aborted transaction, we've set
the global running transaction pointer to NULL _before_ waiting all
other transaction handles to finish, so others'd hit NULL pointer
crash when referencing the global running transaction pointer.

This first sets a hint to avoid new transaction handle joining, then
waits other existing handles to abort or finish so that we can safely
set the above global pointer to NULL.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

f094ac32

Btrfs: fix memory leak of log roots · 3321719e

由 Liu Bo 提交于 2月 27, 2013

When we abort a transaction while fsyncing, we'll skip freeing log roots
part of committing a transaction, which leads to memory leak.

This adds a 'free log roots' in putting super when no more users hold
references on log roots, so it's safe and clean.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

3321719e

Btrfs: copy everything if we've created an inline extent · bdc20e67

由 Josef Bacik 提交于 2月 28, 2013

I noticed while looking into a tree logging bug that we aren't logging inline
extents properly.  Since this requires copying and it shouldn't happen too often
just force us to copy everything for the inode into the tree log when we have an
inline extent.  With this patch we have valid data after a crash when we write
an inline extent.  Thanks,

Cc: stable@vger.kernel.org
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

bdc20e67

27 2月, 2013 5 次提交

btrfs: cleanup for open-coded alignment · fda2832f

由 Qu Wenruo 提交于 2月 26, 2013

Though most of the btrfs codes are using ALIGN macro for page alignment,
there are still some codes using open-coded alignment like the
following:
------
        u64 mask = ((u64)root->stripesize - 1);
        u64 ret = (val + mask) & ~mask;
------
Or even hidden one:
------
        num_bytes = (end - start + blocksize) & ~(blocksize - 1);
------

Sometimes these open-coded alignment is not so easy to understand for
newbie like me.

This commit changes the open-coded alignment to the ALIGN macro for a
better readability.

Also there is a previous patch from David Sterba with similar changes,
but the patch is for 3.2 kernel and seems not merged.
http://www.spinics.net/lists/linux-btrfs/msg12747.html

Cc: David Sterba <dave@jikos.cz>
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

fda2832f

Btrfs: do not change inode flags in rename · 8c4ce81e

由 Liu Bo 提交于 2月 25, 2013

Before we forced to change a file's NOCOW and COMPRESS flag due to
the parent directory's, but this ends up a bad idea, because it
confuses end users a lot about file's NOCOW status, eg. if someone
change a file to NOCOW via 'chattr' and then rename it in the current
directory which is without NOCOW attribute, the file will lose the
NOCOW flag silently.

This diables 'change flags in rename', so from now on we'll only
inherit flags from the parent directory on creation stage while in
other places we can use 'chattr' to set NOCOW or COMPRESS flags.
Reported-by: NMarios Titas <redneb8888@gmail.com>
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

8c4ce81e

Btrfs: use reserved space for creating a snapshot · 2382c5cc

由 Liu Bo 提交于 2月 22, 2013

While inserting dir index and updating inode for a snapshot, we'd
add delayed items which consume trans->block_rsv, if we don't have
any space reserved in this trans handle, we either just return or
reserve space again.

But before creating pending snapshots during committing transaction,
we've done a release on this trans handle, so we don't have space reserved
in it at this stage.

What we're using is block_rsv of pending snapshots which has already
reserved well enough space for both inserting dir index and updating
inode, so we need to set trans handle to indicate that we have space
now.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

2382c5cc

clear chunk_alloc flag on retryable failure · a81cb9a2

由 Alexandre Oliva 提交于 2月 21, 2013

I've experienced filesystem freezes with permanent spikes in the active
process count for quite a while, particularly on filesystems whose
available raw space has already been fully allocated to chunks.

While looking into this, I found a pretty obvious error in
do_chunk_alloc: it sets space_info->chunk_alloc, but if
btrfs_alloc_chunk returns an error other than ENOSPC, it returns leaving
that flag set, which causes any other threads waiting for
space_info->chunk_alloc to become zero to spin indefinitely.

I haven't double-checked that this patch fixes the failure I've observed
fully (it's not exactly trivial to trigger), but it surely is a bug and
the fix is trivial, so... Please put it in :-)

What I saw in that function also happens to explain why in some cases I
see filesystems allocate a huge number of chunks that remain unused
(leading to the scenario above, of not having more chunks to allocate).
It happens for data and metadata, but not necessarily both. I'm
guessing some thread sets the force_alloc flag on the corresponding
space_info, and then several threads trying to get disk space end up
attempting to allocate a new chunk concurrently. All of them will see
the force_alloc flag and bump their local copy of force up to the level
they see first, and they won't clear it even if another thread succeeds
in allocating a chunk, thus clearing the force flag. Then each thread
that observed the force flag will, on its turn, force the allocation of
a new chunk. And any threads that come in while it does that will see
the force flag still set and pick it up, and so on. This sounds like a
problem to me, but... what should the correct behavior be? Clear
force_flag once we copy it to a local force? Reset force to the
incoming value on every loop? Set the flag to our incoming force if we
have it at first, clear our local flag, and move it from the space_info
when we determined that we are the thread that's going to perform the
allocation?

btrfs: clear chunk_alloc flag on retryable failure

From: Alexandre Oliva <oliva@gnu.org>

If btrfs_alloc_chunk fails with e.g. ENOMEM, we exit do_chunk_alloc
without clearing chunk_alloc in space_info. As a result, any further
calls to do_chunk_alloc on that filesystem will start busy-waiting for
chunk_alloc to be cleared, but it never will be. This patch adjusts
do_chunk_alloc so that it clears this flag in case of an error.
Signed-off-by: NAlexandre Oliva <oliva@gnu.org>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

a81cb9a2

Btrfs: fix backref walking race with tree deletions · ca60ebfa

由 Jan Schmidt 提交于 2月 21, 2013

When a subvolume is removed, we remove the root item from the root tree,
while the tree blocks and backrefs remain for a while. When backref walking
comes across one of those orphan tree blocks, it can find a backref for a
no longer existing root. This is all good, we only must tolerate
__resolve_indirect_ref returning an error and continue with the good refs
found.
Reported-by: NAlex Lyakas <alex.btrfs@zadarastorage.com>
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

ca60ebfa

26 2月, 2013 1 次提交

Btrfs: make sure NODATACOW also gets NODATASUM set · f2bdf9a8

由 Josef Bacik 提交于 2月 21, 2013

A user reported hitting the BUG_ON() in btrfs_finished_ordered_io() where we had
csums on a NOCOW extent. This can happen if we have NODATACOW set but not
NODATASUM set, which can happen in two cases, either we mount with -o nodatacow
and then write into preallocated space, or chattr +C a directory and move a file
into that directory. Liu has fixed the move case in a different place, but this
fixes the mount -o nodatacow case. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

f2bdf9a8

21 2月, 2013 26 次提交

Btrfs: fix remount vs autodefrag · dc81cdc5

由 Miao Xie 提交于 2月 20, 2013

If we remount the fs to close the auto defragment or make the fs R/O,
we should stop the auto defragment.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

dc81cdc5

Btrfs: fix wrong outstanding_extents when doing DIO write · 172a5049

由 Miao Xie 提交于 2月 21, 2013

When running the 083th case of xfstests on the filesystem with
"compress-force=lzo", the following WARNINGs were triggered.
  WARNING: at fs/btrfs/inode.c:7908
  WARNING: at fs/btrfs/inode.c:7909
  WARNING: at fs/btrfs/inode.c:7911
  WARNING: at fs/btrfs/extent-tree.c:4510
  WARNING: at fs/btrfs/extent-tree.c:4511

This problem was introduced by the patch "Btrfs: fix deadlock due
to unsubmitted". In this patch, there are two bugs which caused
the above problem.

The 1st one is a off-by-one bug, if the DIO write return 0, it is
also a short write, we need release the reserved space for it. But
we didn't do it in that patch. Fix it by change "ret > 0" to
"ret >= 0".

The 2nd one is ->outstanding_extents was increased twice when
a short write happened. As we know, ->outstanding_extents is
a counter to keep track of the number of extent items we may
use duo to delalloc, when we reserve the free space for a
delalloc write, we assume that the write will introduce just
one extent item, so we increase ->outstanding_extents by 1 at
that time. And then we will increase it every time we split the
write, it is done at the beginning of btrfs_get_blocks_direct().
So when a short write happens, we needn't increase
->outstanding_extents again. But this patch done.

In order to fix the 2nd problem, I re-write the logic for
->outstanding_extents operation. We don't increase it at the
beginning of btrfs_get_blocks_direct(), instead, we just
increase it when the split actually happens.
Reported-by: NMitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

172a5049

Btrfs: snapshot-aware defrag · 38c227d8

由 Liu Bo 提交于 1月 29, 2013

This comes from one of btrfs's project ideas,
As we defragment files, we break any sharing from other snapshots.
The balancing code will preserve the sharing, and defrag needs to grow this
as well.

Now we're able to fill the blank with this patch, in which we make full use of
backref walking stuff.

Here is the basic idea,
o  set the writeback ranges started by defragment with flag EXTENT_DEFRAG
o  at endio, after we finish updating fs tree, we use backref walking to find
   all parents of the ranges and re-link them with the new COWed file layout by
   adding corresponding backrefs.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

38c227d8

Btrfs: fix max chunk size on raid5/6 · 86db2578

由 Chris Mason 提交于 2月 20, 2013

We try to limit the size of a chunk to 10GB, which keeps the unit of
work reasonable during balance and resize operations.  The limit checks
were taking into account the number of copies of the data we had but
what they really should be doing is comparing against the logical
size of the chunk we're creating.

This moves the code around a little to use the count of data stripes
from raid5/6.
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

86db2578

btrfs: limit fallocate extent reservation to 256MB · 24542bf7

由 Zach Brown 提交于 11月 16, 2012

Very large fallocate requests are cpu bound and result in extents with a
repeating pattern of ever decreasing size:

$ time fallocate -l 1T file
real	0m13.039s

( an excerpt of the extents from btrfs-debug-tree: )
  prealloc data disk byte 1536292564992 nr 397312
  prealloc data disk byte 1536292962304 nr 196608
  prealloc data disk byte 1536293158912 nr 98304
  prealloc data disk byte 1536293257216 nr 49152
  prealloc data disk byte 1536293306368 nr 24576
  prealloc data disk byte 1536293330944 nr 12288
  prealloc data disk byte 1536293343232 nr 8192
  prealloc data disk byte 1536293351424 nr 4096
  prealloc data disk byte 1536293355520 nr 4096
  prealloc data disk byte 1536293359616 nr 4096

The excessive cpu use comes from __btrfs_prealloc_file_range() trying to
allocate the entire remaining size after each extent is allocated.
btrfs_reserve_extent() repeatedly cuts this requested size in half until
it gets down to the size that the allocators can return.  We limit the
problem for now by capping each reservation at 256 meg.

The small extents come from a masking bug when decreasing the requested
reservation size.  The high 32bits are cleared and the remaining low
bits might happen to reserve a small size.   Fix this by using
round_down() which properly casts the mask.

After these fixes huge fallocate requests are fast and result in nice
large extents:

$ time fallocate -l 1T file
real	0m0.082s

  prealloc data disk byte 1112425889792 nr 268435456
  prealloc data disk byte 1112694325248 nr 268435456
  prealloc data disk byte 1112962760704 nr 268435456
Reported-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NZach Brown <zab@redhat.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

24542bf7

btrfs: Init io_lock after cloning btrfs device struct · 1cba0cdf

由 Thomas Gleixner 提交于 2月 20, 2013

__btrfs_close_devices() clones btrfs device structs with
memcpy(). Some of the fields in the clone are reinitialized, but it's
missing to init io_lock. In mainline this goes unnoticed, but on RT it
leaves the plist pointing to the original about to be freed lock
struct.

Initialize io_lock after cloning, so no references to the original
struct are left.
Reported-and-tested-by: NMike Galbraith <efault@gmx.de>
Cc: stable@vger.kernel.org
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

1cba0cdf

Merge branch 'raid56-experimental' into for-linus-3.9 · e942f883

由 Chris Mason 提交于 2月 20, 2013

Signed-off-by: NChris Mason <chris.mason@fusionio.com>

Conflicts:
	fs/btrfs/ctree.h
	fs/btrfs/extent-tree.c
	fs/btrfs/inode.c
	fs/btrfs/volumes.c

e942f883

Merge branch 'master' of... · b2c6b3e0

由 Chris Mason 提交于 2月 20, 2013

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next into for-linus-3.9
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

Conflicts:
	fs/btrfs/disk-io.c

b2c6b3e0

Btrfs: fix missing release of qgroup reservation in commit_transaction() · 272d26d0

由 Miao Xie 提交于 2月 20, 2013

We forget to free qgroup reservation in commit_transaction(),fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NWang Shilong <wangsl-fnst@cn.fujitsu.com>
Cc: Arne Jansen <sensille@gmx.net>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

272d26d0

Btrfs: fix missing check before disabling quota · 683cebda

由 Wang Shilong 提交于 2月 20, 2013

The original code forget to check whether quota has been disabled firstly,
and it will return 'EINVAL' and return error to users if quota has been
disabled,it will be unfriendly and confusing for users to see that.
So just return directly if quota has been disabled.
Signed-off-by: NWang Shilong <wangsl-fnst@cn.fujitsu.com>
Cc: Arne Jansen <sensille@gmx.net>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

683cebda

Btrfs: fix cleaner thread not working with inode cache option · fa6ac876

由 Liu Bo 提交于 2月 20, 2013

Right now inode cache inode is treated as the same as space cache
inode, ie. keep inode in memory till putting super.

But this leads to an awkward situation.

If we're going to delete a snapshot/subvolume, btrfs will not
actually delete it and return free space, but will add it to dead
roots list until the last inode on this snap/subvol being destroyed.
Then we'll fetch deleted roots and cleanup them via cleaner thread.

So here is the problem, if we enable inode cache option, each
snap/subvol has a cached inode which is used to store inode allcation
information. And this cache inode will be kept in memory, as the above
said. So with inode cache, snap/subvol can only be added into
dead roots list during freeing roots stage in umount, so that we can
ONLY get space back after another remount(we cleanup dead roots on mount).

But the real thing is we'll no more use the snap/subvol if we mark it
deleted, so we can safely iput its cache inode when we delete snap/subvol.

Another thing is that we need to change the rules of droping inode, we
don't keep snap/subvol's cache inode in memory till end so that we can
add snap/subvol into dead roots list in time.
Reported-by: NMitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

fa6ac876

Btrfs: fix uncompleted transaction · d4edf39b

由 Miao Xie 提交于 2月 20, 2013

In some cases, we need commit the current transaction, but don't want
to start a new one if there is no running transaction, so we introduce
the function - btrfs_attach_transaction(), which can catch the current
transaction, and return -ENOENT if there is no running transaction.

But no running transaction doesn't mean the current transction completely,
because we removed the running transaction before it completes. In some
cases, it doesn't matter. But in some special cases, such as freeze fs, we
hope the transaction is fully on disk, it will introduce some bugs, for
example, we may feeze the fs and dump the data in the disk, if the transction
doesn't complete, we would dump inconsistent data. So we need fix the above
problem for those cases.

We fixes this problem by introducing a function:
	btrfs_attach_transaction_barrier()
if we hope all the transaction is fully on the disk, even they are not
running, we can use this function.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

d4edf39b

Btrfs: fix the deadlock between the transaction start/attach and commit · 178260b2

由 Miao Xie 提交于 2月 20, 2013

Now btrfs_commit_transaction() does this

ret = btrfs_run_ordered_operations(root, 0)

which async flushes all inodes on the ordered operations list, it introduced
a deadlock that transaction-start task, transaction-commit task and the flush
workers waited for each other.
(See the following URL to get the detail
 http://marc.info/?l=linux-btrfs&m=136070705732646&w=2)

As we know, if ->in_commit is set, it means someone is committing the
current transaction, we should not try to join it if we are not JOIN
or JOIN_NOLOCK, wait is the best choice for it. In this way, we can avoid
the above problem. In this way, there is another benefit: there is no new
transaction handle to block the transaction which is on the way of commit,
once we set ->in_commit.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

178260b2

Btrfs: fix the qgroup reserved space is released prematurely · 4b824906

由 Miao Xie 提交于 2月 20, 2013

In start_transactio(), we will try to join the transaction again after
the current transaction is committed, so we should not release the
reserved space of the qgroup. Fix it.

Cc: Arne Jansen <sensille@gmx.net>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

4b824906

btrfs: define BTRFS_MAGIC as a u64 value · cdb4c574

由 Zach Brown 提交于 2月 20, 2013

super.magic is an le64 but it's treated as an unterminated string when
compared against BTRFS_MAGIC which is defined as a string.  Instead
define BTRFS_MAGIC as a normal hex value and use endian helpers to
compare it to the super's magic.

I tested this by mounting an fs made before the change and made sure
that it didn't introduce sparse errors.  This matches a similar cleanup
that is pending in btrfs-progs.  David Sterba pointed out that we should
fix the kernel side as well :).
Signed-off-by: NZach Brown <zab@redhat.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

cdb4c574

Btrfs: set/change the label of a mounted file system · a8bfd4ab

由 jeff.liu 提交于 1月 05, 2013

With this new ioctl(2) BTRFS_IOC_SET_FSLABEL, we can set/change the label of a mounted file system.
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NMiao Xie <miaox@cn.fujitsu.com>
Reviewed-by: NGoffredo Baroncelli <kreijack@inwind.it>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Reviewed-by: NGoffredo Baroncelli <kreijack@inwind.it>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

a8bfd4ab

Btrfs: Add a new ioctl to get the label of a mounted file system · 867ab667

由 jeff.liu 提交于 1月 05, 2013

Add a new ioctl(2) BTRFS_IOC_GET_FSLABLE, so that we can get the label upon a mounted filesystem.
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: Goffredo Baroncelli <kreijack@inwind.it>
Cc: David Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

867ab667

Btrfs: place ordered operations on a per transaction list · 569e0f35

由 Josef Bacik 提交于 2月 13, 2013

Miao made the ordered operations stuff run async, which introduced a
deadlock where we could get somebody (sync) racing in and committing the
transaction while a commit was already happening. The new committer would
try and flush ordered operations which would hang waiting for the commit to
finish because it is done asynchronously and no longer inherits the callers
trans handle. To fix this we need to make the ordered operations list a per
transaction list. We can get new inodes added to the ordered operation list
by truncating them and then having another process writing to them, so this
makes it so that anybody trying to add an ordered operation _must_ start a
transaction in order to add itself to the list, which will keep new inodes
from getting added to the ordered operations list after we start committing.
This should fix the deadlock and also keeps us from doing a lot more work
than we need to during commit. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

569e0f35

Btrfs: relax the block group size limit for bitmaps · dde5740f

由 Josef Bacik 提交于 2月 12, 2013

Dave pointed out that xfstests 273 will tell you that it failed to load the
space cache for a block group when it remounts. This is because we run out
of space writing out the block group cache. This is ok and is working as it
should, but let's try to be a bit nicer. This happens because the block
group was 100mb, but bitmap entries cover 128mb, so we were only getting
extent entries for this block group, which ended up being too many to fit in
the free space cache. So relax the bitmap size requirements to block groups
that are at least half the size a bitmap will cover or larger, that way we
can still keep the amount of space used in the free space cache low enough
to be able to write it out. With this patch I no longer fail to write out
the free space cache. Thanks,
Reported-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

dde5740f

Btrfs: allow for selecting only completely empty chunks · 3e39cea6

由 Ilya Dryomov 提交于 2月 12, 2013

Enhance balance usage filter by making it possible to balance out only
completely empty chunks.  Today, usage filter properly acts on values
from 1 to 99 inclusive, usage=100 selects all chunks, and usage=0
selects no chunks.  This commit changes the usage=0 case: the new
meaning is to restripe only completely empty chunks and nothing else.
Suggested-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

3e39cea6

Btrfs: eliminate a use-after-free in btrfs_balance() · bf023ecf

由 Ilya Dryomov 提交于 2月 12, 2013

Commit 5af3e8cc introduced a use-after-free at volumes.c:3139: bctl is freed
above in __cancel_balance() in all cases except for balance pause.  Fix this
by moving the offending check a couple statements above, the meaning of the
check is preserved.
Reported-by: NChris Mason <chris.mason@fusionio.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

bf023ecf

Btrfs: remove unused extent io tree ops V2 · c8f2f24b

由 Josef Bacik 提交于 2月 11, 2013

Nobody uses these io tree ops anymore so just remove them and clean up the code
a bit.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

c8f2f24b

btrfs: add cancellation points to defrag · 210549eb

由 David Sterba 提交于 2月 09, 2013

The defrag operation can take very long, we want to have a way how to
cancel it. The code checks for a pending signal at safe points in the
defrag loops and returns EAGAIN. This means a user can press ^C after
running 'btrfs fi defrag', woks for both defrag modes, files and root.

Returning from the command was instant in my light tests, but may take
longer depending on the aging factor of the filesystem.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

210549eb

btrfs: put some enospc messages under enospc_debug · b069e0c3

由 David Sterba 提交于 2月 08, 2013

The warning in use_block_rsv is not useful for users and may fill
the logs unnecessarily.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

b069e0c3

Btrfs: implement unlocked dio write · 38851cc1

由 Miao Xie 提交于 2月 08, 2013

This idea is from ext4. By this patch, we can make the dio write parallel,
and improve the performance. But because we can not update isize without
i_mutex, the unlocked dio write just can be done in front of the EOF.

We needn't worry about the race between dio write and truncate, because the
truncate need wait untill all the dio write end.

And we also needn't worry about the race between dio write and punch hole,
because we have extent lock to protect our operation.

I ran fio to test the performance of this feature.

== Hardware ==
CPU: Intel(R) Core(TM)2 Duo CPU     E7500  @ 2.93GHz
Mem: 2GB
SSD: Intel X25-M 120GB (Test Partition: 60GB)

== config file ==
[global]
ioengine=psync
direct=1
bs=4k
size=32G
runtime=60
directory=/mnt/btrfs/
filename=testfile
group_reporting
thread

[file1]
numjobs=1 # 2 4
rw=randwrite

== result (KBps) ==
write	1	2	4
lock	24936	24738	24726
nolock	24962	30866	32101

== result (iops) ==
write	1	2	4
lock	6234	6184	6181
nolock	6240	7716	8025
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

38851cc1

Btrfs: serialize unlocked dio reads with truncate · 2e60a51e

由 Miao Xie 提交于 2月 08, 2013

Currently, we can do unlocked dio reads, but the following race
is possible:

dio_read_task			truncate_task
				->btrfs_setattr()
->btrfs_direct_IO
    ->__blockdev_direct_IO
      ->btrfs_get_block
				  ->btrfs_truncate()
				 #alloc truncated blocks
				 #to other inode
      ->submit_io()
     #INFORMATION LEAK

In order to avoid this problem, we must serialize unlocked dio reads with
truncate. There are two approaches:
- use extent lock to protect the extent that we truncate
- use inode_dio_wait() to make sure the truncating task will wait for
  the read DIO.

If we use the 1st one, we will meet the endless truncation problem due to
the nonlocked read DIO after we implement the nonlocked write DIO. It is
because we still need invoke inode_dio_wait() avoid the race between write
DIO and truncation. By that time, we have to introduce

  btrfs_inode_{block, resume}_nolock_dio()

again. That is we have to implement this patch again, so I choose the 2nd
way to fix the problem.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

2e60a51e

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功