提交 · 925396ecf251432d6d0f703a6cfd0cb9e651d936 · openeuler / Kernel

21 2月, 2013 4 次提交

Btrfs: account for orphan inodes properly during cleanup · 925396ec

由 Josef Bacik 提交于 2月 01, 2013

Dave sent me a panic where we were doing the orphan cleanup and panic'ed
trying to release our reservation from the orphan block rsv. The reason for
this is because our orphan block rsv had been free'd out from underneath us
because the transaction commit found that there were no orphan inodes
according to its count and decided to free it. This is incorrect so make
sure we inc the orphan inodes count so the accounting is all done properly.
This would also cause the warning in the orphan commit code normally if you
had any orphans to cleanup as they would only decrement the orphan count so
you'd get a negative orphan count which could cause problems during runtime.
Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

925396ec

Btrfs: unreserve space if our ordered extent fails to work · 0bec9ef5

由 Josef Bacik 提交于 1月 31, 2013

When a transaction aborts or there's an EIO on an ordered extent or any
error really we will not free up the space we reserved for this ordered
extent. This results in warnings from the block group cache cleanup in the
case of a transaction abort, or leaking space in the case of EIO on an
ordered extent. Fix this up by free'ing the reserved space if we have an
error at all trying to complete an ordered extent. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

0bec9ef5

Btrfs: use the inode own lock to protect its delalloc_bytes · df0af1a5

由 Miao Xie 提交于 1月 29, 2013

We need not use a global lock to protect the delalloc_bytes of the
inode, just use its own lock. In this way, we can reduce the lock
contention and ->delalloc_lock will just protect delalloc inode
list.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

df0af1a5

Btrfs: use percpu counter for fs_info->delalloc_bytes · 963d678b

由 Miao Xie 提交于 1月 29, 2013

fs_info->delalloc_bytes is accessed very frequently, so use percpu
counter instead of the u64 variant for it to reduce the lock
contention.

This patch also fixed the problem that we access the variant
without the lock protection.At worst, we would not flush the
delalloc inodes, and just return ENOSPC error when we still have
some free space in the fs.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

963d678b

20 2月, 2013 7 次提交

Btrfs: move fs/btrfs/ioctl.h to include/uapi/linux/btrfs.h · 55e301fd

由 Filipe Brandenburger 提交于 1月 29, 2013

The header file will then be installed under /usr/include/linux so that
userspace applications can refer to Btrfs ioctls by name and use the same
structs used internally in the kernel.
Signed-off-by: NFilipe Brandenburger <filbranden@google.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

55e301fd

Revert "Btrfs: fix permissions of empty files not affected by umask" · fe5fafbe

由 Josef Bacik 提交于 1月 24, 2013

This reverts commit 2794ed01.

Wasn't supposed to get used in btrfs_mknod, it was supposed to be in
btrfs_create, which was done in commit
9185aa58.
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

fe5fafbe

Btrfs: traverse and flush the delalloc inodes once · 63607cc8

由 Miao Xie 提交于 1月 22, 2013

btrfs_start_delalloc_inodes() needn't traverse and flush the delalloc inodes
repeatedly. It is because we can regard the data that the users write after
we start delalloc inodes flush as the one which is after the delalloc inodes
flush is done, and we can flush it next time.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

63607cc8

Btrfs: use token to avoid times mapping extent buffer · 51fab693

由 Liu Bo 提交于 12月 27, 2012

The API in tree log code has done sort of changes, and it proves that
we can benifit from using token, so do the same thing here.

function_graph tracer's timer shows that it costs nearly half time
of before(39.788us -> 22.391us).
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

51fab693

Btrfs: wait on ordered extents at the last possible moment · 2ab28f32

由 Josef Bacik 提交于 10月 12, 2012

Since we don't actually copy the extent information from the source tree in
the fast case we don't need to wait for ordered io to be completed in order
to fsync, we just need to wait for the io to be completed. So when we're
logging our file just attach all of the ordered extents to the log, and then
when the log syncs just wait for IO_DONE on the ordered extents and then
write the super. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

2ab28f32

Btrfs: use wrapper page_offset · 4eee4fa4

由 Miao Xie 提交于 12月 21, 2012

Use wrapper page_offset to get byte-offset into filesystem object for page.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

4eee4fa4

Btrfs: fix lots of orphan inodes when the space is not enough · 0e8c36a9

由 Miao Xie 提交于 12月 19, 2012

We're running into having 50-100 orphans left over with xfstests 83
because of ENOSPC when trying to start the transaction for the inode update.
But in fact, it makes no sense in updating the inode for the new size while
we're deleting the stupid thing. This patch fixes this problem.
Reported-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

0e8c36a9

25 1月, 2013 1 次提交

Btrfs: fix repeated delalloc work allocation · 1eafa6c7

由 Miao Xie 提交于 1月 22, 2013

btrfs_start_delalloc_inodes() locks the delalloc_inodes list, fetches the
first inode, unlocks the list, triggers btrfs_alloc_delalloc_work/
btrfs_queue_worker for this inode, and then it locks the list, checks the
head of the list again. But because we don't delete the first inode that it
deals with before, it will fetch the same inode. As a result, this function
allocates a huge amount of btrfs_delalloc_work structures, and OOM happens.

Fix this problem by splice this delalloc list.
Reported-by: NAlex Lyakas <alex.btrfs@zadarastorage.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

1eafa6c7

15 1月, 2013 4 次提交

btrfs: update timestamps on truncate() · 3972f260

由 Eric Sandeen 提交于 1月 12, 2013

truncate() vs. ftruncate() differ in the VFS; truncate()
doesn't set (ATTR_CTIME | ATTR_MTIME), and it's up to the
fs to do the timestamp updates if the size changes.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>

3972f260

btrfs: fix btrfs_cont_expand() freeing IS_ERR em · f2767956

由 Zach Brown 提交于 1月 08, 2013

btrfs_cont_expand() tries to free an IS_ERR em as it gets an error from
btrfs_get_extent() and breaks out of its loop.

An instance of -EEXIST was reported in the wild:

  https://bugzilla.redhat.com/show_bug.cgi?id=874407

I have no idea if that -EEXIST is surprising, or not.  Regardless, this
error handling should be cleaned up to handle other reasonable errors
(ENOMEM, EIO; whatever).

This seemed to be the only buggy freeing of the relatively rare IS_ERR
em so I opted to fix the caller rather than teach free_extent_map() to
use IS_ERR_OR_NULL().
Signed-off-by: NZach Brown <zab@redhat.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>

f2767956

Btrfs: fix a bug when llseek for delalloc bytes behind prealloc extents · f9e4fb53

由 Liu Bo 提交于 1月 07, 2013

xfstests case 285 complains.

It it because btrfs did not try to find unwritten delalloc
bytes(only dirty pages, not yet writeback) behind prealloc
extents, it ends up finding nothing while we're with SEEK_DATA.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

f9e4fb53

Btrfs: add orphan before truncating pagecache · f3fe820c

由 Josef Bacik 提交于 1月 07, 2013

Running xfstests 83 in a loop would sometimes fail the fsck. This happens
because if we invalidate a page that already has an ordered extent setup for
it we will complete the ordered extent ourselves, assuming that the truncate
will clean everything up. The problem with this is there is plenty of time
for the truncate to fail after we've done this work. So to fix this we need
to add the orphan item first to make sure the cleanup gets done properly,
and then we can truncate the pagecache and all that stuff and be safe. This
fixes the btrfsck failures I was seeing while running 83 in a loop. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

f3fe820c

18 12月, 2012 2 次提交

Btrfs: fix a bug of per-file nocow · 213490b3

由 Liu Bo 提交于 9月 11, 2012

Users report a bug, the reproducer is:
$ mkfs.btrfs /dev/loop0
$ mount /dev/loop0 /mnt/btrfs/
$ mkdir /mnt/btrfs/dir
$ chattr +C /mnt/btrfs/dir/
$ dd if=/dev/zero of=/mnt/btrfs/dir/foo bs=4K count=10;
$ lsattr /mnt/btrfs/dir/foo
---------------C- /mnt/btrfs/dir/foo
$ filefrag /mnt/btrfs/dir/foo
/mnt/btrfs/dir/foo: 1 extent found    ---> an extent
$ dd if=/dev/zero of=/mnt/btrfs/dir/foo bs=4K count=1 seek=5 conv=notrunc,nocreat; sync
$ filefrag /mnt/btrfs/dir/foo
/mnt/btrfs/dir/foo: 3 extents found   ---> with nocow, btrfs breaks the extent into three parts

The new created file should not only inherit the NODATACOW flag, but also
honor NODATASUM flag, because we must do COW on a file extent with checksum.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

213490b3

Btrfs: fix hash overflow handling · 9c52057c

由 Chris Mason 提交于 12月 17, 2012

The handling for directory crc hash overflows was fairly obscure,
split_leaf returns EOVERFLOW when we try to extend the item and that is
supposed to bubble up to userland.  For a while it did so, but along the
way we added better handling of errors and forced the FS readonly if we
hit IO errors during the directory insertion.

Along the way, we started testing only for EEXIST and the EOVERFLOW case
was dropped.  The end result is that we may force the FS readonly if we
catch a directory hash bucket overflow.

This fixes a few problem spots.  First I add tests for EOVERFLOW in the
places where we can safely just return the error up the chain.

btrfs_rename is harder though, because it tries to insert the new
directory item only after it has already unlinked anything the rename
was going to overwrite.  Rather than adding very complex logic, I added
a helper to test for the hash overflow case early while it is still safe
to bail out.

Snapshot and subvolume creation had a similar problem, so they are using
the new helper now too.
Signed-off-by: NChris Mason <chris.mason@fusionio.com>
Reported-by: NPascal Junod <pascal@junod.info>

9c52057c

17 12月, 2012 12 次提交

Btrfs: fix permissions of empty files not affected by umask · 9185aa58

由 Filipe Brandenburger 提交于 11月 30, 2012

When a new file is created with btrfs_create(), the inode will initially be
created with permissions 0666 and later on in btrfs_init_acl() it will be
adapted to mask out the umask bits. The problem is that this change won't make
it into the btrfs_inode unless there's another change to the inode (e.g. writing
content changing the size or touching the file changing the mtime.)

This fix adds a call to btrfs_update_inode() to btrfs_create() to make sure that
the change will not get lost if the in-memory inode is flushed before other
changes are made to the file.
Signed-off-by: NFilipe Brandenburger <filbranden@google.com>
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

9185aa58

Btrfs: do not call file_update_time in aio_write · 6c760c07

由 Josef Bacik 提交于 11月 09, 2012

This starts a transaction and dirties the inode everytime we call it, which
is super expensive if you have a write heavy workload. We will be updating
the inode when the IO completes and we reserve the space for the inode
update when we reserve space for the write, so there is no chance of loss of
information or enospc issues. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

6c760c07

Btrfs: log changed inodes based on the extent map tree · 70c8a91c

由 Josef Bacik 提交于 10月 11, 2012

We don't really need to copy extents from the source tree since we have all
of the information already available to us in the extent_map tree. So
instead just write the extents straight to the log tree and don't bother to
copy the extent items from the source tree.
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

70c8a91c

Btrfs: do not mark ems as prealloc if we are writing to them · b11e234d

由 Josef Bacik 提交于 12月 03, 2012

We are going to use EM's to log extents in the future, so we need to not
mark them as prealloc if they aren't actually prealloc extents. Instead
mark them with FILLING so we know to ammend mod_start/mod_len and that way
we don't confuse the extent logging code. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

b11e234d

Btrfs: keep track of the extents original block length · b4939680

由 Josef Bacik 提交于 12月 03, 2012

If we've written to a prealloc extent we need to know the original block len
for the extent. We can't figure this out currently since ->block_len is
just set to the extent length. So introduce ->orig_block_len so that we
know how many bytes were in the original extent for proper extent logging
that future patches will need. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

b4939680

Btrfs: inline csums if we're fsyncing · b812ce28

由 Josef Bacik 提交于 11月 16, 2012

The tree logging stuff needs the csums to be on the ordered extents in order
to log them properly, so mark that we're sync and inline the csum creation
so we don't have to wait on the csumming to be done when logging extents
that are still in flight.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

b812ce28

Btrfs: only log the inode item if we can get away with it · e9976151

由 Josef Bacik 提交于 10月 11, 2012

Currently we copy all the file information into the log, inode item, the
refs, xattrs etc. Except most of this doesn't change from fsync to fsync,
just the inode item changes. So set a flag if an xattr changes or a link is
added, and otherwise only log the inode item. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

e9976151

Btrfs: fix wrong return value of btrfs_truncate_page() · ac6a2b36

由 Miao Xie 提交于 12月 05, 2012

ret variant may be set to 0 if we read page successfully, but it might be
released before we lock it again. On this case, if we fail to allocate a
new page, we will return 0, it is wrong, fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

ac6a2b36

Btrfs: don't auto defrag a file when doing directIO · 543eabd5

由 Miao Xie 提交于 12月 05, 2012

If we runt the direct IO, we should not run auto defrag, because it may
introduce buffered IO vs direcIO problem, and make direct IO slow down.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

543eabd5

Btrfs: refactor error handling to drop inode in btrfs_create() · 43baa579

由 Filipe Brandenburger 提交于 11月 30, 2012

Refactor it by checking whether the inode has been created and needs to be
dropped (drop_inode_on_err) and also if the err variable is set. That way the
variable doesn't need to be set on each and every error handling block.
Signed-off-by: NFilipe Brandenburger <filbranden@google.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

43baa579

Btrfs: fix permissions of empty files not affected by umask · 2794ed01

由 Filipe Brandenburger 提交于 11月 30, 2012

2794ed01

Btrfs: add fiemap's flag check · 05dadc09

由 Tsutomu Itoh 提交于 11月 29, 2012

When the flag not supported is specified, it is necessary to return the error
to the caller.
So, we add the validity check of the fiemap's flag.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

05dadc09

13 12月, 2012 6 次提交

Btrfs: handle errors from btrfs_map_bio() everywhere · 61891923

由 Stefan Behrens 提交于 11月 05, 2012

With the addition of the device replace procedure, it is possible
for btrfs_map_bio(READ) to report an error. This happens when the
specific mirror is requested which is located on the target disk,
and the copy operation has not yet copied this block. Hence the
block cannot be read and this error state is indicated by
returning EIO.
Some background information follows now. A new mirror is added
while the device replace procedure is running.
btrfs_get_num_copies() returns one more, and
btrfs_map_bio(GET_READ_MIRROR) adds one more mirror if a disk
location is involved that was already handled by the device
replace copy operation. The assigned mirror num is the highest
mirror number, e.g. the value 3 in case of RAID1.
If btrfs_map_bio() is invoked with mirror_num == 0 (i.e., select
any mirror), the copy on the target drive is never selected
because that disk shall be able to perform the write requests as
quickly as possible. The parallel execution of read requests would
only slow down the disk copy procedure. Second case is that
btrfs_map_bio() is called with mirror_num > 0. This is done from
the repair code only. In this case, the highest mirror num is
assigned to the target disk, since it is used last. And when this
mirror is not available because the copy procedure has not yet
handled this area, an error is returned. Everywhere in the code
the handling of such errors is added now.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

61891923

Btrfs: pass fs_info to btrfs_map_block() instead of mapping_tree · 3ec706c8

由 Stefan Behrens 提交于 11月 05, 2012

This is required for the device replace procedure in a later step.
Two calling functions also had to be changed to have the fs_info
pointer: repair_io_failure() and scrub_setup_recheck_block().
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

3ec706c8

Btrfs: cleanup for btrfs_btree_balance_dirty · b53d3f5d

由 Liu Bo 提交于 11月 14, 2012

- 'nr' is no more used.
- btrfs_btree_balance_dirty() and __btrfs_btree_balance_dirty() can share
  a bunch of code.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

b53d3f5d

fs/btrfs: drop if around WARN_ON · 6c1500f2

由 Julia Lawall 提交于 11月 03, 2012

Just use WARN_ON rather than an if containing only WARN_ON(1).

A simplified version of the semantic patch that makes this transformation
is as follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression e;
@@
- if (e) WARN_ON(1);
+ WARN_ON(e);
// </smpl>
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

6c1500f2

fs/btrfs: use WARN · 31b1a2bd

由 Julia Lawall 提交于 11月 03, 2012

Use WARN rather than printk followed by WARN_ON(1), for conciseness.

A simplified version of the semantic patch that makes this transformation
is as follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression list es;
@@

-printk(
+WARN(1,
  es);
-WARN_ON(1);
// </smpl>
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

31b1a2bd

Btrfs: fix joining the same transaction handler more than 2 times · b7d5b0a8

由 Miao Xie 提交于 11月 01, 2012

If we flush inodes with pending delalloc in a transaction, we may join
the same transaction handler more than 2 times.

The reason is:
  Task						use_count of trans handle
  commit_transaction				1
    |-> btrfs_start_delalloc_inodes		1
	  |-> run_delalloc_nocow		1
		|-> join_transaction		2
		|-> cow_file_range		2
			|-> join_transaction	3

In fact, cow_file_range needn't join the transaction again because the caller
have joined the transaction, so we fix this problem by this way.
Reported-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

b7d5b0a8

12 12月, 2012 2 次提交

Btrfs: make delalloc inodes be flushed by multi-task · 8ccf6f19

由 Miao Xie 提交于 10月 25, 2012

This patch introduce a new worker pool named "flush_workers", and if we
want to force all the inode with pending delalloc to the disks, we can
queue those inodes into the work queue of the worker pool, in this way,
those inodes will be flushed by multi-task.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

8ccf6f19

Btrfs: improve the noflush reservation · 08e007d2

由 Miao Xie 提交于 10月 16, 2012

In some places(such as: evicting inode), we just can not flush the reserved
space of delalloc, flushing the delayed directory index and delayed inode
is OK, but we don't try to flush those things and just go back when there is
no enough space to be reserved. This patch fixes this problem.

We defined 3 types of the flush operations: NO_FLUSH, FLUSH_LIMIT and FLUSH_ALL.
If we can in the transaction, we should not flush anything, or the deadlock
would happen, so use NO_FLUSH. If we flushing the reserved space of delalloc
would cause deadlock, use FLUSH_LIMIT. In the other cases, FLUSH_ALL is used,
and we will flush all things.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

08e007d2

26 10月, 2012 1 次提交

Btrfs: Use btrfs_update_inode_fallback when creating a snapshot · be6aef60

由 Josef Bacik 提交于 10月 22, 2012

On a really full file system I was getting ENOSPC back from
btrfs_update_inode when trying to update the parent inode when creating a
snapshot. Just use the fallback method so we can update the inode and not
have to worry about having a delayed ref. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

be6aef60

09 10月, 2012 1 次提交

Btrfs: confirmation of value is added before trace_btrfs_get_extent() is called · f0bd95ea

由 Tsutomu Itoh 提交于 10月 01, 2012

We should confirm the value of extent_map before calling
trace_btrfs_get_extent() because the value of extent_map has the
possibility of NULL.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>

f0bd95ea

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功