提交 · e99761514999f64aff1985460967f93d9e8417f4 · openeuler / Kernel

17 12月, 2012 1 次提交

Btrfs: only log the inode item if we can get away with it · e9976151

由 Josef Bacik 提交于 10月 11, 2012

Currently we copy all the file information into the log, inode item, the
refs, xattrs etc. Except most of this doesn't change from fsync to fsync,
just the inode item changes. So set a flag if an xattr changes or a link is
added, and otherwise only log the inode item. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

e9976151

13 12月, 2012 4 次提交

Btrfs: fix missing log when BTRFS_INODE_NEEDS_FULL_SYNC is set · 5269b67e

由 Miao Xie 提交于 11月 01, 2012

If we set BTRFS_INODE_NEEDS_FULL_SYNC, we should log all the extent,
but now we forget to take it into account, and set a wrong max key,
if so, we will skip the file extent metadata when doing logging. Fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

5269b67e

Btrfs: fix unprotected extent map operation when logging file extents · bbe14267

由 Miao Xie 提交于 11月 01, 2012

We forget to protect the modified_extents list, fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

bbe14267

Btrfs: fix wrong file extent length · 315a9850

由 Miao Xie 提交于 11月 01, 2012

There are two types of the file extent - inline extent and regular extent,
When we log file extents, we didn't take inline extent into account, fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

315a9850

Btrfs: do not log extents when we only log new names · 183f37fa

由 Liu Bo 提交于 11月 01, 2012

When we log new names, we need to log just enough to recreate the inode
during log replay, and there is no need to log extents along with it.

This actually fixes a bug revealed by xfstests 241, where it shows
that we're logging some extents that have not updated metadata,
so we don't get proper EXTENT_DATA items to be copied to log tree.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

183f37fa

13 10月, 2012 1 次提交

btrfs: Fix compilation with user namespace support enabled · e9069f47

由 Eric W. Biederman 提交于 10月 12, 2012

When compiling with user namespace support btrfs fails like:

fs/btrfs/tree-log.c: In function ‘fill_inode_item’:
fs/btrfs/tree-log.c:2955:2: error: incompatible type for argument 3 of ‘btrfs_set_inode_uid’
fs/btrfs/ctree.h:2026:1: note: expected ‘u32’ but argument is of type ‘kuid_t’
fs/btrfs/tree-log.c:2956:2: error: incompatible type for argument 3 of ‘btrfs_set_inode_gid’
fs/btrfs/ctree.h:2027:1: note: expected ‘u32’ but argument is of type ‘kgid_t’

Fix this by using i_uid_read and i_gid_read in

Cc: Chris Mason <chris.mason@fusionio.com>
Cc: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

e9069f47

09 10月, 2012 9 次提交

C
btrfs: init ref_index to zero in add_inode_ref · f46dbe3d
由 Chris Mason 提交于 10月 09, 2012
```
Signed-off-by: NChris Mason <chris.mason@fusionio.com>
```
f46dbe3d

Btrfs: make filesystem read-only when submitting barrier fails · 5af3e8cc

由 Stefan Behrens 提交于 8月 01, 2012

So far the return code of barrier_all_devices() is ignored, which
means that errors are ignored. The result can be a corrupt
filesystem which is not consistent.
This commit adds code to evaluate the return code of
barrier_all_devices(). The normal btrfs_error() mechanism is used to
switch the filesystem into read-only mode when errors are detected.

In order to decide whether barrier_all_devices() should return
error or success, the number of disks that are allowed to fail the
barrier submission is calculated. This calculation accounts for the
worst RAID level of metadata, system and data. If single, dup or
RAID0 is in use, a single disk error is already considered to be
fatal. Otherwise a single disk error is tolerated.

The calculation of the number of disks that are tolerated to fail
the barrier operation is performed when the filesystem gets mounted,
when a balance operation is started and finished, and when devices
are added or removed.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>

5af3e8cc

Btrfs: don't bother committing delayed inode updates when fsyncing · 94edf4ae

由 Josef Bacik 提交于 9月 25, 2012

We can just copy the in memory inode into the tree log directly, no sense in
updating the fs tree so we can copy it into the tree log tree. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

94edf4ae

Btrfs: be smarter about dropping things from the tree log · 18ec90d6

由 Josef Bacik 提交于 9月 28, 2012

When we truncate existing items in the tree log we've been searching for
each individual item and removing them. This is unnecessary churn and
searching, just keep track of the slot we are on and how many items we need
to delete and delete them all at once. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

18ec90d6

Btrfs: don't lookup csums for prealloc extents · 6f1fed77

由 Josef Bacik 提交于 9月 26, 2012

The tree logging stuff was looking up csums to copy over for prealloc
extents which is just work we don't need to be doing.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

6f1fed77

Btrfs: cache extent state when writing out dirty metadata pages · e6138876

由 Josef Bacik 提交于 9月 27, 2012

Everytime we write out dirty pages we search for an offset in the tree,
convert the bits in the state, and then when we wait we search for the
offset again and clear the bits. So for every dirty range in the io tree we
are doing 4 rb searches, which is suboptimal. With this patch we are only
doing 2 searches for every cycle (modulo weird things happening). Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

e6138876

btrfs: extended inode refs · f186373f

由 Mark Fasheh 提交于 8月 08, 2012

This patch adds basic support for extended inode refs. This includes support
for link and unlink of the refs, which basically gets us support for rename
as well.

Inode creation does not need changing - extended refs are only added after
the ref array is full.
Signed-off-by: NMark Fasheh <mfasheh@suse.de>

f186373f

btrfs: improved readablity for add_inode_ref · 5a1d7843

由 Jan Schmidt 提交于 8月 17, 2012

Moved part of the code into a sub function and replaced most of the gotos
by ifs, hoping that it will be easier to read now.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: NMark Fasheh <mfasheh@suse.de>

5a1d7843

Btrfs: handle not finding the extent exactly when logging changed extents · 0aa4a17d

由 Josef Bacik 提交于 9月 19, 2012

I started hitting warnings when running xfstest 68 in a loop because there
were EM's that were not lined up properly with the physical extents. This
is ok, if we do something like punch a hole or write to a preallocated space
or something like that we can have an EM that doesn't cover the entire
physical extent. So fix the tree logging stuff to cope with this case so we
don't just commit the transaction. With this patch I no longer see the
warnings from the tree logging code. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

0aa4a17d

04 10月, 2012 1 次提交

Btrfs: do not hold the write_lock on the extent tree while logging · ff44c6e3

由 Josef Bacik 提交于 9月 14, 2012

Dave Sterba pointed out a sleeping while atomic bug while doing fsync. This
is because I'm an idiot and didn't realize that rwlock's were spin locks, so
we've been holding this thing while doing allocations and such which is not
good. This patch fixes this by dropping the write lock before we do
anything heavy and re-acquire it when it is done. We also need to take a
ref on the em's in case their corresponding pages are evicted and mark them
as being logged so that releasepage does not remove them and doesn't remove
them from our local list. Thanks,
Reported-by: NDave Sterba <dave@jikos.cz>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

ff44c6e3

02 10月, 2012 9 次提交

Btrfs: fix unprotected ->log_batch · 2ecb7923

由 Miao Xie 提交于 9月 06, 2012

We forget to protect ->log_batch when syncing a file, this patch fix
this problem by atomic operation. And ->log_batch is used to check
if there are parallel sync operations or not, so it is unnecessary to
reset it to 0 after the sync operation of the current log tree complete.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>

2ecb7923

Btrfs: add hole punching · 2aaa6655

由 Josef Bacik 提交于 8月 29, 2012

This patch adds hole punching via fallocate.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

2aaa6655

Btrfs: remove unused hint byte argument for btrfs_drop_extents · 2671485d

由 Josef Bacik 提交于 8月 29, 2012

I audited all users of btrfs_drop_extents and found that nobody actually uses
the hint_byte argument. I'm sure it was used for something at some point but
it's not used now, and the way the pinning works the disk bytenr would never be
immediately useful anyway so lets just remove it. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

2671485d

Btrfs: check if an inode has no checksum when logging it · d2794405

由 Liu Bo 提交于 8月 29, 2012

This is based on Josef's "Btrfs: turbo charge fsync".

If an inode is a BTRFS_INODE_NODATASUM one, we don't need to look for csum
items any more.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>

d2794405

Btrfs: fix a bug in checking whether a inode is already in log · 46d8bc34

由 Liu Bo 提交于 8月 29, 2012

This is based on Josef's "Btrfs: turbo charge fsync".

The current btrfs checks if an inode is in log by comparing
root's last_log_commit to inode's last_sub_trans[2].

But the problem is that this root->last_log_commit is shared among
inodes.

Say we have N inodes to be logged, after the first inode,
root's last_log_commit is updated and the N-1 remained files will
be skipped.

This fixes the bug by keeping a local copy of root's last_log_commit
inside each inode and this local copy will be maintained itself.

[1]: we regard each log transaction as a subset of btrfs's transaction,
i.e. sub_trans
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>

46d8bc34

Btrfs: improve fsync by filtering extents that we want · 4e2f84e6

由 Liu Bo 提交于 8月 27, 2012

This is based on Josef's "Btrfs: turbo charge fsync".

The above Josef's patch performs very good in random sync write test,
because we won't have too much extents to merge.

However, it does not performs good on the test:
dd if=/dev/zero of=foobar bs=4k count=12500 oflag=sync

The reason is when we do sequencial sync write, we need to merge the
current extent just with the previous one, so that we can get accumulated
extents to log:

A(4k) --> AA(8k) --> AAA(12k) --> AAAA(16k) ...

So we'll have to flush more and more checksum into log tree, which is the
bottleneck according to my tests.

But we can avoid this by telling fsync the real extents that are needed
to be logged.

With this, I did the above dd sync write test (size=50m),

         w/o (orig)   w/ (josef's)   w/ (this)
SATA      104KB/s       109KB/s       121KB/s
ramdisk   1.5MB/s       1.5MB/s       10.7MB/s (613%)
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>

4e2f84e6

Btrfs: cleanup extents after we finish logging inode · 06d3d22b

由 Liu Bo 提交于 8月 27, 2012

This is based on Josef's "Btrfs: turbo charge fsync".

We should cleanup those extents after we've finished logging inode,
otherwise we may do redundant work on them.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>

06d3d22b

Btrfs: only warn if we hit an error when doing the tree logging · 0fa83cdb

由 Josef Bacik 提交于 8月 24, 2012

I hit this a couple times while working on my fsync patch (all my bugs, not
normal operation), but with my new stuff we could have new errors from cases
I have not encountered, so instead of BUG()'ing we should be WARN()'ing so
that we are notified there is a problem but the user doesn't lose their
data. We can easily commit the transaction in the case that the tree
logging fails and still be fine, so let's try and be as nice to the user as
possible. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

0fa83cdb

Btrfs: turbo charge fsync · 5dc562c5

由 Josef Bacik 提交于 8月 17, 2012

At least for the vm workload.  Currently on fsync we will

1) Truncate all items in the log tree for the given inode if they exist

and

2) Copy all items for a given inode into the log

The problem with this is that for things like VMs you can have lots of
extents from the fragmented writing behavior, and worst yet you may have
only modified a few extents, not the entire thing.  This patch fixes this
problem by tracking which transid modified our extent, and then when we do
the tree logging we find all of the extents we've modified in our current
transaction, sort them and commit them.  We also only truncate up to the
xattrs of the inode and copy that stuff in normally, and then just drop any
extents in the range we have that exist in the log already.  Here are some
numbers of a 50 meg fio job that does random writes and fsync()s after every
write

		Original	Patched
SATA drive	82KB/s		140KB/s
Fusion drive	431KB/s		2532KB/s

So around 2-6 times faster depending on your hardware.  There are a few
corner cases, for example if you truncate at all we have to do it the old
way since there is no way to be sure what is in the log is ok.  This
probably could be done smarter, but if you write-fsync-truncate-write-fsync
you deserve what you get.  All this work is in RAM of course so if your
inode gets evicted from cache and you read it in and fsync it we'll do it
the slow way if we are still in the same transaction that we last modified
the inode in.

The biggest cool part of this is that it requires no changes to the recovery
code, so if you fsync with this patch and crash and load an old kernel, it
will run the recovery and be a-ok.  I have tested this pretty thoroughly
with an fsync tester and everything comes back fine, as well as xfstests.
Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

5dc562c5

24 7月, 2012 1 次提交

Btrfs: return error of btrfs_update_inode() to caller · b9959295

由 Tsutomu Itoh 提交于 6月 25, 2012

We didn't check error of btrfs_update_inode(), but that error looks
easy to bubble back up.
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

b9959295

03 7月, 2012 1 次提交

Btrfs: run delayed directory updates during log replay · b6305567

由 Chris Mason 提交于 7月 02, 2012

While we are resolving directory modifications in the
tree log, we are triggering delayed metadata updates to
the filesystem btrees.

This commit forces the delayed updates to run so the
replay code can find any modifications done.  It stops
us from crashing because the directory deleltion replay
expects items to be removed immediately from the tree.
Signed-off-by: NChris Mason <chris.mason@fusionio.com>
cc: stable@kernel.org

b6305567

30 5月, 2012 3 次提交

Btrfs: fix return code in drop_objectid_items · 5bdbeb21

由 Josef Bacik 提交于 5月 29, 2012

So dpkg fsync()'s the file and the directory containing the file whenever it
writes to a file which is really slow in btrfs. This is partly because
fsync()'ing a directory _always_ committed the transaction instead of just
going to the tree log. This is because drop_objectid_items() would return 1
since it does a btrfs_search_slot() which returns 1. In tree-log jargon
this means that we have to commit the transaction to be safe. So just check
if ret is greater than 0 and set it to 0 if it does. With this patch we now
use the tree-log instead of committing the entire transaction, which is
twice as fast on my box. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

5bdbeb21

Btrfs: check to see if the inode is in the log before fsyncing · 22ee6985

由 Josef Bacik 提交于 5月 29, 2012

We have this check down in the actual logging code, but this is after we
start a transaction and all that good stuff. So move the helper
inode_in_log() out so we can call it in fsync() and avoid starting a
transaction altogether and just exit if we've already fsync()'ed this file
recently. You would notice this issue if you fsync()'ed a file over and
over again until the transaction committed. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

22ee6985

Btrfs: return value of btrfs_read_buffer is checked correctly · 018642a1

由 Tsutomu Itoh 提交于 5月 29, 2012

btrfs_read_buffer() has the possibility of returning the error.
Therefore, I add the code in which the return value of btrfs_read_buffer()
is checked.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>

018642a1

06 5月, 2012 1 次提交

Btrfs: avoid sleeping in verify_parent_transid while atomic · b9fab919

由 Chris Mason 提交于 5月 06, 2012

verify_parent_transid needs to lock the extent range to make
sure no IO is underway, and so it can safely clear the
uptodate bits if our checks fail.

But, a few callers are using it with spinlocks held.  Most
of the time, the generation numbers are going to match, and
we don't want to switch to a blocking lock just for the error
case.  This adds an atomic flag to verify_parent_transid,
and changes it to return EAGAIN if it needs to block to
properly verifiy things.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b9fab919

22 3月, 2012 2 次提交

btrfs: replace many BUG_ONs with proper error handling · 79787eaa

由 Jeff Mahoney 提交于 3月 12, 2012

 btrfs currently handles most errors with BUG_ON. This patch is a work-in-
 progress but aims to handle most errors other than internal logic
 errors and ENOMEM more gracefully.

 This iteration prevents most crashes but can run into lockups with
 the page lock on occasion when the timing "works out."
Signed-off-by: NJeff Mahoney <jeffm@suse.com>

79787eaa

J
btrfs: return void in functions without error conditions · 143bede5
由 Jeff Mahoney 提交于 3月 01, 2012
```
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
```
143bede5

27 1月, 2012 1 次提交

btrfs: Fix busyloops in transaction waiting code · 6dd70ce4

由 Jan Kara 提交于 1月 26, 2012

wait_log_commit() and wait_for_writer() were using slightly different
conditions for deciding whether they should call schedule() and whether they
should continue in the wait loop. Thus it could happen that we busylooped when
the first condition was not true while the second one was. That is burning CPU
cycles needlessly and is deadly on UP machines...
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

6dd70ce4

22 12月, 2011 1 次提交

Btrfs: mark delayed refs as for cow · 66d7e7f0

由 Arne Jansen 提交于 9月 12, 2011

Add a for_cow parameter to add_delayed_*_ref and pass the appropriate value
from every call site. The for_cow parameter will later on be used to
determine if a ref will change anything with respect to qgroups.

Delayed refs coming from relocation are always counted as for_cow, as they
don't change subvol quota.

Also pass in the fs_info for later use.

btrfs_find_all_roots() will use this as an optimization, as changes that are
for_cow will not change anything with respect to which root points to a
certain leaf. Thus, we don't need to add the current sequence number to
those delayed refs.
Signed-off-by: NArne Jansen <sensille@gmx.net>
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

66d7e7f0

06 11月, 2011 3 次提交

btrfs: separate superblock items out of fs_info · 6c41761f

由 David Sterba 提交于 4月 13, 2011

fs_info has now ~9kb, more than fits into one page. This will cause
mount failure when memory is too fragmented. Top space consumers are
super block structures super_copy and super_for_commit, ~2.8kb each.
Allocate them dynamically. fs_info will be ~3.5kb. (measured on x86_64)

Add a wrapper for freeing fs_info and all of it's dynamically allocated
members.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

6c41761f

Btrfs: fix extent pinning bugs in the tree log · e688b725

由 Chris Mason 提交于 10月 31, 2011

The tree log had two important bugs that could cause corruptions after a
crash.  Sometimes we were allowing tree log blocks to be reused after
the tree log was committed but before the transaction commit was done.

This allowed a future metadata write to overwrite the tree log data.  It
is fixed by adding a new variant of freeing reserved extents that always
pins them.  Credit goes to Stefan Behrens and Arne Jansen for many many
hours spent tracking this bug down.

During tree log replay, we do a pass through the tree log and pin all
the extents we find.  This makes sure the replay code won't go in and
use any of those blocks for new allocations during replay.  The problem
is the free space cache isn't honoring these pinned extents.  So the
allocator can end up handing them out, leading to all kinds of problems
during replay.

The fix here is to force any free space cache to load while we pin the
extents, and then to make sure we remove the pinned extents from the
free space rbtree.
Signed-off-by: NChris Mason <chris.mason@oracle.com>
Reported-by: NStefan Behrens <sbehrens@giantdisaster.de>

e688b725

Btrfs: don't wait as long for more batches during SSD log commit · cd354ad6

由 Chris Mason 提交于 10月 20, 2011

When we're doing log commits, we try to wait for more writers to come in
and make the commit bigger.  This helps improve performance on rotating
disks, but on SSDs it adds latencies.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

cd354ad6

02 11月, 2011 1 次提交

filesystems: add set_nlink() · bfe86848

由 Miklos Szeredi 提交于 10月 28, 2011

Replace remaining direct i_nlink updates with a new set_nlink()
updater function.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Tested-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

bfe86848

17 8月, 2011 1 次提交

Btrfs: fix an oops of log replay · 34f3e4f2

由 liubo 提交于 8月 06, 2011

When btrfs recovers from a crash, it may hit the oops below:

------------[ cut here ]------------
kernel BUG at fs/btrfs/inode.c:4580!
[...]
RIP: 0010:[<ffffffffa03df251>]  [<ffffffffa03df251>] btrfs_add_link+0x161/0x1c0 [btrfs]
[...]
Call Trace:
 [<ffffffffa03e7b31>] ? btrfs_inode_ref_index+0x31/0x80 [btrfs]
 [<ffffffffa04054e9>] add_inode_ref+0x319/0x3f0 [btrfs]
 [<ffffffffa0407087>] replay_one_buffer+0x2c7/0x390 [btrfs]
 [<ffffffffa040444a>] walk_down_log_tree+0x32a/0x480 [btrfs]
 [<ffffffffa0404695>] walk_log_tree+0xf5/0x240 [btrfs]
 [<ffffffffa0406cc0>] btrfs_recover_log_trees+0x250/0x350 [btrfs]
 [<ffffffffa0406dc0>] ? btrfs_recover_log_trees+0x350/0x350 [btrfs]
 [<ffffffffa03d18b2>] open_ctree+0x1442/0x17d0 [btrfs]
[...]

This comes from that while replaying an inode ref item, we forget to
check those old conflicting DIR_ITEM and DIR_INDEX items in fs/file tree,
then we will come to conflict corners which lead to BUG_ON().
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Tested-by: NAndy Lutomirski <luto@mit.edu>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

34f3e4f2

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功