提交 · da83fc6e0f379bf80d68d34cca38788a046a71f4 · openeuler / Kernel

20 7月, 2014 1 次提交

Btrfs: fix abnormal long waiting in fsync · 98ce2ded

由 Liu Bo 提交于 7月 17, 2014

xfstests generic/127 detected this problem.

With commit 7fc34a62, now fsync will only flush
data within the passed range.  This is the cause of the above problem,
-- btrfs's fsync has a stage called 'sync log' which will wait for all the
ordered extents it've recorded to finish.

In xfstests/generic/127, with mixed operations such as truncate, fallocate,
punch hole, and mapwrite, we get some pre-allocated extents, and mapwrite will
mmap, and then msync.  And I find that msync will wait for quite a long time
(about 20s in my case), thanks to ftrace, it turns out that the previous
fallocate calls 'btrfs_wait_ordered_range()' to flush dirty pages, but as the
range of dirty pages may be larger than 'btrfs_wait_ordered_range()' wants,
there can be some ordered extents created but not getting corresponding pages
flushed, then they're left in memory until we fsync which runs into the
stage 'sync log', and fsync will just wait for the system writeback thread
to flush those pages and get ordered extents finished, so the latency is
inevitable.

This adds a flush similar to btrfs_start_ordered_extent() in
btrfs_wait_logged_extents() to fix that.
Reviewed-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <clm@fb.com>

98ce2ded

10 6月, 2014 1 次提交

btrfs: remove stale newlines from log messages · 351fd353

由 David Sterba 提交于 5月 15, 2014

I've noticed an extra line after "use no compression", but search
revealed much more in messages of more critical levels and rare errors.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

351fd353

11 3月, 2014 6 次提交

Btrfs: split the global ordered extents mutex · 31f3d255

由 Miao Xie 提交于 3月 06, 2014

When we create a snapshot, we just need wait the ordered extents in
the source fs/file root, but because we use the global mutex to protect
this ordered extents list of the source fs/file root to avoid accessing
a empty list, if someone got the mutex to access the ordered extents list
of the other fs/file root, we had to wait.

This patch splits the above global mutex, now every fs/file root has
its own mutex to protect its own list.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fb.com>

31f3d255

Btrfs: wake up the tasks that wait for the io earlier · af7a6509

由 Miao Xie 提交于 3月 06, 2014

The tasks that wait for the IO_DONE flag just care about the io of the dirty
pages, so it is better to wake up them immediately after all the pages are
written, not the whole process of the io completes.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fb.com>

af7a6509

Btrfs: fix early enospc due to the race of the two ordered extent wait · 8b9d83cd

由 Miao Xie 提交于 3月 06, 2014

btrfs_wait_ordered_roots() moves all the list entries to a new list,
and then deals with them one by one. But if the other task invokes this
function at that time, it would get a empty list. It makes the enospc
error happens more early. Fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fb.com>

8b9d83cd

btrfs: Cleanup the "_struct" suffix in btrfs_workequeue · d458b054

由 Qu Wenruo 提交于 2月 28, 2014

Since the "_struct" suffix is mainly used for distinguish the differnt
btrfs_work between the original and the newly created one,
there is no need using the suffix since all btrfs_workers are changed
into btrfs_workqueue.

Also this patch fixed some codes whose code style is changed due to the
too long "_struct" suffix.
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Tested-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <jbacik@fb.com>

d458b054

btrfs: Replace fs_info->flush_workers with btrfs_workqueue. · a44903ab

由 Qu Wenruo 提交于 2月 28, 2014

Replace the fs_info->submit_workers with the newly created
btrfs_workqueue.
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Tested-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <jbacik@fb.com>

a44903ab

Btrfs: don't mix the ordered extents of all files together during logging the inodes · 827463c4

由 Miao Xie 提交于 1月 14, 2014

There was a problem in the old code:
If we failed to log the csum, we would free all the ordered extents in the log list
including those ordered extents that were logged successfully, it would make the
log committer not to wait for the completion of the ordered extents.

This patch doesn't insert the ordered extents that is about to be logged into
a global list, instead, we insert them into a local list. If we log the ordered
extents successfully, we splice them with the global list, or we will throw them
away, then do full sync. It can also reduce the lock contention and the traverse
time of list.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fb.com>

827463c4

29 1月, 2014 2 次提交

Btrfs: convert printk to btrfs_ and fix BTRFS prefix · efe120a0

由 Frank Holton 提交于 12月 20, 2013

Convert all applicable cases of printk and pr_* to the btrfs_* macros.

Fix all uses of the BTRFS prefix.
Signed-off-by: NFrank Holton <fholton@gmail.com>
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

efe120a0

Btrfs: avoid unnecessary ordered extent cache resets · 1b8e7e45

由 Filipe David Borba Manana 提交于 11月 22, 2013

After an ordered extent completes, don't blindly reset the
inode's ordered tree last accessed ordered extent pointer.

While running the xfstests I noticed that about 29% of the
time the ordered extent to which tree->last pointed was not
the same as our just completed ordered extent. After that I
ran the following sysbench test (after a prepare phase) and
noticed that about 68% of the time tree->last pointed to
a different ordered extent too.

sysbench --test=fileio --file-num=32 --file-total-size=4G \
    --file-test-mode=rndwr --num-threads=512 \
    --file-block-size=32768 --max-time=60 --max-requests=0 run

Therefore reset tree->last on ordered extent removal only if
it pointed to the ordered extent we're removing from the tree.

Results from 4 runs of the following test before and after
applying this patch:

$ sysbench --test=fileio --file-num=32 --file-total-size=4G \
  --file-test-mode=seqwr --num-threads=512 \
  --file-block-size=32768 --max-time=60 --file-io-mode=sync prepare
$ sysbench --test=fileio --file-num=32 --file-total-size=4G \
  --file-test-mode=seqwr --num-threads=512 \
  --file-block-size=32768 --max-time=60 --file-io-mode=sync run

Before this path:

run 1 - 64.049Mb/sec
run 2 - 63.455Mb/sec
run 3 - 64.656Mb/sec
run 4 - 63.833Mb/sec

After this patch:

run 1 - 66.149Mb/sec
run 2 - 68.459Mb/sec
run 3 - 66.338Mb/sec
run 4 - 66.176Mb/sec

With random writes (--file-test-mode=rndwr) I had huge fluctuations
on the results (+- 35% easily).
Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

1b8e7e45

21 11月, 2013 2 次提交

Btrfs: fix list delete warning when removing ordered root from the list · 931aa877

由 Miao Xie 提交于 11月 14, 2013

Commit b0244199 "Btrfs: don't wait for
the completion of all the ordered extents" introduced a bug that broke
the ordered root list:
 WARNING: CPU: 1 PID: 7119 at lib/list_debug.c:59 __list_del_entry+0x5a/0x98()

It is because we forgot to return the roots in the splice list to the
ordered list of the fs. Fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

931aa877

Btrfs: don't wait for ordered data outside desired range · b52abf1e

由 Filipe David Borba Manana 提交于 11月 06, 2013

In btrfs_wait_ordered_range(), if we found an extent to the left
of the start of our desired wait range and the last byte of that
extent is 1 less than the desired range's start, we would would
wait for the IO completion of that extent unnecessarily.
Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

b52abf1e

12 11月, 2013 4 次提交

Btrfs: don't wait for the completion of all the ordered extents · b0244199

由 Miao Xie 提交于 11月 04, 2013

It is very likely that there are lots of ordered extents in the filesytem,
if we wait for the completion of all of them when we want to reclaim some
space for the metadata space reservation, we would be blocked for a long
time. The performance would drop down suddenly for a long time.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

b0244199

Btrfs: take ordered root lock when removing ordered operations inode · 93858769

由 Josef Bacik 提交于 10月 28, 2013

A user reported a list corruption warning from btrfs_remove_ordered_extent, it
is because we aren't taking the ordered_root_lock when we remove the inode from
the ordered operations list.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

93858769

Btrfs: return an error from btrfs_wait_ordered_range · 0ef8b726

由 Josef Bacik 提交于 10月 25, 2013

I noticed that if the free space cache has an error writing out it's data it
won't actually error out, it will just carry on. This is because it doesn't
check the return value of btrfs_wait_ordered_range, which didn't actually return
anything. So fix this in order to keep us from making free space cache look
valid when it really isnt. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

0ef8b726

Btrfs: btrfs_add_ordered_operation: Fix last modified transaction comparison. · 5ede859b

由 chandan 提交于 10月 14, 2013

Comparison of an inode's last modified transaction with the last committed
transaction is incorrect. Fix it.
Signed-off-by: Nchandan <chandan@linux.vnet.ibm.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

5ede859b

21 9月, 2013 1 次提交

Btrfs: kill delay_iput arg to the wait_ordered functions · f0de181c

由 Josef Bacik 提交于 9月 17, 2013

This is a left over of how we used to wait for ordered extents, which was to
grab the inode and then run filemap flush on it. However if we have an ordered
extent then we already are holding a ref on the inode, and we just use
btrfs_start_ordered_extent anyway, so there is no reason to have an extra ref on
the inode to start work on the ordered extent. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

f0de181c

01 9月, 2013 3 次提交

Btrfs: allow partial ordered extent completion · 77cef2ec

由 Josef Bacik 提交于 8月 29, 2013

We currently have this problem where you can truncate pages that have not yet
been written for an ordered extent. We do this because the truncate will be
coming behind to clean us up anyway so what's the harm right? Well if truncate
fails for whatever reason we leave an orphan item around for the file to be
cleaned up later. But if the user goes and truncates up the file and tries to
read from the area that had been discarded previously they will get a csum error
because we never actually wrote that data out.

This patch fixes this by allowing us to either discard the ordered extent
completely, by which I mean we just free up the space we had allocated and not
add the file extent, or adjust the length of the file extent we write. We do
this by setting the length we truncated down to in the ordered extent, and then
we set the file extent length and ram bytes to this length. The total disk
space stays unchanged since we may be compressed and we can't just chop off the
disk space, but at least this way the file extent only points to the valid data.
Then when the file extent is free'd the extent and csums will be freed normally.

This patch is needed for the next series which will give us more graceful
recovery of failed truncates. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

77cef2ec

Btrfs: Remove superfluous casts from u64 to unsigned long long · c1c9ff7c

由 Geert Uytterhoeven 提交于 8月 20, 2013

u64 is "unsigned long long" on all architectures now, so there's no need to
cast it when formatting it using the "ll" length modifier.
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

c1c9ff7c

Btrfs: fix heavy delalloc related deadlock · 9ffba8cd

由 Josef Bacik 提交于 8月 14, 2013

I added a patch where we started taking the ordered operations mutex when we
waited on ordered extents.  We need this because we splice the list and process
it, so if a flusher came in during this scenario it would think the list was
empty and we'd usually get an early ENOSPC.  The problem with this is that this
lock is used in transaction committing.  So we end up with something like this

Transaction commit
	-> wait on writers

Delalloc flusher
	-> run_ordered_operations (holds mutex)
		->wait for filemap-flush to do its thing

flush task
	-> cow_file_range
		->wait on btrfs_join_transaction because we're commiting

some other task
	-> commit_transaction because we notice trans->transaction->flush is set
		-> run_ordered_operations (hang on mutex)

We need to disentangle the ordered operations flushing from the delalloc
flushing, since they are separate things.  This solves the deadlock issue I was
seeing.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

9ffba8cd

02 7月, 2013 1 次提交

Btrfs: remove btrfs_sector_sum structure · f51a4a18

由 Miao Xie 提交于 6月 19, 2013

Using the structure btrfs_sector_sum to keep the checksum value is
unnecessary, because the extents that btrfs_sector_sum points to are
continuous, we can find out the expected checksums by btrfs_ordered_sum's
bytenr and the offset, so we can remove btrfs_sector_sum's bytenr. After
removing bytenr, there is only one member in the structure, so it makes
no sense to keep the structure, just remove it, and use a u32 array to
store the checksum value.

By this change, we don't use the while loop to get the checksums one by
one. Now, we can get several checksum value at one time, it improved the
performance by ~74% on my SSD (31MB/s -> 54MB/s).

test command:
 # dd if=/dev/zero of=/mnt/btrfs/file0 bs=1M count=1024 oflag=sync
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

f51a4a18

14 6月, 2013 1 次提交

Btrfs: introduce per-subvolume ordered extent list · 199c2a9c

由 Miao Xie 提交于 5月 15, 2013

The reason we introduce per-subvolume ordered extent list is the same
as the per-subvolume delalloc inode list.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

199c2a9c

07 5月, 2013 1 次提交

Btrfs: improve the performance of the csums lookup · e4100d98

由 Miao Xie 提交于 4月 05, 2013

It is very likely that there are several blocks in bio, it is very
inefficient if we get their csums one by one. This patch improves
this problem by getting the csums in batch.

According to the result of the following test, the execute time of
__btrfs_lookup_bio_sums() is down by ~28%(300us -> 217us).

 # dd if=<mnt>/file of=/dev/null bs=1M count=1024
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

e4100d98

28 3月, 2013 1 次提交

Btrfs: hold the ordered operations mutex when waiting on ordered extents · db1d607d

由 Josef Bacik 提交于 3月 26, 2013

We need to hold the ordered_operations mutex while waiting on ordered extents
since we splice and run the ordered extents list. We need to make sure anybody
else who wants to wait on ordered extents does actually wait for them to be
completed. This will keep us from bailing out of flushing in case somebody is
already waiting on ordered extents to complete. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

db1d607d

21 2月, 2013 1 次提交

Btrfs: place ordered operations on a per transaction list · 569e0f35

由 Josef Bacik 提交于 2月 13, 2013

Miao made the ordered operations stuff run async, which introduced a
deadlock where we could get somebody (sync) racing in and committing the
transaction while a commit was already happening. The new committer would
try and flush ordered operations which would hang waiting for the commit to
finish because it is done asynchronously and no longer inherits the callers
trans handle. To fix this we need to make the ordered operations list a per
transaction list. We can get new inodes added to the ordered operation list
by truncating them and then having another process writing to them, so this
makes it so that anybody trying to add an ordered operation _must_ start a
transaction in order to add itself to the list, which will keep new inodes
from getting added to the ordered operations list after we start committing.
This should fix the deadlock and also keeps us from doing a lot more work
than we need to during commit. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

569e0f35

20 2月, 2013 2 次提交

Btrfs: don't traverse the ordered operation list repeatedly · 5b947f1b

由 Miao Xie 提交于 1月 22, 2013

btrfs_run_ordered_operations() needn't traverse the ordered operation list
repeatedly, it is because the transaction commiter will invoke it again when
there is no other writer in this transaction, it can ensure that no one can
add new objects into the ordered operation list.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

5b947f1b

Btrfs: wait on ordered extents at the last possible moment · 2ab28f32

由 Josef Bacik 提交于 10月 12, 2012

Since we don't actually copy the extent information from the source tree in
the fast case we don't need to wait for ordered io to be completed in order
to fsync, we just need to wait for the io to be completed. So when we're
logging our file just attach all of the ordered extents to the log, and then
when the log syncs just wait for IO_DONE on the ordered extents and then
write the super. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

2ab28f32

06 2月, 2013 2 次提交

Btrfs: fix possible stale data exposure · 59fe4f41

由 Josef Bacik 提交于 1月 30, 2013

We specifically do not update the disk i_size if there are ordered extents
outstanding for any area between the current disk_i_size and our ordered
extent so that we do not expose stale data. The problem is the check we
have only checks if the ordered extent starts at or after the current
disk_i_size, which doesn't take into account an ordered extent that starts
before the current disk_i_size and ends past the disk_i_size. Fix this by
checking if the extent ends past the disk_i_size. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

59fe4f41

Btrfs: fix missing i_size update · 5d1f4020

由 Josef Bacik 提交于 1月 30, 2013

If we have an ordered extent before the ordered extent we are currently
completing that is after the current disk_i_size we will put our i_size
update into that ordered extent so that we do not expose stale data. The
problem is that if our disk i_size is updated past the previous ordered
extent we won't update the i_size with the pending i_size update. So check
the pending i_size update and if its above the current disk i_size we need
to go ahead and try to update. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

5d1f4020

13 12月, 2012 1 次提交

Btrfs: cleanup for btrfs_wait_order_range · 4fde183d

由 Liu Bo 提交于 11月 01, 2012

Variable 'found' is no more used.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

4fde183d

12 12月, 2012 2 次提交

Btrfs: make ordered extent be flushed by multi-task · 9afab882

由 Miao Xie 提交于 10月 25, 2012

Though the process of the ordered extents is a bit different with the delalloc inode
flush, but we can see it as a subset of the delalloc inode flush, so we also handle
them by flush workers.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

9afab882

Btrfs: make ordered operations be handled by multi-task · 25287e0a

由 Miao Xie 提交于 10月 25, 2012

The process of the ordered operations is similar to the delalloc inode flush, so
we handle them by flush workers.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

25287e0a

04 10月, 2012 1 次提交
- L
  Btrfs: kill obsolete arguments in btrfs_wait_ordered_extents · 6bbe3a9c
  由 Liu Bo 提交于 9月 14, 2012
```
nocow_only is now an obsolete argument.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
```
  6bbe3a9c
02 10月, 2012 2 次提交

Btrfs: use a slab for ordered extents allocation · 6352b91d

由 Miao Xie 提交于 9月 06, 2012

The ordered extent allocation is in the fast path of the IO, so use a slab
to improve the speed of the allocation.

 "Size of the struct is 280, so this will fall into the size-512 bucket,
  giving 8 objects per page, while own slab will pack 14 objects into a page.

  Another benefit I see is to check for leaked objects when the module is
  removed (and the cache destroy takes place)."
						-- David Sterba
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>

6352b91d

Btrfs: fix file extent discount problem in the, snapshot · b9a8cc5b

由 Miao Xie 提交于 9月 06, 2012

If a snapshot is created while we are writing some data into the file,
the i_size of the corresponding file in the snapshot will be wrong, it will
be beyond the end of the last file extent. And btrfsck will report:
  root 256 inode 257 errors 100

Steps to reproduce:
 # mkfs.btrfs <partition>
 # mount <partition> <mnt>
 # cd <mnt>
 # dd if=/dev/zero of=tmpfile bs=4M count=1024 &
 # for ((i=0; i<4; i++))
 > do
 > btrfs sub snap . $i
 > done

This because the algorithm of disk_i_size update is wrong. Though there are
some ordered extents behind the current one which we use to update disk_i_size,
it doesn't mean those extents will be dealt with in the same transaction. So
We shouldn't use the offset of those extents to update disk_i_size. Or we will
get the wrong i_size in the snapshot.

We fix this problem by recording the max real i_size. If we find there is a
ordered extent which is in front of the current one and doesn't complete, we
will record the end of the current one into that ordered extent. Surely, if
the current extent holds the end of other extent(it must be greater than
the current one because it is behind the current one), we will record the
number that the current extent holds. In this way, we can exclude the ordered
extents that may not be dealth with in the same transaction, and be easy to
know the real disk_i_size.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>

b9a8cc5b

04 8月, 2012 1 次提交

btrfs: nuke pdflush from comments · b2570314

由 Artem Bityutskiy 提交于 7月 25, 2012

The pdflush thread is long gone, so this patch removes references to pdflush
from btrfs comments.

Cc: Chris Mason <chris.mason@fusionio.com>
Cc: linux-btrfs@vger.kernel.org
Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b2570314

15 6月, 2012 1 次提交

Btrfs: call filemap_fdatawrite twice for compression · 7ddf5a42

由 Josef Bacik 提交于 6月 08, 2012

I removed this in an earlier commit and I was wrong. Because compression
can return from filemap_fdatawrite() without having actually set any of it's
pages as writeback() it can make filemap_fdatawait() do essentially nothing,
and then we won't find any ordered extents because they may not have been
created yet. So not only does this make fsync() completely useless, but it
will also screw up if you truncate on a non-page aligned offset since we
zero out the end and then wait on ordered extents and then call drop caches.
We can drop the cache before the io completes and then we try to unpin the
extent we just wrote we won't find it and everything goes sideways. So fix
this by putting it back and put a giant comment there to keep me from trying
to remove it in the future. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

7ddf5a42

30 5月, 2012 3 次提交

Btrfs: finish ordered extents in their own thread · 5fd02043

由 Josef Bacik 提交于 5月 02, 2012

We noticed that the ordered extent completion doesn't really rely on having
a page and that it could be done independantly of ending the writeback on a
page. This patch makes us not do the threaded endio stuff for normal
buffered writes and direct writes so we can end page writeback as soon as
possible (in irq context) and only start threads to do the ordered work when
it is actually done. Compression needs to be reworked some to take
advantage of this as well, but atm it has to do a find_get_page in its endio
handler so it must be done in its own thread. This makes direct writes
quite a bit faster. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

5fd02043

Btrfs: do not check delalloc when updating disk_i_size · 4e899152

由 Josef Bacik 提交于 5月 02, 2012

We are checking delalloc to see if it is ok to update the i_size.  There are
2 cases it stops us from updating

1) If there is delalloc between our current disk_i_size and this ordered
extent

2) If there is delalloc between our current ordered extent and the next
ordered extent

These tests are racy however since we can set delalloc for these ranges at
any time.  Also for the first case if we notice there is delalloc between
disk_i_size and our ordered extent we will not update disk_i_size and assume
that when that delalloc bit gets written out it will update everything
properly.  However if we crash before that we will have file extents outside
of our i_size, which is not good, so this test is dangerous as well as racy.
Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

4e899152

Btrfs: remove useless waiting and extra filemap work · 551ebb2d

由 Josef Bacik 提交于 4月 23, 2012

In btrfs_wait_ordered_range we have been calling filemap_fdata_write() twice
because compression does strange things and then waiting. Then we look up
ordered extents and if we find any we will always schedule_timeout(); once
and then loop back around and do it all again. We will even check to see if
there is delalloc pages on this range and loop again. So this patch gets
rid of the multipe fdata_write() calls and just does
filemap_write_and_wait(). In the case of compression we will still find the
ordered extents and start those individually if we need to so that is ok,
but in the normal buffered case we avoid all this weird overhead.

Then in the case of the schedule_timeout(1), we don't need it. All callers
either 1) don't care, they just want to make sure what they just wrote maeks
it to disk or 2) are doing the lock()->lookup ordered->unlock->flush thing
in which case it will lock and check for ordered extents _anyway_ so get
back to them as quickly as possible. The delaloc check is simply not
needed, this only catches the case where we write to the file again since
doing the filemap_write_and_wait() and if the caller truly cares about that
it will take care of everything itself. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

551ebb2d

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功