提交 · 33b258086441dd07e00133c79fcd8cbc6a76d737 · openeuler / Kernel

12 11月, 2009 9 次提交

Btrfs: allow more metadata chunk preallocation · 33b25808

由 Chris Mason 提交于 15年前

On an FS where all of the space has not been allocated into chunks yet,
the enospc can return enospc just because the existing metadata chunks
are full.

We get around this by allowing more metadata chunks to be allocated up
to a certain limit, and finding the right limit is a little fuzzy.  The
problem is the reservations for delalloc would preallocate way too much
of the FS as metadata.  We need to start saying no and just force some
IO to happen.

But we also need to let a reasonable amount of the FS become metadata.
This bumps the hard limit up, later releases will have a better system.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

33b25808

Btrfs: fallback on uncompressed io if compressed io fails · f5a84ee3

由 Josef Bacik 提交于 15年前

Currently compressed IO does not deal with not having its entire extent able to
be allocated. So if we have enough free space to allocate for the extent, but
its not contiguous, it will fail spectacularly. This patch fixes this by
falling back on uncompressed IO which lets us spread the delalloc extent across
multiple extents. I tested this by making us randomly think the reservation had
failed to make it fallback on the uncompressed io way and it seemed to work
fine. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f5a84ee3

Btrfs: find ideal block group for caching · ccf0e725

由 Josef Bacik 提交于 15年前

This patch changes a few things. Hopefully the comments are helpfull, but
I'll try and be as verbose here.

Problem:

My fedora box was taking 1 minute and 21 seconds to boot with btrfs as root.
Part of this problem was we pick the first block group we can find and start
caching it, even if it may not have enough free space. The other problem is
we only search for cached block groups the first time around, which we won't
find any cached block groups because this is a newly mounted fs, so we end up
caching several block groups during bootup, which with alot of fragmentation
takes around 30-45 seconds to complete, which bogs down the system. So

Solution:

1) Don't cache block groups willy-nilly at first. Instead try and figure out
which block group has the most free, and therefore will take the least amount
of time to cache.

2) Don't be so picky about cached block groups. The other problem is once
we've filled up a cluster, if the block group isn't finished caching the next
time we try and do the allocation we'll completely ignore the cluster and
start searching from the beginning of the space, which makes us cache more
block groups, which slows us down even more. So instead of skipping block
groups that are not finished caching when we have a hint, only skip the block
group if it hasn't started caching yet.

There is one other tweak in here. Before if we allocated a chunk and still
couldn't find new space, we'd end up switching the space info to force another
chunk allocation. This could make us end up with way too many chunks, so keep
track of this particular case.

With this patch and my previous cluster fixes my fedora box now boots in 43
seconds, and according to the bootchart is not held up by our block group
caching at all.
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ccf0e725

Btrfs: avoid null deref in unpin_extent_cache() · 4eb3991c

由 Dan Carpenter 提交于 15年前

I re-orderred the checks to avoid dereferencing "em" if it was null.

Found by smatch static checker.
Signed-off-by: NDan Carpenter <error27@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

4eb3991c

Btrfs: skip btrfs_release_path in btrfs_update_root and btrfs_del_root · df66916e

由 Li Dongyang 提交于 15年前

We don't need to call btrfs_release_path because btrfs_free_path will do
that for us.
Signed-off-by: NLi Dongyang <Jerry87905@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

df66916e

Btrfs: fix some metadata enospc issues · 5df6a9f6

由 Josef Bacik 提交于 15年前

We weren't reserving metadata space for rename, rmdir and unlink, which could
cause problems.
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

5df6a9f6

Btrfs: fix how we set max_size for free space clusters · 01dea1ef

由 Josef Bacik 提交于 15年前

This patch fixes a problem where max_size can be set to 0 even though we
filled the cluster properly. We set max_size to 0 if we restart the cluster
window, but if the new start entry is big enough to be our new cluster then we
could return with a max_size set to 0, which will mean the next time we try to
allocate from this cluster it will fail. So set max_extent to the entry's
size. Tested this on my box and now we actually allocate from the cluster
after we fill it. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

01dea1ef

Btrfs: cleanup transaction starting and fix journal_info usage · 249ac1e5

由 Josef Bacik 提交于 15年前

We use journal_info to tell if we're in a nested transaction to make sure we
don't commit the transaction within a nested transaction. We use another
method to see if there are any outstanding ioctl trans handles, so if we're
starting one do not set current->journal_info, since it will screw with other
filesystems. This patch also cleans up the starting stuff so there aren't any
magic numbers.
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

249ac1e5

Btrfs: fix data allocation hint start · 6346c939

由 Josef Bacik 提交于 15年前

Sometimes our start allocation hint when we cow a file can be either
EXTENT_HOLE or some other such place holder, which is not optimal. So if we
find that our em->block_start is one of these special values, check to see
where the first block of the inode is stored, and use that as a hint. If that
block is also a special value, just fallback on a hint of 0 and let the
allocator figure out a good place to put the data.
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

6346c939

14 10月, 2009 9 次提交

Btrfs: always pin metadata in discard mode · 444528b3

由 Chris Mason 提交于 15年前

We have an optimization in btrfs to allow blocks to be
immediately freed if they were allocated in this transaction and never
written.  Otherwise they are pinned and freed when the transaction
commits.

This isn't optimal for discard mode because immediately freeing
them means immediately discarding them.  It is better to give the
block to the pinning code and letting the (slow) discard happen later.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

444528b3

Btrfs: enable discard support · 06348574

由 Christoph Hellwig 提交于 15年前

The discard support code in btrfs currently is guarded by ifdefs for
BIO_RW_DISCARD, which is never defines as it's the name of an enum
memeber.  Just remove the useless ifdefs to actually enable the code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

06348574

Btrfs: add -o discard option · e244a0ae

由 Christoph Hellwig 提交于 15年前

Enable discard by default is not a good idea given the the trim speed
of SSD prototypes we've seen, and the carecteristics for many high-end
arrays.  Turn of discards by default and require the -o discard option
to enable them on.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e244a0ae

Btrfs: properly wait log writers during log sync · 86df7eb9

由 Yan, Zheng 提交于 15年前

A recently fsync optimization make btrfs_sync_log skip calling
wait_for_writer in the single log writer case. This is incorrect
since the writer count can also be increased by btrfs_pin_log.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

86df7eb9

Btrfs: fix possible ENOSPC problems with truncate · 5d5e103a

由 Josef Bacik 提交于 15年前

There's a problem where we don't do any space reservation for truncates, which
can cause you to OOPs because you will be allowed to go off in the weeds a bit
since we don't account for the delalloc bytes that are created as a result of
the truncate.
Signed-off-by: NJosef Bacik <jbacik@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

5d5e103a

Btrfs: fix btrfs acl #ifdef checks · 0eda294d

由 Chris Mason 提交于 15年前

The btrfs acl code was #ifdefing for a define
that didn't exist.  This correctly matches it
to the values used by the Kconfig file.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

0eda294d

Btrfs: streamline tree-log btree block writeout · 690587d1

由 Chris Mason 提交于 15年前

Syncing the tree log is a 3 phase operation.

1) write and wait for all the tree log blocks for a given root.

2) write and wait for all the tree log blocks for the
tree of tree log roots.

3) write and wait for the super blocks (barriers here)

This isn't as efficient as it could be because there is
no requirement to wait for the blocks from step one to hit the disk
before we start writing the blocks from step two.  This commit
changes the sequence so that we don't start waiting until
all the tree blocks from both steps one and two have been sent
to disk.

We do this by breaking up btrfs_write_wait_marked_extents into
two functions, which is trivial because it was already broken
up into two parts.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

690587d1

Btrfs: avoid tree log commit when there are no changes · 257c62e1

由 Chris Mason 提交于 15年前

rpm has a habit of running fdatasync when the file hasn't
changed.  We already detect if a file hasn't been changed
in the current transaction but it might have been sent to
the tree-log in this transaction and not changed since
the last call to fsync.

In this case, we want to avoid a tree log sync, which includes
a number of synchronous writes and barriers.  This commit
extends the existing tracking of the last transaction to change
a file to also track the last sub-transaction.

The end result is that rpm -ivh and -Uvh are roughly twice as fast,
and on par with ext3.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

257c62e1

Btrfs: only write one super copy during fsync · 4722607d

由 Chris Mason 提交于 15年前

During a tree-log commit for fsync, we've been writing at least
two copies of the super block and forcing them to disk.

The other filesystems write only one, and this change brings us on
par with them.  A full transaction commit will write all the super
copies, so we still have redundant info written on a regular
basis.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

4722607d

09 10月, 2009 10 次提交

Btrfs: fix file clone ioctl for bookend extents · ac6889cb

由 Chris Mason 提交于 15年前

The file clone ioctl was incorrectly taking the offset into the
extent on disk into account when calculating the length of the
cloned extent.

The length never changes based on the offset into the physical extent.

Test case:

fallocate -l 1g image
mke2fs image
bcp image image2
e2fsck -f image2

(errors on image2)

The math bug ends up wrapping the length of the extent, and things
go wrong from there.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ac6889cb

Btrfs: fix uninit compiler warning in cow_file_range_nocow · e9061e21

由 Chris Mason 提交于 15年前

The extent_type variable was exposed uninit via a goto.  It should be
impossible to trigger because it is protected by a check on another
variable, but this makes sure.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e9061e21

A
Btrfs: constify dentry_operations · 82d339d9
由 Alexey Dobriyan 提交于 15年前
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
82d339d9

Btrfs: optimize back reference update during btrfs_drop_snapshot · 94fcca9f

由 Yan, Zheng 提交于 15年前

This patch reading level 0 tree blocks that already use full backrefs.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

94fcca9f

Btrfs: remove negative dentry when deleting subvolumne · efefb143

由 Yan, Zheng 提交于 15年前

The use of btrfs_dentry_delete is removing dentries from the
dcache when deleting subvolumne. btrfs_dentry_delete ignores
negative dentries. This is incorrect since if we don't remove
the negative dentry, its parent dentry can't be removed.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

efefb143

Btrfs: optimize fsync for the single writer case · ff782e0a

由 Josef Bacik 提交于 15年前

This patch optimizes the tree logging stuff so it doesn't always wait 1 jiffie
for new people to join the logging transaction if there is only ever 1 writer.
This helps a little bit with latency where we have something like RPM where it
will fdatasync every file it writes, and so waiting the 1 jiffie for every
fdatasync really starts to add up.
Signed-off-by: NJosef Bacik <jbacik@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ff782e0a

Btrfs: async delalloc flushing under space pressure · e3ccfa98

由 Josef Bacik 提交于 15年前

This patch moves the delalloc flushing that occurs when we are under space
pressure off to a async thread pool. This helps since we only free up
metadata space when we actually insert the extent item, which means it takes
quite a while for space to be free'ed up if we wait on all ordered extents.
However, if space is freed up due to inline extents being inserted, we can
wake people who are waiting up early, and they can finish their work.
Signed-off-by: NJosef Bacik <jbacik@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e3ccfa98

Btrfs: release delalloc reservations on extent item insertion · 32c00aff

由 Josef Bacik 提交于 15年前

This patch fixes an issue with the delalloc metadata space reservation
code. The problem is we used to free the reservation as soon as we
allocated the delalloc region. The problem with this is if we are not
inserting an inline extent, we don't actually insert the extent item until
after the ordered extent is written out. This patch does 3 things,

1) It moves the reservation clearing stuff into the ordered code, so when
we remove the ordered extent we remove the reservation.
2) It adds a EXTENT_DO_ACCOUNTING flag that gets passed when we clear
delalloc bits in the cases where we want to clear the metadata reservation
when we clear the delalloc extent, in the case that we do an inline extent
or we invalidate the page.
3) It adds another waitqueue to the space info so that when we start a fs
wide delalloc flush, anybody else who also hits that area will simply wait
for the flush to finish and then try to make their allocation.

This has been tested thoroughly to make sure we did not regress on
performance.
Signed-off-by: NJosef Bacik <jbacik@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

32c00aff

Btrfs: delay clearing EXTENT_DELALLOC for compressed extents · a3429ab7

由 Chris Mason 提交于 15年前

When compression is on, the cow_file_range code is farmed off to
worker threads.  This allows us to do significant CPU work in parallel
on SMP machines.

But it is a delicate balance around when we clear flags and how.  In
the past we cleared the delalloc flag immediately, which was safe
because the pages stayed locked.

But this is causing problems with the newest ENOSPC code, and with the
recent extent state cleanups we can now clear the delalloc bit at the
same time the uncompressed code does.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a3429ab7

Btrfs: cleanup extent_clear_unlock_delalloc flags · a791e35e

由 Chris Mason 提交于 15年前

extent_clear_unlock_delalloc has a growing set of ugly parameters
that is very difficult to read and maintain.

This switches to a flag field and well named flag defines.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a791e35e

06 10月, 2009 1 次提交

Btrfs: fix possible softlockup in the allocator · 1cdda9b8

由 Josef Bacik 提交于 15年前

Like the cluster allocating stuff, we can lockup the box with the normal
allocation path.  This happens when we

1) Start to cache a block group that is severely fragmented, but has a decent
amount of free space.
2) Start to commit a transaction
3) Have the commit try and empty out some of the delalloc inodes with extents
that are relatively large.

The inodes will not be able to make the allocations because they will ask for
allocations larger than a contiguous area in the free space cache.  So we will
wait for more progress to be made on the block group, but since we're in a
commit the caching kthread won't make any more progress and it already has
enough free space that wait_block_group_cache_progress will just return.  So,
if we wait and fail to make the allocation the next time around, just loop and
go to the next block group.  This keeps us from getting stuck in a softlockup.
Thanks,
Signed-off-by: NJosef Bacik <jbacik@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1cdda9b8

05 10月, 2009 1 次提交

Btrfs: fix deadlock on async thread startup · 61d92c32

由 Chris Mason 提交于 15年前

The btrfs async worker threads are used for a wide variety of things,
including processing bio end_io functions.  This means that when
the endio threads aren't running, the rest of the FS isn't
able to do the final processing required to clear PageWriteback.

The endio threads also try to exit as they become idle and
start more as the work piles up.  The problem is that starting more
threads means kthreadd may need to allocate ram, and that allocation
may wait until the global number of writeback pages on the system is
below a certain limit.

The result of that throttling is that end IO threads wait on
kthreadd, who is waiting on IO to end, which will never happen.

This commit fixes the deadlock by handing off thread startup to a
dedicated thread.  It also fixes a bug where the on-demand thread
creation was creating far too many threads because it didn't take into
account threads being started by other procs.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

61d92c32

02 10月, 2009 2 次提交

Btrfs: fix data space leak fix · fbf19087

由 Josef Bacik 提交于 15年前

There is a problem where page_mkwrite can be called on a dirtied page that
already has a delalloc range associated with it.  The fix is to clear any
delalloc bits for the range we are dirtying so the space accounting gets
handled properly.  This is the same thing we do in the normal write case, so we
are consistent across the board.  With this patch we no longer leak reserved
space.
Signed-off-by: NJosef Bacik <jbacik@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

fbf19087

Btrfs: take i_mutex before generic_write_checks · ab93dbec

由 Chris Mason 提交于 15年前

btrfs_file_write was incorrectly calling generic_write_checks without
taking i_mutex.  This lead to problems with racing around i_size when
doing O_APPEND writes.

The fix here is to move i_mutex higher.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ab93dbec

01 10月, 2009 1 次提交

Btrfs: fix arguments to btrfs_wait_on_page_writeback_range · 35d62a94

由 Christoph Hellwig 提交于 15年前

wait_on_page_writeback_range/btrfs_wait_on_page_writeback_range takes
a pagecache offset, not a byte offset into the file.  Shift the arguments
around to wait for the correct range
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

35d62a94

30 9月, 2009 5 次提交

Btrfs: fix deadlock with free space handling and user transactions · dd7e0b7b

由 Sage Weil 提交于 15年前

If an ioctl-initiated transaction is open, we can't force a commit during
the free space checks in order to free up pinned extents or else we
deadlock.  Just ENOSPC instead.

A more satisfying solution that reserves space for the entire user
transaction up front is forthcoming...
Signed-off-by: NSage Weil <sage@newdream.net>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

dd7e0b7b

Btrfs: fix error cases for ioctl transactions · 1ab86aed

由 Sage Weil 提交于 15年前

Fix leak of vfsmount write reference and open_ioctl_trans reference on
ENOMEM.  Clean up the error paths while we're at it.
Signed-off-by: NSage Weil <sage@newdream.net>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1ab86aed

Btrfs: Use CONFIG_BTRFS_POSIX_ACL to enable ACL code · 3baf0bed

由 Chris Ball 提交于 15年前

We've already defined CONFIG_BTRFS_POSIX_ACL in Kconfig, but we're
currently not using it and are testing CONFIG_FS_POSIX_ACL instead.
CONFIG_FS_POSIX_ACL states "Never use this symbol for ifdefs".
Signed-off-by: NChris Ball <cjb@laptop.org>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

3baf0bed

Btrfs: introduce missing kfree · fd2696f3

由 Julia Lawall 提交于 15年前

Error handling code following a kzalloc should free the allocated data.

The semantic match that finds the problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@r exists@
local idexpression x;
statement S;
expression E;
identifier f,f1,l;
position p1,p2;
expression *ptr != NULL;
@@

x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...);
...
if (x == NULL) S
<... when != x
     when != if (...) { <+...x...+> }
(
x->f1 = E
|
 (x->f1 == NULL || ...)
|
 f(...,x->f1,...)
)
...>
(
 return \(0\|<+...x...+>\|ptr\);
|
 return@p2 ...;
)

@script:python@
p1 << r.p1;
p2 << r.p2;
@@

print "* file: %s kmalloc %s return %s" % (p1[0].file,p1[0].line,p2[0].line)
// </smpl>
Signed-off-by: NJulia Lawall <julia@diku.dk>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

fd2696f3

Btrfs: Fix setting umask when POSIX ACLs are not enabled · 49cf6f45

由 Chris Ball 提交于 15年前

We currently set sb->s_flags |= MS_POSIXACL unconditionally, which is
incorrect -- it tells the VFS that it shouldn't set umask because we
will, yet we don't set it ourselves if we aren't using POSIX ACLs, so
the umask ends up ignored.
Signed-off-by: NChris Ball <cjb@laptop.org>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

49cf6f45

29 9月, 2009 1 次提交

Btrfs: proper -ENOSPC handling · 9ed74f2d

由 Josef Bacik 提交于 15年前

At the start of a transaction we do a btrfs_reserve_metadata_space() and
specify how many items we plan on modifying. Then once we've done our
modifications and such, just call btrfs_unreserve_metadata_space() for
the same number of items we reserved.

For keeping track of metadata needed for data I've had to add an extent_io op
for when we merge extents. This lets us track space properly when we are doing
sequential writes, so we don't end up reserving way more metadata space than
what we need.

The only place where the metadata space accounting is not done is in the
relocation code. This is because Yan is going to be reworking that code in the
near future, so running btrfs-vol -b could still possibly result in a ENOSPC
related panic. This patch also turns off the metadata_ratio stuff in order to
allow users to more efficiently use their disk space.

This patch makes it so we track how much metadata we need for an inode's
delayed allocation extents by tracking how many extents are currently
waiting for allocation. It introduces two new callbacks for the
extent_io tree's, merge_extent_hook and split_extent_hook. These help
us keep track of when we merge delalloc extents together and split them
up. Reservations are handled prior to any actually dirty'ing occurs,
and then we unreserve after we dirty.

btrfs_unreserve_metadata_for_delalloc() will make the appropriate
unreservations as needed based on the number of reservations we
currently have and the number of extents we currently have. Doing the
reservation outside of doing any of the actual dirty'ing lets us do
things like filemap_flush() the inode to try and force delalloc to
happen, or as a last resort actually start allocation on all delalloc
inodes in the fs. This has survived dbench, fs_mark and an fsx torture
test.
Signed-off-by: NJosef Bacik <jbacik@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

9ed74f2d

24 9月, 2009 1 次提交

Btrfs: hash the btree inode during fill_super · c65ddb52

由 Yan Zheng 提交于 15年前

The snapshot deletion  patches dropped this line, but the inode
needs to be hashed.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c65ddb52

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功