提交 · ceab36edd3d3ad3ffd01d41d6d1e05ac1ff8357e · openeuler / Kernel

08 8月, 2009 3 次提交

Btrfs: fix balancing oops when invalidate_inode_pages2 returns EBUSY · ceab36ed

由 Yan Zheng 提交于 8月 07, 2009

invalidate_inode_pages2_range may return -EBUSY occasionally
which results Oops. This patch fixes the issue by moving
invalidate_inode_pages2_range into a loop and keeping calling
it until the return value is not -EBUSY.

The EBUSY return is temporary, and can happen when the btrfs release page
function is unable to release a page because the EXTENT_LOCK
bit is set.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ceab36ed

Btrfs: correct error-handling zlib error handling · 60f2e8f8

由 Julia Lawall 提交于 8月 07, 2009

find_zlib_workspace returns an ERR_PTR value in an error case instead of NULL.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@match exists@
expression x, E;
statement S1, S2;
@@

x = find_zlib_workspace(...)
... when != x = E
(
*  if (x == NULL || ...) S1 else S2
|
*  if (x == NULL && ...) S1 else S2
)
// </smpl>
Signed-off-by: NJulia Lawall <julia@diku.dk>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

60f2e8f8

Btrfs: remove superfluous NULL pointer check in btrfs_rename() · 4baf8c92

由 Bartlomiej Zolnierkiewicz 提交于 8月 07, 2009

This takes care of the following entry from Dan's list:

fs/btrfs/inode.c +4788 btrfs_rename(36) warning: variable derefenced before check 'old_inode'
Reported-by: NDan Carpenter <error27@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Eugene Teo <eteo@redhat.com>
Cc: Julia Lawall <julia@diku.dk>
Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

4baf8c92

01 8月, 2009 1 次提交

Btrfs: make sure the async caching thread advances the key · 013f1b12

由 Chris Mason 提交于 7月 31, 2009

The async caching thread can end up looping forever if a given
search puts it at the last key in a leaf.  It will end up calling
btrfs_next_leaf and then checking if it needs to politely drop
the read semaphore.

Most of the time this looping isn't noticed because it is able to
make progress the next time around.  But, during log replay,
we wait on the async caching thread to finish, and the async thread
is waiting on the commit, and no progress is really made.

The fix used here is to copy the key out of the next leaf,
that way our search lands there properly.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

013f1b12

31 7月, 2009 1 次提交

Btrfs: fix btrfs_remove_from_free_space corner case · 6606bb97

由 Josef Bacik 提交于 7月 31, 2009

Yan Zheng hit a problem where we tried to remove some free space but failed
because we couldn't find the free space entry. This is because the free space
was held within a bitmap that had a starting offset well before the actual
offset of the free space, and there were free space extents that were in the
same range as that offset, so tree_search_offset returned with NULL because we
couldn't find a free space extent that had that offset. This is fixed by
making sure that if we fail to find the entry, we re-search again with
bitmap_only set to 1 and do an offset_to_bitmap so we can get the appropriate
bitmap. A similar problem happens in btrfs_alloc_from_bitmap for the
clustering code, but that is not as bad since we will just go and redo our
cluster allocation.

Also this adds some debugging checks to make sure that the free space we are
trying to remove from the bitmap is in fact there. This can probably go away
after a while, but since this code is only used by the tree-logging stuff it
would be nice to run with it for a while to make sure there are no problems.
Signed-off-by: NJosef Bacik <jbacik@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

6606bb97

30 7月, 2009 2 次提交

Btrfs: be more polite in the async caching threads · f36f3042

由 Chris Mason 提交于 7月 30, 2009

The semaphore used by the async caching threads can prevent a
transaction commit, which can make the FS appear to stall.  This
releases the semaphore more often when a transaction commit is
in progress.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f36f3042

Btrfs: preserve commit_root for async caching · 276e680d

由 Yan Zheng 提交于 7月 30, 2009

The async block group caching code uses the commit_root pointer
to get a stable version of the extent allocation tree for scanning.
This copy of the tree root isn't going to change and it significantly
reduces the complexity of the scanning code.

During a commit, we have a loop where we update the extent allocation
tree root.  We need to loop because updating the root pointer in
the tree of tree roots may allocate blocks which may change the
extent allocation tree.

Right now the commit_root pointer is changed inside this loop.  It
is more correct to change the commit_root pointer only after all the
looping is done.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

276e680d

28 7月, 2009 3 次提交

Btrfs: Fix async caching interaction with unmount · f25784b3

由 Yan Zheng 提交于 7月 28, 2009

- don't stop the caching thread until btrfs_commit_super return.

- if caching is interrupted by umount, set last to (u64)-1.
  otherwise the un-scanned range of block group will be considered
  as free extent.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f25784b3

Btrfs: change how we unpin extents · 68b38550

由 Josef Bacik 提交于 7月 27, 2009

We are racy with async block caching and unpinning extents. This patch makes
things much less complicated by only unpinning the extent if the block group is
cached. We check the block_group->cached var under the block_group->lock spin
lock. If it is set to BTRFS_CACHE_FINISHED then we update the pinned counters,
and unpin the extent and add the free space back. If it is not set to this, we
start the caching of the block group so the next time we unpin extents we can
unpin the extent. This keeps us from racing with the async caching threads,
lets us kill the fs wide async thread counter, and keeps us from having to set
DELALLOC bits for every extent we hit if there are caching kthreads going.

One thing that needed to be changed was btrfs_free_super_mirror_extents. Now
instead of just looking for LOCKED extents, we also look for DIRTY extents,
since we could have left some extents pinned in the previous transaction that
will never get freed now that we are unmounting, which would cause us to leak
memory. So btrfs_free_super_mirror_extents has been changed to
btrfs_free_pinned_extents, and it will clear the extents locked for the super
mirror, and any remaining pinned extents that may be present. Thank you,
Signed-off-by: NJosef Bacik <jbacik@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

68b38550

Btrfs: Correct redundant test in add_inode_ref · 631c07c8

由 Julia Lawall 提交于 7月 27, 2009

dir has already been tested.  It seems that this test should be on the
recently returned value inode.

A simplified version of the semantic match that finds this problem is as
follows: (http://www.emn.fr/x-info/coccinelle/)
Signed-off-by: NJulia Lawall <julia@diku.dk>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

631c07c8

25 7月, 2009 4 次提交

Btrfs: find smallest available device extent during chunk allocation · 9779b72f

由 Chris Mason 提交于 7月 24, 2009

Allocating new block group is easy when the disk has plenty of space.
But things get difficult as the disk fills up, especially if
the FS has been run through btrfs-vol -b.  The balance operation
is likely to make the total bytes available on the device greater
than the largest extent we'll actually be able to allocate.

But the device extent allocation code incorrectly assumes that a device
with 5G free will be able to allocate a 5G extent.  It isn't normally a
problem because device extents don't get freed unless btrfs-vol -b
is run.

This fixes the device extent allocator to remember the largest free
extent it can find, and then uses that value as a fallback.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

9779b72f

Btrfs: clear all space_info->full after removing a block group · 283bb197

由 Chris Mason 提交于 7月 24, 2009

Btrfs allocates individual extents from block groups, and each
block group has a specific type.  It may hold metadata, data
mirrored or striped etc.

When we balance space (btrfs-vol -b) or remove a drive (btrfs-vol -r)
we free block groups.  Once a block group is freed, the space it was
using on the device may be available for use by new block groups.

btrfs_remove_block_group was clearing the flag that said
'our devices are full, don't even try to allocate new block groups',
but it was only clearing that flag for a specific type of block group.

This commit clears the full flag for all of the types of block groups,
making it much more likely that we'll be able to balance space when
the drive is close to full.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

283bb197

Btrfs: make flushoncommit mount option correctly wait on ordered_extents · ebecd3d9

由 Sage Weil 提交于 7月 24, 2009

The commit_transaction call to wait_ordered_extents when snap_pending
passes nocow_only=1 to process only NOCOW or PREALLOC extents.  This isn't
correct for the 'flushoncommit' mode, as it skips extents we just started
IO on in start_delalloc_inodes.

So, in the flushoncommit case, wait on all ordered extents.  Otherwise,
only pass the nocow_only flag to wait_ordered_extents if snap_pending.
Signed-off-by: NSage Weil <sage@newdream.net>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ebecd3d9

Btrfs: Avoid delayed reference update looping · d717aa1d

由 Yan Zheng 提交于 7月 24, 2009

btrfs_split_leaf and btrfs_del_items can end up in a loop
where one is constantly spliting a given leaf and the other
is constantly merging it back with the adjacent nodes.

There is a better fix for this, but in the interest of something
small, this patch just changes btrfs_del_items back to balancing less
often.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d717aa1d

24 7月, 2009 5 次提交

Btrfs: Fix ordering of key field checks in btrfs_previous_item · 0a4eefbb

由 Yan Zheng 提交于 7月 24, 2009

Check objectid of item before checking the item type, otherwise we may return
zero for a key that is actually too low.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

0a4eefbb

Btrfs: find_free_dev_extent doesn't handle holes at the start of the device · 1fcbac58

由 Yan Zheng 提交于 7月 24, 2009

find_free_dev_extent does not properly handle the case where
the device is not complete free, and there is a free extent
at the beginning of the device.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1fcbac58

Btrfs: Remove code duplication in comp_keys · 20736aba

由 Diego Calleja 提交于 7月 24, 2009

comp_keys is duplicating what is done in btrfs_comp_cpu_keys, so just
call it.
Signed-off-by: NDiego Calleja <diegocg@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

20736aba

Btrfs: async block group caching · 817d52f8

由 Josef Bacik 提交于 7月 13, 2009

This patch moves the caching of the block group off to a kthread in order to
allow people to allocate sooner.  Instead of blocking up behind the caching
mutex, we instead kick of the caching kthread, and then attempt to make an
allocation.  If we cannot, we wait on the block groups caching waitqueue, which
the caching kthread will wake the waiting threads up everytime it finds 2 meg
worth of space, and then again when its finished caching.  This is how I tested
the speedup from this

mkfs the disk
mount the disk
fill the disk up with fs_mark
unmount the disk
mount the disk
time touch /mnt/foo

Without my changes this took 11 seconds on my box, with these changes it now
takes 1 second.

Another change thats been put in place is we lock the super mirror's in the
pinned extent map in order to keep us from adding that stuff as free space when
caching the block group.  This doesn't really change anything else as far as the
pinned extent map is concerned, since for actual pinned extents we use
EXTENT_DIRTY, but it does mean that when we unmount we have to go in and unlock
those extents to keep from leaking memory.

I've also added a check where when we are reading block groups from disk, if the
amount of space used == the size of the block group, we go ahead and mark the
block group as cached.  This drastically reduces the amount of time it takes to
cache the block groups.  Using the same test as above, except doing a dd to a
file and then unmounting, it used to take 33 seconds to umount, now it takes 3
seconds.

This version uses the commit_root in the caching kthread, and then keeps track
of how many async caching threads are running at any given time so if one of the
async threads is still running as we cross transactions we can wait until its
finished before handling the pinned extents.  Thank you,
Signed-off-by: NJosef Bacik <jbacik@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

817d52f8

Btrfs: use hybrid extents+bitmap rb tree for free space · 96303081

由 Josef Bacik 提交于 7月 13, 2009

Currently btrfs has a problem where it can use a ridiculous amount of RAM simply
tracking free space. As free space gets fragmented, we end up with thousands of
entries on an rb-tree per block group, which usually spans 1 gig of area. Since
we currently don't ever flush free space cache back to disk this gets to be a
bit unweildly on large fs's with lots of fragmentation.

This patch solves this problem by using PAGE_SIZE bitmaps for parts of the free
space cache. Initially we calculate a threshold of extent entries we can
handle, which is however many extent entries we can cram into 16k of ram. The
maximum amount of RAM that should ever be used to track 1 gigabyte of diskspace
will be 32k of RAM, which scales much better than we did before.

Once we pass the extent threshold, we start adding bitmaps and using those
instead for tracking the free space. This patch also makes it so that any free
space thats less than 4 * sectorsize we go ahead and put into a bitmap. This is
nice since we try and allocate out of the front of a block group, so if the
front of a block group is heavily fragmented and then has a huge chunk of free
space at the end, we go ahead and add the fragmented areas to bitmaps and use a
normal extent entry to track the big chunk at the back of the block group.

I've also taken the opportunity to revamp how we search for free space.
Previously we indexed free space via an offset indexed rb tree and a bytes
indexed rb tree. I've dropped the bytes indexed rb tree and use only the offset
indexed rb tree. This cuts the number of tree operations we were doing
previously down by half, and gives us a little bit of a better allocation
pattern since we will always start from a specific offset and search forward
from there, instead of searching for the size we need and try and get it as
close as possible to the offset we want.

I've given this a healthy amount of testing pre-new format stuff, as well as
post-new format stuff. I've booted up my fedora box which is installed on btrfs
with this patch and ran with it for a few days without issues. I've not seen
any performance regressions in any of my tests.

Since the last patch Yan Zheng fixed a problem where we could have overlapping
entries, so updating their offset inline would cause problems. Thanks,
Signed-off-by: NJosef Bacik <jbacik@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

96303081

23 7月, 2009 5 次提交

Btrfs: Fix crash on read failures at mount · 83121942

由 David Woodhouse 提交于 7月 22, 2009

If the tree roots hit read errors during mount, btrfs is not properly
erroring out.  We need to check the uptodate bits after
reading in the tree root node.
Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

83121942

Btrfs: remove of redundant btrfs_header_level · c271b492

由 Daniel Cadete 提交于 7月 22, 2009

This removes the continues call's of btrfs_header_level. One call of
btrfs_header_level(c) its enough.

Signed-off-by Daniel Cadete <danielncadete10@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c271b492

Btrfs: adjust NULL test · 33c17ad5

由 Julia Lawall 提交于 7月 22, 2009

Move the call to BUG_ON to before the dereference of the tested value.
Signed-off-by: NJulia Lawall <julia@diku.dk>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

33c17ad5

Btrfs: Remove broken sanity check from btrfs_rmap_block() · 3acada49

由 David Woodhouse 提交于 7月 22, 2009

It was never actually doing anything anyway (see the loop condition),
and it would be difficult to make it work for RAID[56].

Even if it was actually working, it's checking for the wrong thing
anyway. Instead of checking whether we list a block which _doesn't_ land
at the relevant physical location, it should be checking that we _have_
listed all the logical blocks which refer to the required physical
location on all devices.

This function is only called from remove_sb_from_cache() to ensure that
we reserve the logical blocks which would reside at the same physical
location as the superblock copies. So listing more blocks than we need
is actually OK.

With RAID[56] we're going to throw away an entire stripe for each block
we have to ignore, so we _are_ going to list blocks other than the
ones which actually contain the superblock.
Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

3acada49

Btrfs: convert nested spin_lock_irqsave to spin_lock · 29c5e8ce

由 Julia Lawall 提交于 7月 22, 2009

If spin_lock_irqsave is called twice in a row with the same second
argument, the interrupt state at the point of the second call overwrites
the value saved by the first call. Indeed, the second call does not need
to save the interrupt state, so it is changed to a simple spin_lock.
Signed-off-by: NJulia Lawall <julia@diku.dk>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

29c5e8ce

22 7月, 2009 5 次提交

Btrfs: make sure all dirty blocks are written at commit time · 4a8c9a62

由 Yan Zheng 提交于 7月 22, 2009

Write dirty block groups may allocate new block, and so may add new delayed
back ref. btrfs_run_delayed_refs may make some block groups dirty.

commit_cowonly_roots does not handle the recursion properly, and some dirty
blocks can be left unwritten at commit time. This patch moves
btrfs_run_delayed_refs into the loop that writes dirty block groups, and makes
the code not break out of the loop until there are no dirty block groups or
delayed back refs.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

4a8c9a62

Btrfs: fix locking issue in btrfs_find_next_key · 33c66f43

由 Yan Zheng 提交于 7月 22, 2009

When walking up the tree, btrfs_find_next_key assumes the upper level tree
block is properly locked. This isn't always true even path->keep_locks is 1.
This is because btrfs_find_next_key may advance path->slots[] several times
instead of only once.

When 'path->slots[level] >= btrfs_header_nritems(path->nodes[level])' is found,
we can't guarantee the original value of 'path->slots[level]' is
'btrfs_header_nritems(path->nodes[level]) - 1'. If it's not, the tree block at
'level + 1' isn't locked.

This patch fixes the issue by explicitly checking the locking state,
re-searching the tree if it's not locked.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

33c66f43

Btrfs: fix double increment of path->slots[0] in btrfs_next_leaf · e457afec

由 Yan Zheng 提交于 7月 22, 2009

if 1 is returned by btrfs_search_slot, the path already points to the
first item with 'key > searching key'. So increasing path->slots[0] by
one is superfluous in that case.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e457afec

Btrfs: properly update space information after shrinking device. · bf1fb512

由 Yan Zheng 提交于 7月 22, 2009

Change 'goto done' to 'break' for the case of all device extents have
been freed, so that the code updates space information will be execute.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

bf1fb512

Btrfs: fix definition of struct btrfs_extent_inline_ref · 1bec1aed

由 Yan Zheng 提交于 7月 22, 2009

use __le64 instead of u64 in on-disk structure definition.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1bec1aed

03 7月, 2009 7 次提交

Btrfs: fix error message formatting · 68f5a38c

由 Hu Tao 提交于 7月 02, 2009

Make an error msg look nicer by inserting a space between number and word.
Signed-off-by: NHu Tao <hu.taoo@gmail.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

68f5a38c

Btrfs: fix use after free in btrfs_start_workers fail path · 9b627e9b

由 Jiri Slaby 提交于 7月 02, 2009

worker memory is already freed on one fail path in btrfs_start_workers,
but is still dereferenced. Switch the dereference and kfree.
Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

9b627e9b

Btrfs: honor nodatacow/sum mount options for new files · 94272164

由 Chris Mason 提交于 7月 02, 2009

The btrfs attr patches unconditionally inherited the inode flags field
without honoring nodatacow and nodatasum. This fix makes sure
we properly record the nodatacow/sum mount options in new inodes.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

94272164

Btrfs: update backrefs while dropping snapshot · 2c47e605

由 Yan Zheng 提交于 6月 27, 2009

The new backref format has restriction on type of backref item. If a tree
block isn't referenced by its owner tree, full backrefs must be used for the
pointers in it. When a tree block loses its owner tree's reference, backrefs
for the pointers in it should be updated to full backrefs. Current
btrfs_drop_snapshot misses the code that updates backrefs, so it's unsafe for
general use.

This patch adds backrefs update code to btrfs_drop_snapshot. It isn't a
problem in the restricted form btrfs_drop_snapshot is used today, but for
general snapshot deletion this update is required.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

2c47e605

Btrfs: account for space we may use in fallocate · a970b0a1

由 Josef Bacik 提交于 6月 27, 2009

Using Eric Sandeen's xfstest for fallocate, you can easily trigger a ENOSPC
panic on btrfs. This is because we do not account for data we may use when
doing the fallocate. This patch fixes the problem by properly reserving space,
and then just freeing it when we are done. The reservation stuff was made with
delalloc in mind, so its a little crude for this case, but it keeps the box
from panicing.
Signed-off-by: NJosef Bacik <jbacik@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a970b0a1

C

Btrfs: fix the file clone ioctl for preallocated extents · c8a894d7
由 Chris Mason 提交于 6月 27, 2009

c8a894d7
C

Btrfs: don't log the inode in file_write while growing the file · f597bb19
由 Chris Mason 提交于 6月 27, 2009

f597bb19

16 6月, 2009 1 次提交

Btrfs: always update root items for fs trees at commit time · 978d910d

由 Yan Zheng 提交于 6月 15, 2009

commit_fs_roots skips updating root items for fs trees that aren't modified.
This is unsafe now that relocation code modifies root item's last_snapshot
field without modifying corresponding fs tree.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

978d910d

11 6月, 2009 3 次提交

Btrfs: fix extent_buffer leak during tree log replay · b263c2c8

由 Chris Mason 提交于 6月 11, 2009

During tree log replay, we read in the tree log roots,
process them and then free them.  A recent change
takes an extra reference on the root node of the tree
when the root is read in, and stores that reference
in root->commit_root.

This reference was not being freed, leaving us with
one buffer pinned in ram for each subvol with
a tree log root after a crash.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b263c2c8

C
Btrfs: fix oops when btrfs_inherit_iflags called with a NULL dir · 0b4dcea5
由 Chris Mason 提交于 6月 11, 2009
```
This happens during subvol creation.
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
0b4dcea5

Btrfs: fix -o nodatasum printk spelling · 067c28ad

由 Chris Mason 提交于 6月 11, 2009

It was printing nodatacsum, which was not the correct option name.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

067c28ad

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功