提交 · 17de39ac17bf99b8bf0d819d13668d5048836efc · openanolis / cloud-kernel

05 5月, 2012 2 次提交

Btrfs: fix page leak when allocing extent buffers · 17de39ac

由 Josef Bacik 提交于 5月 04, 2012

If we happen to alloc a extent buffer and then alloc a page and notice that
page is already attached to an extent buffer, we will only unlock it and
free our existing eb. Any pages currently attached to that eb will be
properly freed, but we don't do the page_cache_release() on the page where
we noticed the other extent buffer which can cause us to leak pages and I
hope cause the weird issues we've been seeing in this area. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

17de39ac

Btrfs: Add properly locking around add_root_to_dirty_list · e5846fc6

由 Chris Mason 提交于 5月 03, 2012

add_root_to_dirty_list happens once at the very beginning of the
transaction, but it is still racey.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e5846fc6

28 4月, 2012 7 次提交

Btrfs: reduce lock contention during extent insertion · dc7fdde3

由 Chris Mason 提交于 4月 27, 2012

We're spending huge amounts of time on lock contention during
end_io processing because we unconditionally assume we are overwriting
an existing extent in the file for each IO.

This checks to see if we are outside i_size, and if so, it uses a
less expensive readonly search of the btree to look for existing
extents.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

dc7fdde3

Btrfs: avoid deadlocks from GFP_KERNEL allocations during btrfs_real_readdir · fede766f

由 Chris Mason 提交于 4月 27, 2012

Btrfs has an optimization where it will preallocate dentries during
readdir to fill in enough information to open the inode without an extra
lookup.

But, we're calling d_alloc, which is doing GFP_KERNEL allocations, and
that leads to deadlocks because our readdir code has tree locks held.

For now, disable this optimization.  We'll fix the gfp mask in the next
merge window.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

fede766f

Btrfs: Fix space checking during fs resize · 7654b724

由 Daniel J Blueman 提交于 4月 27, 2012

Fix out-of-space checking, addressing a warning and potential resource
leak when resizing the filesystem down while allocating blocks.
Signed-off-by: NDaniel J Blueman <daniel@quora.org>
Reviewed-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

7654b724

Btrfs: fix block_rsv and space_info lock ordering · 1f699d38

由 Stefan Behrens 提交于 4月 27, 2012

may_commit_transaction() calls
        spin_lock(&space_info->lock);
        spin_lock(&delayed_rsv->lock);
and update_global_block_rsv() calls
        spin_lock(&block_rsv->lock);
        spin_lock(&sinfo->lock);

Lockdep complains about this at run time.
Everywhere except in update_global_block_rsv(), the space_info lock is
the outer lock, therefore the locking order in update_global_block_rsv()
is changed.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1f699d38

Btrfs: Prevent root_list corruption · 1daf3540

由 Daniel J Blueman 提交于 4月 27, 2012

I was seeing root_list corruption on unmount during fs resize in 3.4-rc4; add
correct locking to address this.
Signed-off-by: NDaniel J Blueman <daniel@quora.org>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1daf3540

Btrfs: fix repair code for RAID10 · 3e74317a

由 Jan Schmidt 提交于 4月 27, 2012

btrfs_map_block sets mirror_num, so that the repair code knows eventually
which device gave us the read error. For RAID10, mirror_num must be 1 or 2.
Before this fix mirror_num was incorrectly related to our stripe index.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

3e74317a

Btrfs: do not start delalloc inodes during sync · 996d282c

由 Josef Bacik 提交于 4月 23, 2012

btrfs_start_delalloc_inodes will just walk the list of delalloc inodes and
start writing them out, but it doesn't splice the list or anything so as
long as somebody is doing work on the box you could end up in this section
_forever_.  So just remove it, it's not needed anyway since sync will start
writeback on all inodes anyway, all we need to do is wait for ordered
extents and then we can commit the transaction.  In my horrible torture test
sync goes from taking 4 minutes to about 1.5 minutes.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

996d282c

19 4月, 2012 19 次提交

Btrfs: fix that check_int_data mount option was ignored · 25cd999e

由 Stefan Behrens 提交于 3月 30, 2012

The bitfield member mount_opt was too small by one bit to hold the mount
option that enabled to include data extents in the integrity checker.
Since the same issue happened when the BTRFS_MOUNT_PANIC_ON_FATAL_ERROR
option was added (git rebase silently merges so that the increase of the
size of the bitfield member is lost), the bit limit was removed entirely.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>

25cd999e

Btrfs: don't count CRC or header errors twice while scrubbing · 5c84fc3c

由 Stefan Behrens 提交于 3月 30, 2012

Each CRC or header error was counted twice, this is now fixed.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>

5c84fc3c

Btrfs: fix btrfs_ioctl_dev_info() crash on missing device · 99ba55ad

由 Stefan Behrens 提交于 3月 19, 2012

When a filesystem is mounted with the degraded option, it is
possible that some of the devices are not there.
btrfs_ioctl_dev_info() crashs in this case because the device
name is a NULL pointer. This ioctl was only used for scrub.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>

99ba55ad

btrfs: don't return EINTR · b9688bb8

由 Arne Jansen 提交于 4月 18, 2012

It is basically a good thing if we are interruptible when waiting for
free space, but the generality in which it is implemented currently
leads to system calls being interruptible that are not documented this
way. For example git can't handle interrupted unlink(), leading to
corrupt repos under space pressure.
Instead we raise the bar to only be interruptible by SIGKILL.
Thanks to David Sterba for suggesting this.
Signed-off-by: NArne Jansen <sensille@gmx.net>

b9688bb8

Btrfs: double unlock bug in error handling · 253beebd

由 Dan Carpenter 提交于 4月 18, 2012

The caller expects this function to return with the lock held and
releases it immediately on error.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>

253beebd

Btrfs: always store the mirror we read the eb from · 5cf1ab56

由 Josef Bacik 提交于 4月 16, 2012

A user reported a panic where we were trying to fix a bad mirror but the
mirror number we were giving was 0, which is invalid. This is because we
don't do the transid verification until after the read, so as far as the
read code is concerned the read was a success. So instead store the mirror
we read from so that if there is some failure post read we know which mirror
to try next and which mirror needs to be fixed if we find a good copy of the
block. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

5cf1ab56

fs/btrfs/volumes.c: add missing free_fs_devices · 48d28232

由 Julia Lawall 提交于 4月 14, 2012

Free fs_devices as done in the error-handling code just below.
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>

48d28232

btrfs: fix early abort in 'remount' · 8a3db184

由 Sergei Trofimovich 提交于 4月 16, 2012

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Josef Bacik <josef@redhat.com>
Signed-off-by: NSergei Trofimovich <slyfox@gentoo.org>

8a3db184

Btrfs: fix max chunk size check in chunk allocator · 37db63a4

由 Ilya Dryomov 提交于 4月 13, 2012

Fix a bug, where in case we need to adjust stripe_size so that the
length of the resulting chunk is less than or equal to max_chunk_size,
DUP chunks turn out to be only half as big as they could be.

Cc: Arne Jansen <sensille@gmx.net>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

37db63a4

Btrfs: add missing read locks in backref.c · b916a59a

由 Jan Schmidt 提交于 4月 13, 2012

iref_to_path and iterate_irefs both increment the eb's refcount to use it
after releasing the path. Both depend on consistent data remaining in the
extent buffer and need a read lock to protect it.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

b916a59a

Btrfs: don't call free_extent_buffer twice in iterate_irefs · aefc1eb1

由 Jan Schmidt 提交于 4月 13, 2012

Avoid calling free_extent_buffer more than once when the iterator function
returns non-zero. The only code that uses this is scrub repair for corrupted
nodatasum blocks.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

aefc1eb1

Btrfs: Make free_ipath() deal gracefully with NULL pointers · 4735fb28

由 Jesper Juhl 提交于 4月 12, 2012

Make free_ipath() behave like most other freeing functions in the
kernel and gracefully do nothing when passed a NULL pointer.

Besides this making the bahaviour consistent with functions such as
kfree(), vfree(), btrfs_free_path() etc etc, it also fixes a real NULL
deref issue in fs/btrfs/ioctl.c::btrfs_ioctl_ino_to_path(). In that
function we have this code:

...
        ipath = init_ipath(size, root, path);
        if (IS_ERR(ipath)) {
                ret = PTR_ERR(ipath);
                ipath = NULL;
                goto out;
        }
...
out:
        btrfs_free_path(path);
        free_ipath(ipath);
...

If we ever take the true branch of that 'if' statement we'll end up
passing a NULL pointer to free_ipath() which will subsequently
dereference it and we'll go "Boom" :-(
This patch will avoid that.
Signed-off-by: NJesper Juhl <jj@chaosbits.net>

4735fb28

Btrfs: avoid possible use-after-free in clear_extent_bit() · cdc6a395

由 Li Zefan 提交于 3月 12, 2012

clear_extent_bit()
{
    next_node = rb_next(&state->rb_node);
    ...
    clear_state_bit(state);  <-- this may free next_node
    if (next_node) {
        state = rb_entry(next_node);
        ...
    }
}

clear_state_bit() calls merge_state() which may free the next node
of the passing extent_state, so clear_extent_bit() may end up
referencing freed memory.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>

cdc6a395

Btrfs: retrurn void from clear_state_bit · 8e52acf7

由 Li Zefan 提交于 3月 12, 2012

Currently it returns a set of bits that were cleared, but this return
value is not used at all.

Moreover it doesn't seem to be useful, because we may clear the bits
of a few extent_states, but only the cleared bits of last one is
returned.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>

8e52acf7

btrfs: add missing unlocks to transaction abort paths · 871383be

由 David Sterba 提交于 4月 02, 2012

Added in commit 49b25e05
("btrfs: enhance transaction abort infrastructure")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

871383be

Btrfs: do not mount when we have a sectorsize unequal to PAGE_SIZE · 8d082fb7

由 Liu Bo 提交于 4月 03, 2012

Our code is not ready to cope with a sectorsize that's not equal to PAGE_SIZE.
It will lead to hanging-on while writing something.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>

8d082fb7

btrfs: don't add both copies of DUP to reada extent tree · 207a232c

由 Arne Jansen 提交于 2月 25, 2012

Normally when there are 2 copies of a block, we add both to the
reada extent tree and prefetch only the one that is easier to reach.
This way we can better utilize multiple devices.
In case of DUP this makes no sense as both copies reside on the
same device.
Signed-off-by: NArne Jansen <sensille@gmx.net>

207a232c

btrfs: fix race in reada · 8c9c2bf7

由 Arne Jansen 提交于 2月 25, 2012

When inserting into the radix tree returns EEXIST, get the existing
entry without giving up the spinlock in between.
There was a race for both the zones trees and the extent tree.
Signed-off-by: NArne Jansen <sensille@gmx.net>

8c9c2bf7

Btrfs: avoid setting ->d_op twice · 848cce0d

由 Li Zefan 提交于 2月 21, 2012

Follow those instructions, and you'll trigger a warning in the
beginning of d_set_d_op():

  # mkfs.btrfs /dev/loop3
  # mount /dev/loop3 /mnt
  # btrfs sub create /mnt/sub
  # btrfs sub snap /mnt /mnt/snap
  # touch /mnt/snap/sub
  touch: cannot touch `tmp': Permission denied

__d_alloc() set d_op to sb->s_d_op (btrfs_dentry_operations), and
then simple_lookup() reset it to simple_dentry_operations, which
triggered the warning.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>

848cce0d

13 4月, 2012 7 次提交

Btrfs: use commit root when loading free space cache · d53ba474

由 Josef Bacik 提交于 4月 12, 2012

A user reported that booting his box up with btrfs root on 3.4 was way
slower than on 3.3 because I removed the ideal caching code. It turns out
that we don't load the free space cache if we're in a commit for deadlock
reasons, but since we're reading the cache and it hasn't changed yet we are
safe reading the inode and free space item from the commit root, so do that
and remove all of the deadlock checks so we don't unnecessarily skip loading
the free space cache. The user reported this fixed the slowness. Thanks,
Tested-by: NCalvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d53ba474

Btrfs: fix use-after-free in __btrfs_end_transaction · 4edc2ca3

由 Dave Jones 提交于 4月 12, 2012

49b25e05 introduced a use-after-free bug
that caused spurious -EIO's to be returned.

Do the check before we free the transaction.

Cc: David Sterba <dsterba@suse.cz>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: NDave Jones <davej@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

4edc2ca3

Btrfs: check return value of bio_alloc() properly · e627ee7b

由 Tsutomu Itoh 提交于 4月 12, 2012

bio_alloc() has the possibility of returning NULL.
So, it is necessary to check the return value.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e627ee7b

Btrfs: remove lock assert from get_restripe_target() · c6664b42

由 Ilya Dryomov 提交于 4月 12, 2012

This fixes a regression introduced by fc67c450.  spin_is_locked() always
returns 0 on UP kernels, which caused assert in get_restripe_target() to
be fired on every call from btrfs_reduce_alloc_profile() on UP systems.
Remove it completely for now, it's not clear if it's going to be needed
in future.
Reported-by: NBobby Powers <bobbypowers@gmail.com>
Reported-by: NMitch Harder <mitch.harder@sabayonlinux.org>
Tested-by: NMitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c6664b42

Btrfs: fix eof while discarding extents · b89203f7

由 Liu Bo 提交于 4月 12, 2012

We miscalculate the length of extents we're discarding, and it leads to
an eof of device.
Reported-by: NDaniel Blueman <daniel@quora.org>
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b89203f7

Btrfs: fix uninit variable in repair_eb_io_failure · d95603b2

由 Chris Mason 提交于 4月 12, 2012

We'd have to be passing bogus extent buffers for this uninit variable to
actually be used, but set it to zero just in case.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d95603b2

Revert "Btrfs: increase the global block reserve estimates" · 8e62c2de

由 Chris Mason 提交于 4月 12, 2012

This reverts commit 5500cdbe.

We've had a number of complaints of early enospc that bisect down
to this patch.  We'll hae to fix the reservations differently.

CC: stable@kernel.org
Signed-off-by: NChris Mason <chris.mason@oracle.com>

8e62c2de

30 3月, 2012 1 次提交

Btrfs: update the checks for mixed block groups with big metadata blocks · bc3f116f

由 Chris Mason 提交于 3月 29, 2012

Dave Sterba had put in patches to look for mixed data/metadata groups
with metadata bigger than 4KB.  But these ended up in the wrong place
and it wasn't testing the feature flag correctly.

This updates the tests to make sure our sizes are matching
Signed-off-by: NChris Mason <chris.mason@oracle.com>

bc3f116f

29 3月, 2012 4 次提交

Btrfs: update to the right index of defragment · e1f041e1

由 Liu Bo 提交于 3月 29, 2012

When we use autodefrag, we forget to update the index which indicates
the last page we've dirty.  And we'll set dirty flags on a same set of
pages again and again.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e1f041e1

Btrfs: do not bother to defrag an extent if it is a big real extent · 66c26892

由 Liu Bo 提交于 3月 29, 2012

$ mkfs.btrfs /dev/sdb7
$ mount /dev/sdb7 /mnt/btrfs/ -oautodefrag
$ dd if=/dev/zero of=/mnt/btrfs/foobar bs=4k count=10 oflag=direct 2>/dev/null
$ filefrag -v /mnt/btrfs/foobar
Filesystem type is: 9123683e
File size of /mnt/btrfs/foobar is 40960 (10 blocks, blocksize 4096)
 ext logical physical expected length flags
   0       0     3072              10 eof
/mnt/btrfs/foobar: 1 extent found

Now we have a big real extent [0, 40960), but autodefrag will still defrag it.

$ sync
$ filefrag -v /mnt/btrfs/foobar
Filesystem type is: 9123683e
File size of /mnt/btrfs/foobar is 40960 (10 blocks, blocksize 4096)
 ext logical physical expected length flags
   0       0     3082              10 eof
/mnt/btrfs/foobar: 1 extent found

So if we already find a big real extent, we're ok about that, just skip it.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

66c26892

Btrfs: add a check to decide if we should defrag the range · 17ce6ef8

由 Liu Bo 提交于 3月 29, 2012

If our file's layout is as follows:
| hole | data1 | hole | data2 |

we do not need to defrag this file, because this file has holes and
cannot be merged into one extent.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

17ce6ef8

Btrfs: fix recursive defragment with autodefrag option · 4cb13e5d

由 Liu Bo 提交于 3月 29, 2012

$ mkfs.btrfs disk
$ mount disk /mnt -o autodefrag
$ dd if=/dev/zero of=/mnt/foobar bs=4k count=10 2>/dev/null && sync
$ for i in `seq 9 -2 0`; do dd if=/dev/zero of=/mnt/foobar bs=4k count=1 \
  seek=$i conv=notrunc 2> /dev/null; done && sync

then we'll get to defrag "foobar" again and again.
So does option "-o autodefrag,compress".

Reasons:
When the cleaner kthread gets to fetch inodes from the defrag tree and defrag
them, it will dirty pages and submit them, this will comes to another DATA COW
where the processing inode will be inserted to the defrag tree again.

This patch sets a rule for COW code, i.e. insert an inode when we're really
going to make some defragments.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

4cb13e5d

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功