提交 · 87826df0ec36fc28884b4ddbb3f3af41c4c2008f · gsplhtlxg / clone-Linux

15 2月, 2012 7 次提交

btrfs: delalloc for page dirtied out-of-band in fixup worker · 87826df0

由 Jeff Mahoney 提交于 2月 15, 2012

 We encountered an issue that was easily observable on s/390 systems but
 could really happen anywhere. The timing just seemed to hit reliably
 on s/390 with limited memory.

 The gist is that when an unexpected set_page_dirty() happened, we'd
 run into the BUG() in btrfs_writepage_fixup_worker since it wasn't
 properly set up for delalloc.

 This patch does the following:
 - Performs the missing delalloc in the fixup worker
 - Allow the start hook to return -EBUSY which informs __extent_writepage
   that it should mark the page skipped and not to redirty it. This is
   required since the fixup worker can fail with -ENOSPC and the page
   will have already been redirtied. That causes an Oops in
   drop_outstanding_extents later. Retrying the fixup worker could
   lead to an infinite loop. Deferring the page redirty also saves us
   some cycles since the page would be stuck in a resubmit-redirty loop
   until the fixup worker completes. It's not harmful, just wasteful.
 - If the fixup worker fails, we mark the page and mapping as errored,
   and end the writeback, similar to what we would do had the page
   actually been submitted to writeback.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>

87826df0

Btrfs: fix memory leak in load_free_space_cache() · a7e221e9

由 Tsutomu Itoh 提交于 2月 14, 2012

load_free_space_cache() has forgotten to free path.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>

a7e221e9

btrfs: don't check DUP chunks twice · 859acaf1

由 Arne Jansen 提交于 2月 09, 2012

Because scrub enumerates the dev extent tree to find the chunks to scrub,
it currently finds each DUP chunk twice and also scrubs it twice. This
patch makes sure that scrub_chunk only checks that part of the chunk the
dev extent has been found for. This only changes the behaviour for DUP
chunks.
Reported-and-tested-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NArne Jansen <sensille@gmx.net>

859acaf1

Btrfs: fix trim 0 bytes after a device delete · 2cac13e4

由 Liu Bo 提交于 2月 09, 2012

A user reported a bug of btrfs's trim, that is we will trim 0 bytes
after a device delete.

The reproducer:

$ mkfs.btrfs disk1
$ mkfs.btrfs disk2
$ mount disk1 /mnt
$ fstrim -v /mnt
$ btrfs device add disk2 /mnt
$ btrfs device del disk1 /mnt
$ fstrim -v /mnt

This is because after we delete the device, the block group may start from
a non-zero place, which will confuse trim to discard nothing.
Reported-by: NLutz Euler <lutz.euler@freenet.de>
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>

2cac13e4

Btrfs: return the internal error unchanged if btrfs_get_extent_fiemap() call... · 6af021d8

由 Jeff Liu 提交于 2月 09, 2012

Btrfs: return the internal error unchanged if btrfs_get_extent_fiemap() call failed for SEEK_DATA/SEEK_HOLE inquiry

Given that ENXIO only means "offset beyond EOF" for either SEEK_DATA or SEEK_HOLE inquiry
in a desired file range, so we should return the internal error unchanged if btrfs_get_extent_fiemap()
call failed, rather than ENXIO.

Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: NJie Liu <jeff.liu@oracle.com>

6af021d8

Btrfs: avoid positive number with ERR_PTR · 8f24b496

由 Jan Schmidt 提交于 2月 08, 2012

inode_ref_info() returns 1 when the element wasn't found and < 0 on error,
just like btrfs_search_slot(). In iref_to_path() it's an error when the
inode ref can't be found, thus we return ERR_PTR(ret) in that case. In order
to avoid ERR_PTR(1), we now set ret to -ENOENT in that case.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

8f24b496

btrfs: Sector Size check during Mount · 941b2ddf

由 Keith Mannthey 提交于 11月 29, 2011

Gracefully fail when trying to mount a BTRFS file system that has a
sectorsize smaller than PAGE_SIZE.

On PPC it is possible to build a FS while using a 4k PAGE_SIZE kernel
then boot into a 64K PAGE_SIZE kernel.  Presently open_ctree fails in an
endless loop and hangs the machine in this situation.

My debugging has show this Sector size < Page size to be a non trivial
situation and a graceful exit from the situation would be nice for the
time being.
Signed-off-by: NKeith Mannthey <kmannth@us.ibm.com>

941b2ddf

01 2月, 2012 1 次提交

Btrfs: don't reserve data with extents locked in btrfs_fallocate · d98456fc

由 Chris Mason 提交于 1月 31, 2012

btrfs_fallocate tries to allocate space only if ranges in the file don't
already exist.  But the enospc checks it does are not allowed with
extents locked.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d98456fc

27 1月, 2012 11 次提交

Btrfs: fix reservations in btrfs_page_mkwrite · 9998eb70

由 Chris Mason 提交于 1月 25, 2012

Josef fixed btrfs_page_mkwrite to properly release reserved
extents if there was an error.  But if we fail to get a reservation
and we fail to dirty the inode (for ENOSPC reasons), we'll end up
trying to release a reservation we never had.

This makes sure we only release if we were able to reserve.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

9998eb70

Btrfs: advance window_start if we're using a bitmap · 9b230628

由 Josef Bacik 提交于 1月 26, 2012

If we span a long area in a bitmap we could end up taking a lot of time
searching to the next free area if we're searching from the original
window_start, so advance window_start in order to make sure we don't do any
superficial searching.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

9b230628

btrfs: mask out gfp flags in releasepage · 0c4e538b

由 David Sterba 提交于 1月 26, 2012

btree_releasepage is a callback and can be passed unknown gfp flags and then
they may end up in kmem_cache_alloc called from alloc_extent_state, slab
allocator will BUG_ON when there is HIGHMEM or DMA32 flag set.

This may happen when btrfs is mounted from a loop device, which masks out
__GFP_IO flag. The check in try_release_extent_state

3399                 if ((mask & GFP_NOFS) == GFP_NOFS)
3400                         mask = GFP_NOFS;

will not work and passes unfiltered flags further resulting in crash at
mm/slab.c:2963

 [<000000000024ae4c>] cache_alloc_refill+0x3b4/0x5c8
 [<000000000024c810>] kmem_cache_alloc+0x204/0x294
 [<00000000001fd3c2>] mempool_alloc+0x52/0x170
 [<000003c000ced0b0>] alloc_extent_state+0x40/0xd4 [btrfs]
 [<000003c000cee5ae>] __clear_extent_bit+0x38a/0x4cc [btrfs]
 [<000003c000cee78c>] try_release_extent_state+0x9c/0xd4 [btrfs]
 [<000003c000cc4c66>] btree_releasepage+0x7e/0xd0 [btrfs]
 [<0000000000210d84>] shrink_page_list+0x6a0/0x724
 [<0000000000211394>] shrink_inactive_list+0x230/0x578
 [<0000000000211bb8>] shrink_list+0x6c/0x120
 [<0000000000211e4e>] shrink_zone+0x1e2/0x228
 [<0000000000211f24>] shrink_zones+0x90/0x254
 [<0000000000213410>] do_try_to_free_pages+0xac/0x420
 [<0000000000213ae0>] try_to_free_pages+0x13c/0x1b0
 [<0000000000204e6c>] __alloc_pages_nodemask+0x5b4/0x9a8
 [<00000000001fb04a>] grab_cache_page_write_begin+0x7e/0xe8
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

0c4e538b

Btrfs: fix enospc error caused by wrong checks of the chunk · 9e622d6b

由 Miao Xie 提交于 1月 26, 2012

When we did sysbench test for inline files, enospc error happened easily though
there was lots of free disk space which could be allocated for new chunks.

Reproduce steps:
 # mkfs.btrfs -b $((2 * 1024 * 1024 * 1024)) <test partition>
 # mount <test partition> /mnt
 # ulimit -n 102400
 # cd /mnt
 # sysbench --num-threads=1 --test=fileio --file-num=81920 \
 > --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
 > --file-test-mode=seqwr prepare
 # sysbench --num-threads=1 --test=fileio --file-num=81920 \
 > --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
 > --file-test-mode=seqwr run
 <soon later, BUG_ON() was triggered by enospc error>

The reason of this bug is:
Now, we can reserve space which is larger than the free space in the chunks if
we have enough free disk space which can be used for new chunks. By this way,
the space allocator should allocate a new chunk by force if there is no free
space in the free space cache. But there are two wrong checks which break this
operation.

One is
	if (ret == -ENOSPC && num_bytes > min_alloc_size)
in btrfs_reserve_extent(), it is wrong, we should try to allocate a new chunk
even we fail to allocate free space by minimum allocable size.

The other is
	if (space_info->force_alloc)
		force = space_info->force_alloc;
in do_chunk_alloc(). It makes the allocator ignore CHUNK_ALLOC_FORCE If someone
sets ->force_alloc to CHUNK_ALLOC_LIMITED, and makes the enospc error happen.

Fix these two wrong checks. Especially the second one, we fix it by changing
the value of CHUNK_ALLOC_LIMITED and CHUNK_ALLOC_FORCE, and make
CHUNK_ALLOC_FORCE greater than CHUNK_ALLOC_LIMITED since CHUNK_ALLOC_FORCE has
higher priority. And if the value which is passed in by the caller is greater
than ->force_alloc, use the passed value.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

9e622d6b

Btrfs: do not defrag a file partially · 7ec31b54

由 Liu Bo 提交于 1月 26, 2012

xfstests 218 complains that btrfs defrags a file partially:
 After: 1
 Write backwards sync, but contiguous - should defrag to 1 extent
 Before: 10
-After: 1
+After: 2

To fix this, we need to set max_to_defrag count properly.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

7ec31b54

Btrfs: fix warning for 32-bit build of fs/btrfs/check-integrity.c · 0b485143

由 Stefan Behrens 提交于 1月 26, 2012

There have been 4 warnings on 32-bit build, they are herewith fixed.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

0b485143

Btrfs: use cluster->window_start when allocating from a cluster bitmap · 0b4a9d24

由 Josef Bacik 提交于 1月 26, 2012

We specifically set window_start in the cluster struct to indicate where the
cluster starts in a bitmap, but we've been using min_start to indicate where
we're searching from. This is usually the start of the blockgroup, so
essentially means we're constantly searching from the start of any bitmap we
find, which completely negates all the trouble we go to in order to setup a
cluster. So start using window_start to make sure we actually use the area we
found. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

0b4a9d24

Btrfs: Check for NULL page in extent_range_uptodate · 8bedd51b

由 Mitch Harder 提交于 1月 26, 2012

A user has encountered a NULL pointer kernel oops in btrfs when
encountering media errors.  The problem has been identified
as an unhandled NULL pointer returned from find_get_page().
This modification simply checks for a NULL page, and returns
with an error if found (the extent_range_uptodate() function
returns 1 on errors).

After testing this patch, the user reported that the error with
the NULL pointer oops was solved.  However, there is still a
remaining problem with a thread becoming stuck in
wait_on_page_locked(page) in the read_extent_buffer_pages(...)
function in extent_io.c

       for (i = start_i; i < num_pages; i++) {
               page = extent_buffer_page(eb, i);
               wait_on_page_locked(page);
               if (!PageUptodate(page))
                       ret = -EIO;
       }

This patch leaves the issue with the locked page yet to be resolved.
Signed-off-by: NMitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

8bedd51b

btrfs: Fix busyloops in transaction waiting code · 6dd70ce4

由 Jan Kara 提交于 1月 26, 2012

wait_log_commit() and wait_for_writer() were using slightly different
conditions for deciding whether they should call schedule() and whether they
should continue in the wait loop. Thus it could happen that we busylooped when
the first condition was not true while the second one was. That is burning CPU
cycles needlessly and is deadly on UP machines...
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

6dd70ce4

Btrfs: make sure a bitmap has enough bytes · 357b9784

由 Josef Bacik 提交于 1月 26, 2012

We have only been checking for min_bytes available in bitmap entries, but we
won't successfully setup a bitmap cluster unless it has at least bytes in the
bitmap, so in the common case min_bytes is 4k and we want something like 2MB, so
if there are a bunch of bitmap entries with less than 2mb's in them, we'll
search all them anyway, which is suboptimal. Fix this check. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

357b9784

Btrfs: fix uninit warning in backref.c · b1375d64

由 Jan Schmidt 提交于 1月 26, 2012

Added initialization with the declaration of ret. It isn't set later on the
switch-default branch (which should never be taken).
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b1375d64

17 1月, 2012 21 次提交

Btrfs: use larger system chunks · 96bdc7dc

由 Chris Mason 提交于 1月 16, 2012

system chunks by default are very small.  This makes them slightly
larger and also fixes the conditional checks to make sure we don't
allocate a billion of them at once.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

96bdc7dc

Btrfs: add a delalloc mutex to inodes for delalloc reservations · f248679e

由 Josef Bacik 提交于 1月 13, 2012

I was using i_mutex for this, but we're getting bogus lockdep warnings by doing
that and theres no real way to get rid of those, so just stop using i_mutex to
protect delalloc metadata reservations and use a delalloc mutex instead. This
shouldn't be contended often at all, only if you are writing and mmap writing to
the file at the same time. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

f248679e

Btrfs: space leak tracepoints · 8c2a3ca2

由 Josef Bacik 提交于 1月 10, 2012

This in addition to a script in my btrfs-tracing tree will help track down space
leaks when we're getting space left over in block groups on umount. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

8c2a3ca2

Btrfs: protect orphan block rsv with spin_lock · 90290e19

由 Josef Bacik 提交于 12月 02, 2011

We've been seeing warnings coming out of the orphan commit stuff forever from
ceph. Turns out it's because we're racing with checking if the orphan block
reserve is set, because we clear it outside of the spin_lock. So leave the
normal fastpath checks where they are, but take the spin_lock and _recheck_ to
make sure we haven't had an orphan block rsv added in the meantime. Then clear
the root's orphan block rsv and release the lock. With this patch a user said
the warnings went away and they usually showed up pretty soon after he started
ceph. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

90290e19

Btrfs: add allocator tracepoints · 3f7de037

由 Josef Bacik 提交于 11月 10, 2011

I used these tracepoints when figuring out what the cluster stuff was doing, so
add them to mainline in case we need to profile this stuff again. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

3f7de037

Btrfs: don't call btrfs_throttle in file write · 45a8090e

由 Josef Bacik 提交于 1月 12, 2012

Btrfs_throttle will make us wait if there is a currently committing transaction
until we can open new transactions, which is ridiculous since we don't actually
start any transactions within the file write path anyway, so all this does is
introduce big latencies if we have a sync/fsync heavy workload going on while
somebody else is trying to do work. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

45a8090e

Btrfs: release space on error in page_mkwrite · ec39e180

由 Josef Bacik 提交于 1月 12, 2012

If updating the inode gave us an ENOSPC we were just returning in page_mkwrite,
which is a problem since we make our reservation right before trying to update
the inode, so fix the out label so that we actually free our reservation.
Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ec39e180

Btrfs: fix btrfsck error 400 when truncating a compressed · f70a9a6b

由 Miao Xie 提交于 1月 12, 2012

Reproduce steps:
 # mkfs.btrfs /dev/sdb5
 # mount /dev/sdb5 -o compress=lzo /mnt
 # dd if=/dev/zero of=/mnt/tmpfile bs=128K count=1
 # sync
 # truncate -s 64K /mnt/tmpfile
 root 5 inode 257 errors 400

This is because of the wrong if condition, which is used to check if we should
subtract the bytes of the dropped range from i_blocks/i_bytes of i-node or not.
When we truncate a compressed extent, btrfs substracts the bytes of the whole
extent, it's wrong. We should substract the real size that we truncate, no
matter it is a compressed extent or not. Fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f70a9a6b

Btrfs: do not use btrfs_end_transaction_throttle everywhere · 7ad85bb7

由 Josef Bacik 提交于 1月 12, 2012

A user reported a problem where things like open with O_CREAT would take up to
30 seconds when he had nfs activity on the same mount. This is because all of
our quick metadata operations, like create, symlink etc all do
btrfs_end_transaction_throttle, which if the transaction is blocked will wait
for the commit to complete before it returns. This adds a ridiculous amount of
latency and isn't really needed. The normal btrfs_end_transaction will mark the
transaction as blocked and wake the transaction kthread up if it thinks the
transaction needs to end (this being in the running out of global reserve space
scenario), and this is all that is really needed since we've already done
everything we're going to do, we just need to return. This should help people
with the latency they were seeing when using synchronous heavy workloads.
Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

7ad85bb7

I
Btrfs: add balance progress reporting · 19a39dce
由 Ilya Dryomov 提交于 1月 16, 2012
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
19a39dce

Btrfs: allow for resuming restriper after it was paused · de322263