提交 · 31bada7c4e5e80b9cc031e89247175bed58ed9e9 · openeuler / raspberrypi-kernel

08 7月, 2016 11 次提交

Btrfs: fix release reserved extents trace points · 31bada7c

由 Josef Bacik 提交于 3月 25, 2016

We were doing trace_btrfs_release_reserved_extent() in pin_down_extent which
isn't quite right because we will go through and free that extent later when we
unpin, so it messes up apps that are accounting for the reservation space. We
were also unconditionally doing it in __btrfs_free_reserved_extent(), when we
only actually free the reservation instead of pinning the extent. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

31bada7c

Btrfs: add tracepoints for flush events · f376df2b

由 Josef Bacik 提交于 3月 25, 2016

We want to track when we're triggering flushing from our reservation code and
what flushing is being done when we start flushing.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

f376df2b

Btrfs: fix delalloc reservation amount tracepoint · f485c9ee

由 Josef Bacik 提交于 3月 25, 2016

We can sometimes drop the reservation we had for our inode, so we need to remove
that amount from to_reserve so that our tracepoint reports a valid amount of
space.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

f485c9ee

Btrfs: trace pinned extents · c51e7bb1

由 Josef Bacik 提交于 3月 25, 2016

Pinned extents are an important metric to keep track of for enospc.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c51e7bb1

Btrfs: introduce ticketed enospc infrastructure · 957780eb

由 Josef Bacik 提交于 5月 17, 2016

Our enospc flushing sucks. It is born from a time where we were early
enospc'ing constantly because multiple threads would race in for the same
reservation and randomly starve other ones out. So I came up with this solution
to block any other reservations from happening while one guy tried to flush
stuff to satisfy his reservation. This gives us pretty good correctness, but
completely crap latency.

The solution I've come up with is ticketed reservations. Basically we try to
make our reservation, and if we can't we put a ticket on a list in order and
kick off an async flusher thread. This async flusher thread does the same old
flushing we always did, just asynchronously. As space is freed and added back
to the space_info it checks and sees if we have any tickets that need
satisfying, and adds space to the tickets and wakes up anything we've satisfied.

Once the flusher thread stops making progress it wakes up all the current
tickets and tells them to take a hike.

There is a priority list for things that can't flush, since the async flusher
could do anything we need to avoid deadlocks. These guys get priority for
having their reservation made, and will still do manual flushing themselves in
case the async flusher isn't running.

This patch gives us significantly better latencies. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

957780eb

Btrfs: add tracepoint for adding block groups · c83f8eff

由 Josef Bacik 提交于 3月 25, 2016

I'm writing a tool to visualize the enospc system inside btrfs, I need this
tracepoint in order to keep track of the block groups in the system.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c83f8eff

Btrfs: warn_on for unaccounted spaces · d555b6c3

由 Josef Bacik 提交于 3月 25, 2016

These were hidden behind enospc_debug, which isn't helpful as they indicate
actual bugs, unlike the rest of the enospc_debug stuff which is really debug
information.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

d555b6c3

Btrfs: change delayed reservation fallback behavior · c48f49d6

由 Josef Bacik 提交于 3月 25, 2016

We reserve space for the inode update when we first reserve space for writing to
a file. However there are lots of ways that we can use this reservation and not
have it for subsequent ordered extents. Previously we'd fall through and try to
reserve metadata bytes for this, then we'd just steal the full reservation from
the delalloc_block_rsv, and if that didn't have enough space we'd steal the full
reservation from the global reserve. The problem with this is we can easily
just return ENOSPC and fallback to updating the inode item directly. In the
worst case (assuming 4k nodesize) we'd steal 64kib from the global reserve if we
fall all the way through, however if we just fallback and update the inode
directly we'd only steal 4k * BTRFS_PATH_MAX in the worst case which is 32kib.

We would have also just added the extent item for the inode so we likely will
have already cow'ed down most of the way to the leaf containing the inode item,
so we are more often than not only need one or two nodesize's worth of
reservations. Given the reservation for the extent itself is also a worst case
we will likely already have space to cover the inode update.

This change will make us behave better in the theoretical worst case, and much
better in the case that we don't have our reservation and cannot reserve more
metadata. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c48f49d6

Btrfs: always reserve metadata for delalloc extents · 48c3d480

由 Josef Bacik 提交于 3月 25, 2016

There are a few races in the metadata reservation stuff. First we add the bytes
to the block_rsv well after we've set the bit on the inode saying that we have
space for it and after we've reserved the bytes. So use the normal
btrfs_block_rsv_add helper for this case. Secondly we can flush delalloc
extents when we try to reserve space for our write, which means that we could
have used up the space for the inode and we wouldn't know because we only check
before the reservation. So instead make sure we are always reserving space for
the inode update, and then if we don't need it release those bytes afterward.
Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

48c3d480

Btrfs: fix callers of btrfs_block_rsv_migrate · 25d609f8

由 Josef Bacik 提交于 3月 25, 2016

So btrfs_block_rsv_migrate just unconditionally calls block_rsv_migrate_bytes.
Not only this but it unconditionally changes the size of the block_rsv. This
isn't a bug strictly speaking, but it makes truncate block rsv's look funny
because every time we migrate bytes over its size grows, even though we only
want it to be a specific size. So collapse this into one function that takes an
update_size argument and make truncate and evict not update the size for
consistency sake. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

25d609f8

Btrfs: add bytes_readonly to the spaceinfo at once · e40edf2d

由 Josef Bacik 提交于 3月 25, 2016

For some reason we're adding bytes_readonly to the space info after we update
the space info with the block group info. This creates a tiny race where we
could over-reserve space because we haven't yet taken out the bytes_readonly
bit. Since we already know this information at the time we call
update_space_info, just pass it along so it can be updated all at once. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e40edf2d

25 6月, 2016 1 次提交

Btrfs: fix ->iterate_shared() by upgrading i_rwsem for delayed nodes · 02dbfc99

由 Omar Sandoval 提交于 5月 20, 2016

Commit fe742fd4 ("Revert "btrfs: switch to ->iterate_shared()"")
backed out the conversion to ->iterate_shared() for Btrfs because the
delayed inode handling in btrfs_real_readdir() is racy. However, we can
still do readdir in parallel if there are no delayed nodes.

This is a temporary fix which upgrades the shared inode lock to an
exclusive lock only when we have delayed items until we come up with a
more complete solution. While we're here, rename the
btrfs_{get,put}_delayed_items functions to make it very clear that
they're just for readdir.

Tested with xfstests and by doing a parallel kernel build:

	while make tinyconfig && make -j4 && git clean dqfx; do
		:
	done

along with a bunch of parallel finds in another shell:

	while true; do
		for ((i=0; i<4; i++)); do
			find . >/dev/null &
		done
		wait
	done
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

02dbfc99

24 6月, 2016 4 次提交

Btrfs: Force stripesize to the value of sectorsize · b7f67055

由 Chandan Rajendra 提交于 6月 23, 2016

Btrfs code currently assumes stripesize to be same as
sectorsize. However Btrfs-progs (until commit
df05c7ed455f519e6e15e46196392e4757257305) has been setting
btrfs_super_block->stripesize to a value of 4096.

This commit makes sure that the value of btrfs_super_block->stripesize
is a power of 2. Later, it unconditionally sets btrfs_root->stripesize
to sectorsize.
Signed-off-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

b7f67055

btrfs: fix disk_i_size update bug when fallocate() fails · c0d2f610

由 Wang Xiaoguang 提交于 6月 22, 2016

When doing truncate operation, btrfs_setsize() will first call
truncate_setsize() to set new inode->i_size, but if later
btrfs_truncate() fails, btrfs_setsize() will call
"i_size_write(inode, BTRFS_I(inode)->disk_i_size)" to reset the
inmemory inode size, now bug occurs. It's because for truncate
case btrfs_ordered_update_i_size() directly uses inode->i_size
to update BTRFS_I(inode)->disk_i_size, indeed we should use the
"offset" argument to update disk_i_size. Here is the call graph:
==>btrfs_truncate()
====>btrfs_truncate_inode_items()
======>btrfs_ordered_update_i_size(inode, last_size, NULL);
Here btrfs_ordered_update_i_size()'s offset argument is last_size.

And below test case can reveal this bug:

dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=100
dev=$(losetup --show -f fs.img)
mkdir -p /mnt/mntpoint
mkfs.btrfs  -f $dev
mount $dev /mnt/mntpoint
cd /mnt/mntpoint

echo "workdir is: /mnt/mntpoint"
blocksize=$((128 * 1024))
dd if=/dev/zero of=testfile bs=$blocksize count=1
sync
count=$((17*1024*1024*1024/blocksize))
echo "file size is:" $((count*blocksize))
for ((i = 1; i <= $count; i++)); do
	i=$((i + 1))
	dst_offset=$((blocksize * i))
	xfs_io -f -c "reflink testfile 0 $dst_offset $blocksize"\
		testfile > /dev/null
done
sync

truncate --size 0 testfile
ls -l testfile
du -sh testfile
exit

In this case, truncate operation will fail for enospc reason and
"du -sh testfile" returns value greater than 0, but testfile's
size is 0, we need to reflect correct inode->i_size.
Signed-off-by: NWang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

c0d2f610

Btrfs: fix error handling in map_private_extent_buffer · 415b35a5

由 Liu Bo 提交于 6月 17, 2016

map_private_extent_buffer() can return -EINVAL in two different cases,
1. when the requested contents span two pages if nodesize is larger
   than pagesize,
2. when it detects something insane.

The 2nd one used to be only a WARN_ON(1), and we decided to return a error
to callers, but we didn't fix up all its callers, which will be
addressed by this patch.

Without this, btrfs may end up with 'general protection', ie.
reading invalid memory.
Reported-by: NVegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

415b35a5

Btrfs: fix error return code in btrfs_init_test_fs() · 04e1b65a

由 Wei Yongjun 提交于 6月 17, 2016

Fix to return a negative error code from the kern_mount() error handling
case instead of 0(ret is set to 0 by register_filesystem), as done
elsewhere in this function.
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

04e1b65a

23 6月, 2016 3 次提交

Btrfs: don't do nocow check unless we have to · c6887cd1

由 Josef Bacik 提交于 3月 25, 2016

Before we write into prealloc/nocow space we have to make sure that there are no
references to the extents we are writing into, which means checking the extent
tree and csum tree in the case of nocow.  So we don't want to do the nocow dance
unless we can't reserve data space, since it's a serious drag on performance.
With the following sequence

fallocate -l10737418240 /mnt/btrfs-test/file
cp --reflink /mnt/btrfs-test/file /mnt/btrfs-test/link
fio --name=randwrite --rw=randwrite --bs=4k --filename=/mnt/btrfs-test/file \
	--end_fsync=1

we get the worst case scenario where we have to fall back on to doing the check
anyway.

Without this patch
lat (usec): min=5, max=111598, avg=27.65, stdev=124.51
write: io=10240MB, bw=126876KB/s, iops=31718, runt= 82646msec

With this patch
lat (usec): min=3, max=91210, avg=14.09, stdev=110.62
write: io=10240MB, bw=212753KB/s, iops=53188, runt= 49286msec

We get twice the throughput, half of the runtime, and half of the average
latency.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
[ PAGE_CACHE_ removal related fixups ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

c6887cd1

btrfs: fix deadlock in delayed_ref_async_start · 0f873eca

由 Chris Mason 提交于 4月 27, 2016

"Btrfs: track transid for delayed ref flushing" was deadlocking on
btrfs_attach_transaction because its not safe to call from the async
delayed ref start code.  This commit brings back btrfs_join_transaction
instead and checks for a blocked commit.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

0f873eca

Btrfs: track transid for delayed ref flushing · 31b9655f

由 Josef Bacik 提交于 4月 11, 2016

Using the offwakecputime bpf script I noticed most of our time was spent waiting
on the delayed ref throttling. This is what is supposed to happen, but
sometimes the transaction can commit and then we're waiting for throttling that
doesn't matter anymore. So change this stuff to be a little smarter by tracking
the transid we were in when we initiated the throttling. If the transaction we
get is different then we can just bail out. This resulted in a 50% speedup in
my fs_mark test, and reduced the amount of time spent throttling by 60 seconds
over the entire run (which is about 30 minutes). Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

31b9655f

18 6月, 2016 8 次提交

Btrfs: btrfs_check_super_valid: Allow 4096 as stripesize · dd5c9311

由 Chandan Rajendra 提交于 6月 16, 2016

Older btrfs-progs/mkfs.btrfs sets 4096 as the stripesize. Hence
restricting stripesize to be equal to sectorsize would cause super block
validation to return an error on architectures where PAGE_SIZE is not
equal to 4096.

Hence as a workaround, this commit allows stripesize to be set to 4096
bytes.
Signed-off-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

dd5c9311

btrfs: remove build fixup for qgroup_account_snapshot · 89c5a544

由 David Sterba 提交于 6月 16, 2016

Introduced in 2c1984f2 ("btrfs: build fixup for
qgroup_account_snapshot") as temporary bisectability build fixup.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

89c5a544

D
btrfs: use new error message helper in qgroup_account_snapshot · f7af3934
由 David Sterba 提交于 6月 17, 2016
```
We've renamed btrfs_std_error, this one is left from last merge.
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
f7af3934

btrfs: avoid blocking open_ctree from cleaner_kthread · 90c711ab

由 Zygo Blaxell 提交于 6月 12, 2016

This fixes a problem introduced in commit 2f3165ec
"btrfs: don't force mounts to wait for cleaner_kthread to delete one or more subvolumes".

open_ctree eventually calls btrfs_replay_log which in turn calls
btrfs_commit_super which tries to lock the cleaner_mutex, causing a
recursive mutex deadlock during mount.

Instead of playing whack-a-mole trying to keep up with all the
functions that may want to lock cleaner_mutex, put all the cleaner_mutex
lockers back where they were, and attack the problem more directly:
keep cleaner_kthread asleep until the filesystem is mounted.

When filesystems are mounted read-only and later remounted read-write,
open_ctree did not set fs_info->open and neither does anything else.
Set this flag in btrfs_remount so that neither btrfs_delete_unused_bgs
nor cleaner_kthread get confused by the common case of "/" filesystem
read-only mount followed by read-write remount.
Signed-off-by: NZygo Blaxell <ce3g8jdj@umail.furryterror.org>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

90c711ab

Btrfs: don't BUG_ON() in btrfs_orphan_add · 3b6571c1

由 Josef Bacik 提交于 5月 27, 2016

This is just a screwup for developers, so change it to an ASSERT() so developers
notice when things go wrong and deal with the error appropriately if ASSERT()
isn't enabled.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Reviewed-by: NMark Fasheh <mfasheh@suse.de>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

3b6571c1

btrfs: account for non-CoW'd blocks in btrfs_abort_transaction · 64c12921

由 Jeff Mahoney 提交于 6月 08, 2016

The test for !trans->blocks_used in btrfs_abort_transaction is
insufficient to determine whether it's safe to drop the transaction
handle on the floor.  btrfs_cow_block, informed by should_cow_block,
can return blocks that have already been CoW'd in the current
transaction.  trans->blocks_used is only incremented for new block
allocations. If an operation overlaps the blocks in the current
transaction entirely and must abort the transaction, we'll happily
let it clean up the trans handle even though it may have modified
the blocks and will commit an incomplete operation.

In the long-term, I'd like to do closer tracking of when the fs
is actually modified so we can still recover as gracefully as possible,
but that approach will need some discussion.  In the short term,
since this is the only code using trans->blocks_used, let's just
switch it to a bool indicating whether any blocks were used and set
it when should_cow_block returns false.

Cc: stable@vger.kernel.org # 3.4+
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Reviewed-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

64c12921

Btrfs: check if extent buffer is aligned to sectorsize · c871b0f2

由 Liu Bo 提交于 6月 06, 2016

Thanks to fuzz testing, we can pass an invalid bytenr to extent buffer
via alloc_extent_buffer().  An unaligned eb can have more pages than it
should have, which ends up extent buffer's leak or some corrupted content
in extent buffer.

This adds a warning to let us quickly know what was happening.

Now that alloc_extent_buffer() no more returns NULL, this changes its
caller and callers of its caller to match with the new error
handling.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c871b0f2

btrfs: Use correct format specifier · 16ff4b45

由 Heinrich Schuchardt 提交于 6月 11, 2016

Component mirror_num of struct btrfsic_block is defined
as unsigned int. Use %u as format specifier.
Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

16ff4b45

06 6月, 2016 9 次提交

Btrfs: self-tests: Fix extent buffer bitmap test fail on BE system · 34b3e6c9

由 Feifei Xu 提交于 6月 01, 2016

In __test_eb_bitmaps(), we write random data to a bitmap. Then copy
the bitmap to another bitmap that resides inside an extent buffer.
Later we verify the values of corresponding bits in the bitmap and the
bitmap inside the extent buffer. However, extent_buffer_test_bit()
reads in byte granularity while test_bit() reads in unsigned long
granularity. Hence we end up comparing wrong bits on big-endian
systems such as ppc64. This commit fixes the issue by reading the
bitmap in byte granularity.
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Reviewed-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: NFeifei Xu <xufeifei@linux.vnet.ibm.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

34b3e6c9

Btrfs: self-tests: Fix test_bitmaps fail on 64k sectorsize · 36b3dc05

由 Feifei Xu 提交于 6月 01, 2016

With 64K sectorsize, 1G sized block group cannot span across bitmaps.
To execute test_bitmaps() function, this commit allocates
"BITS_PER_BITMAP * sectorsize + PAGE_SIZE" sized block group.
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Reviewed-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: NFeifei Xu <xufeifei@linux.vnet.ibm.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

36b3dc05

Btrfs: self-tests: Use macros instead of constants and add missing newline · ef9f2db3

由 Feifei Xu 提交于 6月 01, 2016

This commit replaces numerical constants with appropriate
preprocessor macros.
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: NFeifei Xu <xufeifei@linux.vnet.ibm.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ef9f2db3

Btrfs: self-tests: Support testing all possible sectorsizes and nodesizes · d94f43b4

由 Feifei Xu 提交于 6月 01, 2016

To test all possible sectorsizes, this commit adds a sectorsize
array. This commit executes the tests for all possible sectorsizes and
nodesizes.
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: NFeifei Xu <xufeifei@linux.vnet.ibm.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

d94f43b4

Btrfs: self-tests: Execute page straddling test only when nodesize < PAGE_SIZE · ed9e4afd

由 Feifei Xu 提交于 6月 01, 2016

On ppc64, PAGE_SIZE is 64k which is same as BTRFS_MAX_METADATA_BLOCKSIZE.
In such a scenario, we will never be able to have an extent buffer
containing more than one page. Hence in such cases this commit does not
execute the page straddling tests.
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NFeifei Xu <xufeifei@linux.vnet.ibm.com>
Signed-off-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ed9e4afd

btrfs: advertise which crc32c implementation is being used at module load · 5f9e1059

由 Jeff Mahoney 提交于 9月 16, 2015

Since several architectures support hardware-accelerated crc32c
calculation, it would be nice to confirm that btrfs is actually using it.

We can see an elevated use count for the module, but it doesn't actually
show who the users are.  This patch simply prints the name of the driver
after successfully initializing the shash.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
[ added a helper and used in module load-time message ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

5f9e1059

Btrfs: add validadtion checks for chunk loading · e06cd3dd

由 Liu Bo 提交于 6月 03, 2016

To prevent fuzzed filesystem images from panic the whole system,
we need various validation checks to refuse to mount such an image
if btrfs finds any invalid value during loading chunks, including
both sys_array and regular chunks.

Note that these checks may not be sufficient to cover all corner cases,
feel free to add more checks.
Reported-by: NVegard Nossum <vegard.nossum@oracle.com>
Reported-by: NQuentin Casasnovas <quentin.casasnovas@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e06cd3dd

Btrfs: add more validation checks for superblock · 99e3ecfc

由 Liu Bo 提交于 6月 03, 2016

This adds validation checks for super_total_bytes, super_bytes_used and
super_stripesize, super_num_devices.
Reported-by: NVegard Nossum <vegard.nossum@oracle.com>
Reported-by: NQuentin Casasnovas <quentin.casasnovas@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

99e3ecfc

Btrfs: clear uptodate flags of pages in sys_array eb · d865177a

由 Liu Bo 提交于 6月 03, 2016

We set uptodate flag to pages in the temporary sys_array eb,
but do not clear the flag after free eb.  As the special
btree inode may still hold a reference on those pages, the
uptodate flag can remain alive in them.

If btrfs_super_chunk_root has been intentionally changed to the
offset of this sys_array eb, reading chunk_root will read content
of sys_array and it will skip our beautiful checks in
btree_readpage_end_io_hook() because of
"pages of eb are uptodate => eb is uptodate"

This adds the 'clear uptodate' part to force it to read from disk.
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

d865177a

04 6月, 2016 1 次提交

Btrfs: deal with duplciates during extent_map insertion in btrfs_get_extent · 8dff9c85

由 Chris Mason 提交于 9月 19, 2015

When dealing with inline extents, btrfs_get_extent will incorrectly try
to insert a duplicate extent_map.  The dup hits -EEXIST from
add_extent_map, but then we try to merge with the existing one and end
up trying to insert a zero length extent_map.

This actually works most of the time, except when there are extent maps
past the end of the inline extent.  rocksdb will trigger this sometimes
because it preallocates an extent and then truncates down.

Josef made a script to trigger with xfs_io:

	#!/bin/bash

	xfs_io -f -c "pwrite 0 1000" inline
	xfs_io -c "falloc -k 4k 1M" inline
	xfs_io -c "pread 0 1000" -c "fadvise -d 0 1000" -c "pread 0 1000" inline
	xfs_io -c "fadvise -d 0 1000" inline
	cat inline

You'll get EIOs trying to read inline after this because add_extent_map
is returning EEXIST
Signed-off-by: NChris Mason <clm@fb.com>

8dff9c85

03 6月, 2016 3 次提交

Btrfs: self-tests: Support non-4k page size · b9ef22de

由 Feifei Xu 提交于 6月 01, 2016

self-tests code assumes 4k as the sectorsize and nodesize. This commit
fix hardcoded 4K. Enables the self-tests code to be executed on non-4k
page sized systems (e.g. ppc64).
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NFeifei Xu <xufeifei@linux.vnet.ibm.com>
Signed-off-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

b9ef22de

Btrfs: Fix integer overflow when calculating bytes_per_bitmap · 0ef6447a

由 Feifei Xu 提交于 6月 01, 2016

On ppc64, bytes_per_bitmap will be (65536*8*65536). Hence append UL to
fix integer overflow.
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Reviewed-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: NFeifei Xu <xufeifei@linux.vnet.ibm.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

0ef6447a

Btrfs: test_check_exists: Fix infinite loop when searching for free space entries · 5473e0c4

由 Feifei Xu 提交于 6月 01, 2016

On a ppc64 machine using 64K as the block size, assume that the RB
tree at btrfs_free_space_ctl->free_space_offset contains following
two entries:

1. A bitmap entry having an offset value of 0 and having the bits
   corresponding to the address range [128M+512K, 128M+768K] set.
2. An extent entry corresponding to the address range
   [128M-256K, 128M-128K]

In such a scenario, test_check_exists() invoked for checking the
existence of address range [128M+768K, 256M] can lead to an
infinite loop as explained below:

- Checking for the extent entry fails.
- Checking for a bitmap entry results in the free space info in
  range [128M+512K, 128M+768K] beng returned.
- rb_prev(info) returns NULL because the bitmap entry starting from
  offset 0 comes first in the RB tree.
- current_node = bitmap node.
- while (current_node)
	tmp = rb_next(bitmap_node);/*tmp is extent based free space entry*/
	Since extent based free space entry's last address is smaller
	than the address being searched for (i.e. 128M+768K) we
	incorrectly again obtain the extent node as the "next right node"
	of the RB tree and thus end up looping infinitely.

This patch fixes the issue by checking the "tmp" variable which point
to the most recently searched free space node.
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Reviewed-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: NFeifei Xu <xufeifei@linux.vnet.ibm.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

5473e0c4