提交 · b7831b20f32019b741eb8fe3435c2516e13e0c4a · openanolis / cloud-kernel

18 9月, 2014 40 次提交

Btrfs: show real function name in btrfs workqueue tracepoint · b7831b20

由 Liu Bo 提交于 8月 15, 2014

Use %pf instead of %p, just same as kernel workqueue tracepoints.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

b7831b20

Btrfs: shrink further sizeof(struct extent_buffer) · 2a39e598

由 Filipe Manana 提交于 8月 14, 2014

The map_start and map_len fields aren't used anywhere, so just remove
them. On a x86_64 system, this reduced sizeof(struct extent_buffer)
from 296 bytes to 280 bytes, and therefore 14 extent_buffer structs can
now fit into a page instead of 13.
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

2a39e598

Btrfs: send, lower mem requirements for processing xattrs · 4395e0c4

由 Filipe Manana 提交于 8月 20, 2014

Maximum xattr size can be up to nearly the leaf size. For an fs with a
leaf size larger than the page size, using kmalloc requires allocating
multiple pages that are contiguous, which might not be possible if
there's heavy memory fragmentation. Therefore fallback to vmalloc if
we fail to allocate with kmalloc. Also start with a smaller buffer size,
since xattr values typically are smaller than a page.
Reported-by: NChris Murphy <lists@colorremedies.com>
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

4395e0c4

btrfs: remove stale define after removing ordered operations · f87c4318

由 David Sterba 提交于 8月 20, 2014

Last user removed in commit "btrfs: disable strict file flushes for
renames and truncates" (8d875f95).
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

f87c4318

Btrfs: improve free space cache management and space allocation · 20005523

由 Filipe Manana 提交于 8月 29, 2014

While under random IO, a block group's free space cache eventually reaches
a state where it has a mix of extent entries and bitmap entries representing
free space regions.

As later free space regions are returned to the cache, some of them are merged
with existing extent entries if they are contiguous with them. But others are
not merged, because despite the existence of adjacent free space regions in
the cache, the merging doesn't happen because the existing free space regions
are represented in bitmap extents. Even when new free space regions are merged
with existing extent entries (enlarging the free space range they represent),
we create chances of having after an enlarged region that is contiguous with
some other region represented in a bitmap entry.

Both clustered and non-clustered space allocation work by iterating over our
extent and bitmap entries and skipping any that represents a region smaller
then the allocation request (and giving preference to extent entries before
bitmap entries). By having a contiguous free space region that is represented
by 2 (or more) entries (mix of extent and bitmap entries), we end up not
satisfying an allocation request with a size larger than the size of any of
the entries but no larger than the sum of their sizes. Making the caller assume
we're under a ENOSPC condition or force it to allocate multiple smaller space
regions (as we do for file data writes), which adds extra overhead and more
chances of causing fragmentation due to the smaller regions being all spread
apart from each other (more likely when under concurrency).

For example, if we have the following in the cache:

* extent entry representing free space range: [128Mb - 256Kb, 128Mb[

* bitmap entry covering the range [128Mb, 256Mb[, but only with the bits
representing the range [128Mb, 128Mb + 768Kb[ set - that is, only that
space in this 128Mb area is marked as free

An allocation request for 1Mb, starting at offset not greater than 128Mb - 256Kb,
would fail before, despite the existence of such contiguous free space area in the
cache. The caller could only allocate up to 768Kb of space at once and later another
256Kb (or vice-versa). In between each smaller allocation request, another task
working on a different file/inode might come in and take that space, preventing the
former task of getting a contiguous 1Mb region of free space.

Therefore this change implements the ability to move free space from bitmap
entries into existing and new free space regions represented with extent
entries. This is done when a space region is added to the cache.

A test was added to the sanity tests that explains in detail the issue too.

Some performance test results with compilebench on a 4 cores machine, with
32Gb of ram and using an HDD follow.

Test: compilebench -D /mnt -i 30 -r 1000 --makej

Before this change:

intial create total runs 30 avg 69.02 MB/s (user 0.28s sys 0.57s)
compile total runs 30 avg 314.96 MB/s (user 0.12s sys 0.25s)
read compiled tree total runs 3 avg 27.14 MB/s (user 1.52s sys 0.90s)
delete compiled tree total runs 30 avg 3.14 seconds (user 0.15s sys 0.66s)

After this change:

intial create total runs 30 avg 68.37 MB/s (user 0.29s sys 0.55s)
compile total runs 30 avg 382.83 MB/s (user 0.12s sys 0.24s)
read compiled tree total runs 3 avg 27.82 MB/s (user 1.45s sys 0.97s)
delete compiled tree total runs 30 avg 3.18 seconds (user 0.17s sys 0.65s)
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

20005523

btrfs: rename total_bytes to avoid confusion · 3c1dbdf5

由 Anand Jain 提交于 8月 20, 2014

we are assigning number_devices to the total_bytes,
that's very confusing for a moment
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

3c1dbdf5

btrfs: fix typo in the log message · de4c296f

由 Anand Jain 提交于 8月 13, 2014

there is no matching open parenthesis for the closing parenthesis
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NChris Mason <clm@fb.com>

de4c296f

btrfs: rw_devices shouldn't be incremented for seed fs in btrfs_rm_dev_replace_srcdev() · b2efedca

由 Anand Jain 提交于 8月 13, 2014

seed fs devices don't participate as rw_device, so don't increment
rw_devices when the device being handled belongs to a seed fs.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NChris Mason <clm@fb.com>

b2efedca

btrfs: fix memory leak when there is no more seed device · 8bef8401

由 Anand Jain 提交于 8月 13, 2014

When we replace all the seed device in the system there is
no point in just keeping the btrfs_fs_devices with out
any device
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NChris Mason <clm@fb.com>

8bef8401

btrfs: update sprout seed pointer when seed fs is relinquished · 94d5f0c2

由 Anand Jain 提交于 8月 13, 2014

We are not updating sprout fs seed pointer when all seed device
is replaced. This patch will check if all seed device has been
replaced and then update the sprout pointer accordingly.

Same reproducer as in the previous patch would apply here.
And notice that btrfs_close_device will check if seed fs is
present and spits out the error with out this patch.

int btrfs_close_devices(struct btrfs_fs_devices *fs_devices)
{
::
                seed_devices = fs_devices->seed;
::
        while (seed_devices) {
                fs_devices = seed_devices;
                seed_devices = fs_devices->seed;
                __btrfs_close_devices(fs_devices);
                free_fs_devices(fs_devices);
        }
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NChris Mason <clm@fb.com>

94d5f0c2

btrfs: fix rw_devices miss match after seed replace · 63dd86fa

由 Anand Jain 提交于 8月 13, 2014

reproducer:
    reproducer:
    mount /dev/sdb /btrfs
    btrfs dev add /dev/sdc /btrfs
    btrfs rep start -B /dev/sdb /dev/sdd /btrfs
    umount /btrfs

WARNING: CPU: 0 PID: 3882 at fs/btrfs/volumes.c:892 __btrfs_close_devices+0x1c8/0x200 [btrfs]()

which is

        WARN_ON(fs_devices->rw_devices);

   The problem here is that we did not add one to the rw_devices when
   we replace the seed device with a writable device.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NChris Mason <clm@fb.com>

63dd86fa

btrfs: replace seed device followed by unmount causes kernel WARNING · 25e8e911

由 Anand Jain 提交于 8月 20, 2014

reproducer:
mount /dev/sdb /btrfs
btrfs dev add /dev/sdc /btrfs
btrfs rep start -B /dev/sdb /dev/sdd /btrfs
umount /btrfs

WARNING: CPU: 0 PID: 12661 at fs/btrfs/volumes.c:891 __btrfs_close_devices+0x1b0/0x200 [btrfs]()
::

__btrfs_close_devices()
::
        WARN_ON(fs_devices->open_devices);

After the seed device has been replaced the new target device
is no more a seed device. So we need to update the device
numbers in the fs_devices as pointed by the fs_info.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

25e8e911

btrfs: preparatory to make btrfs_rm_dev_replace_srcdev() seed aware · d51908ce

由 Anand Jain 提交于 8月 13, 2014

There is no logical change in this patch, just a preparatory patch,
so that changes can be easily reasoned.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NChris Mason <clm@fb.com>

d51908ce

btrfs: Drop stray check of fixup_workers creation · 56094eec

由 Andrey Utkin 提交于 8月 09, 2014

The issue was introduced in a79b7d4b,
adding allocation of extent_workers, so this stray check is surely not
meant to be a check of something else.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=82021Reported-by: NMaks Naumov <maksqwe1@ukr.net>
Signed-off-by: NAndrey Utkin <andrey.krieger.utkin@gmail.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NChris Mason <clm@fb.com>

56094eec

Btrfs: make btrfs_search_forward return with nodes unlocked · f98de9b9

由 Filipe Manana 提交于 8月 04, 2014

None of the uses of btrfs_search_forward() need to have the path
nodes (level >= 1) read locked, only the leaf needs to be locked
while the caller processes it. Therefore make it return a path
with all nodes unlocked, except for the leaf.

This change is motivated by the observation that during a file
fsync we repeatdly call btrfs_search_forward() and process the
returned leaf while upper nodes of the returned path (level >= 1)
are read locked, which unnecessarily blocks other tasks that want
to write to the same fs/subvol btree.
Therefore instead of modifying the fsync code to unlock all nodes
with level >= 1 immediately after calling btrfs_search_forward(),
change btrfs_search_forward() to do it, so that it benefits all
callers.
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

f98de9b9

btrfs: sysfs label interface should check for read only FS · 79aec2b8

由 Anand Jain 提交于 7月 30, 2014

Not sure how this escaped many eyes so far
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NChris Mason <clm@fb.com>

79aec2b8

btrfs: code optimize: BTRFS_ATTR_RW could set the mode · 20ee0825

由 Anand Jain 提交于 7月 30, 2014

BTRFS_ATTR_RW could set the mode and be inline with BTRFS_ATTR
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NChris Mason <clm@fb.com>

20ee0825

btrfs: code optimize: BTRFS_ATTR could handle the mode · 98b3d389

由 Anand Jain 提交于 7月 30, 2014

All that uses BTRFS_ATTR want mode to be set at 0444 so just do
it at the define.  And few spacing alignments.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NChris Mason <clm@fb.com>

98b3d389

btrfs: use BTRFS_ATTR instead of btrfs_no_store() · 3f4b57e0

由 Anand Jain 提交于 7月 30, 2014

we have BTRFS_ATTR define to create sysfs RO file, use that.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NChris Mason <clm@fb.com>

3f4b57e0

Btrfs: avoid unnecessary switch of path locks to blocking mode · 160f4089

由 Filipe Manana 提交于 7月 28, 2014

If we need to cow a node, increase the write lock level and retry the
tree search, there's no point of changing the node locks in our path
to blocking mode, as we only waste time and unnecessarily wake up other
tasks waiting on the spinning locks (just to block them again shortly
after) because we release our path before repeating the tree search.
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

160f4089

Btrfs: unlock nodes earlier when inserting items in a btree · 24cdc847

由 Filipe Manana 提交于 7月 28, 2014

In ctree.c:setup_items_for_insert(), we can unlock all nodes in our
path before we process the leaf (shift items and data, adjust data
offsets, etc). This allows for better btree concurrency, as we're
often holding a write lock on at least the node at level 1.
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

24cdc847

btrfs: use IS_ALIGNED() for assertion in btrfs_lookup_csums_range() for simplicity · d1b00a47

由 Satoru Takeuchi 提交于 7月 25, 2014

btrfs_lookup_csums_range() uses ALIGN() to check if "start"
and "end + 1" are aligned to "root->sectorsize". It's better to
replace these with IS_ALIGNED() for simplicity.
Signed-off-by: NSatoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

d1b00a47

Btrfs: cleanup for btrfs workqueue tracepoints · 1a76e4ba

由 Liu Bo 提交于 8月 12, 2014

Tracepoint trace_btrfs_normal_work_done never has an user, just cleanup it.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <clm@fb.com>

1a76e4ba

Btrfs: add work_struct information for workqueue tracepoint · b38a6258

由 Liu Bo 提交于 8月 12, 2014

Kernel workqueue's tracepoints print the address of work_struct, while btrfs
workqueue's tracepoints print the address of btrfs_work.

We need a connection between this two, for example when debuging, we usually
grep an address in the trace output.  So it'd be better to also print
work_struct in btrfs workqueue's tracepoint.

Please note that we can only add this into those tracepoints whose work is still
available in memory because we need to reference the work.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <clm@fb.com>

b38a6258

btrfs: add trace for qgroup accounting · d3982100

由 Mark Fasheh 提交于 7月 17, 2014

We want this to debug qgroup changes on live systems.
Signed-off-by: NMark Fasheh <mfasheh@suse.de>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

d3982100

Btrfs: cleanup unused latest_devid and latest_trans in fs_devices · 443f24fe

由 Miao Xie 提交于 7月 24, 2014

The member variants - latest_devid and latest_trans - of fs_devices structure
are set, but no one use them to do anything. so remove them.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

443f24fe

M
Btrfs: update the comment of total_bytes and disk_total_bytes of btrfs_devie · 6ba40b61
由 Miao Xie 提交于 7月 24, 2014
```
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>
```
6ba40b61

Btrfs: Fix the problem that the dirty flag of dev stats is cleared · addc3fa7

由 Miao Xie 提交于 7月 24, 2014

The io error might happen during writing out the device stats, and the
device stats information and dirty flag would be update at that time,
but the current code didn't consider this case, just clear the dirty
flag, it would cause that we forgot to write out the new device stats
information. Fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

addc3fa7

Btrfs: make the device lock and its protected data in the same cacheline · d5ee37bc

由 Miao Xie 提交于 7月 24, 2014

The lock in btrfs_device structure was far away from its protected data, it would
make CPU load the cache line twice when we accessed them, move them together.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

d5ee37bc

Btrfs: fix wrong generation check of super block on a seed device · 5f546063

由 Miao Xie 提交于 7月 24, 2014

The super block generation of the seed devices is not the same as the
filesystem which sprouted from them because we don't update the super
block on the seed devices when we change that new filesystem. So we
should not use the generation of that new filesystem to check the super
block generation on the seed devices, Fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

5f546063

Btrfs: fix wrong fsid check of scrub · 17a9be2f

由 Miao Xie 提交于 7月 24, 2014

All the metadata in the seed devices has the same fsid as the fsid
of the seed filesystem which is on the seed device, so we should check
them by the current filesystem. Fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

17a9be2f

btrfs: wake up transaction thread from SYNC_FS ioctl · 2fad4e83

由 David Sterba 提交于 7月 23, 2014

The transaction thread may want to do more work, namely it pokes the
cleaner ktread that will start processing uncleaned subvols.

This can be triggered by user via the 'btrfs fi sync' command, otherwise
there was a delay up to 30 seconds before the cleaner started to clean
old snapshots.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

2fad4e83

Btrfs: fix wrong max inline data size limit · c01a5c07

由 Wang Shilong 提交于 7月 17, 2014

inline data is stored from offset of @disk_bytenr in
struct btrfs_file_extent_item. So substracting total
size of struct btrfs_file_extent_item is wrong, fix it.
Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

c01a5c07

Btrfs: fix off-by-one in cow_file_range_inline() · 354877be

由 Wang Shilong 提交于 7月 17, 2014

Btrfs could still inline file data if its size is same as
page size, so don't skip max value here.
Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

354877be

Btrfs: fall into nocompression codes quickly if possible · 7816030e

由 Wang Shilong 提交于 7月 17, 2014

If flag NOCOMPRESS is set which means bad compression ratio,
we could avoid call cow_file_range_async() for this case earlier.
Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

7816030e

Btrfs: fix wrong skipping compression for an inode · f79707b0

由 Wang Shilong 提交于 7月 17, 2014

If a file's compression ratios is bad, we will set NOCOMPRESS
flag for it, and it will skip compression for that inode next time.

However, if we remount fs to COMPRESS_FORCE, it still should try
if we could compress pages for that inode, this patch fix wrong
check for this problem.
Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

f79707b0

Btrfs: fix sparse warning · d447d0da

由 Fabian Frederick 提交于 7月 15, 2014

Fix the following sparse warning:
fs/btrfs/send.c:518:51: warning: incorrect type in argument 2 (different address spaces)
fs/btrfs/send.c:518:51:    expected char const [noderef] <asn:1>*<noident>
fs/btrfs/send.c:518:51:    got char *

We can safely use (const char __user *) with set_fs(KERNEL_DS)

__force added to avoid sparse-all warning:
fs/btrfs/send.c:518:40: warning: cast adds address space to expression (<asn:1>)
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Reviewed-by: NZach Brown <zab@zabbo.net>
Signed-off-by: NChris Mason <clm@fb.com>

d447d0da

Btrfs: use BUG_ON · 14586651

由 HIMANGI SARAOGI 提交于 7月 09, 2014

Use BUG_ON(x) rather than if(x) BUG();

The semantic patch that fixes this problem is as follows:

// <smpl>
@@ identifier x; @@
-if (x) BUG();
+BUG_ON(x);
// </smpl>
Signed-off-by: NHimangi Saraogi <himangi774@gmail.com>
Acked-by: NJulia Lawall <julia.lawall@lip6.fr>
Signed-off-by: NChris Mason <clm@fb.com>

14586651

btrfs compression: merge inflate and deflate z_streams · 78809913

由 Sergey Senozhatsky 提交于 7月 07, 2014

`struct workspace' used for zlib compression contains two zlib
z_stream-s: `def_strm' used in zlib_compress_pages(), and `inf_strm'
used in zlib_decompress/zlib_decompress_biovec(). None of these
functions use `inf_strm' and `def_strm' simultaniously, meaning that
for every compress/decompress operation we need only one z_stream
(out of two available).

`inf_strm' and `def_strm' are different in size of ->workspace. For
inflate stream we vmalloc() zlib_inflate_workspacesize() bytes, for
deflate stream - zlib_deflate_workspacesize() bytes. On my system zlib
returns the following workspace sizes, correspondingly: 42312 and 268104
(+ guard pages).

Keep only one `z_stream' in `struct workspace' and use it for both
compression and decompression. Hence, instead of vmalloc() of two
z_stream->worskpace-s, allocate only one of size:
	max(zlib_deflate_workspacesize(), zlib_inflate_workspacesize())
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NChris Mason <clm@fb.com>

78809913

Btrfs: set error return value in btrfs_get_blocks_direct · 555e1286

由 Filipe Manana 提交于 7月 07, 2014

We were returning with 0 (success) because we weren't extracting the
error code from em (PTR_ERR(em)). Fix it.
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Reviewed-by: NSatoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

555e1286

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功