提交 · b24baf6917a376420d535548e1f88744028bcf24 · openanolis / cloud-kernel

26 7月, 2012 8 次提交

C
Btrfs: uninit variable fixes in send/receive · b24baf69
由 Chris Mason 提交于 7月 25, 2012
```
Signed-off-by: NChris Mason <chris.mason@fusionio.com>
```
b24baf69

Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive · 31db9f7c

由 Alexander Block 提交于 7月 25, 2012

This patch introduces the BTRFS_IOC_SEND ioctl that is
required for send. It allows btrfs-progs to implement
full and incremental sends. Patches for btrfs-progs will
follow.
Signed-off-by: NAlexander Block <ablock84@googlemail.com>
Reviewed-by: NDavid Sterba <dave@jikos.cz>
Reviewed-by: NArne Jansen <sensille@gmx.net>
Reviewed-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Reviewed-by: NAlex Lyakas <alex.bolshoy.btrfs@gmail.com>

31db9f7c

Btrfs: add btrfs_compare_trees function · 7069830a

由 Alexander Block 提交于 6月 05, 2012

This function is used to find the differences between
two trees. The tree compare skips whole subtrees if it
detects shared tree blocks and thus is pretty fast.
Signed-off-by: NAlexander Block <ablock84@googlemail.com>
Reviewed-by: NDavid Sterba <dave@jikos.cz>
Reviewed-by: NArne Jansen <sensille@gmx.net>
Reviewed-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Reviewed-by: NAlex Lyakas <alex.bolshoy.btrfs@gmail.com>

7069830a

Btrfs: introduce subvol uuids and times · 8ea05e3a

由 Alexander Block 提交于 7月 25, 2012

This patch introduces uuids for subvolumes. Each
subvolume has it's own uuid. In case it was snapshotted,
it also contains parent_uuid. In case it was received,
it also contains received_uuid.

It also introduces subvolume ctime/otime/stime/rtime. The
first two are comparable to the times found in inodes. otime
is the origin/creation time and ctime is the change time.
stime/rtime are only valid on received subvolumes.
stime is the time of the subvolume when it was
sent. rtime is the time of the subvolume when it was
received.

Additionally to the times, we have a transid for each
time. They are updated at the same place as the times.

btrfs receive uses stransid and rtransid to find out
if a received subvolume changed in the meantime.

If an older kernel mounts a filesystem with the
extented fields, all fields become invalid. The next
mount with a new kernel will detect this and reset the
fields.
Signed-off-by: NAlexander Block <ablock84@googlemail.com>
Reviewed-by: NDavid Sterba <dave@jikos.cz>
Reviewed-by: NArne Jansen <sensille@gmx.net>
Reviewed-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Reviewed-by: NAlex Lyakas <alex.bolshoy.btrfs@gmail.com>

8ea05e3a

Btrfs: make iref_to_path non static · 91cb916c

由 Alexander Block 提交于 6月 03, 2012

Make iref_to_path non static (needed in send) and rename
it to btrfs_iref_to_path
Signed-off-by: NAlexander Block <ablock84@googlemail.com>

91cb916c

Btrfs: add a barrier before a waitqueue_active check · cd1cfc49

由 Chris Mason 提交于 7月 25, 2012

We were missing wakeups on the delayed ref waitqueue due
to races on waitqueue_active.
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

cd1cfc49

Btrfs: call the ordered free operation without any locks held · e9fbcb42

由 Chris Mason 提交于 7月 25, 2012

Each ordered operation has a free callback, and this was called with the
worker spinlock held.  Josef made the free callback also call iput,
which we can't do with the spinlock.

This drops the spinlock for the free operation and grabs it again before
moving through the rest of the list.  We'll circle back around to this
and find a cleaner way that doesn't bounce the lock around so much.
Signed-off-by: NChris Mason <chris.mason@fusionio.com>
cc: stable@kernel.org

e9fbcb42

Btrfs: Check INCOMPAT flags on remount and add helper function · 2b0ce2c2

由 Mitch Harder 提交于 7月 24, 2012

In support of the recently added capability to remount with lzo
compression, provide a helper function to check the compression
INCOMPAT flags when remounting with lzo compression, and set
the flags if necessary.

Also, implement the new helper function when defragmenting with
explicit lzo compression and when setting the default subvolume.
Signed-off-by: NMitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

2b0ce2c2

25 7月, 2012 2 次提交

Btrfs: add helper for tree enumeration · e6793769

由 Arne Jansen 提交于 9月 13, 2011

Often no exact match is wanted but just the next lower or
higher item. There's a lot of duplicated code throughout
btrfs to deal with the corner cases. This patch adds a
helper function that can facilitate searching.
Signed-off-by: NArne Jansen <sensille@gmx.net>

e6793769

btrfs: allow cross-subvolume file clone · 362a20c5

由 David Sterba 提交于 8月 01, 2011

Lift the EXDEV condition and allow different root trees for files being
cloned, then pass source inode's root when searching for extents.
Cloning is not allowed to cross vfsmounts, ie. when two subvolumes from
one filesystem are mounted separately.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

362a20c5

24 7月, 2012 30 次提交

Btrfs: improve multi-thread buffer read · 67c9684f

由 Liu Bo 提交于 7月 20, 2012

While testing with my buffer read fio jobs[1], I find that btrfs does not
perform well enough.

Here is a scenario in fio jobs:

We have 4 threads, "t1 t2 t3 t4", starting to buffer read a same file,
and all of them will race on add_to_page_cache_lru(), and if one thread
successfully puts its page into the page cache, it takes the responsibility
to read the page's data.

And what's more, reading a page needs a period of time to finish, in which
other threads can slide in and process rest pages:

     t1          t2          t3          t4
   add Page1
   read Page1  add Page2
     |         read Page2  add Page3
     |            |        read Page3  add Page4
     |            |           |        read Page4
-----|------------|-----------|-----------|--------
     v            v           v           v
    bio          bio         bio         bio

Now we have four bios, each of which holds only one page since we need to
maintain consecutive pages in bio.  Thus, we can end up with far more bios
than we need.

Here we're going to
a) delay the real read-page section and
b) try to put more pages into page cache.

With that said, we can make each bio hold more pages and reduce the number
of bios we need.

Here is some numbers taken from fio results:
         w/o patch                 w patch
       -------------  --------  ---------------
READ:    745MB/s        +25%       934MB/s

[1]:
[global]
group_reporting
thread
numjobs=4
bs=32k
rw=read
ioengine=sync
directory=/mnt/btrfs/

[READ]
filename=foobar
size=2000M
invalidate=1
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

67c9684f

Btrfs: make btrfs's allocation smoothly with preallocation · df57dbe6

由 Liu Bo 提交于 7月 23, 2012

For backref walking, we've introduce delayed ref's sequence.  However,
it changes our preallocation behavior.

The story is that when we preallocate an extent and then mark it written
piece by piece, the ideal case should be that we don't need to COW the
extent, which is why we use 'preallocate'.

But we may not make use of preallocation, since when we check for cross refs on
the extent, we may have two ref entries which have the same content except
the sequence value, and we recognize them as cross refs and do COW to allocate
another extent.

So we end up with several pieces of space instead of an whole extent.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

df57dbe6

Btrfs: lock the transition from dirty to writeback for an eb · 51561ffe

由 Josef Bacik 提交于 7月 20, 2012

There is a small window where an eb can have no IO bits set on it, which
could potentially result in extent_buffer_under_io() returning false when we
want it to return true, which could result in not fun things happening. So
in order to protect this case we need to hold the refs_lock when we make
this transition to make sure we get reliable results out of
extent_buffer_udner_io(). Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

51561ffe

Btrfs: fix potential race in extent buffer freeing · 594831c4

由 Josef Bacik 提交于 7月 20, 2012

This sounds sort of impossible but it is the only thing I can think of and
at the very least it is theoretically possible so here it goes.

If we are in try_release_extent_buffer we will check that the ref count on
the extent buffer is 1 and not under IO, and then go down and clear the tree
ref. If between this check and clearing the tree ref somebody else comes in
and grabs a ref on the eb and the marks it dirty before
try_release_extent_buffer() does it's tree ref clear we can end up with a
dirty eb that will be freed while it is still dirty which will result in a
panic. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

594831c4

Btrfs: don't return true in releasepage unless we actually freed the eb · e64860aa

由 Josef Bacik 提交于 7月 20, 2012

I noticed while looking at an extent_buffer race that we will
unconditionally return 1 if we get down to release_extent_buffer after
clearing the tree ref.  However we can easily race in here and get a ref on
the eb and not actually free the eb.  So make release_extent_buffer return 1
if it free'd the eb and 0 if not so we can be a little kinder to the vm.
Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

e64860aa

Btrfs: suppress printk() if all device I/O stats are zero · a98cdb85

由 Stefan Behrens 提交于 7月 17, 2012

Code is added to suppress the I/O stats printing at mount time if all
statistic values are zero.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>

a98cdb85

Btrfs: remove unwanted printk() for btrfs device I/O stats · 5021976d

由 Stefan Behrens 提交于 7月 17, 2012

People complained about the annoying kernel log message
"btrfs: no dev_stats entry found ... (OK on first mount after mkfs)"
everytime a filesystem is mounted for the first time after running
mkfs. Since the distribution of the btrfs-progs is not synchronized
to the kernel version, mkfs like it is now will be used also in the
future. Then this message is not useful to find errors, it is just
annoying. This commit removes the printk().
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>

5021976d

Btrfs: rewrite BTRFS_SETGET_FUNCS · 18077bb4

由 Li Zefan 提交于 7月 09, 2012

BTRFS_SETGET_FUNCS macro is used to generate btrfs_set_foo() and
btrfs_foo() functions, which read and write specific fields in the
extent buffer.

The total number of set/get functions is ~200, but in fact we only
need 8 functions: 2 for u8 field, 2 for u16, 2 for u32 and 2 for u64.

It results in redunction of ~37K bytes.

   text    data     bss     dec     hex filename
 629661   12489     216  642366   9cd3e fs/btrfs/btrfs.o.orig
 592637   12489     216  605342   93c9e fs/btrfs/btrfs.o
Signed-off-by: NLi Zefan <lizefan@huawei.com>

18077bb4

Btrfs: zero unused bytes in inode item · 293f7e07

由 Li Zefan 提交于 7月 10, 2012

The otime field is not zeroed, so users will see random otime in an old
filesystem with a new kernel which has otime support in the future.

The reserved bytes are also not zeroed, and we'll have compatibility
issue if we make use of those bytes.
Signed-off-by: NLi Zefan <lizefan@huawei.com>

293f7e07

Btrfs: kill free_space pointer from inode structure · b4d7c3c9

由 Li Zefan 提交于 7月 09, 2012

Inodes always allocate free space with BTRFS_BLOCK_GROUP_DATA type,
which means every inode has the same BTRFS_I(inode)->free_space pointer.

This shrinks struct btrfs_inode by 4 bytes (or 8 bytes on 64 bits).
Signed-off-by: NLi Zefan <lizefan@huawei.com>

b4d7c3c9

A
btrfs read error corrected message floods the console during recovery · d5b025d5
由 Anand Jain 提交于 7月 02, 2012
```
Changing printk_in_rcu to printk_ratelimited_in_rcu will suffice
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
```
d5b025d5

Btrfs: fix buffer leak in btrfs_next_old_leaf · e6466e35

由 Jan Schmidt 提交于 7月 04, 2012

When calling btrfs_next_old_leaf, we were leaking an extent buffer in the
rare case of using the deadlock avoidance code needed for the tree mod log.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

e6466e35

Btrfs: do not count in readonly bytes · f6175efa

由 Liu Bo 提交于 7月 06, 2012

If a block group is ro, do not count its entries in when we dump space info.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

f6175efa

Btrfs: add ro notification to dump_space_info · 799ffc3c

由 Liu Bo 提交于 7月 06, 2012

Block group has ro attributes, make dump_space_info show it.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

799ffc3c

Btrfs: fix a bug of writting free space cache during balance · cf7c1ef6

由 Liu Bo 提交于 7月 06, 2012

Here is the whole story:
1)
A free space cache consists of two parts:
o  free space cache inode, which is special becase it's stored in root tree.
o  free space info, which is stored as the above inode's file data.

But we only build up another new inode and does not flush its free space info
onto disk when we _clear and setup_ free space cache, and this ends up with
that the block group cache's cache_state remains DC_SETUP instead of DC_WRITTEN.

And holding DC_SETUP means that we will not truncate this free space cache inode,
which means the disk offset of its file extent will remain _unchanged_ at least
until next transaction finishes committing itself.

2)
We can set a block group readonly when we relocate the block group.

However,
if the readonly block group covers the disk offset where our free space cache
inode is going to write, it will force the free space cache inode into
cow_file_range() and it'll end up hitting a BUG_ON.

3)
Due to the above analysis, we fix this bug by adding the missing dirty flag.

4)
However, it's not over, there is still another case, nospace_cache.

With nospace_cache, we do not want to set dirty flag, instead we just truncate
free space cache inode and bail out with setting cache state DC_WRITTEN.

We can benifit from it since it saves us another 'pre-allocation' part which
usually costs a lot.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

cf7c1ef6

Btrfs: do not abort transaction in prealloc case · 06789384

由 Liu Bo 提交于 7月 06, 2012

During disk balance, we prealloc new file extent for file data relocation,
but we may fail in 'no available space' case, and it leads to flipping btrfs
into readonly.

It is not necessary to bail out and abort transaction since we do have several
ways to rescue ourselves from ENOSPC case.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

06789384

Btrfs: kill root from btrfs_is_free_space_inode · 83eea1f1

由 Liu Bo 提交于 7月 10, 2012

Since root can be fetched via BTRFS_I macro directly, we can save an args
for btrfs_is_free_space_inode().
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

83eea1f1

Btrfs: fix btrfs_is_free_space_inode to recognize btree inode · 51a8cf9d

由 Liu Bo 提交于 7月 10, 2012

For btree inode, its root is also 'tree root', so btree inode can be
misunderstood as a free space inode.

We should add one more check for btree inode.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

51a8cf9d

Btrfs: avoid I/O repair BUG() from btree_read_extent_buffer_pages() · c0901581

由 Stefan Behrens 提交于 7月 10, 2012

From btree_read_extent_buffer_pages(), currently repair_io_failure()
can be called with mirror_num being zero when submit_one_bio() returned
an error before. This used to cause a BUG_ON(!mirror_num) in
repair_io_failure() and indeed this is not a case that needs the I/O
repair code to rewrite disk blocks.
This commit prevents calling repair_io_failure() in this case and thus
avoids the BUG_ON() and malfunction.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

c0901581

Btrfs: rework shrink_delalloc · f4c738c2

由 Josef Bacik 提交于 7月 02, 2012

So shrink_delalloc has grown all sorts of cruft over the years thanks to
many reworkings of how we track enospc. What happens now as we fill up the
disk is we will loop for freaking ever hoping to reclaim a arbitrary amount
of space of metadata, this was from when everybody flushed at the same time.
Now we only have people flushing one at a time. So instead of trying to
reclaim a huge amount of space, just try to flush a decent chunk of space,
and stop looping as soon as we have enough free space to satisfy our
reservation. This makes xfstests 224 go much faster. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

f4c738c2

Btrfs: do not set subvolume flags in readonly mode · b9ca0664

由 Liu Bo 提交于 6月 29, 2012

$ mkfs.btrfs /dev/sdb7
$ btrfstune -S1 /dev/sdb7
$ mount /dev/sdb7 /mnt/btrfs
mount: block device /dev/sdb7 is write-protected, mounting read-only
$ btrfs dev add /dev/sdb8 /mnt/btrfs/

Now we get a btrfs in which mnt flags has readonly but sb flags does
not.  So for those ioctls that only check sb flags with MS_RDONLY, it
is going to be a problem.
Setting subvolume flags is such an ioctl, we should use mnt_want_write_file()
to check RO flags.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>

b9ca0664

Btrfs: use mnt_want_write_file instead of mnt_want_write · e54bfa31

由 Liu Bo 提交于 6月 29, 2012

mnt_want_write_file is faster when file has been opened for write.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>

e54bfa31

Btrfs: remove redundant r/o check for superblock · 768e9dfe

由 Liu Bo 提交于 6月 29, 2012

mnt_want_write() and mnt_want_write_file() will check sb->s_flags with
MS_RDONLY, and we don't need to do it ourselves.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>

768e9dfe

Btrfs: check write access to mount earlier while creating snapshots · a874a63e

由 Liu Bo 提交于 6月 29, 2012

Move check of write access to mount into upper functions so that we can
use mnt_want_write_file instead, which is faster than mnt_want_write.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>

a874a63e

Btrfs: fix typo in cow_file_range_async and async_cow_submit · 287082b0

由 Liu Bo 提交于 6月 28, 2012

It should be 10 * 1024 * 1024.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

287082b0

Btrfs: change how we indicate we're adding csums · 0e721106

由 Josef Bacik 提交于 6月 26, 2012

There is weird logic I had to put in place to make sure that when we were
adding csums that we'd used the delalloc block rsv instead of the global
block rsv. Part of this meant that we had to free up our transaction
reservation before we ran the delayed refs since csum deletion happens
during the delayed ref work. The problem with this is that when we release
a reservation we will add it to the global reserve if it is not full in
order to keep us going along longer before we have to force a transaction
commit. By releasing our reservation before we run delayed refs we don't
get the opportunity to drain down the global reserve for the work we did, so
we won't refill it as often. This isn't a problem per-se, it just results
in us possibly committing transactions more and more often, and in rare
cases could cause those WARN_ON()'s to pop in use_block_rsv because we ran
out of space in our block rsv.

This also helps us by holding onto space while the delayed refs run so we
don't end up with as many people trying to do things at the same time, which
again will help us not force commits or hit the use_block_rsv warnings.
Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

0e721106

Btrfs: return error of btrfs_update_inode() to caller · b9959295

由 Tsutomu Itoh 提交于 6月 25, 2012

We didn't check error of btrfs_update_inode(), but that error looks
easy to bubble back up.
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

b9959295

Btrfs: fix error handling in __add_reloc_root() · 23291a04

由 Dan Carpenter 提交于 6月 25, 2012

We dereferenced "node" in the error message after freeing it.  Also
btrfs_panic() can return so we should return an error code instead of
continuing.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>

23291a04

Btrfs: do not ignore errors from btrfs_cleanup_fs_roots() when mounting · 44c44af2

由 Ilya Dryomov 提交于 6月 22, 2012

There used to be a BUG_ON(ret) there before EH patch (79787eaa) went in.
Bail out with EINVAL.

Cc: David Sterba <dsterba@suse.cz>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

44c44af2

I
Btrfs: do not return EINVAL instead of ENOMEM from open_ctree() · fed425c7
由 Ilya Dryomov 提交于 6月 22, 2012
```
When bailing from open_ctree() err is returned, not ret.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
fed425c7

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功