提交 · ae1e206b806ccc490dadff59af8a7a2477b32884 · openeuler / Kernel

29 8月, 2012 19 次提交

Btrfs: allow delayed refs to be merged · ae1e206b

由 Josef Bacik 提交于 8月 07, 2012

Daniel Blueman reported a bug with fio+balance on a ramdisk setup.
Basically what happens is the balance relocates a tree block which will drop
the implicit refs for all of its children and adds a full backref. Once the
block is relocated we have to add the implicit refs back, so when we cow the
block again we add the implicit refs for its children back. The problem
comes when the original drop ref doesn't get run before we add the implicit
refs back. The delayed ref stuff will specifically prefer ADD operations
over DROP to keep us from freeing up an extent that will have references to
it, so we try to add the implicit ref before it is actually removed and we
panic. This worked fine before because the add would have just canceled the
drop out and we would have been fine. But the backref walking work needs to
be able to freeze the delayed ref stuff in time so we have this ever
increasing sequence number that gets attached to all new delayed ref updates
which makes us not merge refs and we run into this issue.

So to fix this we need to merge delayed refs. So everytime we run a
clustered ref we need to try and merge all of its delayed refs. The backref
walking stuff locks the delayed ref head before processing, so if we have it
locked we are safe to merge any refs inside of the sequence number. If
there is no sequence number we can merge all refs. Doing this not only
fixes our bug but keeps the delayed ref code from adding and removing
useless refs and batching together multiple refs into one search instead of
one search per delayed ref, which will really help our commit times. I ran
this with Daniels test and 276 and I haven't seen any problems. Thanks,
Reported-by: NDaniel J Blueman <daniel@quora.org>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

ae1e206b

Btrfs: fix enospc problems when deleting a subvol · 5a24e84c

由 Josef Bacik 提交于 8月 08, 2012

Subvol delete is a special kind of awful where we use the global reserve to
cover the ENOSPC requirements. The problem is once we're done removing
everything we do a btrfs_update_inode(), which by default will try to do the
delayed update stuff which will use it's own reserve. There will be no
space in this reserve and we'll return ENOSPC. So instead use
btrfs_update_inode_fallback() which will just fallback to updating the inode
item in the case of enospc. This is fine because the global reserve covers
the space requirements for this. With this patch I can now delete a subvol
on a problem image Dave Sterba sent me. Thanks,
Reported-by: NDavid Sterba <dave@jikos.cz>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

5a24e84c

Btrfs: fix wrong mtime and ctime when creating snapshots · c0f62ded

由 Miao Xie 提交于 8月 08, 2012

When we created a new snapshot, the mtime and ctime of its parent directory
were not updated. Fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

c0f62ded

Btrfs: fix race in run_clustered_refs · 22cd2e7d

由 Arne Jansen 提交于 8月 09, 2012

With commit

commit d1270cd9
Author: Arne Jansen <sensille@gmx.net>
Date:   Tue Sep 13 15:16:43 2011 +0200

     Btrfs: put back delayed refs that are too new

I added a window where the delayed_ref's head->ref_mod code can diverge
from the sum of the remaining refs, because we release the head->mutex
in the middle. This leads to btrfs_lookup_extent_info returning wrong
numbers. This patch fixes this by adjusting the head's ref_mod with each
delayed ref we run.
Signed-off-by: NArne Jansen <sensille@gmx.net>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

22cd2e7d

Btrfs: don't run __tree_mod_log_free_eb on leaves · b12a3b1e

由 Chris Mason 提交于 8月 07, 2012

When we split a leaf, we may end up inserting a new root on top of that
leaf.  The reflog code was incorrectly assuming the old root was always
a node.  This makes sure we skip over leaves.
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

b12a3b1e

Btrfs: increase the size of the free space cache · 6fc823b1

由 Josef Bacik 提交于 8月 06, 2012

Arne was complaining about the space cache having mismatching generation
numbers when debugging a deadlock. This is because we can run out of space
in our preallocated range for our space cache if you have a pretty
fragmented amount of space in your pinned space. So just increase the
amount of space we preallocate for space cache so we can be sure to have
enough space. This will only really affect data ranges since their the only
chunks that end up larger than 256MB. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

6fc823b1

Btrfs: barrier before waitqueue_active · 66657b31

由 Josef Bacik 提交于 8月 01, 2012

We need a barrir before calling waitqueue_active otherwise we will miss
wakeups.  So in places that do atomic_dec(); then atomic_read() use
atomic_dec_return() which imply a memory barrier (see memory-barriers.txt)
and then add an explicit memory barrier everywhere else that need them.
Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

66657b31

Btrfs: fix deadlock in wait_for_more_refs · 1fa11e26

由 Arne Jansen 提交于 8月 06, 2012

Commit a168650c introduced a waiting mechanism to prevent busy waiting in
btrfs_run_delayed_refs. This can deadlock with btrfs_run_ordered_operations,
where a tree_mod_seq is held while waiting for the io to complete, while
the end_io calls btrfs_run_delayed_refs.
This whole mechanism is unnecessary. If not enough runnable refs are
available to satisfy count, just return as count is more like a guideline
than a strict requirement.
In case we have to run all refs, commit transaction makes sure that no
other threads are working in the transaction anymore, so we just assert
here that no refs are blocked.
Signed-off-by: NArne Jansen <sensille@gmx.net>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

1fa11e26

btrfs: fix second lock in btrfs_delete_delayed_items() · 62095265

由 Fengguang Wu 提交于 8月 04, 2012

Fix a real bug caught by coccinelle.

fs/btrfs/delayed-inode.c:1013:1-11: second lock on line 1013
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>

62095265

Btrfs: don't allocate a seperate csums array for direct reads · c329861d

由 Josef Bacik 提交于 8月 03, 2012

We've been allocating a big array for csums instead of storing them in the
io_tree like we do for buffered reads because previously we were locking the
entire range, so we didn't have an extent state for each sector of the
range. But now that we do the range locking as we map the buffers we can
limit the mapping lenght to sectorsize and use the private part of the
io_tree for our csums. This allows us to avoid an extra memory allocation
for direct reads which could incur latency. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

c329861d

Btrfs: do not strdup non existent strings · 99f5944b

由 Josef Bacik 提交于 8月 02, 2012

When we close devices we add back empty devices for some reason that escapes
me.  In the case of a missing dev we don't allocate an rcu_string for it's
name, so check to see if the device has a name and if it doesn't don't
bother strdup()'ing it.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

99f5944b

Btrfs: do not use missing devices when showing devname · aa9ddcd4

由 Josef Bacik 提交于 8月 02, 2012

If you do the following

mkfs.btrfs /dev/sdb /dev/sdc
rmmod btrfs
dd if=/dev/zero of=/dev/sdb bs=1M count=1
mount -o degraded /dev/sdc /mnt/btrfs-test

the box will panic trying to deref the name for the missing dev since it is
the lower numbered devid.  So fix show_devname to not use missing devices.
Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

aa9ddcd4

Btrfs: fix that error value is changed by mistake · 3627bf45

由 Stefan Behrens 提交于 8月 01, 2012

In iterate_inodes_from_logical() the error result from
extent_from_logical() is patched by mistake. Typically ENOENT is
patched to EINVAL because (-ENOENT & BTRFS_EXTENT_FLAG_TREE_BLOCK)
evaluates to true.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>

3627bf45

Btrfs: lock extents as we map them in DIO · eb838e73

由 Josef Bacik 提交于 7月 31, 2012

A deadlock in xfstests 113 was uncovered by commit

d187663e

This is because we would not return EIOCBQUEUED for short AIO reads, instead
we'd wait for the DIO to complete and then return the amount of data we
transferred, which would allow our stuff to unlock the remaning amount. But
with this change this no longer happens, so if we have a short AIO read (for
example if we try to read past EOF), we could leave the section from EOF to
the end of where we tried to read locked. Fixing this is tricky since there
is no clear way to know exactly how much data DIO truly submitted for IO, so
to make this less hard on ourselves and less combersome we need to lock the
extents as we try to map them, and then we unlock any areas we didn't
actually map. This makes us completely safe from deadlocks and reliance on
a particular behavior of the DIO code. This also lays the groundwork for
allowing us to use the normal csum storage method for reads which means we
can remove an allocation. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

eb838e73

Btrfs: fix some endian bugs handling the root times · dadd1105

由 Dan Carpenter 提交于 7月 30, 2012

"trans->transid" is cpu endian but we want to store the data as little
endian.  "item->ctime.nsec" is only 32 bits, not 64.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>

dadd1105

Btrfs: unlock on error in btrfs_delalloc_reserve_metadata() · 55e591ff

由 Dan Carpenter 提交于 7月 30, 2012

We should release this mutex before returning the error code.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>

55e591ff

Btrfs: checking for NULL instead of IS_ERR · 57a5a882

由 Dan Carpenter 提交于 7月 30, 2012

add_qgroup_rb() never returns NULL, only error pointers.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>

57a5a882

Btrfs: fix some error codes in btrfs_qgroup_inherit() · 5986802c

由 Dan Carpenter 提交于 7月 30, 2012

These are returning zero when it should be returning a negative error
code.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>

5986802c

Btrfs: fix a misplaced address operator in a condition · aa2ffd06

由 Stefan Behrens 提交于 7月 26, 2012

This should obviously not be "if (&flag)" but "if (flag)".
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>

aa2ffd06

26 7月, 2012 10 次提交

C
Btrfs: uninit variable fixes in send/receive · b24baf69
由 Chris Mason 提交于 7月 25, 2012
```
Signed-off-by: NChris Mason <chris.mason@fusionio.com>
```
b24baf69

Merge branch 'send-v2' of git://github.com/ablock84/linux-btrfs into for-linus · 113c1cb5

由 Chris Mason 提交于 7月 25, 2012

This is the kernel portion of btrfs send/receive

Conflicts:
	fs/btrfs/Makefile
	fs/btrfs/backref.h
	fs/btrfs/ctree.c
	fs/btrfs/ioctl.c
	fs/btrfs/ioctl.h
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

113c1cb5

Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive · 31db9f7c

由 Alexander Block 提交于 7月 25, 2012

This patch introduces the BTRFS_IOC_SEND ioctl that is
required for send. It allows btrfs-progs to implement
full and incremental sends. Patches for btrfs-progs will
follow.
Signed-off-by: NAlexander Block <ablock84@googlemail.com>
Reviewed-by: NDavid Sterba <dave@jikos.cz>
Reviewed-by: NArne Jansen <sensille@gmx.net>
Reviewed-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Reviewed-by: NAlex Lyakas <alex.bolshoy.btrfs@gmail.com>

31db9f7c

Btrfs: add btrfs_compare_trees function · 7069830a

由 Alexander Block 提交于 6月 05, 2012

This function is used to find the differences between
two trees. The tree compare skips whole subtrees if it
detects shared tree blocks and thus is pretty fast.
Signed-off-by: NAlexander Block <ablock84@googlemail.com>
Reviewed-by: NDavid Sterba <dave@jikos.cz>
Reviewed-by: NArne Jansen <sensille@gmx.net>
Reviewed-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Reviewed-by: NAlex Lyakas <alex.bolshoy.btrfs@gmail.com>

7069830a

Btrfs: introduce subvol uuids and times · 8ea05e3a

由 Alexander Block 提交于 7月 25, 2012

This patch introduces uuids for subvolumes. Each
subvolume has it's own uuid. In case it was snapshotted,
it also contains parent_uuid. In case it was received,
it also contains received_uuid.

It also introduces subvolume ctime/otime/stime/rtime. The
first two are comparable to the times found in inodes. otime
is the origin/creation time and ctime is the change time.
stime/rtime are only valid on received subvolumes.
stime is the time of the subvolume when it was
sent. rtime is the time of the subvolume when it was
received.

Additionally to the times, we have a transid for each
time. They are updated at the same place as the times.

btrfs receive uses stransid and rtransid to find out
if a received subvolume changed in the meantime.

If an older kernel mounts a filesystem with the
extented fields, all fields become invalid. The next
mount with a new kernel will detect this and reset the
fields.
Signed-off-by: NAlexander Block <ablock84@googlemail.com>
Reviewed-by: NDavid Sterba <dave@jikos.cz>
Reviewed-by: NArne Jansen <sensille@gmx.net>
Reviewed-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Reviewed-by: NAlex Lyakas <alex.bolshoy.btrfs@gmail.com>

8ea05e3a

Btrfs: make iref_to_path non static · 91cb916c

由 Alexander Block 提交于 6月 03, 2012

Make iref_to_path non static (needed in send) and rename
it to btrfs_iref_to_path
Signed-off-by: NAlexander Block <ablock84@googlemail.com>

91cb916c

Btrfs: add a barrier before a waitqueue_active check · cd1cfc49

由 Chris Mason 提交于 7月 25, 2012

We were missing wakeups on the delayed ref waitqueue due
to races on waitqueue_active.
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

cd1cfc49

Btrfs: call the ordered free operation without any locks held · e9fbcb42

由 Chris Mason 提交于 7月 25, 2012

Each ordered operation has a free callback, and this was called with the
worker spinlock held.  Josef made the free callback also call iput,
which we can't do with the spinlock.

This drops the spinlock for the free operation and grabs it again before
moving through the rest of the list.  We'll circle back around to this
and find a cleaner way that doesn't bounce the lock around so much.
Signed-off-by: NChris Mason <chris.mason@fusionio.com>
cc: stable@kernel.org

e9fbcb42

Btrfs: Check INCOMPAT flags on remount and add helper function · 2b0ce2c2

由 Mitch Harder 提交于 7月 24, 2012

In support of the recently added capability to remount with lzo
compression, provide a helper function to check the compression
INCOMPAT flags when remounting with lzo compression, and set
the flags if necessary.

Also, implement the new helper function when defragmenting with
explicit lzo compression and when setting the default subvolume.
Signed-off-by: NMitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

2b0ce2c2

Merge branch 'qgroup' of git://git.jan-o-sch.net/btrfs-unstable into for-linus · b478b2ba

由 Chris Mason 提交于 7月 25, 2012

Conflicts:
	fs/btrfs/ioctl.c
	fs/btrfs/ioctl.h
	fs/btrfs/transaction.c
	fs/btrfs/transaction.h
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

b478b2ba

25 7月, 2012 2 次提交

Btrfs: add helper for tree enumeration · e6793769

由 Arne Jansen 提交于 9月 13, 2011

Often no exact match is wanted but just the next lower or
higher item. There's a lot of duplicated code throughout
btrfs to deal with the corner cases. This patch adds a
helper function that can facilitate searching.
Signed-off-by: NArne Jansen <sensille@gmx.net>

e6793769

btrfs: allow cross-subvolume file clone · 362a20c5

由 David Sterba 提交于 8月 01, 2011

Lift the EXDEV condition and allow different root trees for files being
cloned, then pass source inode's root when searching for extents.
Cloning is not allowed to cross vfsmounts, ie. when two subvolumes from
one filesystem are mounted separately.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

362a20c5

24 7月, 2012 9 次提交

Btrfs: improve multi-thread buffer read · 67c9684f

由 Liu Bo 提交于 7月 20, 2012

While testing with my buffer read fio jobs[1], I find that btrfs does not
perform well enough.

Here is a scenario in fio jobs:

We have 4 threads, "t1 t2 t3 t4", starting to buffer read a same file,
and all of them will race on add_to_page_cache_lru(), and if one thread
successfully puts its page into the page cache, it takes the responsibility
to read the page's data.

And what's more, reading a page needs a period of time to finish, in which
other threads can slide in and process rest pages:

     t1          t2          t3          t4
   add Page1
   read Page1  add Page2
     |         read Page2  add Page3
     |            |        read Page3  add Page4
     |            |           |        read Page4
-----|------------|-----------|-----------|--------
     v            v           v           v
    bio          bio         bio         bio

Now we have four bios, each of which holds only one page since we need to
maintain consecutive pages in bio.  Thus, we can end up with far more bios
than we need.

Here we're going to
a) delay the real read-page section and
b) try to put more pages into page cache.

With that said, we can make each bio hold more pages and reduce the number
of bios we need.

Here is some numbers taken from fio results:
         w/o patch                 w patch
       -------------  --------  ---------------
READ:    745MB/s        +25%       934MB/s

[1]:
[global]
group_reporting
thread
numjobs=4
bs=32k
rw=read
ioengine=sync
directory=/mnt/btrfs/

[READ]
filename=foobar
size=2000M
invalidate=1
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

67c9684f

Btrfs: make btrfs's allocation smoothly with preallocation · df57dbe6

由 Liu Bo 提交于 7月 23, 2012

For backref walking, we've introduce delayed ref's sequence.  However,
it changes our preallocation behavior.

The story is that when we preallocate an extent and then mark it written
piece by piece, the ideal case should be that we don't need to COW the
extent, which is why we use 'preallocate'.

But we may not make use of preallocation, since when we check for cross refs on
the extent, we may have two ref entries which have the same content except
the sequence value, and we recognize them as cross refs and do COW to allocate
another extent.

So we end up with several pieces of space instead of an whole extent.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

df57dbe6

Btrfs: lock the transition from dirty to writeback for an eb · 51561ffe

由 Josef Bacik 提交于 7月 20, 2012

There is a small window where an eb can have no IO bits set on it, which
could potentially result in extent_buffer_under_io() returning false when we
want it to return true, which could result in not fun things happening. So
in order to protect this case we need to hold the refs_lock when we make
this transition to make sure we get reliable results out of
extent_buffer_udner_io(). Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

51561ffe

Btrfs: fix potential race in extent buffer freeing · 594831c4

由 Josef Bacik 提交于 7月 20, 2012

This sounds sort of impossible but it is the only thing I can think of and
at the very least it is theoretically possible so here it goes.

If we are in try_release_extent_buffer we will check that the ref count on
the extent buffer is 1 and not under IO, and then go down and clear the tree
ref. If between this check and clearing the tree ref somebody else comes in
and grabs a ref on the eb and the marks it dirty before
try_release_extent_buffer() does it's tree ref clear we can end up with a
dirty eb that will be freed while it is still dirty which will result in a
panic. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

594831c4

Btrfs: don't return true in releasepage unless we actually freed the eb · e64860aa

由 Josef Bacik 提交于 7月 20, 2012

I noticed while looking at an extent_buffer race that we will
unconditionally return 1 if we get down to release_extent_buffer after
clearing the tree ref.  However we can easily race in here and get a ref on
the eb and not actually free the eb.  So make release_extent_buffer return 1
if it free'd the eb and 0 if not so we can be a little kinder to the vm.
Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

e64860aa

Btrfs: suppress printk() if all device I/O stats are zero · a98cdb85

由 Stefan Behrens 提交于 7月 17, 2012

Code is added to suppress the I/O stats printing at mount time if all
statistic values are zero.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>

a98cdb85

Btrfs: remove unwanted printk() for btrfs device I/O stats · 5021976d

由 Stefan Behrens 提交于 7月 17, 2012

People complained about the annoying kernel log message
"btrfs: no dev_stats entry found ... (OK on first mount after mkfs)"
everytime a filesystem is mounted for the first time after running
mkfs. Since the distribution of the btrfs-progs is not synchronized
to the kernel version, mkfs like it is now will be used also in the
future. Then this message is not useful to find errors, it is just
annoying. This commit removes the printk().
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>

5021976d

Btrfs: rewrite BTRFS_SETGET_FUNCS · 18077bb4

由 Li Zefan 提交于 7月 09, 2012

BTRFS_SETGET_FUNCS macro is used to generate btrfs_set_foo() and
btrfs_foo() functions, which read and write specific fields in the
extent buffer.

The total number of set/get functions is ~200, but in fact we only
need 8 functions: 2 for u8 field, 2 for u16, 2 for u32 and 2 for u64.

It results in redunction of ~37K bytes.

   text    data     bss     dec     hex filename
 629661   12489     216  642366   9cd3e fs/btrfs/btrfs.o.orig
 592637   12489     216  605342   93c9e fs/btrfs/btrfs.o
Signed-off-by: NLi Zefan <lizefan@huawei.com>

18077bb4

Btrfs: zero unused bytes in inode item · 293f7e07

由 Li Zefan 提交于 7月 10, 2012

The otime field is not zeroed, so users will see random otime in an old
filesystem with a new kernel which has otime support in the future.

The reserved bytes are also not zeroed, and we'll have compatibility
issue if we make use of those bytes.
Signed-off-by: NLi Zefan <lizefan@huawei.com>

293f7e07

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功