提交 · 962197babeccc1f4cc8aa28ad844df80bdc85ed0 · xiphi1978 / linux

02 10月, 2012 3 次提交

btrfs: polish names of kmem caches · 837e1972

由 David Sterba 提交于 9月 07, 2012

Usecase:

  watch 'grep btrfs < /proc/slabinfo'

easy to watch all caches in one go.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

837e1972

Btrfs: use flag EXTENT_DEFRAG for snapshot-aware defrag · 9e8a4a8b

由 Liu Bo 提交于 9月 05, 2012

We're going to use this flag EXTENT_DEFRAG to indicate which range
belongs to defragment so that we can implement snapshow-aware defrag:

We set the EXTENT_DEFRAG flag when dirtying the extents that need
defragmented, so later on writeback thread can differentiate between
normal writeback and writeback started by defragmentation.
Original-Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>

9e8a4a8b

Btrfs: fix btrfs send for inline items and compression · 74dd17fb

由 Chris Mason 提交于 8月 07, 2012

The btrfs send code was assuming the offset of the file item into the
extent translated to bytes on disk.  If we're compressed, this isn't
true, and so it was off into extents owned by other files.

It was also improperly handling inline extents.  This solves a crash
where we may have gone past the end of the file extent item by not
testing early enough for an inline extent.  It also solves problems
where we have a whole between the end of the inline item and the start
of the full extent.
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

74dd17fb

29 8月, 2012 1 次提交

Btrfs: revert checksum error statistic which can cause a BUG() · 5ee0844d

由 Stefan Behrens 提交于 8月 27, 2012

Commit 442a4f63 added btrfs device
statistic counters for detected IO and checksum errors to Linux 3.5.
The statistic part that counts checksum errors in
end_bio_extent_readpage() can cause a BUG() in a subfunction:
"kernel BUG at fs/btrfs/volumes.c:3762!"
That part is reverted with the current patch.
However, the counting of checksum errors in the scrub context remains
active, and the counting of detected IO errors (read, write or flush
errors) in all contexts remains active.

Cc: stable <stable@vger.kernel.org> # 3.5
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

5ee0844d

24 7月, 2012 5 次提交

Btrfs: improve multi-thread buffer read · 67c9684f

由 Liu Bo 提交于 7月 20, 2012

While testing with my buffer read fio jobs[1], I find that btrfs does not
perform well enough.

Here is a scenario in fio jobs:

We have 4 threads, "t1 t2 t3 t4", starting to buffer read a same file,
and all of them will race on add_to_page_cache_lru(), and if one thread
successfully puts its page into the page cache, it takes the responsibility
to read the page's data.

And what's more, reading a page needs a period of time to finish, in which
other threads can slide in and process rest pages:

     t1          t2          t3          t4
   add Page1
   read Page1  add Page2
     |         read Page2  add Page3
     |            |        read Page3  add Page4
     |            |           |        read Page4
-----|------------|-----------|-----------|--------
     v            v           v           v
    bio          bio         bio         bio

Now we have four bios, each of which holds only one page since we need to
maintain consecutive pages in bio.  Thus, we can end up with far more bios
than we need.

Here we're going to
a) delay the real read-page section and
b) try to put more pages into page cache.

With that said, we can make each bio hold more pages and reduce the number
of bios we need.

Here is some numbers taken from fio results:
         w/o patch                 w patch
       -------------  --------  ---------------
READ:    745MB/s        +25%       934MB/s

[1]:
[global]
group_reporting
thread
numjobs=4
bs=32k
rw=read
ioengine=sync
directory=/mnt/btrfs/

[READ]
filename=foobar
size=2000M
invalidate=1
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

67c9684f

Btrfs: lock the transition from dirty to writeback for an eb · 51561ffe

由 Josef Bacik 提交于 7月 20, 2012

There is a small window where an eb can have no IO bits set on it, which
could potentially result in extent_buffer_under_io() returning false when we
want it to return true, which could result in not fun things happening. So
in order to protect this case we need to hold the refs_lock when we make
this transition to make sure we get reliable results out of
extent_buffer_udner_io(). Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

51561ffe

Btrfs: fix potential race in extent buffer freeing · 594831c4

由 Josef Bacik 提交于 7月 20, 2012

This sounds sort of impossible but it is the only thing I can think of and
at the very least it is theoretically possible so here it goes.

If we are in try_release_extent_buffer we will check that the ref count on
the extent buffer is 1 and not under IO, and then go down and clear the tree
ref. If between this check and clearing the tree ref somebody else comes in
and grabs a ref on the eb and the marks it dirty before
try_release_extent_buffer() does it's tree ref clear we can end up with a
dirty eb that will be freed while it is still dirty which will result in a
panic. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

594831c4

Btrfs: don't return true in releasepage unless we actually freed the eb · e64860aa

由 Josef Bacik 提交于 7月 20, 2012

I noticed while looking at an extent_buffer race that we will
unconditionally return 1 if we get down to release_extent_buffer after
clearing the tree ref.  However we can easily race in here and get a ref on
the eb and not actually free the eb.  So make release_extent_buffer return 1
if it free'd the eb and 0 if not so we can be a little kinder to the vm.
Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

e64860aa

A
btrfs read error corrected message floods the console during recovery · d5b025d5
由 Anand Jain 提交于 7月 02, 2012
```
Changing printk_in_rcu to printk_ratelimited_in_rcu will suffice
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
```
d5b025d5

12 7月, 2012 1 次提交

Btrfs: fix typo in convert_extent_bit · 10983f2e

由 Liu Bo 提交于 7月 11, 2012

It should be convert_extent_bit.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

10983f2e

03 7月, 2012 1 次提交

Btrfs: hold a ref on the inode during writepages · 7fd1a3f7

由 Josef Bacik 提交于 6月 27, 2012

We can race with unlink and not actually be able to do our igrab in
btrfs_add_ordered_extent. This will result in all sorts of problems.
Instead of doing the complicated work to try and handle returning an error
properly from btrfs_add_ordered_extent, just hold a ref to the inode during
writepages. If we cannot grab a ref we know we're freeing this inode anyway
and can just drop the dirty pages on the floor, because screw them we're
going to invalidate them anyway. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

7fd1a3f7

15 6月, 2012 1 次提交

Btrfs: use rcu to protect device->name · 606686ee

由 Josef Bacik 提交于 6月 04, 2012

Al pointed out that we can just toss out the old name on a device and add a
new one arbitrarily, so anybody who uses device->name in printk could
possibly use free'd memory. Instead of adding locking around all of this he
suggested doing it with RCU, so I've introduced a struct rcu_string that
does just that and have gone through and protected all accesses to
device->name that aren't under the uuid_mutex with rcu_read_lock(). This
protects us and I will use it for dealing with removing the device that we
used to mount the file system in a later patch. Thanks,
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <josef@redhat.com>

606686ee

30 5月, 2012 4 次提交

Btrfs: add device counters for detected IO and checksum errors · 442a4f63

由 Stefan Behrens 提交于 5月 25, 2012

The goal is to detect when drives start to get an increased error rate,
when drives should be replaced soon. Therefore statistic counters are
added that count IO errors (read, write and flush). Additionally, the
software detected errors like checksum errors and corrupted blocks are
counted.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>

442a4f63

Btrfs: use fastpath in extent state ops as much as possible · d1ac6e41

由 Liu Bo 提交于 5月 10, 2012

Fully utilize our extent state's new helper functions to use
fastpath as much as possible.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Reviewed-by: NJosef Bacik <josef@redhat.com>

d1ac6e41

Btrfs: finish ordered extents in their own thread · 5fd02043

由 Josef Bacik 提交于 5月 02, 2012

We noticed that the ordered extent completion doesn't really rely on having
a page and that it could be done independantly of ending the writeback on a
page. This patch makes us not do the threaded endio stuff for normal
buffered writes and direct writes so we can end page writeback as soon as
possible (in irq context) and only start threads to do the ordered work when
it is actually done. Compression needs to be reworked some to take
advantage of this as well, but atm it has to do a find_get_page in its endio
handler so it must be done in its own thread. This makes direct writes
quite a bit faster. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

5fd02043

Btrfs: fix compile warnings in extent_io.c · d7dbe9e7

由 Josef Bacik 提交于 4月 23, 2012

These warnings are bogus since we will always have at least one page in an
eb, but to make the compiler happy just set ret = 0 in these two cases.
Thanks,
Btrfs: fix compile warnings in extent_io.c

These warnings are bogus since we will always have at least one page in an
eb, but to make the compiler happy just set ret = 0 in these two cases.
Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

d7dbe9e7

26 5月, 2012 1 次提交

Btrfs: dummy extent buffers for tree mod log · 815a51c7

由 Jan Schmidt 提交于 5月 16, 2012

The tree modification log needs two ways to create dummy extent buffers,
once by allocating a fresh one (to rebuild an old root) and once by
cloning an existing one (to make private rewind modifications) to it.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

815a51c7

11 5月, 2012 4 次提交

Btrfs: remove the useless assignment to *entry in function tree_insert of file extent_io.c · fd5e62a3

由 Wang Sheng-Hui 提交于 4月 06, 2012

In tree_insert, var *entry is used in the loop only, and is useless
out of the loop. Remove the useless assignment after the loop.
Signed-off-by: NWang Sheng-Hui <shhuiw@gmail.com>

fd5e62a3

Btrfs: fix the comment for find_first_extent_bit · 477d7eaf

由 Wang Sheng-Hui 提交于 4月 06, 2012

The return value of find_first_extent_bit is 1 or 0, no < 0.
And if found something, return 0; if nothing was found, return 1.
Fix the comment.
Signed-off-by: NWang Sheng-Hui <shhuiw@gmail.com>

477d7eaf

Btrfs: fix btrfs_release_extent_buffer_page with the right usage of num_extent_pages · 39bab87b

由 Wang Sheng-Hui 提交于 4月 06, 2012

num_extent_pages returns the number of pages in the specific range, not
the index of the last page in the eb range.

btrfs_release_extent_buffer_page is called with start_idx set 0 in current
codes, so it's not a problem yet. But the logic is indeed wrong.

Fix it here.
Signed-off-by: NWang Sheng-Hui <shhuiw@gmail.com>

39bab87b

W
Btrfs: cleanup the comment for clear_state_bit in extent_io.c · 1b303fc0
由 Wang Sheng-Hui 提交于 4月 06, 2012
```
No 'delete' arg is used for clear_state_bit.
Cleanup the comment.
Signed-off-by: NWang Sheng-Hui <shhuiw@gmail.com>
```
1b303fc0

05 5月, 2012 1 次提交

Btrfs: fix page leak when allocing extent buffers · 17de39ac

由 Josef Bacik 提交于 5月 04, 2012

If we happen to alloc a extent buffer and then alloc a page and notice that
page is already attached to an extent buffer, we will only unlock it and
free our existing eb. Any pages currently attached to that eb will be
properly freed, but we don't do the page_cache_release() on the page where
we noticed the other extent buffer which can cause us to leak pages and I
hope cause the weird issues we've been seeing in this area. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

17de39ac

19 4月, 2012 3 次提交

Btrfs: always store the mirror we read the eb from · 5cf1ab56

由 Josef Bacik 提交于 4月 16, 2012

A user reported a panic where we were trying to fix a bad mirror but the
mirror number we were giving was 0, which is invalid. This is because we
don't do the transid verification until after the read, so as far as the
read code is concerned the read was a success. So instead store the mirror
we read from so that if there is some failure post read we know which mirror
to try next and which mirror needs to be fixed if we find a good copy of the
block. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

5cf1ab56

Btrfs: avoid possible use-after-free in clear_extent_bit() · cdc6a395

由 Li Zefan 提交于 3月 12, 2012

clear_extent_bit()
{
    next_node = rb_next(&state->rb_node);
    ...
    clear_state_bit(state);  <-- this may free next_node
    if (next_node) {
        state = rb_entry(next_node);
        ...
    }
}

clear_state_bit() calls merge_state() which may free the next node
of the passing extent_state, so clear_extent_bit() may end up
referencing freed memory.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>

cdc6a395

Btrfs: retrurn void from clear_state_bit · 8e52acf7

由 Li Zefan 提交于 3月 12, 2012

Currently it returns a set of bits that were cleared, but this return
value is not used at all.

Moreover it doesn't seem to be useful, because we may clear the bits
of a few extent_states, but only the cleared bits of last one is
returned.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>

8e52acf7

13 4月, 2012 2 次提交

Btrfs: check return value of bio_alloc() properly · e627ee7b

由 Tsutomu Itoh 提交于 4月 12, 2012

bio_alloc() has the possibility of returning NULL.
So, it is necessary to check the return value.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e627ee7b

Btrfs: fix uninit variable in repair_eb_io_failure · d95603b2

由 Chris Mason 提交于 4月 12, 2012

We'd have to be passing bogus extent buffers for this uninit variable to
actually be used, but set it to zero just in case.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d95603b2

27 3月, 2012 8 次提交

Btrfs: deal with read errors on extent buffers differently · ea466794

由 Josef Bacik 提交于 3月 26, 2012

Since we need to read and write extent buffers in their entirety we can't use
the normal bio_readpage_error stuff since it only works on a per page basis. So
instead make it so that if we see an io error in endio we just mark the eb as
having an IO error and then in btree_read_extent_buffer_pages we will manually
try other mirrors and then overwrite the bad mirror if we find a good copy.
This works with larger than page size blocks. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ea466794

Btrfs: loop waiting on writeback · a098d8e8

由 Chris Mason 提交于 3月 21, 2012

lock_extent_buffer_for_io needs to loop around and make sure the
writeback bits are not set.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a098d8e8

Btrfs: ensure an entire eb is written at once · 0b32f4bb

由 Josef Bacik 提交于 3月 13, 2012

This patch simplifies how we track our extent buffers. Previously we could exit
writepages with only having written half of an extent buffer, which meant we had
to track the state of the pages and the state of the extent buffers differently.
Now we only read in entire extent buffers and write out entire extent buffers,
this allows us to simply set bits in our bflags to indicate the state of the eb
and we no longer have to do things like track uptodate with our iotree. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

0b32f4bb

Btrfs: introduce mark_extent_buffer_accessed · 5df4235e

由 Josef Bacik 提交于 3月 15, 2012

Because an eb can have multiple pages we need to make sure that all pages within
the eb are markes as accessed, since releasepage can be called against any page
in the eb. This will keep us from possibly evicting hot eb's when we're doing
larger than pagesize eb's. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

5df4235e

Btrfs: introduce free_extent_buffer_stale · 3083ee2e

由 Josef Bacik 提交于 3月 09, 2012

Because btrfs cow's we can end up with extent buffers that are no longer
necessary just sitting around in memory. So instead of evicting these pages, we
could end up evicting things we actually care about. Thus we have
free_extent_buffer_stale for use when we are freeing tree blocks. This will
make it so that the ref for the eb being in the radix tree is dropped as soon as
possible and then is freed when the refcount hits 0 instead of waiting to be
released by releasepage. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

3083ee2e

Btrfs: only use the existing eb if it's count isn't 0 · 115391d2

由 Josef Bacik 提交于 3月 09, 2012

We can run into a problem where we find an eb for our existing page already on
the radix tree but it has a ref count of 0. It hasn't yet been removed by RCU
yet so this can cause issues where we will use the EB after free. So do
atomic_inc_not_zero on the exists->refs and if it is zero just do
synchronize_rcu() and try again. We won't have to worry about new allocators
coming in since they will block on the page lock at this point. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

115391d2

Btrfs: set page->private to the eb · 4f2de97a

由 Josef Bacik 提交于 3月 07, 2012

We spend a lot of time looking up extent buffers from pages when we could just
store the pointer to the eb the page is associated with in page->private. This
patch does just that, and it makes things a little simpler and reduces a bit of
CPU overhead involved with doing metadata IO. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

4f2de97a

Btrfs: allow metadata blocks larger than the page size · 727011e0

由 Chris Mason 提交于 8月 06, 2010

A few years ago the btrfs code to support blocks lager than
the page size was disabled to fix a few corner cases in the
page cache handling.  This fixes the code to properly support
large metadata blocks again.

Since current kernels will crash early and often with larger
metadata blocks, this adds an incompat bit so that older kernels
can't mount it.

This also does away with different blocksizes for nodes and leaves.
You get a single block size for all tree blocks.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

727011e0

22 3月, 2012 5 次提交

btrfs: replace many BUG_ONs with proper error handling · 79787eaa

由 Jeff Mahoney 提交于 3月 12, 2012

 btrfs currently handles most errors with BUG_ON. This patch is a work-in-
 progress but aims to handle most errors other than internal logic
 errors and ENOMEM more gracefully.

 This iteration prevents most crashes but can run into lockups with
 the page lock on occasion when the timing "works out."
Signed-off-by: NJeff Mahoney <jeffm@suse.com>

79787eaa

btrfs: split extent_state ops · 3fbe5c02

由 Jeff Mahoney 提交于 3月 01, 2012

set_extent_bit can do exclusive locking but only when called by lock_extent*,

Drop the exclusive bits argument except when called by lock_extent.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>

3fbe5c02

btrfs: drop gfp_t from lock_extent · d0082371

由 Jeff Mahoney 提交于 3月 01, 2012

 lock_extent and unlock_extent are always called with GFP_NOFS, drop the
 argument and use GFP_NOFS consistently.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>

d0082371

J
btrfs: return void in functions without error conditions · 143bede5
由 Jeff Mahoney 提交于 3月 01, 2012
```
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
```
143bede5

btrfs: ->submit_bio_hook error push-up · 355808c2

由 Jeff Mahoney 提交于 10月 03, 2011

This pushes failures from the submit_bio_hook callbacks,
btrfs_submit_bio_hook and btree_submit_bio_hook into the callers, including
callers of submit_one_bio where it catches the failures with BUG_ON.

It also pushes up through the ->readpage_io_failed_hook to
end_bio_extent_writepage where the error is already caught with BUG_ON.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>

355808c2