提交 · 2ef9ccbfcb90cf84bdba320a571b18b05c41101b · openanolis / cloud-kernel

31 12月, 2015 1 次提交

bcache: fix a livelock when we cause a huge number of cache misses · 2ef9ccbf

由 Zheng Liu 提交于 11月 29, 2015

Subject :	[PATCH v2] bcache: fix a livelock in btree lock
Date :	Wed, 25 Feb 2015 20:32:09 +0800 (02/25/2015 04:32:09 AM)

This commit tries to fix a livelock in bcache.  This livelock might
happen when we causes a huge number of cache misses simultaneously.

When we get a cache miss, bcache will execute the following path.

->cached_dev_make_request()
  ->cached_dev_read()
    ->cached_lookup()
      ->bch->btree_map_keys()
        ->btree_root()  <------------------------
          ->bch_btree_map_keys_recurse()        |
            ->cache_lookup_fn()                 |
              ->cached_dev_cache_miss()         |
                ->bch_btree_insert_check_key() -|
                  [If btree->seq is not equal to seq + 1, we should return
                   EINTR and traverse btree again.]

In bch_btree_insert_check_key() function we first need to check upgrade
flag (op->lock == -1), and when this flag is true we need to release
read btree->lock and try to take write btree->lock.  During taking and
releasing this write lock, btree->seq will be monotone increased in
order to prevent other threads modify this in cache miss (see btree.h:74).
But if there are some cache misses caused by some requested, we could
meet a livelock because btree->seq is always changed by others.  Thus no
one can make progress.

This commit will try to take write btree->lock if it encounters a race
when we traverse btree.  Although it sacrifice the scalability but we
can ensure that only one can modify the btree.
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Tested-by: NJoshua Schmid <jschmid@suse.com>
Tested-by: NEric Wheeler <bcache@linux.ewheeler.net>
Cc: Joshua Schmid <jschmid@suse.com>
Cc: Zhu Yanhai <zhu.yanhai@gmail.com>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: stable@vger.kernel.org
Signed-off-by: NJens Axboe <axboe@fb.com>

2ef9ccbf

29 7月, 2015 1 次提交

block: add a bi_error field to struct bio · 4246a0b6

由 Christoph Hellwig 提交于 7月 20, 2015

Currently we have two different ways to signal an I/O error on a BIO:

 (1) by clearing the BIO_UPTODATE flag
 (2) by returning a Linux errno value to the bi_end_io callback

The first one has the drawback of only communicating a single possible
error (-EIO), and the second one has the drawback of not beeing persistent
when bios are queued up, and are not passed along from child to parent
bio in the ever more popular chaining scenario.  Having both mechanisms
available has the additional drawback of utterly confusing driver authors
and introducing bugs where various I/O submitters only deal with one of
them, and the others have to add boilerplate code to deal with both kinds
of error returns.

So add a new bi_error field to store an errno value directly in struct
bio and remove the existing mechanisms to clean all this up.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

4246a0b6

05 8月, 2014 6 次提交

bcache: try to set b->parent properly · 2452cc89

由 Slava Pestov 提交于 7月 12, 2014

bcache_flash_dev.ktest would reliably crash with 8k and 16k bucket size
before; now it passes.

Change-Id: Ib542232235e39298c3a7548fe52b645cabb823d1

2452cc89

bcache: fix use-after-free in btree_gc_coalesce() · 400ffaa2

由 Slava Pestov 提交于 7月 12, 2014

If we goto out_nocoalesce after we free new_nodes[0], we end up freeing
new_nodes[0] again. This was generating a lockdep warning. The fix is
to set new_nodes[0] to NULL, since the out_nocoalesce path safely
ignores NULL entries in the new_nodes array.

This regression was introduced in 2d7f9531.

Change-Id: I76564d7257800583214376b4bacf236cda90c89c

400ffaa2

S
bcache: fix crash in bcache_btree_node_alloc_fail tracepoint · 913dc33f
由 Slava Pestov 提交于 5月 23, 2014
```
'b' was NULL.

Change-Id: Icac0fd04afa2d23f213d96d51afd53374e6dd0c0
```
913dc33f

bcache: Allocate bounce buffers with GFP_NOWAIT · 501d52a9

由 Kent Overstreet 提交于 5月 19, 2014

There's no point in blocking on these allocations, since our fallback paths will
probably go faster than blocking.

Change-Id: I733ca202c25cb36bde02607a0a60552229a4241c

501d52a9

bcache: Make sure to pass GFP_WAIT to mempool_alloc() · bcf090e0

由 Kent Overstreet 提交于 5月 19, 2014

this was very wrong - mempool_alloc() only guarantees success with GFP_WAIT.
bcache uses GFP_NOWAIT in various other places where we have a fallback,
circuits must've gotten crossed when writing this code or something.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

bcf090e0

bcache: wait for buckets when allocating new btree root · c5aa4a31

由 Slava Pestov 提交于 4月 21, 2014

Tested:
- sometimes bcache_tier test would hang on startup with a failure
  to allocate the btree root -- no longer seeing this
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

c5aa4a31

19 3月, 2014 10 次提交

bcache: Kill bucket->gc_gen · 3a2fd9d5

由 Kent Overstreet 提交于 2月 27, 2014

gc_gen was a temporary used to recalculate last_gc, but since we only need
bucket->last_gc when gc isn't running (gc_mark_valid = 1), we can just update
last_gc directly.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

3a2fd9d5

bcache: Kill unused freelist · 2531d9ee

由 Kent Overstreet 提交于 3月 17, 2014

This was originally added as at optimization that for various reasons isn't
needed anymore, but it does add a lot of nasty corner cases (and it was
responsible for some recently fixed bugs). Just get rid of it now.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

2531d9ee

bcache: Rework btree cache reserve handling · 0a63b66d

由 Kent Overstreet 提交于 3月 17, 2014

This changes the bucket allocation reserves to use _real_ reserves - separate
freelists - instead of watermarks, which if nothing else makes the current code
saner to reason about and is going to be important in the future when we add
support for multiple btrees.

It also adds btree_check_reserve(), which checks (and locks) the reserves for
both bucket allocation and memory allocation for btree nodes; the old code just
kinda sorta assumed that since (e.g. for btree node splits) it had the root
locked and that meant no other threads could try to make use of the same
reserve; this technically should have been ok for memory allocation (we should
always have a reserve for memory allocation (the btree node cache is used as a
reserve and we preallocate it)), but multiple btrees will mean that locking the
root won't be sufficient anymore, and for the bucket allocation reserve it was
technically possible for the old code to deadlock.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

0a63b66d

bcache: Kill btree_io_wq · 56b30770

由 Kent Overstreet 提交于 1月 23, 2014

With the locking rework in the last patch, this shouldn't be needed anymore -
btree_node_write_work() only takes b->write_lock which is never held for very
long.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

56b30770

bcache: btree locking rework · 2a285686

由 Kent Overstreet 提交于 3月 04, 2014

Add a new lock, b->write_lock, which is required to actually modify - or write -
a btree node; this lock is only held for short durations.

This means we can write out a btree node without taking b->lock, which _is_ held
for long durations - solving a deadlock when btree_flush_write() (from the
journalling code) is called with a btree node locked.

Right now just occurs in bch_btree_set_root(), but with an upcoming journalling
rework is going to happen a lot more.

This also turns b->lock is now more of a read/intent lock instead of a
read/write lock - but not completely, since it still blocks readers. May turn it
into a real intent lock at some point in the future.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

2a285686

bcache: Fix a race when freeing btree nodes · 05335cff

由 Kent Overstreet 提交于 3月 17, 2014

This isn't a bulletproof fix; btree_node_free() -> bch_bucket_free() puts the
bucket on the unused freelist, where it can be reused right away without any
ordering requirements. It would be better to wait on at least a journal write to
go down before reusing the bucket. bch_btree_set_root() does this, and inserting
into non leaf nodes is completely synchronous so we should be ok, but future
patches are just going to get rid of the unused freelist - it was needed in the
past for various reasons but shouldn't be anymore.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

05335cff

bcache: Add a real GC_MARK_RECLAIMABLE · 4fe6a816

由 Kent Overstreet 提交于 3月 13, 2014

This means the garbage collection code can better check for data and metadata
pointers to the same buckets.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

4fe6a816

bcache: Kill dead cgroup code · 3f5e0a34

由 Kent Overstreet 提交于 1月 23, 2014

This hasn't been used or even enabled in ages.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

3f5e0a34

bcache: Fix another bug recovering from unclean shutdown · 487dded8

由 Kent Overstreet 提交于 3月 17, 2014

The on disk bucket gens are allowed to be out of date, when we reuse buckets
that didn't have any live data in them. To deal with this, the initial gc has to
update the bucket gen when we find a pointer gen newer than the bucket's gen.

Unfortunately we weren't doing this for pointers in the journal that we're about
to replay.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

487dded8

bcache: Fix a bug recovering from unclean shutdown · 0bd143fd

由 Kent Overstreet 提交于 3月 04, 2014

The code to fixup incorrect bucket prios incorrectly did not skip btree node
freeing keys
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

0bd143fd

30 1月, 2014 2 次提交

K
bcache: Minor fixes from kbuild robot · 3572324a
由 Kent Overstreet 提交于 1月 10, 2014
```
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
```
3572324a

bcache: fix BUG_ON due to integer overflow with GC_SECTORS_USED · 94717447

由 Darrick J. Wong 提交于 1月 28, 2014

The BUG_ON at the end of __bch_btree_mark_key can be triggered due to
an integer overflow error:

BITMASK(GC_SECTORS_USED, struct bucket, gc_mark, 2, 13);
...
SET_GC_SECTORS_USED(g, min_t(unsigned,
	     GC_SECTORS_USED(g) + KEY_SIZE(k),
	     (1 << 14) - 1));
BUG_ON(!GC_SECTORS_USED(g));

In bcache.h, the SECTORS_USED bitfield is defined to be 13 bits wide.
While the SET_ code tries to ensure that the field doesn't overflow by
clamping it to (1<<14)-1 == 16383, this is incorrect because 16383
requires 14 bits.  Therefore, if GC_SECTORS_USED() + KEY_SIZE() =
8192, the SET_ statement tries to store 8192 into a 13-bit field.  In
a 13-bit field, 8192 becomes zero, thus triggering the BUG_ON.

Therefore, create a field width constant and a max value constant, and
use those to create the bitfield and check the inputs to
SET_GC_SECTORS_USED.  Arguably the BITMASK() template ought to have
BUG_ON checks for too-large values, but that's a separate patch.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

94717447

09 1月, 2014 19 次提交

bcache: Don't return -EINTR when insert finished · 3b3e9e50

由 Kent Overstreet 提交于 12月 07, 2013

We need to return -EINTR after a split because we invalidated iterators
(and freed the btree node) - but if we were finished inserting, we don't
want to redo the traversal.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

3b3e9e50

bcache: Move insert_fixup() to btree_keys_ops · 829a60b9

由 Kent Overstreet 提交于 11月 11, 2013

Now handling overlapping extents/keys is a method that's specific to what the
btree node contains.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

829a60b9

bcache: Convert sorting to btree_keys · 89ebb4a2

由 Kent Overstreet 提交于 11月 11, 2013

More work to disentangle various code from struct btree
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

89ebb4a2

bcache: Convert debug code to btree_keys · dc9d98d6

由 Kent Overstreet 提交于 12月 17, 2013

More work to disentangle various code from struct btree
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

dc9d98d6

K
bcache: Convert btree_iter to struct btree_keys · c052dd9a
由 Kent Overstreet 提交于 11月 11, 2013
```
More work to disentangle bset.c from struct btree
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
```
c052dd9a

bcache: Add bch_btree_keys_u64s_remaining() · 59158fde

由 Kent Overstreet 提交于 11月 11, 2013

Helper function to explicitly check how much space is free in a btree node
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

59158fde

bcache: Add struct btree_keys · a85e968e

由 Kent Overstreet 提交于 12月 20, 2013

Soon, bset.c won't need to depend on struct btree.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

a85e968e

K
bcache: Abstract out stuff needed for sorting · 65d45231
由 Kent Overstreet 提交于 12月 20, 2013
```
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
```
65d45231

bcache: Rename/shuffle various code around · ee811287

由 Kent Overstreet 提交于 12月 17, 2013

More work to disentangle bset.c from the rest of the code:
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

ee811287

bcache: Add struct bset_sort_state · 67539e85

由 Kent Overstreet 提交于 9月 10, 2013

More disentangling bset.c from the rest of the bcache code - soon, the
sorting routines won't have any dependencies on any outside structs.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

67539e85

bcache: Split out sort_extent_cmp() · 911c9610

由 Kent Overstreet 提交于 7月 28, 2013

Only use extent comparison for comparing extents, so we're not using
START_KEY() on other key types (i.e. btree pointers)
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

911c9610

bcache: Bkey indexing renaming · fafff81c

由 Kent Overstreet 提交于 12月 17, 2013

More refactoring:

node() -> bset_bkey_idx()
end() -> bset_bkey_last()
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

fafff81c

bcache: Make bch_keylist_realloc() take u64s, not nptrs · 085d2a3d

由 Kent Overstreet 提交于 11月 11, 2013

Getting away from KEY_PTRS and moving toward KEY_U64s - and getting rid of magic
2s

Also - split out the part that checks against journal entry size so as to avoid
a dependancy on struct cache_set in bset.c
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

085d2a3d

bcache: Btree verify code improvements · 78b77bf8

由 Kent Overstreet 提交于 12月 17, 2013

Used this fixed code to find and fix the bug fixed by
a4d885097b0ac0cd1337f171f2d4b83e946094d4.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

78b77bf8

bcache: kill index() · 88b9f8c4

由 Kent Overstreet 提交于 12月 17, 2013

That was a terrible name for a macro, add some better helpers to replace it.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

88b9f8c4

bcache: Do bkey_put() in btree_split() error path · 5f5837d2

由 Kent Overstreet 提交于 12月 16, 2013

This error path shouldn't have been hit in practice.. and we've got reworked
reserve code coming soon so that it shouldn't _ever_ be bit... but if we've got
code for this error path it should be correct.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

5f5837d2

bcache: Rework allocator reserves · 78365411

由 Kent Overstreet 提交于 12月 17, 2013

We need a reserve for allocating buckets for new btree nodes - and now that
we've got multiple btrees, it really needs to be per btree.

This reworks the reserves so we've got separate freelists for each reserve
instead of watermarks, which seems to make things a bit cleaner, and it adds
some code so that btree_split() can make sure the reserve is available before it
starts.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

78365411

K
bcache: kill closure locking usage · cb7a583e
由 Kent Overstreet 提交于 12月 16, 2013
```
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
```
cb7a583e
K
bcache: Minor btree cache fix · b0f32a56
由 Kent Overstreet 提交于 12月 10, 2013
```
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
```
b0f32a56

17 12月, 2013 1 次提交

bcache: fix for gc and writeback race · bf0a628a

由 Nicholas Swenson 提交于 11月 26, 2013

Garbage collector needs to check keys in the writeback keybuf to
make sure it's not invalidating buckets to which the writeback
keys point to.
Signed-off-by: NNicholas Swenson <nks@daterainc.com>
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

bf0a628a

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功