提交 · 2ef9ccbfcb90cf84bdba320a571b18b05c41101b · openanolis / cloud-kernel

31 12月, 2015 1 次提交

bcache: fix a livelock when we cause a huge number of cache misses · 2ef9ccbf

由 Zheng Liu 提交于 11月 29, 2015

Subject :	[PATCH v2] bcache: fix a livelock in btree lock
Date :	Wed, 25 Feb 2015 20:32:09 +0800 (02/25/2015 04:32:09 AM)

This commit tries to fix a livelock in bcache.  This livelock might
happen when we causes a huge number of cache misses simultaneously.

When we get a cache miss, bcache will execute the following path.

->cached_dev_make_request()
  ->cached_dev_read()
    ->cached_lookup()
      ->bch->btree_map_keys()
        ->btree_root()  <------------------------
          ->bch_btree_map_keys_recurse()        |
            ->cache_lookup_fn()                 |
              ->cached_dev_cache_miss()         |
                ->bch_btree_insert_check_key() -|
                  [If btree->seq is not equal to seq + 1, we should return
                   EINTR and traverse btree again.]

In bch_btree_insert_check_key() function we first need to check upgrade
flag (op->lock == -1), and when this flag is true we need to release
read btree->lock and try to take write btree->lock.  During taking and
releasing this write lock, btree->seq will be monotone increased in
order to prevent other threads modify this in cache miss (see btree.h:74).
But if there are some cache misses caused by some requested, we could
meet a livelock because btree->seq is always changed by others.  Thus no
one can make progress.

This commit will try to take write btree->lock if it encounters a race
when we traverse btree.  Although it sacrifice the scalability but we
can ensure that only one can modify the btree.
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Tested-by: NJoshua Schmid <jschmid@suse.com>
Tested-by: NEric Wheeler <bcache@linux.ewheeler.net>
Cc: Joshua Schmid <jschmid@suse.com>
Cc: Zhu Yanhai <zhu.yanhai@gmail.com>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: stable@vger.kernel.org
Signed-off-by: NJens Axboe <axboe@fb.com>

2ef9ccbf

08 11月, 2015 1 次提交

block: change ->make_request_fn() and users to return a queue cookie · dece1635

由 Jens Axboe 提交于 11月 05, 2015

No functional changes in this patch, but it prepares us for returning
a more useful cookie related to the IO that was queued up.
Signed-off-by: NJens Axboe <axboe@fb.com>
Acked-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NKeith Busch <keith.busch@intel.com>

dece1635

06 11月, 2015 1 次提交

bcache: Really show state of work pending bit · 8d090f47

由 Petr Mladek 提交于 10月 05, 2015

WORK_STRUCT_PENDING is a mask for testing the pending bit.
test_bit() expects the number of the bit and we need to
use WORK_STRUCT_PENDING_BIT there.

Also work_data_bits() is defined in workqueues.h now.

I have noticed this just by chance when looking how
WORK_STRUCT_PENDING_BIT is used. The change is compile
tested.
Signed-off-by: NPetr Mladek <pmladek@suse.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

8d090f47

14 8月, 2015 1 次提交

bcache: remove driver private bio splitting code · 749b61da

由 Kent Overstreet 提交于 11月 23, 2013

The bcache driver has always accepted arbitrarily large bios and split
them internally.  Now that every driver must accept arbitrarily large
bios this code isn't nessecary anymore.

Cc: linux-bcache@vger.kernel.org
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
[dpark: add more description in commit message]
Signed-off-by: NDongsu Park <dpark@posteo.net>
Signed-off-by: NMing Lin <ming.l@ssi.samsung.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

749b61da

29 7月, 2015 1 次提交

block: add a bi_error field to struct bio · 4246a0b6

由 Christoph Hellwig 提交于 7月 20, 2015

Currently we have two different ways to signal an I/O error on a BIO:

 (1) by clearing the BIO_UPTODATE flag
 (2) by returning a Linux errno value to the bi_end_io callback

The first one has the drawback of only communicating a single possible
error (-EIO), and the second one has the drawback of not beeing persistent
when bios are queued up, and are not passed along from child to parent
bio in the ever more popular chaining scenario.  Having both mechanisms
available has the additional drawback of utterly confusing driver authors
and introducing bugs where various I/O submitters only deal with one of
them, and the others have to add boilerplate code to deal with both kinds
of error returns.

So add a new bi_error field to store an errno value directly in struct
bio and remove the existing mechanisms to clean all this up.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

4246a0b6

17 7月, 2015 1 次提交

block: have drivers use blk_queue_max_discard_sectors() · 2bb4cd5c

由 Jens Axboe 提交于 7月 14, 2015

Some drivers use it now, others just set the limits field manually.
But in preparation for splitting this into a hard and soft limit,
ensure that they all call the proper function for setting the hw
limit for discards.
Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

2bb4cd5c

11 7月, 2015 1 次提交

bcache: don't embed 'return' statements in closure macros · 77b5a084

由 Jens Axboe 提交于 3月 06, 2015

This is horribly confusing, it breaks the flow of the code without
it being apparent in the caller.
Signed-off-by: NJens Axboe <axboe@fb.com>
Acked-by: NChristoph Hellwig <hch@lst.de>

77b5a084

01 7月, 2015 2 次提交

MAINTAINERS: BCACHE: Kent Overstreet has changed email address · d1aa1ab3

由 Joe Perches 提交于 6月 30, 2015

Kent's email address in MAINTAINERS seems to be invalid.
This was his last sign-off address, so use that if appropriate.

Fix the S: status entry while there.
Signed-off-by: NJoe Perches <joe@perches.com>
Acked-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d1aa1ab3

bcache: use kvfree() in various places · 958b4338

由 Pekka Enberg 提交于 6月 30, 2015

Use kvfree() instead of open-coding it.
Signed-off-by: NPekka Enberg <penberg@kernel.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

958b4338

02 6月, 2015 1 次提交

writeback: separate out include/linux/backing-dev-defs.h · 66114cad

由 Tejun Heo 提交于 5月 22, 2015

With the planned cgroup writeback support, backing-dev related
declarations will be more widely used across block and cgroup;
unfortunately, including backing-dev.h from include/linux/blkdev.h
makes cyclic include dependency quite likely.

This patch separates out backing-dev-defs.h which only has the
essential definitions and updates blkdev.h to include it.  c files
which need access to more backing-dev details now include
backing-dev.h directly.  This takes backing-dev.h off the common
include dependency chain making it a lot easier to use it across block
and cgroup.

v2: fs/fat build failure fixed.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NJens Axboe <axboe@fb.com>

66114cad

22 5月, 2015 1 次提交

block: remove management of bi_remaining when restoring original bi_end_io · 326e1dbb

由 Mike Snitzer 提交于 5月 22, 2015

Commit c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for
non-chains") regressed all existing callers that followed this pattern:
 1) saving a bio's original bi_end_io
 2) wiring up an intermediate bi_end_io
 3) restoring the original bi_end_io from intermediate bi_end_io
 4) calling bio_endio() to execute the restored original bi_end_io

The regression was due to BIO_CHAIN only ever getting set if
bio_inc_remaining() is called.  For the above pattern it isn't set until
step 3 above (step 2 would've needed to establish BIO_CHAIN).  As such
the first bio_endio(), in step 2 above, never decremented __bi_remaining
before calling the intermediate bi_end_io -- leaving __bi_remaining with
the value 1 instead of 0.  When bio_inc_remaining() occurred during step
3 it brought it to a value of 2.  When the second bio_endio() was
called, in step 4 above, it should've called the original bi_end_io but
it didn't because there was an extra reference that wasn't dropped (due
to atomic operations being optimized away since BIO_CHAIN wasn't set
upfront).

Fix this issue by removing the __bi_remaining management complexity for
all callers that use the above pattern -- bio_chain() is the only
interface that _needs_ to be concerned with __bi_remaining.  For the
above pattern callers just expect the bi_end_io they set to get called!
Remove bio_endio_nodec() and also remove all bio_inc_remaining() calls
that aren't associated with the bio_chain() interface.

Also, the bio_inc_remaining() interface has been moved local to bio.c.

Fixes: c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for non-chains")
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

326e1dbb

06 5月, 2015 1 次提交

bio: skip atomic inc/dec of ->bi_cnt for most use cases · dac56212

由 Jens Axboe 提交于 4月 17, 2015

Struct bio has a reference count that controls when it can be freed.
Most uses cases is allocating the bio, which then returns with a
single reference to it, doing IO, and then dropping that single
reference. We can remove this atomic_dec_and_test() in the completion
path, if nobody else is holding a reference to the bio.

If someone does call bio_get() on the bio, then we flag the bio as
now having valid count and that we must properly honor the reference
count when it's being put.
Tested-by: NRobert Elliott <elliott@hp.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

dac56212

24 11月, 2014 1 次提交

md/bcache: use generic io stats accounting functions to simplify io stat accounting · aae4933d

由 Gu Zheng 提交于 11月 24, 2014

Use generic io stats accounting help functions (generic_{start,end}_io_acct)
to simplify io stat accounting.
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Acked-by: NKent Overstreet <kmo@datera.io>
Signed-off-by: NJens Axboe <axboe@fb.com>

aae4933d

05 10月, 2014 1 次提交

block: disable entropy contributions for nonrot devices · b277da0a

由 Mike Snitzer 提交于 10月 04, 2014

Clear QUEUE_FLAG_ADD_RANDOM in all block drivers that set
QUEUE_FLAG_NONROT.

Historically, all block devices have automatically made entropy
contributions.  But as previously stated in commit e2e1a148 ("block: add
sysfs knob for turning off disk entropy contributions"):
    - On SSD disks, the completion times aren't as random as they
      are for rotational drives. So it's questionable whether they
      should contribute to the random pool in the first place.
    - Calling add_disk_randomness() has a lot of overhead.

There are more reliable sources for randomness than non-rotational block
devices.  From a security perspective it is better to err on the side of
caution than to allow entropy contributions from unreliable "random"
sources.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

b277da0a

05 8月, 2014 22 次提交

bcache: Drop unneeded blk_sync_queue() calls · 0781c874

由 Kent Overstreet 提交于 7月 07, 2014

this is needed for the queue/block device we created (it's done by
blk_cleanup_queue() which we do call) - but calling it for the block devices we
only opened is pointless.

Change-Id: I53dfded14ed15b9581d10ca8399d5e1b3abbf9f2

0781c874

bcache: add mutex lock for bch_is_open · 789d21db

由 Jianjian Huo 提交于 7月 13, 2014

Since bch_is_open will iterate linked list bch_cache_sets and
uncached_devices, it needs bch_register_lock.
Signed-off-by: NJianjian Huo <samuel.huo@gmail.com>

789d21db

bcache: Correct printing of btree_gc_max_duration_ms · 5b25abad

由 Surbhi Palande 提交于 4月 17, 2014

time_stats::btree_gc_max_duration_mc is not bit shifted by 8

Fixes BUG #138

Change-Id: I44fc6e1d0579674016acc533f1a546b080e5371a
Signed-off-by: NSurbhi Palande <sap@daterainc.com>

5b25abad

bcache: try to set b->parent properly · 2452cc89

由 Slava Pestov 提交于 7月 12, 2014

bcache_flash_dev.ktest would reliably crash with 8k and 16k bucket size
before; now it passes.

Change-Id: Ib542232235e39298c3a7548fe52b645cabb823d1

2452cc89

bcache: fix memory corruption in init error path · c9a78332

由 Slava Pestov 提交于 6月 19, 2014

If register_cache_set() failed, we would touch ca->set after
it had already been freed. Also, fix an assertion to catch
this.

Change-Id: I748e5f5b223e2d9b2602075dec2f997cced2394d

c9a78332

S
bcache: fix crash with incomplete cache set · bf0c55c9
由 Slava Pestov 提交于 7月 11, 2014
```
Change-Id: I6abde52afe917633480caaf4e2518f42a816d886
```
bf0c55c9
K
bcache: Fix more early shutdown bugs · d83353b3
由 Kent Overstreet 提交于 6月 11, 2014
```
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
```
d83353b3

bcache: fix use-after-free in btree_gc_coalesce() · 400ffaa2

由 Slava Pestov 提交于 7月 12, 2014

If we goto out_nocoalesce after we free new_nodes[0], we end up freeing
new_nodes[0] again. This was generating a lockdep warning. The fix is
to set new_nodes[0] to NULL, since the out_nocoalesce path safely
ignores NULL entries in the new_nodes array.

This regression was introduced in 2d7f9531.

Change-Id: I76564d7257800583214376b4bacf236cda90c89c

400ffaa2

bcache: Fix an infinite loop in journal replay · 6b708de6

由 Kent Overstreet 提交于 6月 02, 2014

When running with multiple cache devices, if one of the devices has a completely
empty journal but we'd already found some journal entries on a previosu device
we'd go into an infinite loop.

Change-Id: I1dcdc0d738192746de28f40e8b08825b0dea5e2b
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

6b708de6

S
bcache: fix crash in bcache_btree_node_alloc_fail tracepoint · 913dc33f
由 Slava Pestov 提交于 5月 23, 2014
```
'b' was NULL.

Change-Id: Icac0fd04afa2d23f213d96d51afd53374e6dd0c0
```
913dc33f
S
bcache: bcache_write tracepoint was crashing · 60ae81ee
由 Slava Pestov 提交于 5月 22, 2014
```
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
```
60ae81ee
S
bcache: fix typo in bch_bkey_equal_header · 8e094808
由 Slava Pestov 提交于 6月 30, 2014
```
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
```
8e094808

bcache: Allocate bounce buffers with GFP_NOWAIT · 501d52a9

由 Kent Overstreet 提交于 5月 19, 2014

There's no point in blocking on these allocations, since our fallback paths will
probably go faster than blocking.

Change-Id: I733ca202c25cb36bde02607a0a60552229a4241c

501d52a9

bcache: Make sure to pass GFP_WAIT to mempool_alloc() · bcf090e0

由 Kent Overstreet 提交于 5月 19, 2014

this was very wrong - mempool_alloc() only guarantees success with GFP_WAIT.
bcache uses GFP_NOWAIT in various other places where we have a fallback,
circuits must've gotten crossed when writing this code or something.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

bcf090e0

bcache: fix uninterruptible sleep in writeback thread · 9e5c3535

由 Slava Pestov 提交于 5月 01, 2014

There were two issues here:

- writeback thread did not start until the device first became dirty
- writeback thread used uninterruptible sleep once running

Without this patch I see kernel warnings printed and a load average of
1.52 after booting my test VM. With this patch the warnings are gone and
the load average is near 0.00 as expected.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

9e5c3535

bcache: wait for buckets when allocating new btree root · c5aa4a31

由 Slava Pestov 提交于 4月 21, 2014

Tested:
- sometimes bcache_tier test would hang on startup with a failure
  to allocate the btree root -- no longer seeing this
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

c5aa4a31

S
bcache: fix crash on shutdown in passthrough mode · a664d0f0
由 Slava Pestov 提交于 5月 20, 2014
```
We never started the writeback thread in this case, so don't stop it.
```
a664d0f0
S

bcache: fix lockdep warnings on shutdown · e5112201
由 Slava Pestov 提交于 4月 29, 2014

e5112201
S

bcache allocator: send discards with correct size · 8b326d3a
由 Slava Pestov 提交于 4月 21, 2014

8b326d3a

bcache: Fix to remove the rcu_sched stalls. · dbd810ab

由 Surbhi Palande 提交于 4月 10, 2014

while loop was executing infinitely.
This fix ends the while loop gracefully.
Signed-off-by: NSurbhi Palande <sap@daterainc.com>
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

dbd810ab

bcache: Fix a journal replay bug · 9aa61a99

由 Kent Overstreet 提交于 4月 10, 2014

journal replay wansn't validating pointers with bch_extent_invalid() before
derefing, fixed
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

9aa61a99

bcache: Fix a bug when detaching · 5b1016e6

由 Kent Overstreet 提交于 3月 19, 2014

After detaching a backing device from a cache set, a bit wasn't getting
reset meaning the second detach wouldn't work correctly.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

5b1016e6

18 4月, 2014 1 次提交

arch: Mass conversion of smp_mb__*() · 4e857c58

由 Peter Zijlstra 提交于 3月 17, 2014

Mostly scripted conversion of the smp_mb__* barriers.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

4e857c58

19 3月, 2014 3 次提交

bcache: remove nested function usage · cb851149

由 John Sheu 提交于 3月 17, 2014

Uninlined nested functions can cause crashes when using ftrace, as they don't
follow the normal calling convention and confuse the ftrace function graph
tracer as it examines the stack.

Also, nested functions are supported as a gcc extension, but may fail on other
compilers (e.g. llvm).
Signed-off-by: NJohn Sheu <john.sheu@gmail.com>

cb851149

bcache: Kill bucket->gc_gen · 3a2fd9d5

由 Kent Overstreet 提交于 2月 27, 2014

gc_gen was a temporary used to recalculate last_gc, but since we only need
bucket->last_gc when gc isn't running (gc_mark_valid = 1), we can just update
last_gc directly.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

3a2fd9d5

bcache: Kill unused freelist · 2531d9ee

由 Kent Overstreet 提交于 3月 17, 2014

This was originally added as at optimization that for various reasons isn't
needed anymore, but it does add a lot of nasty corner cases (and it was
responsible for some recently fixed bugs). Just get rid of it now.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

2531d9ee

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功