提交 · cc23196631fbcd1bc3eafedbb712413fdbf946a3 · openanolis / cloud-kernel

11 11月, 2013 25 次提交

bcache: Clean up cache_lookup_fn · cc231966

由 Kent Overstreet 提交于 7月 24, 2013

There was some looping in submit_partial_cache_hit() and
submit_partial_cache_hit() that isn't needed anymore - originally, we
wouldn't necessarily process the full hit or miss all at once because
when splitting the bio, we took into account the restrictions of the
device we were sending it to.

But, device bio size restrictions are now handled elsewhere, with a
wrapper around generic_make_request() - so that looping has been
unnecessary for awhile now and we can now do quite a bit of cleanup.

And if we trim the key we're reading from to match the subset we're
actually reading, we don't have to explicitly calculate bi_sector
anymore. Neat.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

cc231966

bcache: Convert bch_btree_read_async() to bch_btree_map_keys() · 2c1953e2

由 Kent Overstreet 提交于 7月 24, 2013

This is a fairly straightforward conversion, mostly reshuffling -
op->lookup_done goes away, replaced by MAP_DONE/MAP_CONTINUE. And the
code for handling cache hits and misses wasn't really btree code, so it
gets moved to request.c.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

2c1953e2

bcache: Move some stuff to btree.c · df8e8970

由 Kent Overstreet 提交于 7月 24, 2013

With the new btree_map() functions, we don't need to export the stuff
needed for traversing the btree anymore.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

df8e8970

bcache: Add btree_map() functions · 48dad8ba

由 Kent Overstreet 提交于 9月 10, 2013

Lots of stuff has been open coding its own btree traversal - which is
generally pretty simple code, but there are a few subtleties.

This adds new new functions, bch_btree_map_nodes() and
bch_btree_map_keys(), which do the traversal for you. Everything that's
open coding btree traversal now (with the exception of garbage
collection) is slowly going to be converted to these two functions;
being able to write other code at a higher level of abstraction  is a
big improvement w.r.t. overall code quality.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

48dad8ba

bcache: Convert writeback to a kthread · 5e6926da

由 Kent Overstreet 提交于 7月 24, 2013

This simplifies the writeback flow control quite a bit - previously, it
was conceptually two coroutines, refill_dirty() and read_dirty(). This
makes the code quite a bit more straightforward.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

5e6926da

bcache: Convert gc to a kthread · 72a44517

由 Kent Overstreet 提交于 10月 24, 2013

We needed a dedicated rescuer workqueue for gc anyways... and gc was
conceptually a dedicated thread, just one that wasn't running all the
time. Switch it to a dedicated thread to make the code a bit more
straightforward.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

72a44517

bcache: Convert bucket_wait to wait_queue_head_t · 35fcd848

由 Kent Overstreet 提交于 7月 24, 2013

At one point we did do fancy asynchronous waiting stuff with
bucket_wait, but that's all gone (and bucket_wait is used a lot less
than it used to be). So use the standard primitives.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

35fcd848

bcache: Convert try_wait to wait_queue_head_t · e8e1d468

由 Kent Overstreet 提交于 7月 24, 2013

We never waited on c->try_wait asynchronously, so just use the standard
primitives.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

e8e1d468

bcache: Move keylist out of btree_op · 0b93207a

由 Kent Overstreet 提交于 7月 24, 2013

Slowly working on pruning struct btree_op - the aim is for it to only
contain things that are actually necessary for traversing the btree.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

0b93207a

bcache: Refactor journalling flow control · a34a8bfd

由 Kent Overstreet 提交于 10月 24, 2013

Making things less asynchronous that don't need to be - bch_journal()
only has to block when the journal or journal entry is full, which is
emphatically not a fast path. So make it a normal function that just
returns when it finishes, to make the code and control flow easier to
follow.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

a34a8bfd

K
bcache: Refactor read request code a bit · cdd972b1
由 Kent Overstreet 提交于 9月 10, 2013
```
More refactoring, and renaming.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
```
cdd972b1

bcache: Refactor request_write() · 84f0db03

由 Kent Overstreet 提交于 7月 24, 2013

Try to improve some of the naming a bit to be more consistent, and also
improve the flow of control in request_write() a bit.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

84f0db03

K
bcache: Clean up keylist code · c2f95ae2
由 Kent Overstreet 提交于 7月 24, 2013
```
More random refactoring.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
```
c2f95ae2

bcache: Add explicit keylist arg to btree_insert() · 4f3d4014

由 Kent Overstreet 提交于 9月 10, 2013

Some refactoring - better to explicitly pass stuff around instead of
having it all in the "big bag of state", struct btree_op. Going to prune
struct btree_op quite a bit over time.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

4f3d4014

bcache: Convert btree_insert_check_key() to btree_insert_node() · e7c590eb

由 Kent Overstreet 提交于 9月 10, 2013

This was the main point of all this refactoring - now,
btree_insert_check_key() won't fail just because the leaf node happened
to be full.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

e7c590eb

bcache: Insert multiple keys at a time · 403b6cde

由 Kent Overstreet 提交于 7月 24, 2013

We'll often end up with a list of adjacent keys to insert -
because bch_data_insert() may have to fragment the data it writes.

Originally, to simplify things and avoid having to deal with corner
cases bch_btree_insert() would pass keys from this list one at a time to
btree_insert_recurse() - mainly because the list of keys might span leaf
nodes, so it was easier this way.

With the btree_insert_node() refactoring, it's now a lot easier to just
pass down the whole list and have btree_insert_recurse() iterate over
leaf nodes until it's done.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

403b6cde

bcache: Add btree_insert_node() · 26c949f8

由 Kent Overstreet 提交于 9月 10, 2013

The flow of control in the old btree insertion code was rather -
backwards; we'd recurse down the btree (in btree_insert_recurse()), and
then if we needed to split the keys to be inserted into the parent node
would be effectively returned up to btree_insert_recurse(), which would
notice there was more work to do and finish the insertion.

The main problem with this was that the full logic for btree insertion
could only be used by calling btree_insert_recurse; if you'd gotten to a
btree leaf some other way and had a key to insert, if it turned out that
node needed to be split you were SOL.

This inverts the flow of control so btree_insert_node() does _full_
btree insertion, including splitting - and takes a (leaf) btree node to
insert into as a parameter.

This means we can now _correctly_ handle cache misses - for cache
misses, we need to insert a fake "check" key into the btree when we
discover we have a cache miss - while we still have the btree locked.
Previously, if the btree node was full inserting a cache miss would just
fail.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

26c949f8

bcache: Explicitly track btree node's parent · d6fd3b11

由 Kent Overstreet 提交于 7月 24, 2013

This is prep work for the reworked btree insertion code.

The way we set b->parent is ugly and hacky... the problem is, when
btree_split() or garbage collection splits or rewrites a btree node, the
parent changes for all its (potentially already cached) children.

I may change this later and add some code to look through the btree node
cache and find all our cached child nodes and change the parent pointer
then...
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

d6fd3b11

bcache: Remove unnecessary check in should_split() · 8304ad4d

由 Kent Overstreet 提交于 7月 24, 2013

Checking i->seq was redundant, because since ages ago we always
initialize the new bset when advancing b->written
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

8304ad4d

bcache: Stripe size isn't necessarily a power of two · 2d679fc7

由 Kent Overstreet 提交于 8月 17, 2013

Originally I got this right... except that the divides didn't use
do_div(), which broke 32 bit kernels. When I went to fix that, I forgot
that the raid stripe size usually isn't a power of two... doh
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

2d679fc7

bcache: Add on error panic/unregister setting · 77c320eb

由 Kent Overstreet 提交于 7月 11, 2013

Works kind of like the ext4 setting, to panic or remount read only on
errors.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

77c320eb

bcache: Use blkdev_issue_discard() · 49b1212d

由 Kent Overstreet 提交于 7月 24, 2013

The old asynchronous discard code was really a relic from when all the
allocation code was asynchronous - now that allocation runs out of a
dedicated thread there's no point in keeping around all that complicated
machinery.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

49b1212d

bcache: Fix a lockdep splat · dd9ec84d

由 Kent Overstreet 提交于 10月 24, 2013

bch_keybuf_del() takes a spinlock that can't be taken in interrupt context -
whoops. Fortunately, this code isn't enabled by default (you have to toggle a
sysfs thing).
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

dd9ec84d

K

bcache: Fix a journalling performance bug · 7857d5d4
由 Kent Overstreet 提交于 10月 08, 2013

7857d5d4

bcache: Fix dirty_data accounting · 1fa8455d

由 Kent Overstreet 提交于 11月 10, 2013

Dirty data accounting wasn't quite right - firstly, we were adding the key we're
inserting after it could have merged with another dirty key already in the
btree, and secondly we could sometimes pass the wrong offset to
bcache_dev_sectors_dirty_add() for dirty data we were overwriting - which is
important when tracking dirty data by stripe.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10

1fa8455d

23 10月, 2013 1 次提交

bcache: Fixed incorrect order of arguments to bio_alloc_bioset() · d4eddd42

由 Kent Overstreet 提交于 10月 22, 2013

Signed-off-by: NKent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d4eddd42

11 10月, 2013 1 次提交

bcache: Fix a null ptr deref regression · 2fe80d3b

由 Kent Overstreet 提交于 10月 10, 2013

Commit c0f04d88 ("bcache: Fix flushes in writeback mode") was fixing
a reported data corruption bug, but it seems some last minute
refactoring or rebasing introduced a null pointer deref.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Reported-by: NGabriel de Perthuis <g2p.code@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2fe80d3b

25 9月, 2013 10 次提交

bcache: Fix flushes in writeback mode · c0f04d88

由 Kent Overstreet 提交于 9月 23, 2013

In writeback mode, when we get a cache flush we need to make sure we
issue a flush to the backing device.

The code for sending down an extra flush was wrong - by cloning the bio
we were probably getting flags that didn't make sense for a bare flush,
and also the old code was firing for FUA bios, for which we don't need
to send a flush to the backing device.

This was causing data corruption somehow - the mechanism was never
determined, but this patch fixes it for the users that were seeing it.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c0f04d88

bcache: Fix for handling overlapping extents when reading in a btree node · 84786438

由 Kent Overstreet 提交于 9月 23, 2013

btree_sort_fixup() was overly clever, because it was trying to avoid
pulling a key off the btree iterator in more than one place.

This led to a really obscure bug where we'd break early from the loop in
btree_sort_fixup() if the current key overlapped with keys in more than
one older set, and the next key it overlapped with was zero size.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

84786438

bcache: Fix a shrinker deadlock · a698e08c

由 Kent Overstreet 提交于 9月 23, 2013

GFP_NOIO means we could be getting called recursively - mca_alloc() ->
mca_data_alloc() - definitely can't use mutex_lock(bucket_lock) then.
Whoops.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a698e08c

bcache: Fix a dumb CPU spinning bug in writeback · 79e3dab9

由 Kent Overstreet 提交于 9月 23, 2013

schedule_timeout() != schedule_timeout_uninterruptible()
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

79e3dab9

bcache: Fix a flush/fua performance bug · 1394d676

由 Kent Overstreet 提交于 9月 23, 2013

bch_journal_meta() was missing the flush to make the journal write
actually go down (instead of waiting up to journal_delay_ms)...

Whoops
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1394d676

bcache: Fix a writeback performance regression · c2a4f318

由 Kent Overstreet 提交于 9月 23, 2013

Background writeback works by scanning the btree for dirty data and
adding those keys into a fixed size buffer, then for each dirty key in
the keybuf writing it to the backing device.

When read_dirty() finishes and it's time to scan for more dirty data, we
need to wait for the outstanding writeback IO to finish - they still
take up slots in the keybuf (so that foreground writes can check for
them to avoid races) - without that wait, we'll continually rescan when
we'll be able to add at most a key or two to the keybuf, and that takes
locks that starves foreground IO. Doh.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c2a4f318

bcache: Correct printf()-style format length modifier · 61cbd250

由 Geert Uytterhoeven 提交于 9月 23, 2013

Fix

drivers/md/bcache/btree.c: In function ‘bch_btree_node_read’:
drivers/md/bcache/btree.c:259: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 3 has type ‘size_t’
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

61cbd250

bcache: Fix for when no journal entries are found · c426c4fd

由 Kent Overstreet 提交于 9月 23, 2013

The journal replay code didn't handle this case, causing it to go into
an infinite loop...
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c426c4fd

bcache: Strip endline when writing the label through sysfs · aee6f1cf

由 Gabriel de Perthuis 提交于 9月 23, 2013

sysfs attributes with unusual characters have crappy failure modes
in Squeeze (udev 164); later versions of udev are unaffected.

This should make these characters more unusual.
Signed-off-by: NGabriel de Perthuis <g2p.code@gmail.com>
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

aee6f1cf

bcache: Fix a dumb journal discard bug · 6d9d21e3

由 Kent Overstreet 提交于 9月 23, 2013

That switch statement was obviously wrong, leading to some sort of weird
spinning on rare occasion with discards enabled...
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6d9d21e3

11 9月, 2013 1 次提交

drivers: convert shrinkers to new count/scan API · 7dc19d5a

由 Dave Chinner 提交于 8月 28, 2013

Convert the driver shrinkers to the new API.  Most changes are compile
tested only because I either don't have the hardware or it's staging
stuff.

FWIW, the md and android code is pretty good, but the rest of it makes me
want to claw my eyes out.  The amount of broken code I just encountered is
mind boggling.  I've added comments explaining what is broken, but I fear
that some of the code would be best dealt with by being dragged behind the
bike shed, burying in mud up to it's neck and then run over repeatedly
with a blunt lawn mower.

Special mention goes to the zcache/zcache2 drivers.  They can't co-exist
in the build at the same time, they are under different menu options in
menuconfig, they only show up when you've got the right set of mm
subsystem options configured and so even compile testing is an exercise in
pulling teeth.  And that doesn't even take into account the horrible,
broken code...

[glommer@openvz.org: fixes for i915, android lowmem, zcache, bcache]
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NGlauber Costa <glommer@openvz.org>
Acked-by: NMel Gorman <mgorman@suse.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7dc19d5a

12 7月, 2013 2 次提交

bcache: Allocation kthread fixes · 79826c35

由 Kent Overstreet 提交于 7月 10, 2013

The alloc kthread should've been using try_to_freeze() - and also there
was the potential for the alloc kthread to get woken up after it had
shut down, which would have been bad.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

79826c35

bcache: Fix GC_SECTORS_USED() calculation · 29ebf465

由 Kent Overstreet 提交于 7月 11, 2013

Part of the job of garbage collection is to add up however many sectors
of live data it finds in each bucket, but that doesn't work very well if
it doesn't reset GC_SECTORS_USED() when it starts. Whoops.

This wouldn't have broken anything horribly, but allocation tries to
preferentially reclaim buckets that are mostly empty and that's not
gonna work with an incorrect GC_SECTORS_USED() value.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10

29ebf465

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功