提交 · ee811287c9f241641899788cbfc9d70ed96ba3a5 · openeuler / Kernel

09 1月, 2014 21 次提交

bcache: Rename/shuffle various code around · ee811287

由 Kent Overstreet 提交于 12月 17, 2013

More work to disentangle bset.c from the rest of the code:
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

ee811287

bcache: Add struct bset_sort_state · 67539e85

由 Kent Overstreet 提交于 9月 10, 2013

More disentangling bset.c from the rest of the bcache code - soon, the
sorting routines won't have any dependencies on any outside structs.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

67539e85

bcache: Split out sort_extent_cmp() · 911c9610

由 Kent Overstreet 提交于 7月 28, 2013

Only use extent comparison for comparing extents, so we're not using
START_KEY() on other key types (i.e. btree pointers)
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

911c9610

bcache: Bkey indexing renaming · fafff81c

由 Kent Overstreet 提交于 12月 17, 2013

More refactoring:

node() -> bset_bkey_idx()
end() -> bset_bkey_last()
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

fafff81c

bcache: Make bch_keylist_realloc() take u64s, not nptrs · 085d2a3d

由 Kent Overstreet 提交于 11月 11, 2013

Getting away from KEY_PTRS and moving toward KEY_U64s - and getting rid of magic
2s

Also - split out the part that checks against journal entry size so as to avoid
a dependancy on struct cache_set in bset.c
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

085d2a3d

bcache: Remove/fix some header dependencies · 9a02b7ee

由 Kent Overstreet 提交于 12月 20, 2013

In the process of disentagling/libraryizing bset.c from the rest of the
bcache code.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

9a02b7ee

bcache: Use a mempool for mergesort temporary space · 0a451145

由 Kent Overstreet 提交于 12月 18, 2013

It was a single element mempool before, it's slightly cleaner to just use a real
mempool.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

0a451145

bcache: Btree verify code improvements · 78b77bf8

由 Kent Overstreet 提交于 12月 17, 2013

Used this fixed code to find and fix the bug fixed by
a4d885097b0ac0cd1337f171f2d4b83e946094d4.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

78b77bf8

bcache: kill index() · 88b9f8c4

由 Kent Overstreet 提交于 12月 17, 2013

That was a terrible name for a macro, add some better helpers to replace it.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

88b9f8c4

K
bcache: Trivial error handling fix · 5c41c8a7
由 Kent Overstreet 提交于 7月 08, 2013
```
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
```
5c41c8a7

bcache/md: Use raid stripe size · c78afc62

由 Kent Overstreet 提交于 7月 11, 2013

Now that we've got code for raid5/6 stripe awareness, bcache just needs
to know about the stripes and when writing partial stripes is expensive
- we probably don't want to enable this optimization for raid1 or 10,
even though they have stripes. So add a flag to queue_limits.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

c78afc62

bcache: Do bkey_put() in btree_split() error path · 5f5837d2

由 Kent Overstreet 提交于 12月 16, 2013

This error path shouldn't have been hit in practice.. and we've got reworked
reserve code coming soon so that it shouldn't _ever_ be bit... but if we've got
code for this error path it should be correct.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

5f5837d2

bcache: Rework allocator reserves · 78365411

由 Kent Overstreet 提交于 12月 17, 2013

We need a reserve for allocating buckets for new btree nodes - and now that
we've got multiple btrees, it really needs to be per btree.

This reworks the reserves so we've got separate freelists for each reserve
instead of watermarks, which seems to make things a bit cleaner, and it adds
some code so that btree_split() can make sure the reserve is available before it
starts.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

78365411

K
bcache: kill closure locking code · 1dd13c8d
由 Kent Overstreet 提交于 12月 20, 2013
```
Also flesh out the documentation a bit
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
```
1dd13c8d
K
bcache: kill closure locking usage · cb7a583e
由 Kent Overstreet 提交于 12月 16, 2013
```
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
```
cb7a583e

bcache: Zero less memory · a5ae4300

由 Kent Overstreet 提交于 9月 10, 2013

Another minor performance optimization
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

a5ae4300

bcache: Don't touch bucket gen for dirty ptrs · d56d000a

由 Kent Overstreet 提交于 8月 09, 2013

Unnecessary since a bucket that has dirty pointers pointing to it can
never be invalidated - and skipping it is a measurable performance
boost, since the bucket gen will usually be a cache miss.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

d56d000a

K
bcache: Minor btree cache fix · b0f32a56
由 Kent Overstreet 提交于 12月 10, 2013
```
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
```
b0f32a56

bcache: Performance fix for when journal entry is full · 5775e213

由 Kent Overstreet 提交于 12月 10, 2013

We were unnecessarily waiting on a journal write to complete when we just needed
to start a journal write and start setting up the next one.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

5775e213

bcache: Minor journal fix · b3fa7e77

由 Kent Overstreet 提交于 8月 05, 2013

The real fix is where we check the bytes we need against how much is
remaining - we also need to check for a journal entry bigger than our
buffer, we'll never write those and it would be bad if we tried to read
one.

Also improve the diagnostic messages.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

b3fa7e77

bcache: Data corruption fix · ef71ec00

由 Kent Overstreet 提交于 12月 17, 2013

The code that handles overlapping extents that we've just read back in from disk
was depending on the behaviour of the code that handles overlapping extents as
we're inserting into a btree node in the case of an insert that forced an
existing extent to be split: on insert, if we had to split we'd also insert a
new extent to represent the top part of the old extent - and then that new
extent would get written out.

The code that read the extents back in thus not bother with splitting extents -
if it saw an extent that ovelapped in the middle of an older extent, it would
trim the old extent to only represent the bottom part, assuming that the
original insert would've inserted a new extent to represent the top part.

I still haven't figured out _how_ it can happen, but I'm now pretty convinced
(and testing has confirmed) that there's some kind of an obscure corner case
(probably involving extent merging, and multiple overwrites in different sets)
that breaks this. The fix is to change the mergesort fixup code to split extents
itself when required.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10

ef71ec00

17 12月, 2013 10 次提交

bcache: New writeback PD controller · 16749c23

由 Kent Overstreet 提交于 11月 11, 2013

The old writeback PD controller could get into states where it had throttled all
the way down and take way too long to recover - it was too complicated to really
understand what it was doing.

This rewrites a good chunk of it to hopefully be simpler and make more sense,
and it also pays more attention to units which should make the behaviour a bit
easier to understand.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

16749c23

bcache: bugfix for race between moving_gc and bucket_invalidate · 6d3d1a9c

由 Kent Overstreet 提交于 12月 16, 2013

There is a possibility for a bucket to be invalidated by the allocator
while moving_gc was copying it's contents to another bucket, if the
bucket only held cached data. To prevent this moving checks for
a stale ptr (to an invalidated bucket), before and after reads.
It it finds one, it simply ignores moving that data. This only
affects bcache if the moving_gc was turned on, note that it's
off by default.
Signed-off-by: NNicholas Swenson <nks@daterainc.com>
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

6d3d1a9c

bcache: fix for gc and writeback race · bf0a628a

由 Nicholas Swenson 提交于 11月 26, 2013

Garbage collector needs to check keys in the writeback keybuf to
make sure it's not invalidating buckets to which the writeback
keys point to.
Signed-off-by: NNicholas Swenson <nks@daterainc.com>
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

bf0a628a

bcache: bugfix - moving_gc now moves only correct buckets · 981aa8c0

由 Nicholas Swenson 提交于 11月 07, 2013

Removed gc_move_threshold because picking buckets only by
threshold could lead moving extra buckets (ei. if there are
buckets at the threshold that aren't supposed to be moved
do to space considerations).

This is replaced by a GC_MOVE bit in the gc_mark bitmask.
Now only marked buckets get moved.
Signed-off-by: NNicholas Swenson <nks@daterainc.com>
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

981aa8c0

N
bcache: fix for gc crashing when no sectors are used · bee63f40
由 Nicholas Swenson 提交于 10月 31, 2013
```
Signed-off-by: NNicholas Swenson <nks@daterainc.com>
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
```
bee63f40

bcache: Fix heap_peek() macro · 97d11a66

由 Nicholas Swenson 提交于 10月 23, 2013

Signed-off-by: NNicholas Swenson <nks@daterainc.com>
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

97d11a66

bcache: Fix for can_attach_cache() · 9eb8ebeb

由 Nicholas Swenson 提交于 10月 22, 2013

Signed-off-by: NNicholas Swenson <nks@daterainc.com>
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

9eb8ebeb

bcache: Fix dirty_data accounting · d24a6e10

由 Kent Overstreet 提交于 11月 10, 2013

Dirty data accounting wasn't quite right - firstly, we were adding the key we're
inserting after it could have merged with another dirty key already in the
btree, and secondly we could sometimes pass the wrong offset to
bcache_dev_sectors_dirty_add() for dirty data we were overwriting - which is
important when tracking dirty data by stripe.

NOTE FOR BACKPORTERS: For 3.10 (and 3.11?) there's other accounting fixes
necessary that got squashed in with other patches; the full patch against 3.10
is 408cc2f47eeac93a, available at:
  git://evilpiepirate.org/~kent/linux-bcache.git bcache-3.10-writeback-fixes
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 2a46036..4a12b2f 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -1817,7 +1817,8 @@ static bool fix_overlapping_extents(struct btree *b, struct bkey *insert,
 			if (KEY_START(k) > KEY_START(insert) + sectors_found)
 				goto check_failed;

-			if (KEY_PTRS(replace_key) != KEY_PTRS(k))
+			if (KEY_PTRS(k) != KEY_PTRS(replace_key) ||
+			    KEY_DIRTY(k) != KEY_DIRTY(replace_key))
 				goto check_failed;

 			/* skip past gen */

d24a6e10

bcache: Use uninterruptible sleep in writeback · ce2b3f59

由 Kent Overstreet 提交于 11月 28, 2013

We're just waiting on kthread_should_stop(), nothing else, so
interruptible sleep was wrong here.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

ce2b3f59

bcache: kthread don't set writeback task to INTERUPTIBLE · f665c0f8

由 Stefan Priebe 提交于 11月 16, 2013

at the beginning (schedule_timout_interuptible) and others
do his on their own

This prevents wrong load average calculation (load of 1 per thread)
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

f665c0f8

14 12月, 2013 2 次提交

dm array: fix a reference counting bug in shadow_ablock · ed9571f0

由 Joe Thornber 提交于 12月 13, 2013

An old array block could have its reference count decremented below
zero when it is being replaced in the btree by a new array block.

The fix is to increment the old ablock's reference count just before
inserting a new ablock into the btree.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.9+

ed9571f0

dm space map: disallow decrementing a reference count below zero · 5b564d80

由 Joe Thornber 提交于 12月 13, 2013

The old behaviour, returning -EINVAL if a ref_count of 0 would be
decremented, was removed in commit f722063e ("dm space map: optimise
sm_ll_dec and sm_ll_inc").  To fix this regression we return an error
code from the mutator function pointer passed to sm_ll_mutate() and have
dec_ref_count() return -EINVAL if the old ref_count is 0.

Add a DMERR to reflect the potential seriousness of this error.

Also, add missing dm_tm_unlock() to sm_ll_mutate()'s error path.

With this fix the following dmts regression test now passes:
 dmtest run --suite cache -n /metadata_use_kernel/

The next patch fixes the higher-level dm-array code that exposed this
regression.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.12+

5b564d80

11 12月, 2013 7 次提交

dm stats: initialize read-only module parameter · 76f5bee5

由 Mikulas Patocka 提交于 12月 05, 2013

The module parameter stats_current_allocated_bytes in dm-mod is
read-only.  This parameter informs the user about memory
consumption.  It is not supposed to be changed by the user.

However, despite being read-only, this parameter can be set on
modprobe or insmod command line:
modprobe dm-mod stats_current_allocated_bytes=12345

The kernel doesn't expect that this variable can be non-zero at module
initialization and if the user sets it, it results in warning.

This patch initializes the variable in the module init routine, so
that user-supplied value is ignored.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.12+

76f5bee5

dm bufio: initialize read-only module parameters · 4cb57ab4

由 Mikulas Patocka 提交于 12月 05, 2013

Some module parameters in dm-bufio are read-only. These parameters
inform the user about memory consumption. They are not supposed to be
changed by the user.

However, despite being read-only, these parameters can be set on
modprobe or insmod command line, for example:
modprobe dm-bufio current_allocated_bytes=12345

The kernel doesn't expect that these variables can be non-zero at module
initialization and if the user sets them, it results in BUG.

This patch initializes the variables in the module init routine, so that
user-supplied values are ignored.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.2+

4cb57ab4

dm cache: actually resize cache · 08844800

由 Vincent Pelletier 提交于 11月 30, 2013

Commit f494a9c6 ("dm cache: cache
shrinking support") broke cache resizing support.

dm_cache_resize() is called with cache->cache_size before it gets
updated to new_size, so it is a no-op.  But the dm-cache superblock is
updated with the new_size even though the backing dm-array is not
resized.  Fix this by passing the new_size to dm_cache_resize().
Signed-off-by: NVincent Pelletier <plr.vincent@gmail.com>
Acked-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

08844800

dm cache policy mq: fix promotions to occur as expected · af95e7a6

由 Joe Thornber 提交于 11月 15, 2013

Micro benchmarks that repeatedly issued IO to a single block were
failing to cause a promotion from the origin device to the cache.  Fix
this by not updating the stats during map() if -EWOULDBLOCK will be
returned.

The mq policy will only update stats, consider migration, etc, once per
tick period (a unit of time established between dm-cache core and the
policies).

When the IO thread calls the policy's map method, if it would like to
migrate the associated block it returns -EWOULDBLOCK, the IO then gets
handed over to a worker thread which handles the migration.  The worker
thread calls map again, to check the migration is still needed (avoids a
race among other things).  *BUT*, before this fix, if we were still in
the same tick period the stats were already updated by the previous map
call -- so the migration would no longer be requested.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

af95e7a6

dm thin: allow pool in read-only mode to transition to read-write mode · 9b7aaa64

由 Joe Thornber 提交于 12月 04, 2013

A thin-pool may be in read-only mode because the pool's data or metadata
space was exhausted.  To allow for recovery, by adding more space to the
pool, we must allow a pool to transition from PM_READ_ONLY to PM_WRITE
mode.  Otherwise, running out of space will render the pool permanently
read-only.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

9b7aaa64

dm thin: re-establish read-only state when switching to fail mode · 5383ef3a

由 Joe Thornber 提交于 12月 04, 2013

If the thin-pool transitioned to fail mode and the thin-pool's table
were reloaded for some reason: the new table's default pool mode would
be read-write, though it will transition to fail mode during resume.

When the pool mode transitions directly from PM_WRITE to PM_FAIL we need
to re-establish the intermediate read-only state in both the metadata
and persistent-data block manager (as is usually done with the normal
pool mode transition sequence: PM_WRITE -> PM_READ_ONLY -> PM_FAIL).
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

5383ef3a

dm thin: always fallback the pool mode if commit fails · 020cc3b5

由 Joe Thornber 提交于 12月 04, 2013

Rename commit_or_fallback() to commit().  Now all previous calls to
commit() will trigger the pool mode to fallback if the commit fails.

Also, check the error returned from commit() in alloc_data_block().
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

020cc3b5

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功