提交 · 134bf30c06f057d6b8d90132e8f8b3cd2be79572 · openeuler / Kernel

27 7月, 2015 2 次提交

dm cache policy smq: fix alloc_bitset check that always evaluates as false · 134bf30c

由 Colin Ian King 提交于 7月 23, 2015

static analysis by cppcheck has found a check on alloc_bitset that
always evaluates as false and hence never finds an allocation failure:

[drivers/md/dm-cache-policy-smq.c:1689]: (warning) Logical conjunction
  always evaluates to false: !EXPR && EXPR.

Fix this by removing the incorrect mq->cache_hit_bits check
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

134bf30c

M
dm thin: return -ENOSPC when erroring retry list due to out of data space · 0a927c2f
由 Mike Snitzer 提交于 7月 21, 2015
```
Otherwise -EIO would be returned when -ENOSPC should be used
consistently.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
0a927c2f

24 7月, 2015 6 次提交

md/raid5: clear R5_NeedReplace when no longer needed. · e6030cb0

由 NeilBrown 提交于 7月 17, 2015

This flag is currently never cleared, which can in rare cases
trigger a warn-on if it is still set but the block isn't
InSync.

So clear it when it isn't need, which includes if the replacement
device has failed.
Signed-off-by: NNeilBrown <neilb@suse.com>

e6030cb0

Fix read-balancing during node failure · 90382ed9

由 Goldwyn Rodrigues 提交于 6月 24, 2015

During a node failure, We need to suspend read balancing so that the
reads are directed to the first device and stale data is not read.
Suspending writes is not required because these would be recorded and
synced eventually.

A new flag MD_CLUSTER_SUSPEND_READ_BALANCING is set in recover_prep().
area_resyncing() will respond true for the entire devices if this
flag is set and the request type is READ. The flag is cleared
in recover_done().
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Reported-By: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

90382ed9

md-cluster: fix bitmap sub-offset in bitmap_read_sb · 33e38ac6

由 Goldwyn Rodrigues 提交于 7月 01, 2015

bitmap_read_sb is modifying mddev->bitmap_info.offset. This works for
the first bitmap read. However, when multiple bitmaps need to be opened
by the same node, it ends up corrupting the offset. Fix it by using a
local variable.

Also, bitmap_read_sb is not required in bitmap_copy_from_slot since
it is called in bitmap_create. Remove bitmap_read_sb().
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

33e38ac6

md: Return error if request_module fails and returns positive value · b0c26a79

由 Goldwyn Rodrigues 提交于 7月 22, 2015

request_module() can return 256 (process exited) in some cases,
which is not as specified in the documentation before the
request_module() definition. Convert the error to -ENOENT.

The positive error number results in bitmap_create() returning
a value that is meant to be an error but doesn't look like one,
so it is dereferenced as a point and causes a crash.

(not needed for stable as this is "experimental" code)
Fixes: edb39c9d ("Introduce md_cluster_operations to handle cluster functions")
Signed-off-By: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

b0c26a79

md: Skip cluster setup in case of error while reading bitmap · f7357273

由 Goldwyn Rodrigues 提交于 7月 22, 2015

If the bitmap read fails, the error code set is -EINVAL. However,
we don't check for errors and go ahead with cluster_setup.
Skip the cluster setup in case of error.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

f7357273

md/raid1: fix test for 'was read error from last working device'. · 34cab6f4

由 NeilBrown 提交于 7月 24, 2015

When we get a read error from the last working device, we don't
try to repair it, and don't fail the device.  We simple report a
read error to the caller.

However the current test for 'is this the last working device' is
wrong.
When there is only one fully working device, it assumes that a
non-faulty device is that device.  However a spare which is rebuilding
would be non-faulty but so not the only working device.

So change the test from "!Faulty" to "In_sync".  If ->degraded says
there is only one fully working device and this device is in_sync,
this must be the one.

This bug has existed since we allowed read_balance to read from
a recovering spare in v3.0
Reported-and-tested-by: NAlexander Lyakas <alex.bolshoy@gmail.com>
Fixes: 76073054 ("md/raid1: clean up read_balance.")
Cc: stable@vger.kernel.org (v3.0+)
Signed-off-by: NNeilBrown <neilb@suse.com>

34cab6f4

23 7月, 2015 1 次提交

md: Skip cluster setup for dm-raid · d3b178ad

由 Goldwyn Rodrigues 提交于 7月 22, 2015

There is a bug that the bitmap superblock isn't initialised properly for
dm-raid, so a new field can have garbage in new fields.
(dm-raid does initialisation in the kernel - md initialised the
 superblock in mdadm).

This means that for dm-raid we cannot currently trust the new ->nodes
field. So:
 - use __GFP_ZERO to initialise the superblock properly for all new
    arrays
 - initialise all fields in bitmap_info in bitmap_new_disk_sb
 - ignore ->nodes for dm arrays (yes, this is a hack)

This bug exposes dm-raid to bug in the (still experimental) md-cluster
code, so it is suitable for -stable.  It does cause crashes.

References: https://bugzilla.kernel.org/show_bug.cgi?id=100491
Cc: stable@vger.kernel.org (v4.1)
Signed-off-By: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

d3b178ad

22 7月, 2015 3 次提交

md: flush ->event_work before stopping array. · ee5d004f

由 NeilBrown 提交于 7月 22, 2015

The 'event_work' worker used by dm-raid may still be running
when the array is stopped.  This can result in an oops.

So flush the workqueue on which it is run after detaching
and before destroying the device.
Reported-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.com>
Cc: stable@vger.kernel.org (2.6.38+ please delay 2 weeks after -final release)
Fixes: 9d09e663 ("dm: raid456 basic support")

ee5d004f

md/raid10: always set reshape_safe when initializing reshape_position. · 299b0685

由 NeilBrown 提交于 7月 06, 2015

'reshape_position' tracks where in the reshape we have reached.
'reshape_safe' tracks where in the reshape we have safely recorded
in the metadata.

These are compared to determine when to update the metadata.
So it is important that reshape_safe is initialised properly.
Currently it isn't.  When starting a reshape from the beginning
it usually has the correct value by luck.  But when reducing the
number of devices in a RAID10, it has the wrong value and this leads
to the metadata not being updated correctly.
This can lead to corruption if the reshape is not allowed to complete.

This patch is suitable for any -stable kernel which supports RAID10
reshape, which is 3.5 and later.

Fixes: 3ea7daa5 ("md/raid10: add reshape support")
Cc: stable@vger.kernel.org (v3.5+ please wait for -final to be out for 2 weeks)
Signed-off-by: NNeilBrown <neilb@suse.com>

299b0685

md/raid5: avoid races when changing cache size. · 2d5b569b

由 NeilBrown 提交于 7月 06, 2015

Cache size can grow or shrink due to various pressures at
any time.  So when we resize the cache as part of a 'grow'
operation (i.e. change the size to allow more devices) we need
to blocks that automatic growing/shrinking.

So introduce a mutex.  auto grow/shrink uses mutex_trylock()
and just doesn't bother if there is a blockage.
Resizing the whole cache holds the mutex to ensure that
the correct number of new stripes is allocated.

This bug can result in some stripes not being freed when an
array is stopped.  This leads to the kmem_cache not being
freed and a subsequent array can try to use the same kmem_cache
and get confused.

Fixes: edbe83ab ("md/raid5: allow the stripe_cache to grow and shrink.")
Cc: stable@vger.kernel.org (4.1 - please delay until 2 weeks after release of 4.2)
Signed-off-by: NNeilBrown <neilb@suse.com>

2d5b569b

17 7月, 2015 3 次提交

dm cache: avoid calls to prealloc_free_structs() if possible · 665022d7

由 Mike Snitzer 提交于 7月 16, 2015

If no work was performed then prealloc_data_structs() wasn't ever called
so there isn't any need to call prealloc_free_structs().
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

665022d7

dm cache: avoid preallocation if no work in writeback_some_dirty_blocks() · e782eff5

由 Mike Snitzer 提交于 7月 16, 2015

Refactor writeback_some_dirty_blocks() to avoid prealloc_data_structs()
if the policy doesn't have any dirty blocks ready for writeback.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e782eff5

dm cache: do not wake_worker() in free_migration() · 386cb7cd

由 Mike Snitzer 提交于 7月 16, 2015

All methods that queue work call wake_worker() as you'd expect.
E.g. cell_defer, defer_bio, quiesce_migration (which is called by
writeback, promote, demote_then_promote, invalidate, discard, etc).
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

386cb7cd

16 7月, 2015 3 次提交

dm cache: display 'needs_check' in status if it is set · 255eac20

由 Mike Snitzer 提交于 7月 15, 2015

There is currently no way to see that the needs_check flag has been set
in the metadata.  Display 'needs_check' in the cache status if it is set
in the cache metadata.

Also, update cache documentation.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

255eac20

dm thin: display 'needs_check' in status if it is set · e4c78e21

由 Mike Snitzer 提交于 7月 15, 2015

There is currently no way to see that the needs_check flag has been set
in the metadata.  Display 'needs_check' in the thin-pool status if it is
set in the thinp metadata.

Also, update thinp documentation.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e4c78e21

dm thin: stay in out-of-data-space mode once no_space_timeout expires · bcc696fa

由 Mike Snitzer 提交于 7月 15, 2015

This fixes an issue where running out of data space would cause the
thin-pool's metadata to become read-only. There was no reason to make
metadata read-only -- calling set_pool_mode() with PM_READ_ONLY was a
misguided way to error all queued and future write IOs. We can
accomplish the same by degrading from PM_OUT_OF_DATA_SPACE to
PM_OUT_OF_DATA_SPACE with error_if_no_space enabled.

Otherwise, the use of PM_READ_ONLY could cause a race where commit() was
started before the PM_READ_ONLY transition but dm_pool_commit_metadata()
would go on to fail because the block manager had transitioned to
read-only. The return of -EPERM from dm_pool_commit_metadata(), due to
attempting to commit while in read-only mode, caused the thin-pool to
set 'needs_check' because a metadata_operation_failed(). This needless
cascade of failures makes life for users more difficult than needed.
Reported-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

bcc696fa

13 7月, 2015 1 次提交

dm: fix use after free crash due to incorrect cleanup sequence · b06075a9

由 Mikulas Patocka 提交于 7月 10, 2015

Linux 4.2-rc1 Commit 0f20972f ("dm: factor out a common
cleanup_mapped_device()") moved a common cleanup code to a separate
function.  Unfortunately, that commit incorrectly changed the order of
cleanup, so that it destroys the mapped_device's srcu structure
'io_barrier' before destroying its workqueue.

The function that is executed on the workqueue (dm_wq_work) uses the srcu
structure, thus it may use it after being freed.  That results in a
crash in the LVM test suite's mirror-vgreduce-removemissing.sh test.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Fixes: 0f20972f ("dm: factor out a common cleanup_mapped_device()")
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

b06075a9

11 7月, 2015 1 次提交

bcache: don't embed 'return' statements in closure macros · 77b5a084

由 Jens Axboe 提交于 3月 06, 2015

This is horribly confusing, it breaks the flow of the code without
it being apparent in the caller.
Signed-off-by: NJens Axboe <axboe@fb.com>
Acked-by: NChristoph Hellwig <hch@lst.de>

77b5a084

09 7月, 2015 1 次提交

Revert "dm: only run the queue on completion if congested or no requests pending" · 621739b0

由 Mike Snitzer 提交于 7月 08, 2015

This reverts commit 9a0e609e.
(Resolved a conflict during revert due to commit bfebd1cd that came
after)

This revert is motivated by a couple failure reports on request-based DM
multipath testbeds:
1) Netapp reported that their multipath fault injection test under heavy
   IO load can stall longer than 300 seconds.
2) IBM reported elevated lock contention in their testbed (likely due to
   increased back pressure due to IO not being dispatched as quickly):
   https://www.redhat.com/archives/dm-devel/2015-July/msg00057.htmlSigned-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 4.1+

621739b0

06 7月, 2015 3 次提交

dm btree: silence lockdep lock inversion in dm_btree_del() · 1c751879

由 Joe Thornber 提交于 7月 03, 2015

Allocate memory using GFP_NOIO when deleting a btree.  dm_btree_del()
can be called via an ioctl and we don't want to recurse into the FS or
block layer.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

1c751879

dm thin: allocate the cell_sort_array dynamically · a822c83e

由 Joe Thornber 提交于 7月 03, 2015

Given the pool's cell_sort_array holds 8192 pointers it triggers an
order 5 allocation via kmalloc.  This order 5 allocation is prone to
failure as system memory gets more fragmented over time.

Fix this by allocating the cell_sort_array using vmalloc.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

a822c83e

dm btree remove: fix bug in redistribute3 · 4c7e3093

由 Dennis Yang 提交于 6月 26, 2015

redistribute3() shares entries out across 3 nodes.  Some entries were
being moved the wrong way, breaking the ordering.  This manifested as a
BUG() in dm-btree-remove.c:shift() when entries were removed from the
btree.

For additional context see:
https://www.redhat.com/archives/dm-devel/2015-May/msg00113.htmlSigned-off-by: NDennis Yang <shinrairis@gmail.com>
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

4c7e3093

01 7月, 2015 2 次提交

MAINTAINERS: BCACHE: Kent Overstreet has changed email address · d1aa1ab3

由 Joe Perches 提交于 6月 30, 2015

Kent's email address in MAINTAINERS seems to be invalid.
This was his last sign-off address, so use that if appropriate.

Fix the S: status entry while there.
Signed-off-by: NJoe Perches <joe@perches.com>
Acked-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d1aa1ab3

bcache: use kvfree() in various places · 958b4338

由 Pekka Enberg 提交于 6月 30, 2015

Use kvfree() instead of open-coding it.
Signed-off-by: NPekka Enberg <penberg@kernel.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

958b4338

26 6月, 2015 4 次提交

dm cache policy smq: fix "default" version to be 1.4.0 · b5451e45

由 Mike Snitzer 提交于 6月 26, 2015

Commit bccab6a0 ("dm cache: switch the "default" cache replacement
policy from mq to smq") should've incremented the "default" policy's
version number to 1.4.0 rather than reverting to version 1.0.0.
Reported-by: NAlasdair G Kergon <agk@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

b5451e45

Revert "block, dm: don't copy bios for request clones" · 78d8e58a

由 Mike Snitzer 提交于 6月 26, 2015

This reverts commit 5f1b670d.

Justification for revert as reported in this dm-devel post:
https://www.redhat.com/archives/dm-devel/2015-June/msg00160.html

this change should not be pushed to mainline yet.

Firstly, Christoph has a newer version of the patch that fixes silent
data corruption problem:
  https://www.redhat.com/archives/dm-devel/2015-May/msg00229.html

And the new version still depends on LLDDs to always complete requests
to the end when error happens, while block API doesn't enforce such a
requirement. If the assumption is ever broken, the inconsistency between
request and bio (e.g. rq->__sector and rq->bio) will cause silent data
corruption:
  https://www.redhat.com/archives/dm-devel/2015-June/msg00022.htmlReported-by: NJunichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

78d8e58a

Revert "dm: do not allocate any mempools for blk-mq request-based DM" · 4e6e36c3

由 Mike Snitzer 提交于 6月 26, 2015

This reverts commit cbc4e3c1.
Reported-by: NJunichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

4e6e36c3

drivers/md/md.c: use strreplace() · 90a9befb

由 Rasmus Villemoes 提交于 6月 25, 2015

There's no point in starting over when we meet a '/'.  This also
eliminates a stack variable and a little .text.
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

90a9befb

25 6月, 2015 3 次提交

md: clear Blocked flag on failed devices when array is read-only. · ab16bfc7

由 Neil Brown 提交于 6月 17, 2015

The Blocked flag indicates that a device has failed but that this
fact hasn't been recorded in the metadata yet.  Writes to such
devices cannot be allowed until the metadata has been updated.

On a read-only array, the Blocked flag will never be cleared.
This prevents the device being removed from the array.

If the metadata is being handled by the kernel
(i.e. !mddev->external), then we can be sure that if the array is
switch to writable, then a metadata update will happen and will
record the failure.  So we don't need the flag set.

If metadata is externally managed, it is upto the external manager
to clear the 'blocked' flag.
Reported-by: NXiaoNi <xni@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

ab16bfc7

md: unlock mddev_lock on an error path. · 9a8c0fa8

由 NeilBrown 提交于 6月 25, 2015

This error path retuns while still holding the lock - bad.

Fixes: 6791875e ("md: make reconfig_mutex optional for writes to md sysfs files.")
Cc: stable@vger.kernel.org (v4.0+)
Signed-off-by: NNeilBrown <neilb@suse.com>

9a8c0fa8

md: clear mddev->private when it has been freed. · bd691922

由 NeilBrown 提交于 6月 25, 2015

If ->private is set when ->run is called, it is assumed to be
a 'config'  prepared as part of 'reshape'.

So it is important when we free that config, that we also clear ->private.
This is not often a problem as the mddev will normally be discarded
shortly after the config us freed.
However if an 'assemble' races with a final close, the assemble can use
the old mddev which has a stale ->private.  This leads to any of
various sorts of crashes.

So clear ->private after calling ->free().
Reported-by: NNate Clark <nate@neworld.us>
Cc: stable@vger.kernel.org (v4.0+)
Fixes: afa0f557 ("md: rename ->stop to ->free")
Signed-off-by: NNeilBrown <neilb@suse.com>

bd691922

24 6月, 2015 2 次提交

vfs: add seq_file_path() helper · 2726d566

由 Miklos Szeredi 提交于 6月 19, 2015

Turn
	seq_path(..., &file->f_path, ...);
into
	seq_file_path(..., file, ...);
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2726d566

vfs: add file_path() helper · 9bf39ab2

由 Miklos Szeredi 提交于 6月 19, 2015

Turn
	d_path(&file->f_path, ...);
into
	file_path(file, ...);
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9bf39ab2

18 6月, 2015 5 次提交

dm stats: add support for request-based DM devices · e262f347

由 Mikulas Patocka 提交于 6月 09, 2015

This makes it possible to use dm stats with DM multipath.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e262f347

dm stats: collect and report histogram of IO latencies · dfcfac3e

由 Mikulas Patocka 提交于 6月 09, 2015

Add an option to dm statistics to collect and report a histogram of
IO latencies.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

dfcfac3e

dm stats: support precise timestamps · c96aec34

由 Mikulas Patocka 提交于 6月 09, 2015

Make it possible to use precise timestamps with nanosecond granularity
in dm statistics.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

c96aec34

dm stats: fix divide by zero if 'number_of_areas' arg is zero · dd4c1b7d

由 Mikulas Patocka 提交于 6月 05, 2015

If the number_of_areas argument was zero the kernel would crash on
div-by-zero.  Add better input validation.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # v3.12+

dd4c1b7d

dm cache: switch the "default" cache replacement policy from mq to smq · bccab6a0

由 Mike Snitzer 提交于 6月 17, 2015

The Stochastic multiqueue (SMQ) policy (vs MQ) offers the promise of
less memory utilization, improved performance and increased adaptability
in the face of changing workloads.  SMQ also does not have any
cumbersome tuning knobs.

Users may switch from "mq" to "smq" simply by appropriately reloading a
DM table that is using the cache target.  Doing so will cause all of the
mq policy's hints to be dropped.  Also, performance of the cache may
degrade slightly until smq recalculates the origin device's hotspots
that should be cached.

In the future the "mq" policy will just silently make use of "smq" and
the mq code will be removed.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Acked-by: NJoe Thornber <ejt@redhat.com>

bccab6a0

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功