提交 · f94c0b6658c7edea8bc19d13be321e3860a3fa54 · openeuler / raspberrypi-kernel

25 7月, 2013 2 次提交

md/raid5: fix interaction of 'replace' and 'recovery'. · f94c0b66

由 NeilBrown 提交于 7月 22, 2013

If a device in a RAID4/5/6 is being replaced while another is being
recovered, then the writes to the replacement device currently don't
happen, resulting in corruption when the replacement completes and the
new drive takes over.

This is because the replacement writes are only triggered when
's.replacing' is set and not when the similar 's.sync' is set (which
is the case during resync and recovery - it means all devices need to
be read).

So schedule those writes when s.replacing is set as well.

In this case we cannot use "STRIPE_INSYNC" to record that the
replacement has happened as that is needed for recording that any
parity calculation is complete.  So introduce STRIPE_REPLACED to
record if the replacement has happened.

For safety we should also check that STRIPE_COMPUTE_RUN is not set.
This has a similar effect to the "s.locked == 0" test.  The latter
ensure that now IO has been flagged but not started.  The former
checks if any parity calculation has been flagged by not started.
We must wait for both of these to complete before triggering the
'replace'.

Add a similar test to the subsequent check for "are we finished yet".
This possibly isn't needed (is subsumed in the STRIPE_INSYNC test),
but it makes it more obvious that the REPLACE will happen before we
think we are finished.

Finally if a NeedReplace device is not UPTODATE then that is an
error.  We really must trigger a warning.

This bug was introduced in commit 9a3e1101
(md/raid5:  detect and handle replacements during recovery.)
which introduced replacement for raid5.
That was in 3.3-rc3, so any stable kernel since then would benefit
from this fix.

Cc: stable@vger.kernel.org (3.3+)
Reported-by: Nqindehua <13691222965@163.com>
Tested-by: Nqindehua <qindehua@163.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

f94c0b66

md/raid10: remove use-after-free bug. · 0eb25bb0

由 NeilBrown 提交于 7月 24, 2013

We always need to be careful when calling generic_make_request, as it
can start a chain of events which might free something that we are
using.

Here is one place I wasn't careful enough.  If the wbio2 is not in
use, then it might get freed at the first generic_make_request call.
So perform all necessary tests first.

This bug was introduced in 3.3-rc3 (24afd80d) and can cause an
oops, so fix is suitable for any -stable since then.

Cc: stable@vger.kernel.org (3.3+)
Signed-off-by: NNeilBrown <neilb@suse.de>

0eb25bb0

18 7月, 2013 3 次提交

md/raid1: fix bio handling problems in process_checks() · 30bc9b53

由 NeilBrown 提交于 7月 17, 2013

Recent change to use bio_copy_data() in raid1 when repairing
an array is faulty.

The underlying may have changed the bio in various ways using
bio_advance and these need to be undone not just for the 'sbio' which
is being copied to, but also the 'pbio' (primary) which is being
copied from.

So perform the reset on all bios that were read from and do it early.

This also ensure that the sbio->bi_io_vec[j].bv_len passed to
memcmp is correct.

This fixes a crash during a 'check' of a RAID1 array.  The crash was
introduced in 3.10 so this is suitable for 3.10-stable.

Cc: stable@vger.kernel.org (3.10)
Reported-by: NJoe Lawrence <joe.lawrence@stratus.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

30bc9b53

md: Remove recent change which allows devices to skip recovery. · 5024c298

由 NeilBrown 提交于 7月 17, 2013

commit 7ceb17e8
    md: Allow devices to be re-added to a read-only array.

allowed a bit more than just that.  It also allows devices to be added
to a read-write array and to end up skipping recovery.

This patch removes the offending piece of code pending a rewrite for a
subsequent release.

More specifically:
 If the array has a bitmap, then the device will still need a bitmap
 based resync ('saved_raid_disk' is set under different conditions
 is a bitmap is present).
 If the array doesn't have a bitmap, then this is correct as long as
 nothing has been written to the array since the metadata was checked
 by ->validate_super.  However there is no locking to ensure that there
 was no write.

Bug was introduced in 3.10 and causes data corruption so
patch is suitable for 3.10-stable.

Cc: stable@vger.kernel.org (3.10)
Reported-by: NJoe Lawrence <joe.lawrence@stratus.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

5024c298

md/raid10: fix two problems with RAID10 resync. · 7bb23c49

由 NeilBrown 提交于 7月 16, 2013

1/ When an different between blocks is found, data is copied from
   one bio to the other.  However bv_len is used as the length to
   copy and this could be zero.  So use r10_bio->sectors to calculate
   length instead.
   Using bv_len was probably always a bit dubious, but the introduction
   of bio_advance made it much more likely to be a problem.

2/ When preparing some blocks for sync, we don't set BIO_UPTODATE
   except on bios that we schedule for a read.  This ensures that
   missing/failed devices don't confuse the loop at the top of
   sync_request write.
   Commit 8be185f2 "raid10: Use bio_reset()"
   removed a loop which set BIO_UPTDATE on all appropriate bios.
   So we need to re-add that flag.

These bugs were introduced in 3.10, so this patch is suitable for
3.10-stable, and can remove a potential for data corruption.

Cc: stable@vger.kernel.org (3.10)
Reported-by: NBrassow Jonathan <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

7bb23c49

11 7月, 2013 12 次提交

dm: add switch target · 9d0eb0ab

由 Jim Ramsay 提交于 7月 10, 2013

dm-switch is a new target that maps IO to underlying block devices
efficiently when there is a large number of fixed-sized address regions
but there is no simple pattern to allow for a compact mapping
representation such as dm-stripe.

Though we have developed this target for a specific storage device, Dell
EqualLogic, we have made an effort to keep it as general purpose as
possible in the hope that others may benefit.

Originally developed by Jim Ramsay. Simplified by Mikulas Patocka.
Signed-off-by: NJim Ramsay <jim_ramsay@dell.com>
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

9d0eb0ab

dm: optimize reorder structure · 2a7faeb1

由 Mikulas Patocka 提交于 7月 10, 2013

This reorder actually improves performance by 20% (from 39.1s to 32.8s)
on x86-64 quad core Opteron.

I have no explanation for this, possibly it makes some other entries are
better cache-aligned.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

2a7faeb1

dm: optimize use SRCU and RCU · 83d5e5b0

由 Mikulas Patocka 提交于 7月 10, 2013

This patch removes "io_lock" and "map_lock" in struct mapped_device and
"holders" in struct dm_table and replaces these mechanisms with
sleepable-rcu.

Previously, the code would call "dm_get_live_table" and "dm_table_put" to
get and release table. Now, the code is changed to call "dm_get_live_table"
and "dm_put_live_table". dm_get_live_table locks sleepable-rcu and
dm_put_live_table unlocks it.

dm_get_live_table_fast/dm_put_live_table_fast can be used instead of
dm_get_live_table/dm_put_live_table. These *_fast functions use
non-sleepable RCU, so the caller must not block between them.

If the code changes active or inactive dm table, it must call
dm_sync_table before destroying the old table.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

83d5e5b0

dm bufio: submit writes outside lock · 2480945c

由 Mikulas Patocka 提交于 7月 10, 2013

This patch changes dm-bufio so that it submits write I/Os outside of the
lock. If the number of submitted buffers is greater than the number of
requests on the target queue, submit_bio blocks. We want to block outside
of the lock to improve latency of other threads that may need the lock.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

2480945c

dm cache: fix arm link errors with inline · 43aeaa29

由 Mikulas Patocka 提交于 7月 10, 2013

Use __always_inline to avoid a link failure with gcc 4.6 on ARM.
gcc 4.7 is OK.

It creates a function block_div.part.8, it references __udivdi3 and
__umoddi3 and it is never called. The references to __udivdi3 and
__umoddi3 cause a link failure.
Reported-by: NRob Herring <robherring2@gmail.com>
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

43aeaa29

dm verity: use __ffs and __fls · 553d8fe0

由 Mikulas Patocka 提交于 7月 10, 2013

This patch changes ffs() to __ffs() and fls() to __fls() which don't add
one to the result.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

553d8fe0

dm flakey: correct ctr alloc failure mesg · 75e3a0f5

由 Alasdair G Kergon 提交于 7月 10, 2013

Remove the reference to the "linear" target from the error message
issued when allocation fails in the flakey target.

Cc: Robin Dong <sanbai@taobao.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

75e3a0f5

dm verity: remove pointless comparison · 5d8be843

由 Mikulas Patocka 提交于 7月 10, 2013

Remove num < 0 test in verity_ctr because num is unsigned.
(Found by Coverity.)
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

5d8be843

dm: use __GFP_HIGHMEM in __vmalloc · 220cd058

由 Mikulas Patocka 提交于 7月 10, 2013

Use __GFP_HIGHMEM in __vmalloc.

Pages allocated with __vmalloc can be allocated in high memory that is not
directly mapped to kernel space, so use __GFP_HIGHMEM just like vmalloc
does. This patch reduces memory pressure slightly because pages can be
allocated in the high zone.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

220cd058

dm verity: fix inability to use a few specific devices sizes · b1bf2de0

由 Mikulas Patocka 提交于 7月 10, 2013

Fix a boundary condition that caused failure for certain device sizes.

The problem is reported at
  http://code.google.com/p/cryptsetup/issues/detail?id=160

For certain device sizes the number of hashes at a specific level was
calculated incorrectly.

It happens for example for a device with data and metadata block size 4096
that has 16385 blocks and algorithm sha256.

The user can test if he is affected by this bug by running the
"veritysetup verify" command and also by activating the dm-verity kernel
driver and reading the whole block device. If it passes without an error,
then the user is not affected.

The condition for the bug is:

Split the total number of data blocks (data_block_bits) into bit strings,
each string has hash_per_block_bits bits. hash_per_block_bits is
rounddown(log2(metadata_block_size/hash_digest_size)). Equivalently, you
can say that you convert data_blocks_bits to 2^hash_per_block_bits base.

If there some zero bit string below the most significant bit string and at
least one bit below this zero bit string is set, then the bug happens.

The same bug exists in the userspace veritysetup tool, so you must use
fixed veritysetup too if you want to use devices that are affected by
this boundary condition.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org # 3.4+
Cc: Milan Broz <gmazyland@gmail.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

b1bf2de0

dm ioctl: set noio flag to avoid __vmalloc deadlock · 1c0e883e

由 Mikulas Patocka 提交于 7月 10, 2013

Set noio flag while calling __vmalloc() because it doesn't fully respect
gfp flags to avoid a possible deadlock (see commit
502624bd).

This should be backported to stable kernels 3.8 and newer. The kernel 3.8
doesn't have memalloc_noio_save(), so we should set and restore process
flag PF_MEMALLOC instead.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

1c0e883e

dm mpath: fix ioctl deadlock when no paths · 6c182cd8

由 Hannes Reinecke 提交于 7月 10, 2013

When multipath needs to retry an ioctl the reference to the
current live table needs to be dropped. Otherwise a deadlock
occurs when all paths are down:
- dm_blk_ioctl takes a reference to the current table
  and spins in multipath_ioctl().
- A new table is being loaded, but upon resume the process
  hangs in dm_table_destroy() waiting for references to
  drop to zero.

With this patch the reference to the old table is dropped
prior to retry, thereby avoiding the deadlock.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

6c182cd8

04 7月, 2013 2 次提交

md/raid10: fix bug which causes all RAID10 reshapes to move no data. · 13765120

由 NeilBrown 提交于 7月 04, 2013

The recent comment:
commit 7e83ccbe
    md/raid10: Allow skipping recovery when clean arrays are assembled

Causes raid10 to skip a recovery in certain cases where it is safe to
do so.  Unfortunately it also causes a reshape to be skipped which is
never safe.  The result is that an attempt to reshape a RAID10 will
appear to complete instantly, but no data will have been moves so the
array will now contain garbage.
(If nothing is written, you can recovery by simple performing the
reverse reshape which will also complete instantly).

Bug was introduced in 3.10, so this is suitable for 3.10-stable.

Cc: stable@vger.kernel.org (3.10)
Cc: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NNeilBrown <neilb@suse.de>

13765120

md/raid5: allow 5-device RAID6 to be reshaped to 4-device. · fdcfbbb6

由 NeilBrown 提交于 7月 04, 2013

There is a bug in 'check_reshape' for raid5.c  To checks
that the new minimum number of devices is large enough (which is
good), but it does so also after the reshape has started (bad).

This is bad because
 - the calculation is now wrong as mddev->raid_disks has changed
   already, and
 - it is pointless because it is now too late to stop.

So only perform that test when reshape has not been committed to.
Signed-off-by: NNeilBrown <neilb@suse.de>

fdcfbbb6

03 7月, 2013 1 次提交

md/raid10: fix two bugs affecting RAID10 reshape. · 78eaa0d4

由 NeilBrown 提交于 7月 02, 2013

1/ If a RAID10 is being reshaped to a fewer number of devices
 and is stopped while this is ongoing, then when the array is
 reassembled the 'mirrors' array will be allocated too small.
 This will lead to an access error or memory corruption.

2/ A sanity test for a reshaping RAID10 array is restarted
 is slightly incorrect.

Due to the first bug, this is suitable for any -stable
kernel since 3.5 where this code was introduced.

Cc: stable@vger.kernel.org (v3.5+)
Signed-off-by: NNeilBrown <neilb@suse.de>

78eaa0d4

26 6月, 2013 2 次提交

MD: Remember the last sync operation that was performed · c4a39551

由 Jonathan Brassow 提交于 6月 25, 2013

MD:  Remember the last sync operation that was performed

This patch adds a field to the mddev structure to track the last
sync operation that was performed.  This is especially useful when
it comes to what is recorded in mismatch_cnt in sysfs.  If the
last operation was "data-check", then it reports the number of
descrepancies found by the user-initiated check.  If it was a
"repair" operation, then it is reporting the number of
descrepancies repaired.  etc.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

c4a39551

md: fix buglet in RAID5 -> RAID0 conversion. · eea136d6

由 NeilBrown 提交于 6月 26, 2013

RAID5 uses a 'per-array' value for the 'size' of each device.
RAID0 uses a 'per-device' value - it can be different for each device.

When converting a RAID5 to a RAID0 we must ensure that the per-device
size of each device matches the per-array size for the RAID5, else
the array will change size.

If the metadata cannot record a changed per-device size (as is the
case with v0.90 metadata) the array could get bigger on restart.  This
does not cause data corruption, so it not a big issue and is mainly
yet another a reason to not use 0.90.
Signed-off-by: NNeilBrown <neilb@suse.de>

eea136d6

18 6月, 2013 1 次提交

md: bcache: Fixed a typo with the word 'arithmetic' · 48a73025

由 Phil Viana 提交于 6月 03, 2013

The word 'arithmetic' was typed as 'arithmatic'
Signed-off-by: NPhil Viana <phillip.l.viana@gmail.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

48a73025

14 6月, 2013 8 次提交

md/raid10: check In_sync flag in 'enough()'. · 725d6e57

由 NeilBrown 提交于 6月 11, 2013

It isn't really enough to check that the rdev is present, we need to
also be sure that the device is still In_sync.

Doing this requires using rcu_dereference to access the rdev, and
holding the rcu_read_lock() to ensure the rdev doesn't disappear while
we look at it.
Signed-off-by: NNeilBrown <neilb@suse.de>

725d6e57

md/raid10: locking changes for 'enough()'. · 635f6416

由 NeilBrown 提交于 6月 11, 2013

As 'enough' accesses conf->prev and conf->geo, which can change
spontanously, it should guard against changes.
This can be done with device_lock as start_reshape holds device_lock
while updating 'geo' and end_reshape holds it while updating 'prev'.

So 'error' needs to hold 'device_lock'.

On the other hand, raid10_end_read_request knows which of the two it
really wants to access, and as it is an active request on that one,
the value cannot change underneath it.

So change _enough to take flag rather than a pointer, pass the
appropriate flag from raid10_end_read_request(), and remove the locking.

All other calls to 'enough' are made with reconfig_mutex held, so
neither 'prev' nor 'geo' can change.
Signed-off-by: NNeilBrown <neilb@suse.de>

635f6416

md: replace strict_strto*() with kstrto*() · b29bebd6

由 Jingoo Han 提交于 6月 01, 2013

The usage of strict_strtoul() is not preferred, because
strict_strtoul() is obsolete. Thus, kstrtoul() should be
used.
Signed-off-by: NJingoo Han <jg1.han@samsung.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

b29bebd6

md: Wait for md_check_recovery before attempting device removal. · 90f5f7ad

由 Hannes Reinecke 提交于 4月 02, 2013

When a device has failed, it needs to be removed from the personality
module before it can be removed from the array as a whole.
The first step is performed by md_check_recovery() which is called
from the raid management thread.

So when a HOT_REMOVE ioctl arrives, wait briefly for md_check_recovery
to have run.  This increases the chance that the ioctl will succeed.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NNeil Brown <nfbrown@suse.de>

90f5f7ad

dm-raid: silence compiler warning on rebuilds_per_group. · 3f6bbd3f

由 NeilBrown 提交于 5月 09, 2013

This doesn't really need to be initialised, but it doesn't hurt,
silences the compiler, and as it is a counter it makes sense for it to
start at zero.
Signed-off-by: NNeilBrown <neilb@suse.de>

3f6bbd3f

DM RAID: Fix raid_resume not reviving failed devices in all cases · a4dc163a

由 Jonathan Brassow 提交于 5月 08, 2013

DM RAID:  Fix raid_resume not reviving failed devices in all cases

When a device fails in a RAID array, it is marked as Faulty.  Later,
md_check_recovery is called which (through the call chain) calls
'hot_remove_disk' in order to have the personalities remove the device
from use in the array.

Sometimes, it is possible for the array to be suspended before the
personalities get their chance to perform 'hot_remove_disk'.  This is
normally not an issue.  If the array is deactivated, then the failed
device will be noticed when the array is reinstantiated.  If the
array is resumed and the disk is still missing, md_check_recovery will
be called upon resume and 'hot_remove_disk' will be called at that
time.  However, (for dm-raid) if the device has been restored,
a resume on the array would cause it to attempt to revive the device
by calling 'hot_add_disk'.  If 'hot_remove_disk' had not been called,
a situation is then created where the device is thought to concurrently
be the replacement and the device to be replaced.  Thus, the device
is first sync'ed with the rest of the array (because it is the replacement
device) and then marked Faulty and removed from the array (because
it is also the device being replaced).

The solution is to check and see if the device had properly been removed
before the array was suspended.  This is done by seeing whether the
device's 'raid_disk' field is -1 - a condition that implies that
'md_check_recovery -> remove_and_add_spares (where raid_disk is set to -1)
-> hot_remove_disk' has been called.  If 'raid_disk' is not -1, then
'hot_remove_disk' must be called to complete the removal of the previously
faulty device before it can be revived via 'hot_add_disk'.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

a4dc163a

DM RAID: Break-up untidy function · f381e71b

由 Jonathan Brassow 提交于 5月 08, 2013

DM RAID:  Break-up untidy function

Clean-up excessive indentation by moving some code in raid_resume()
into its own function.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

f381e71b

DM RAID: Add ability to restore transiently failed devices on resume · 9092c02d

由 Jonathan Brassow 提交于 5月 02, 2013

DM RAID: Add ability to restore transiently failed devices on resume

This patch adds code to the resume function to check over the devices
in the RAID array.  If any are found to be marked as failed and their
superblocks can be read, an attempt is made to reintegrate them into
the array.  This allows the user to refresh the array with a simple
suspend and resume of the array - rather than having to load a
completely new table, allocate and initialize all the structures and
throw away the old instantiation.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

9092c02d

13 6月, 2013 4 次提交

md/raid1,5,10: Disable WRITE SAME until a recovery strategy is in place · 5026d7a9

由 H. Peter Anvin 提交于 6月 12, 2013

There are cases where the kernel will believe that the WRITE SAME
command is supported by a block device which does not, in fact,
support WRITE SAME.  This currently happens for SATA drivers behind a
SAS controller, but there are probably a hundred other ways that can
happen, including drive firmware bugs.

After receiving an error for WRITE SAME the block layer will retry the
request as a plain write of zeroes, but mdraid will consider the
failure as fatal and consider the drive failed.  This has the effect
that all the mirrors containing a specific set of data are each
offlined in very rapid succession resulting in data loss.

However, just bouncing the request back up to the block layer isn't
ideal either, because the whole initial request-retry sequence should
be inside the write bitmap fence, which probably means that md needs
to do its own conversion of WRITE SAME to write zero.

Until the failure scenario has been sorted out, disable WRITE SAME for
raid1, raid5, and raid10.

[neilb: added raid5]

This patch is appropriate for any -stable since 3.7 when write_same
support was added.

Cc: stable@vger.kernel.org
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

5026d7a9

md/raid1,raid10: use freeze_array in place of raise_barrier in various places. · e2d59925

由 NeilBrown 提交于 6月 12, 2013

Various places in raid1 and raid10 are calling raise_barrier when they
really should call freeze_array.
The former is only intended to be called from "make_request".
The later has extra checks for 'nr_queued' and makes a call to
flush_pending_writes(), so it is safe to call it from within the
management thread.

Using raise_barrier will sometimes deadlock.  Using freeze_array
should not.

As 'freeze_array' currently expects one request to be pending (in
handle_read_error - the only previous caller), we need to pass
it the number of pending requests (extra) to ignore.

The deadlock was made particularly noticeable by commits
050b6615 (raid10) and 6b740b8d (raid1) which
appeared in 3.4, so the fix is appropriate for any -stable
kernel since then.

This patch probably won't apply directly to some early kernels and
will need to be applied by hand.

Cc: stable@vger.kernel.org
Reported-by: NAlexander Lyakas <alex.bolshoy@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

e2d59925

md/raid1: consider WRITE as successful only if at least one non-Faulty and... · 3056e3ae

由 Alex Lyakas 提交于 6月 04, 2013

md/raid1: consider WRITE as successful only if at least one non-Faulty and non-rebuilding drive completed it.

Without that fix, the following scenario could happen:

- RAID1 with drives A and B; drive B was freshly-added and is rebuilding
- Drive A fails
- WRITE request arrives to the array. It is failed by drive A, so
r1_bio is marked as R1BIO_WriteError, but the rebuilding drive B
succeeds in writing it, so the same r1_bio is marked as
R1BIO_Uptodate.
- r1_bio arrives to handle_write_finished, badblocks are disabled,
md_error()->error() does nothing because we don't fail the last drive
of raid1
- raid_end_bio_io()  calls call_bio_endio()
- As a result, in call_bio_endio():
        if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
                clear_bit(BIO_UPTODATE, &bio->bi_flags);
this code doesn't clear the BIO_UPTODATE flag, and the whole master
WRITE succeeds, back to the upper layer.

So we returned success to the upper layer, even though we had written
the data onto the rebuilding drive only. But when we want to read the
data back, we would not read from the rebuilding drive, so this data
is lost.

[neilb - applied identical change to raid10 as well]

This bug can result in lost data, so it is suitable for any
-stable kernel.

Cc: stable@vger.kernel.org
Signed-off-by: NAlex Lyakas <alex@zadarastorage.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

3056e3ae

md: md_stop_writes() should always freeze recovery. · 6b6204ee

由 NeilBrown 提交于 5月 09, 2013

__md_stop_writes() will currently sometimes freeze recovery.
So any caller must be ready for that to happen, and indeed they are.

However if __md_stop_writes() doesn't freeze_recovery, then
a recovery could start before mddev_suspend() is called, which
could be awkward.  This can particularly cause problems or dm-raid.

So change __md_stop_writes() to always freeze recovery.  This is safe
and more predicatable.
Reported-by: NBrassow Jonathan <jbrassow@redhat.com>
Tested-by: NBrassow Jonathan <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

6b6204ee

30 5月, 2013 1 次提交

raid5: Initialize bi_vcnt · 4997b72e

由 Kent Overstreet 提交于 5月 30, 2013

The patch that converted raid5 to use bio_reset() forgot to initialize
bi_vcnt.
Signed-off-by: NKent Overstreet <koverstreet@google.com>
Cc: NeilBrown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Tested-by: NIlia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4997b72e

20 5月, 2013 1 次提交

dm thin: fix metadata dev resize detection · 610bba8b

由 Alasdair G Kergon 提交于 5月 19, 2013

Fix detection of the need to resize the dm thin metadata device.

The code incorrectly tried to extend the metadata device when it
didn't need to due to a merging error with patch 24347e95 ("dm thin:
detect metadata device resizing").

  device-mapper: transaction manager: couldn't open metadata space map
  device-mapper: thin metadata: tm_open_with_sm failed
  device-mapper: thin: aborting transaction failed
  device-mapper: thin: switching pool to failure mode
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

610bba8b

15 5月, 2013 3 次提交

bcache: Fix error handling in init code · f59fce84

由 Kent Overstreet 提交于 5月 15, 2013

This code appears to have rotted... fix various bugs and do some
refactoring.
Signed-off-by: NKent Overstreet <koverstreet@google.com>

f59fce84

bcache: drop "select CLOSURES" · bbb1c3b5

由 Paul Bolle 提交于 5月 13, 2013

The Kconfig entry for BCACHE selects CLOSURES. But there's no Kconfig
symbol CLOSURES. That symbol was used in development versions of bcache,
but was removed when the closures code was no longer provided as a
kernel library. It can safely be dropped.
Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>

bbb1c3b5

bcache: Fix incompatible pointer type warning · 867e1162

由 Emil Goode 提交于 5月 09, 2013

The function pointer release in struct block_device_operations
should point to functions declared as void.

Sparse warnings:

drivers/md/bcache/super.c:656:27: warning:
	incorrect type in initializer (different base types)
	drivers/md/bcache/super.c:656:27:
	expected void ( *release )( ... )
	drivers/md/bcache/super.c:656:27:
	got int ( static [toplevel] *<noident> )( ... )

drivers/md/bcache/super.c:656:2: warning:
	initialization from incompatible pointer type [enabled by default]

drivers/md/bcache/super.c:656:2: warning:
	(near initialization for ‘bcache_ops.release’) [enabled by default]
Signed-off-by: NEmil Goode <emilgoode@gmail.com>
Signed-off-by: NKent Overstreet <koverstreet@google.com>

867e1162