提交 · 697a067f1ec67f2f8dfafd0a1b95a46997a11f32 · OpenHarmony / kernel_linux

04 7月, 2013 2 次提交

md/raid10: fix bug which causes all RAID10 reshapes to move no data. · 13765120

由 NeilBrown 提交于 7月 04, 2013

The recent comment:
commit 7e83ccbe
    md/raid10: Allow skipping recovery when clean arrays are assembled

Causes raid10 to skip a recovery in certain cases where it is safe to
do so.  Unfortunately it also causes a reshape to be skipped which is
never safe.  The result is that an attempt to reshape a RAID10 will
appear to complete instantly, but no data will have been moves so the
array will now contain garbage.
(If nothing is written, you can recovery by simple performing the
reverse reshape which will also complete instantly).

Bug was introduced in 3.10, so this is suitable for 3.10-stable.

Cc: stable@vger.kernel.org (3.10)
Cc: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NNeilBrown <neilb@suse.de>

13765120

md/raid5: allow 5-device RAID6 to be reshaped to 4-device. · fdcfbbb6

由 NeilBrown 提交于 7月 04, 2013

There is a bug in 'check_reshape' for raid5.c  To checks
that the new minimum number of devices is large enough (which is
good), but it does so also after the reshape has started (bad).

This is bad because
 - the calculation is now wrong as mddev->raid_disks has changed
   already, and
 - it is pointless because it is now too late to stop.

So only perform that test when reshape has not been committed to.
Signed-off-by: NNeilBrown <neilb@suse.de>

fdcfbbb6

03 7月, 2013 1 次提交

md/raid10: fix two bugs affecting RAID10 reshape. · 78eaa0d4

由 NeilBrown 提交于 7月 02, 2013

1/ If a RAID10 is being reshaped to a fewer number of devices
 and is stopped while this is ongoing, then when the array is
 reassembled the 'mirrors' array will be allocated too small.
 This will lead to an access error or memory corruption.

2/ A sanity test for a reshaping RAID10 array is restarted
 is slightly incorrect.

Due to the first bug, this is suitable for any -stable
kernel since 3.5 where this code was introduced.

Cc: stable@vger.kernel.org (v3.5+)
Signed-off-by: NNeilBrown <neilb@suse.de>

78eaa0d4

26 6月, 2013 2 次提交

MD: Remember the last sync operation that was performed · c4a39551

由 Jonathan Brassow 提交于 6月 25, 2013

MD:  Remember the last sync operation that was performed

This patch adds a field to the mddev structure to track the last
sync operation that was performed.  This is especially useful when
it comes to what is recorded in mismatch_cnt in sysfs.  If the
last operation was "data-check", then it reports the number of
descrepancies found by the user-initiated check.  If it was a
"repair" operation, then it is reporting the number of
descrepancies repaired.  etc.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

c4a39551

md: fix buglet in RAID5 -> RAID0 conversion. · eea136d6

由 NeilBrown 提交于 6月 26, 2013

RAID5 uses a 'per-array' value for the 'size' of each device.
RAID0 uses a 'per-device' value - it can be different for each device.

When converting a RAID5 to a RAID0 we must ensure that the per-device
size of each device matches the per-array size for the RAID5, else
the array will change size.

If the metadata cannot record a changed per-device size (as is the
case with v0.90 metadata) the array could get bigger on restart.  This
does not cause data corruption, so it not a big issue and is mainly
yet another a reason to not use 0.90.
Signed-off-by: NNeilBrown <neilb@suse.de>

eea136d6

14 6月, 2013 8 次提交

md/raid10: check In_sync flag in 'enough()'. · 725d6e57

由 NeilBrown 提交于 6月 11, 2013

It isn't really enough to check that the rdev is present, we need to
also be sure that the device is still In_sync.

Doing this requires using rcu_dereference to access the rdev, and
holding the rcu_read_lock() to ensure the rdev doesn't disappear while
we look at it.
Signed-off-by: NNeilBrown <neilb@suse.de>

725d6e57

md/raid10: locking changes for 'enough()'. · 635f6416

由 NeilBrown 提交于 6月 11, 2013

As 'enough' accesses conf->prev and conf->geo, which can change
spontanously, it should guard against changes.
This can be done with device_lock as start_reshape holds device_lock
while updating 'geo' and end_reshape holds it while updating 'prev'.

So 'error' needs to hold 'device_lock'.

On the other hand, raid10_end_read_request knows which of the two it
really wants to access, and as it is an active request on that one,
the value cannot change underneath it.

So change _enough to take flag rather than a pointer, pass the
appropriate flag from raid10_end_read_request(), and remove the locking.

All other calls to 'enough' are made with reconfig_mutex held, so
neither 'prev' nor 'geo' can change.
Signed-off-by: NNeilBrown <neilb@suse.de>

635f6416

md: replace strict_strto*() with kstrto*() · b29bebd6

由 Jingoo Han 提交于 6月 01, 2013

The usage of strict_strtoul() is not preferred, because
strict_strtoul() is obsolete. Thus, kstrtoul() should be
used.
Signed-off-by: NJingoo Han <jg1.han@samsung.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

b29bebd6

md: Wait for md_check_recovery before attempting device removal. · 90f5f7ad

由 Hannes Reinecke 提交于 4月 02, 2013

When a device has failed, it needs to be removed from the personality
module before it can be removed from the array as a whole.
The first step is performed by md_check_recovery() which is called
from the raid management thread.

So when a HOT_REMOVE ioctl arrives, wait briefly for md_check_recovery
to have run.  This increases the chance that the ioctl will succeed.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NNeil Brown <nfbrown@suse.de>

90f5f7ad

dm-raid: silence compiler warning on rebuilds_per_group. · 3f6bbd3f

由 NeilBrown 提交于 5月 09, 2013

This doesn't really need to be initialised, but it doesn't hurt,
silences the compiler, and as it is a counter it makes sense for it to
start at zero.
Signed-off-by: NNeilBrown <neilb@suse.de>

3f6bbd3f

DM RAID: Fix raid_resume not reviving failed devices in all cases · a4dc163a

由 Jonathan Brassow 提交于 5月 08, 2013

DM RAID:  Fix raid_resume not reviving failed devices in all cases

When a device fails in a RAID array, it is marked as Faulty.  Later,
md_check_recovery is called which (through the call chain) calls
'hot_remove_disk' in order to have the personalities remove the device
from use in the array.

Sometimes, it is possible for the array to be suspended before the
personalities get their chance to perform 'hot_remove_disk'.  This is
normally not an issue.  If the array is deactivated, then the failed
device will be noticed when the array is reinstantiated.  If the
array is resumed and the disk is still missing, md_check_recovery will
be called upon resume and 'hot_remove_disk' will be called at that
time.  However, (for dm-raid) if the device has been restored,
a resume on the array would cause it to attempt to revive the device
by calling 'hot_add_disk'.  If 'hot_remove_disk' had not been called,
a situation is then created where the device is thought to concurrently
be the replacement and the device to be replaced.  Thus, the device
is first sync'ed with the rest of the array (because it is the replacement
device) and then marked Faulty and removed from the array (because
it is also the device being replaced).

The solution is to check and see if the device had properly been removed
before the array was suspended.  This is done by seeing whether the
device's 'raid_disk' field is -1 - a condition that implies that
'md_check_recovery -> remove_and_add_spares (where raid_disk is set to -1)
-> hot_remove_disk' has been called.  If 'raid_disk' is not -1, then
'hot_remove_disk' must be called to complete the removal of the previously
faulty device before it can be revived via 'hot_add_disk'.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

a4dc163a

DM RAID: Break-up untidy function · f381e71b

由 Jonathan Brassow 提交于 5月 08, 2013

DM RAID:  Break-up untidy function

Clean-up excessive indentation by moving some code in raid_resume()
into its own function.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

f381e71b

DM RAID: Add ability to restore transiently failed devices on resume · 9092c02d

由 Jonathan Brassow 提交于 5月 02, 2013

DM RAID: Add ability to restore transiently failed devices on resume

This patch adds code to the resume function to check over the devices
in the RAID array.  If any are found to be marked as failed and their
superblocks can be read, an attempt is made to reintegrate them into
the array.  This allows the user to refresh the array with a simple
suspend and resume of the array - rather than having to load a
completely new table, allocate and initialize all the structures and
throw away the old instantiation.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

9092c02d

13 6月, 2013 4 次提交

md/raid1,5,10: Disable WRITE SAME until a recovery strategy is in place · 5026d7a9

由 H. Peter Anvin 提交于 6月 12, 2013

There are cases where the kernel will believe that the WRITE SAME
command is supported by a block device which does not, in fact,
support WRITE SAME.  This currently happens for SATA drivers behind a
SAS controller, but there are probably a hundred other ways that can
happen, including drive firmware bugs.

After receiving an error for WRITE SAME the block layer will retry the
request as a plain write of zeroes, but mdraid will consider the
failure as fatal and consider the drive failed.  This has the effect
that all the mirrors containing a specific set of data are each
offlined in very rapid succession resulting in data loss.

However, just bouncing the request back up to the block layer isn't
ideal either, because the whole initial request-retry sequence should
be inside the write bitmap fence, which probably means that md needs
to do its own conversion of WRITE SAME to write zero.

Until the failure scenario has been sorted out, disable WRITE SAME for
raid1, raid5, and raid10.

[neilb: added raid5]

This patch is appropriate for any -stable since 3.7 when write_same
support was added.

Cc: stable@vger.kernel.org
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

5026d7a9

md/raid1,raid10: use freeze_array in place of raise_barrier in various places. · e2d59925

由 NeilBrown 提交于 6月 12, 2013

Various places in raid1 and raid10 are calling raise_barrier when they
really should call freeze_array.
The former is only intended to be called from "make_request".
The later has extra checks for 'nr_queued' and makes a call to
flush_pending_writes(), so it is safe to call it from within the
management thread.

Using raise_barrier will sometimes deadlock.  Using freeze_array
should not.

As 'freeze_array' currently expects one request to be pending (in
handle_read_error - the only previous caller), we need to pass
it the number of pending requests (extra) to ignore.

The deadlock was made particularly noticeable by commits
050b6615 (raid10) and 6b740b8d (raid1) which
appeared in 3.4, so the fix is appropriate for any -stable
kernel since then.

This patch probably won't apply directly to some early kernels and
will need to be applied by hand.

Cc: stable@vger.kernel.org
Reported-by: NAlexander Lyakas <alex.bolshoy@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

e2d59925

md/raid1: consider WRITE as successful only if at least one non-Faulty and... · 3056e3ae

由 Alex Lyakas 提交于 6月 04, 2013

md/raid1: consider WRITE as successful only if at least one non-Faulty and non-rebuilding drive completed it.

Without that fix, the following scenario could happen:

- RAID1 with drives A and B; drive B was freshly-added and is rebuilding
- Drive A fails
- WRITE request arrives to the array. It is failed by drive A, so
r1_bio is marked as R1BIO_WriteError, but the rebuilding drive B
succeeds in writing it, so the same r1_bio is marked as
R1BIO_Uptodate.
- r1_bio arrives to handle_write_finished, badblocks are disabled,
md_error()->error() does nothing because we don't fail the last drive
of raid1
- raid_end_bio_io()  calls call_bio_endio()
- As a result, in call_bio_endio():
        if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
                clear_bit(BIO_UPTODATE, &bio->bi_flags);
this code doesn't clear the BIO_UPTODATE flag, and the whole master
WRITE succeeds, back to the upper layer.

So we returned success to the upper layer, even though we had written
the data onto the rebuilding drive only. But when we want to read the
data back, we would not read from the rebuilding drive, so this data
is lost.

[neilb - applied identical change to raid10 as well]

This bug can result in lost data, so it is suitable for any
-stable kernel.

Cc: stable@vger.kernel.org
Signed-off-by: NAlex Lyakas <alex@zadarastorage.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

3056e3ae

md: md_stop_writes() should always freeze recovery. · 6b6204ee

由 NeilBrown 提交于 5月 09, 2013

__md_stop_writes() will currently sometimes freeze recovery.
So any caller must be ready for that to happen, and indeed they are.

However if __md_stop_writes() doesn't freeze_recovery, then
a recovery could start before mddev_suspend() is called, which
could be awkward.  This can particularly cause problems or dm-raid.

So change __md_stop_writes() to always freeze recovery.  This is safe
and more predicatable.
Reported-by: NBrassow Jonathan <jbrassow@redhat.com>
Tested-by: NBrassow Jonathan <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

6b6204ee

30 5月, 2013 1 次提交

raid5: Initialize bi_vcnt · 4997b72e

由 Kent Overstreet 提交于 5月 30, 2013

The patch that converted raid5 to use bio_reset() forgot to initialize
bi_vcnt.
Signed-off-by: NKent Overstreet <koverstreet@google.com>
Cc: NeilBrown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Tested-by: NIlia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4997b72e

20 5月, 2013 1 次提交

dm thin: fix metadata dev resize detection · 610bba8b

由 Alasdair G Kergon 提交于 5月 19, 2013

Fix detection of the need to resize the dm thin metadata device.

The code incorrectly tried to extend the metadata device when it
didn't need to due to a merging error with patch 24347e95 ("dm thin:
detect metadata device resizing").

  device-mapper: transaction manager: couldn't open metadata space map
  device-mapper: thin metadata: tm_open_with_sm failed
  device-mapper: thin: aborting transaction failed
  device-mapper: thin: switching pool to failure mode
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

610bba8b

15 5月, 2013 3 次提交

bcache: Fix error handling in init code · f59fce84

由 Kent Overstreet 提交于 5月 15, 2013

This code appears to have rotted... fix various bugs and do some
refactoring.
Signed-off-by: NKent Overstreet <koverstreet@google.com>

f59fce84

bcache: drop "select CLOSURES" · bbb1c3b5

由 Paul Bolle 提交于 5月 13, 2013

The Kconfig entry for BCACHE selects CLOSURES. But there's no Kconfig
symbol CLOSURES. That symbol was used in development versions of bcache,
but was removed when the closures code was no longer provided as a
kernel library. It can safely be dropped.
Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>

bbb1c3b5

bcache: Fix incompatible pointer type warning · 867e1162

由 Emil Goode 提交于 5月 09, 2013

The function pointer release in struct block_device_operations
should point to functions declared as void.

Sparse warnings:

drivers/md/bcache/super.c:656:27: warning:
	incorrect type in initializer (different base types)
	drivers/md/bcache/super.c:656:27:
	expected void ( *release )( ... )
	drivers/md/bcache/super.c:656:27:
	got int ( static [toplevel] *<noident> )( ... )

drivers/md/bcache/super.c:656:2: warning:
	initialization from incompatible pointer type [enabled by default]

drivers/md/bcache/super.c:656:2: warning:
	(near initialization for ‘bcache_ops.release’) [enabled by default]
Signed-off-by: NEmil Goode <emilgoode@gmail.com>
Signed-off-by: NKent Overstreet <koverstreet@google.com>

867e1162

10 5月, 2013 18 次提交

dm cache: set config value · 2f14f4b5

由 Joe Thornber 提交于 5月 10, 2013

Share configuration option processing code between the dm cache
ctr and message functions.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

2f14f4b5

dm cache: move config fns · 2c73c471

由 Alasdair G Kergon 提交于 5月 10, 2013

Move process_config_option() in dm-cache-target.c to make the
next patch more readable.
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

2c73c471

dm thin: generate event when metadata threshold passed · ac8c3f3d

由 Joe Thornber 提交于 5月 10, 2013

Generate a dm event when the amount of remaining thin pool metadata
space falls below a certain level.

The threshold is taken to be a quarter of the size of the metadata
device with a minimum threshold of 4MB.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

ac8c3f3d

dm persistent metadata: add space map threshold callback · 2fc48021

由 Joe Thornber 提交于 5月 10, 2013

Add a threshold callback to dm persistent data space maps.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

2fc48021

dm persistent data: add threshold callback to space map · 7c3d3f2a

由 Joe Thornber 提交于 5月 10, 2013

Add a threshold callback function to the persistent data space map
interface for a subsequent patch to use.

dm-thin and dm-cache are interested in knowing when they're getting
low on metadata or data blocks.  This patch introduces a new method
for registering a callback against a threshold.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

7c3d3f2a

dm thin: detect metadata device resizing · 24347e95

由 Joe Thornber 提交于 5月 10, 2013

Allow the dm thin pool metadata device to be extended.

Whenever a pool is resumed, detect whether the size of the metadata
device has increased, and if so, extend the metadata to use the new
space.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

24347e95

dm persistent data: support space map resizing · 1921c56d

由 Joe Thornber 提交于 5月 10, 2013

Support extending a dm persistent data metadata space map.

The extend itself is implemented by switching back to the boostrap
allocator and pointing to the new space.  The extra bitmap indexes are
then allocated from the new space, and finally we switch back to the
proper space map ops and tweak the reference counts.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

1921c56d

dm thin: open dev read only when possible · 5d0db96d

由 Joe Thornber 提交于 5月 10, 2013

If a thin pool is created in read-only-metadata mode then only open the
metadata device read-only.

Previously it was always opened with FMODE_READ | FMODE_WRITE.

(Note that dm_get_device() still allows read-only dm devices to be used
read-write at the moment: If I create a read-only linear device for the
metadata, via dmsetup load --readonly, then I can still create a rw pool
out of it.)
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

5d0db96d

dm thin: refactor data dev resize · b17446df

由 Joe Thornber 提交于 5月 10, 2013

Refactor device size functions in preparation for similar metadata
device resizing functions.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

b17446df

dm cache: replace memcpy with struct assignment · 8c5008fa

由 Joe Thornber 提交于 5月 10, 2013

Use struct assignment rather than memcpy in dm cache.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

8c5008fa

dm cache: fix typos in comments · aeed1420

由 Joe Thornber 提交于 5月 10, 2013

Fix up some typos in dm-cache comments.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

aeed1420

dm cache policy: fix description of lookup fn · e12c1fd9

由 Alasdair G Kergon 提交于 5月 10, 2013

Correct the documented requirement on the return code from dm cache policy
lookup functions stated in the policy module header file.
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

e12c1fd9

dm persistent data: fix error message typos · 88a488f6

由 Joe Thornber 提交于 5月 10, 2013

Fix some typos in dm-space-map-metadata.c error messages.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

88a488f6

dm cache: tune migration throttling · f8350daf

由 Joe Thornber 提交于 5月 10, 2013

Tune the dm cache migration throttling.

i) Issue a tick every second, just in case there's no i/o going through.

ii) Drop the migration threshold right down to something suitable for
background work.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

f8350daf

dm mpath: enable WRITE SAME support · 042bcef8

由 Mike Snitzer 提交于 5月 10, 2013

Enable WRITE SAME support in dm multipath.  As far as multipath is
concerned it is just another write request.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Tested-by: NBharata B Rao <bharata.rao@gmail.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

042bcef8

dm table: fix write same support · dc019b21

由 Mike Snitzer 提交于 5月 10, 2013

If device_not_write_same_capable() returns true then the iterate_devices
loop in dm_table_supports_write_same() should return false.
Reported-by: NBharata B Rao <bharata.rao@gmail.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # v3.8+
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

dc019b21

dm bufio: avoid a possible __vmalloc deadlock · 502624bd

由 Mikulas Patocka 提交于 5月 10, 2013

This patch uses memalloc_noio_save to avoid a possible deadlock in
dm-bufio.  (it could happen only with large block size, at most
PAGE_SIZE << MAX_ORDER (typically 8MiB).

__vmalloc doesn't fully respect gfp flags. The specified gfp flags are
used for allocation of requested pages, structures vmap_area, vmap_block
and vm_struct and the radix tree nodes.

However, the kernel pagetables are allocated always with GFP_KERNEL.
Thus the allocation of pagetables can recurse back to the I/O layer and
cause a deadlock.

This patch uses the function memalloc_noio_save to set per-process
PF_MEMALLOC_NOIO flag and the function memalloc_noio_restore to restore
it. When this flag is set, all allocations in the process are done with
implied GFP_NOIO flag, thus the deadlock can't happen.

This should be backported to stable kernels, but they don't have the
PF_MEMALLOC_NOIO flag and memalloc_noio_save/memalloc_noio_restore
functions. So, PF_MEMALLOC should be set and restored instead.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Cc: stable@kernel.org
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

502624bd

dm snapshot: fix error return code in snapshot_ctr · 09e8b813

由 Wei Yongjun 提交于 5月 10, 2013

Return -ENOMEM instead of success if unable to allocate pending
exception mempool in snapshot_ctr.
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: stable@vger.kernel.org
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

09e8b813

OpenHarmony / kernel_linux 上一次同步 大约 4 年

OpenHarmony / kernel_linux
上一次同步大约 4 年