提交 · 1501efadc524a0c99494b576923091589a52d2a4 · openeuler / Kernel

14 1月, 2016 3 次提交

md/raid: only permit hot-add of compatible integrity profiles · 1501efad

由 Dan Williams 提交于 1月 13, 2016

It is not safe for an integrity profile to be changed while i/o is
in-flight in the queue.  Prevent adding new disks or otherwise online
spares to an array if the device has an incompatible integrity profile.

The original change to the blk_integrity_unregister implementation in
md, commmit c7bfced9 "md: suspend i/o during runtime
blk_integrity_unregister" introduced an immediate hang regression.

This policy of disallowing changes the integrity profile once one has
been established is shared with DM.

Here is an abbreviated log from a test run that:
1/ Creates a degraded raid1 with an integrity-enabled device (pmem0s) [   59.076127]
2/ Tries to add an integrity-disabled device (pmem1m) [   90.489209]
3/ Retries with an integrity-enabled device (pmem1s) [  205.671277]

[   59.076127] md/raid1:md0: active with 1 out of 2 mirrors
[   59.078302] md: data integrity enabled on md0
[..]
[   90.489209] md0: incompatible integrity profile for pmem1m
[..]
[  205.671277] md: super_written gets error=-5
[  205.677386] md/raid1:md0: Disk failure on pmem1m, disabling device.
[  205.677386] md/raid1:md0: Operation continuing on 1 devices.
[  205.683037] RAID1 conf printout:
[  205.684699]  --- wd:1 rd:2
[  205.685972]  disk 0, wo:0, o:1, dev:pmem0s
[  205.687562]  disk 1, wo:1, o:1, dev:pmem1s
[  205.691717] md: recovery of RAID array md0

Fixes: c7bfced9 ("md: suspend i/o during runtime blk_integrity_unregister")
Cc: <stable@vger.kernel.org>
Cc: Mike Snitzer <snitzer@redhat.com>
Reported-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

1501efad

MD: add journal with array suspended · 87d4d916

由 Shaohua Li 提交于 1月 06, 2016

Hot add journal disk in recovery thread context brings a lot of trouble
as IO could be running. Unlike spare disk hot add, adding journal disk
with array suspended makes more sense and implmentation is much easier.
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

87d4d916

md: set MD_HAS_JOURNAL in correct places · a62ab49e

由 Shaohua Li 提交于 1月 06, 2016

Set MD_HAS_JOURNAL when a array is loaded or journal is initialized.
This is to avoid the flags set too early in journal disk hotadd.
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

a62ab49e

07 1月, 2016 2 次提交

md: Remove 'ready' field from mddev. · 274d8cbd

由 NeilBrown 提交于 1月 04, 2016

This field is always set in tandem with ->pers, and when it is tested
->pers is also tested.  So ->ready is not needed.

It was needed once, but code rearrangement and locking changes have
removed that needed.
Signed-off-by: NNeilBrown <neilb@suse.com>

274d8cbd

md: remove unnecesary md_new_event_inintr · bb9ef716

由 Guoqing Jiang 提交于 12月 28, 2015

md_new_event had removed sysfs_notify since 'commit 72a23c21
("Make sure all changes to md/sync_action are notified.")', so we
can use md_new_event and delete md_new_event_inintr.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

bb9ef716

06 1月, 2016 9 次提交

raid5-cache: add journal hot add/remove support · f6b6ec5c

由 Shaohua Li 提交于 12月 21, 2015

Add support for journal disk hot add/remove. Mostly trival checks in md
part. The raid5 part is a little tricky. For hot-remove, we can't wait
pending write as it's called from raid5d. The wait will cause deadlock.
We simplily fail the hot-remove. A hot-remove retry can success
eventually since if journal disk is faulty all pending write will be
failed and finish. For hot-add, since an array supporting journal but
without journal disk will be marked read-only, we are safe to hot add
journal without stopping IO (should be read IO, while journal only
handles write IO).
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

f6b6ec5c

drivers: md: use ktime_get_real_seconds() · 9ebc6ef1

由 Deepa Dinamani 提交于 12月 21, 2015

get_seconds() API is not y2038 safe on 32 bit systems and the API
is deprecated. Replace it with calls to ktime_get_real_seconds()
API instead. Change mddev structure types to time64_t accordingly.

32 bit signed timestamps will overflow in the year 2038.

Change the user interface mdu_array_info_s structure timestamps:
ctime and utime values used in ioctls GET_ARRAY_INFO and
SET_ARRAY_INFO to unsigned int. This will extend the field to last
until the year 2106.
The long term plan is to get rid of ctime and utime values in
this structure as this information can be read from the on-disk
meta data directly.

Clamp the tim64_t timestamps to positive values with a max of U32_MAX
when returning from GET_ARRAY_INFO ioctl to accommodate above changes
in the data type of timestamps to unsigned int.

v0.90 on disk meta data uses u32 for maintaining time stamps.
So this will also last until year 2106.
Assumption is that the usage of v0.90 will be deprecated by
year 2106.

Timestamp fields in the on disk meta data for v1.0 version already
use 64 bit data types. Remove the truncation of the bits while
writing to or reading from these from the disk.
Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NNeilBrown <neilb@suse.com>

9ebc6ef1

md: avoid warning for 32-bit sector_t · 3312c951

由 Arnd Bergmann 提交于 12月 21, 2015

When CONFIG_LBDAF is not set, sector_t is only 32-bits wide, which
means we cannot have devices with more than 2TB, and the code that
is trying to handle compatibility support for large devices in
md version 0.90 is meaningless but also causes a compile-time warning:

drivers/md/md.c: In function 'super_90_load':
drivers/md/md.c:1029:19: warning: large integer implicitly truncated to unsigned type [-Woverflow]
drivers/md/md.c: In function 'super_90_rdev_size_change':
drivers/md/md.c:1323:17: warning: large integer implicitly truncated to unsigned type [-Woverflow]

This adds a check for CONFIG_LBDAF to avoid even getting into this
code path, and also adds an explicit cast to let the compiler know
it doesn't have to warn about the truncation.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NNeilBrown <neilb@suse.com>

3312c951

md: update comment for md_allow_write · abf3508d

由 Guoqing Jiang 提交于 12月 21, 2015

MD_CHANGE_CLEAN had been replaced with MD_CHANGE_PENDING after
commit 070dc6 ("md: resolve confusion of MD_CHANGE_CLEAN"),
so make the change accordingly.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

abf3508d

md-cluster: Defer MD reloading to mddev->thread · 15858fa5

由 Guoqing Jiang 提交于 12月 21, 2015

Reloading of superblock must be performed under reconfig_mutex. However,
this cannot be done with md_reload_sb because it would deadlock with
the message DLM lock. So, we defer it in md_check_recovery() which is
executed by mddev->thread.

This introduces a new flag, MD_RELOAD_SB, which if set, will reload the
superblock. And good_device_nr is also added to 'struct mddev' which is
used to get the num of the good device within cluster raid.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

15858fa5

md-cluster: append some actions when change bitmap from clustered to none · f6a2dc64

由 Guoqing Jiang 提交于 12月 21, 2015

For clustered raid, we need to do extra actions when change
bitmap to none.

1. check if all the bitmap lock could be get or not, if yes then
   we can continue the change since cluster raid is only active
   in current node. Otherwise return fail and unlock the related
   bitmap locks
2. set nodes to 0 and then leave cluster environment.
3. release other nodes's bitmap lock.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

f6a2dc64

md-cluster: Allow spare devices to be marked as faulty · 09afd2a8

由 Goldwyn Rodrigues 提交于 12月 21, 2015

If a spare device was marked faulty, it would not be reflected
in receiving nodes because it would mark it as activated and continue.
Continue the operation, so it may be set as faulty.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

09afd2a8

md-cluster: Fix the remove sequence with the new MD reload code · 54a88392

由 Goldwyn Rodrigues 提交于 12月 21, 2015

The remove disk message does not need metadata_update_start(), but
can be an independent message.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

54a88392

md-cluster: remove a disk asynchronously from cluster environment · 659b254f

由 Guoqing Jiang 提交于 12月 21, 2015

For cluster raid, if one disk couldn't be reach in one node, then
other nodes would receive the REMOVE message for the disk.

In receiving node, we can't call md_kick_rdev_from_array to remove
the disk from array synchronously since the disk might still be busy
in this node. So let's set a ClusterRemove flag on the disk, then
let the thread to do the removal job eventually.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

659b254f

21 12月, 2015 1 次提交

md: remove check for MD_RECOVERY_NEEDED in action_store. · 312045ee

由 NeilBrown 提交于 12月 21, 2015

md currently doesn't allow a 'sync_action' such as 'reshape' to be set
while MD_RECOVERY_NEEDED is set.

This s a problem, particularly since commit 738a2738 as that can
cause ->check_shape to call mddev_resume() which sets
MD_RECOVERY_NEEDED.  So by the time we come to start 'reshape' it is
very likely that MD_RECOVERY_NEEDED is still set.

Testing for this flag is not really needed and is in any case very
racy as it can be set at any moment - asynchronously.  Any race
between setting a sync_action and setting MD_RECOVERY_NEEDED must
already be handled properly in some locked code, probably
md_check_recovery(), so remove the test here.

The test on MD_RECOVERY_RUNNING is also racy in the 'reshape' case
so we should test it again after getting mddev_lock().

As this fixes a race and a regression which can cause 'reshape' to
fail, it is suitable for -stable kernels since 4.1
Reported-by: NXiao Ni <xni@redhat.com>
Fixes: 738a2738 ("md/raid5: fix allocation of 'scribble' array.")
Cc: stable@vger.kernel.org (v4.1+)
Signed-off-by: NNeilBrown <neilb@suse.com>

312045ee

18 12月, 2015 3 次提交

Fix remove_and_add_spares removes drive added as spare in slot_store · cb01c549

由 Goldwyn Rodrigues 提交于 12月 18, 2015

Commit 2910ff17
introduced a regression which would remove a recently added spare via
slot_store. Revert part of the patch which touches slot_store() and add
the disk directly using pers->hot_add_disk()

Fixes: 2910ff17 ("md: remove_and_add_spares() to activate specific
rdev")
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NPawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

cb01c549

md: fix bug due to nested suspend · 0dc10e50

由 Mikulas Patocka 提交于 12月 18, 2015

The patch c7bfced9 committed to 4.4-rc
causes crash in LVM test shell/lvchange-raid.sh. The kernel crashes with
this BUG, the reason is that we attempt to suspend a device that is
already suspended. See also
https://bugzilla.redhat.com/show_bug.cgi?id=1283491

This patch fixes the bug by changing functions mddev_suspend and
mddev_resume to always nest.
The number of nested calls to mddev_nested_suspend is kept in the
variable mddev->suspended.
[neilb: made mddev_suspend() always nest instead of introduce mddev_nested_suspend]

kernel BUG at drivers/md/md.c:317!
CPU: 3 PID: 32754 Comm: lvm Not tainted 4.4.0-rc2 #1
task: 0000000047076040 ti: 0000000047014000 task.ti: 0000000047014000

     YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00001000000001000000000000001111 Not tainted
r00-03  000000000804000f 00000000102c5280 0000000010c7522c 000000007e3d1810
r04-07  0000000010c6f000 000000004ef37f20 000000007e3d1dd0 000000007e3d1810
r08-11  000000007c9f1600 0000000000000000 0000000000000001 ffffffffffffffff
r12-15  0000000010c1d000 0000000000000041 00000000f98d63c8 00000000f98e49e4
r16-19  00000000f98e49e4 00000000c138fd06 00000000f98d63c8 0000000000000001
r20-23  0000000000000002 000000004ef37f00 00000000000000b0 00000000000001d1
r24-27  00000000424783a0 000000007e3d1dd0 000000007e3d1810 00000000102b2000
r28-31  0000000000000001 0000000047014840 0000000047014930 0000000000000001
sr00-03  0000000007040800 0000000000000000 0000000000000000 0000000007040800
sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000

IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000102c538c 00000000102c5390
 IIR: 03ffe01f    ISR: 0000000000000000  IOR: 00000000102b2748
 CPU:        3   CR30: 0000000047014000 CR31: 0000000000000000
 ORIG_R28: 00000000000000b0
 IAOQ[0]: mddev_suspend+0x10c/0x160 [md_mod]
 IAOQ[1]: mddev_suspend+0x110/0x160 [md_mod]
 RP(r2): raid1_add_disk+0xd4/0x2c0 [raid1]
Backtrace:
 [<0000000010c7522c>] raid1_add_disk+0xd4/0x2c0 [raid1]
 [<0000000010c20078>] raid_resume+0x390/0x418 [dm_raid]
 [<00000000105833e8>] dm_table_resume_targets+0xc0/0x188 [dm_mod]
 [<000000001057f784>] dm_resume+0x144/0x1e0 [dm_mod]
 [<0000000010587dd4>] dev_suspend+0x1e4/0x568 [dm_mod]
 [<0000000010589278>] ctl_ioctl+0x1e8/0x428 [dm_mod]
 [<0000000010589518>] dm_compat_ctl_ioctl+0x18/0x68 [dm_mod]
 [<0000000040377b88>] compat_SyS_ioctl+0xd0/0x1558

Fixes: c7bfced9 ("md: suspend i/o during runtime blk_integrity_unregister")
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

0dc10e50

MD: change journal disk role to disk 0 · 9b15603d

由 Shaohua Li 提交于 12月 18, 2015

Neil pointed out setting journal disk role to raid_disks will confuse
reshape if we support reshape eventually. Switching the role to 0 (we
should be fine as long as the value >=0) and skip sysfs file creation to
avoid error.
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

9b15603d

08 11月, 2015 1 次提交

block: change ->make_request_fn() and users to return a queue cookie · dece1635

由 Jens Axboe 提交于 11月 05, 2015

No functional changes in this patch, but it prepares us for returning
a more useful cookie related to the IO that was queued up.
Signed-off-by: NJens Axboe <axboe@fb.com>
Acked-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NKeith Busch <keith.busch@intel.com>

dece1635

01 11月, 2015 8 次提交

MD: when RAID journal is missing/faulty, block RESTART_ARRAY_RW · 339421de

由 Song Liu 提交于 10月 08, 2015

When RAID-4/5/6 array suffers from missing journal device, we put
the array in read only state. We should not allow trasition to
read-write states (clean and active) before replacing journal device.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

339421de

MD: set journal disk ->raid_disk · f2076e7d

由 Shaohua Li 提交于 10月 08, 2015

Set journal disk ->raid_disk to >=0, I choose raid_disks + 1 instead of
0, because we already have a disk with ->raid_disk 0 and this causes
sysfs entry creation conflict. A lot of places assumes disk with
->raid_disk >=0 is normal raid disk, so we add check for journal disk.
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

f2076e7d

MD: kick out journal disk if it's not fresh · a3dfbdaa

由 Song Liu 提交于 10月 08, 2015

When journal disk is faulty and we are reassemabling the raid array, the
journal disk is old. We don't allow the journal disk added to the raid
array. Since journal disk is missing in the array, the raid5 will mark
the array readonly.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

a3dfbdaa

MD: add new bit to indicate raid array with journal · a97b7896

由 Song Liu 提交于 10月 08, 2015

If a raid array has journal feature bit set, add a new bit to indicate
this. If the array is started without journal disk existing, we know
there is something wrong.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

a97b7896

MD: fix info output for journal disk · 9efdca16

由 Shaohua Li 提交于 10月 12, 2015

journal disk can be faulty. The Journal and Faulty aren't exclusive with
each other.
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

9efdca16

md: show journal for journal disk in disk state sysfs · ac6096e9

由 Shaohua Li 提交于 10月 04, 2015

Journal disk state sysfs entry should indicate it's journal
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

ac6096e9

skip match_mddev_units check for special roles · 0b020e85

由 Song Liu 提交于 9月 03, 2015

match_mddev_units is used to check whether 2 RAID arrays share
same disk(s). Arrays that share disk(s) will not do resync at the
same time for better performance (fewer HDD seek). However, this
check should not apply to Spare, Faulty, and Journal disks, as
they do not paticipate in resync.

In this patch, match_mddev_units skips check for disks with flag
"Faulty" or "Journal" or raid_disk < 0.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

0b020e85

md: skip resync for raid array with journal · bd18f646

由 Shaohua Li 提交于 9月 02, 2015

If a raid array has journal, the journal can guarantee the consistency,
we can skip resync after a unclean shutdown. The exception is raid
creation or user initiated resync, which we still do a raid resync.
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

bd18f646

31 10月, 2015 1 次提交

Revert "md: allow a partially recovered device to be hot-added to an array." · d01552a7

由 NeilBrown 提交于 10月 31, 2015

This reverts commit 7eb41885.

This commit is poorly justified, I can find not discusison in email,
and it clearly causes a problem.

If a device which is being recovered fails and is subsequently
re-added to an array, there could easily have been changes to the
array *before* the point where the recovery was up to.  So the
recovery must start again from the beginning.

If a spare is being recovered and fails, then when it is re-added we
really should do a bitmap-based recovery up to the recovery-offset,
and then a full recovery from there.  Before this reversion, we only
did the "full recovery from there" which is not corect.  After this
reversion with will do a full recovery from the start, which is safer
but not ideal.

It will be left to a future patch to arrange the two different styles
of recovery.
Reported-and-tested-by: NNate Dailey <nate.dailey@stratus.com>
Signed-off-by: NNeilBrown <neilb@suse.com>
Cc: stable@vger.kernel.org (3.14+)
Fixes: 7eb41885 ("md: allow a partially recovered device to be hot-added to an array.")

d01552a7

24 10月, 2015 4 次提交

md: override md superblock recovery_offset for journal device · 3069aa8d

由 Shaohua Li 提交于 8月 13, 2015

Journal device stores data in a log structure. We need record the log
start. Here we override md superblock recovery_offset for this purpose.
This field of a journal device is meaningless otherwise.
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

3069aa8d

MD: add a new disk role to present write journal device · bac624f3

由 Song Liu 提交于 8月 13, 2015

Next patches will use a disk as raid5/6 journaling. We need a new disk
role to present the journal device and add MD_FEATURE_JOURNAL to
feature_map for backward compability.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

bac624f3

MD: replace special disk roles with macros · c4d4c91b

由 Song Liu 提交于 8月 13, 2015

Add the following two macros for special roles: spare and faulty

MD_DISK_ROLE_SPARE	0xffff
MD_DISK_ROLE_FAULTY	0xfffe

Add MD_DISK_ROLE_MAX	0xff00 as the maximal possible regular role,
and minimal value of special role.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

c4d4c91b

md-cluster: Call update_raid_disks() if another node --grow's raid_disks · 28c1b9fd

由 Goldwyn Rodrigues 提交于 10月 22, 2015

To incorporate --grow feature executed on one node, other nodes need to
acknowledge the change in number of disks. Call update_raid_disks()
to update internal data structures.

This leads to call check_reshape() -> md_allow_write() -> md_update_sb(),
this results in a deadlock. This is done so it can safely allocate memory
(which might trigger writeback which might write to raid1). This is
not required for md with a bitmap.

In the clustered case, we don't perform md_update_sb() in md_allow_write(),
but in do_md_run(). Also we disable safemode for clustered mode.

mddev->recovery_cp need not be set in check_sb_changes() because this
is required only when a node reads another node's bitmap. mddev->recovery_cp
(which is read from sb->resync_offset), is set only if mddev is in_sync.
Since we disabled safemode, in_sync is set to zero.
In a clustered environment, the MD may not be in sync because another
node could be writing to it. So make sure that in_sync is not set in
case of clustered node in __md_stop_writes().
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

28c1b9fd

22 10月, 2015 3 次提交

md: suspend i/o during runtime blk_integrity_unregister · c7bfced9

由 Dan Williams 提交于 10月 21, 2015

Synchronize pending i/o against a change in the integrity profile to
avoid the possibility of spurious integrity errors.  Given linear_add()
is suspending the mddev before manipulating the mddev, do the same for
the other personalities.
Acked-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

c7bfced9

md, dm, scsi, nvme, libnvdimm: drop blk_integrity_unregister() at shutdown · 9609b994

由 Dan Williams 提交于 10月 21, 2015

Now that the integrity profile is statically allocated there is no work
to do when shutting down an integrity enabled block device.

Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: James Bottomley <JBottomley@Odin.com>
Acked-by: NNeilBrown <neilb@suse.com>
Acked-by: NKeith Busch <keith.busch@intel.com>
Acked-by: NVishal Verma <vishal.l.verma@intel.com>
Tested-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

9609b994

block: Inline blk_integrity in struct gendisk · 25520d55

由 Martin K. Petersen 提交于 10月 21, 2015

Up until now the_integrity profile has been dynamically allocated and
attached to struct gendisk after the disk has been made active.

This causes problems because NVMe devices need to register the profile
prior to the partition table being read due to a mandatory metadata
buffer requirement. In addition, DM goes through hoops to deal with
preallocating, but not initializing integrity profiles.

Since the integrity profile is small (4 bytes + a pointer), Christoph
suggested moving it to struct gendisk proper. This requires several
changes:

 - Moving the blk_integrity definition to genhd.h.

 - Inlining blk_integrity in struct gendisk.

 - Removing the dynamic allocation code.

 - Adding helper functions which allow gendisk to set up and tear down
   the integrity sysfs dir when a disk is added/deleted.

 - Adding a blk_integrity_revalidate() callback for updating the stable
   pages bdi setting.

 - The calls that depend on whether a device has an integrity profile or
   not now key off of the bi->profile pointer.

 - Simplifying the integrity support routines in DM (Mike Snitzer).
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reported-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

25520d55

13 10月, 2015 2 次提交

md: check the return value for metadata_update_start · 23b63f9f

由 Guoqing Jiang 提交于 10月 12, 2015

We shouldn't run related funs of md_cluster_ops in case
metadata_update_start returned failure.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>

23b63f9f

md-cluster: only call kick_rdev_from_array after remove disk successfully · a9720903

由 Guoqing Jiang 提交于 10月 12, 2015

For cluster raid, we should not kick it from array if the disk can't be
remove from array successfully.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>

a9720903

12 10月, 2015 3 次提交

md-cluster: Fix adding of new disk with new reload code · dbb64f86

由 Goldwyn Rodrigues 提交于 10月 01, 2015

Adding the disk worked incorrectly with the new reload code. Fix it:

 - No operation should be performed on rdev marked as Candidate
 - After a metadata update operation, kick disk if role is 0xfffe
   else clear Candidate bit and continue with the regular change check.
 - Saving the mode of the lock resource to check if token lock is already
   locked, because it can be called twice while adding a disk. However,
   unlock_comm() must be called only once.
 - add_new_disk() is called by the node initiating the --add operation.
   If it needs to be canceled, call add_new_disk_cancel(). The operation
   is completed by md_update_sb() which will write and unlock the
   communication.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>

dbb64f86

md-cluster: Perform resync/recovery under a DLM lock · c186b128

由 Goldwyn Rodrigues 提交于 9月 30, 2015

Resync or recovery must be performed by only one node at a time.
A DLM lock resource, resync_lockres provides the mutual exclusion
so that only one node performs the recovery/resync at a time.

If a node is unable to get the resync_lockres, because recovery is
being performed by another node, it set MD_RECOVER_NEEDED so as
to schedule recovery in the future.

Remove the debug message in resync_info_update()
used during development.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>

c186b128

md-cluster: Perform a lazy update · 2aa82191

由 Goldwyn Rodrigues 提交于 9月 28, 2015

In a clustered environment, a change such as marking a device faulty,
can be recorded by any of the nodes. This is communicated to all the
nodes and re-recording such a change is unnecessary, and quite often
pretty disruptive.

With this patch, just before the update, we detect for the changes
and if the changes are already in superblock, we abort the update
after clearing all the flags
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>

2aa82191

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功