提交 · 5d8817833c7609c24da9a92f71c53caa9c1424eb · openeuler / Kernel

29 7月, 2016 1 次提交

MD: fix null pointer deference · 5d881783

由 Shaohua Li 提交于 7月 28, 2016

The md device might not have personality (for example, ddf raid array). The
issue is introduced by 8430e7e0(md: disconnect device from personality
before trying to remove it)
Reported-by: Nkernel test robot <xiaolong.ye@intel.com>
Signed-off-by: NShaohua Li <shli@fb.com>

5d881783

20 7月, 2016 3 次提交

md: add missing sysfs_notify on array_state update · 573275b5

由 Tomasz Majchrzak 提交于 6月 30, 2016

Changeset 6791875e has added early return from a function so there is no
sysfs notification for 'active' and 'clean' state change.
Signed-off-by: NTomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: NShaohua Li <shli@fb.com>

573275b5

Fix kernel module refcount handling · 4cb9da7d

由 Alexey Obitotskiy 提交于 6月 23, 2016

md loads raidX modules and increments module refcount each time level
has changed but does not decrement it. You are unable to unload raid0
module after reshape because raid0 reshape changes level to raid4
and back to raid0.
Signed-off-by: NAleksey Obitotskiy <aleksey.obitotskiy@intel.com>
Signed-off-by: NShaohua Li <shli@fb.com>

4cb9da7d

md: use seconds granularity for error logging · 0e3ef49e

由 Arnd Bergmann 提交于 6月 17, 2016

The md code stores the exact time of the last error in the
last_read_error variable using a timespec structure. It only
ever uses the seconds portion of that though, so we can
use a scalar for it.

There won't be an overflow in 2038 here, because it already
used monotonic time and 32-bit is enough for that, but I've
decided to use time64_t for consistency in the conversion.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NShaohua Li <shli@fb.com>

0e3ef49e

14 6月, 2016 3 次提交

md: reduce the number of synchronize_rcu() calls when multiple devices fail. · d787be40

由 NeilBrown 提交于 6月 02, 2016

Every time a device is removed with ->hot_remove_disk() a synchronize_rcu() call is made
which can delay several milliseconds in some case.
If lots of devices fail at once - as could happen with a large RAID10 where one set
of devices are removed all at once - these delays can add up to be very inconcenient.

As failure is not reversible we can check for that first, setting a
separate flag if it is found, and then all synchronize_rcu() once for
all the flagged devices.  Then ->hot_remove_disk() function can skip the
synchronize_rcu() step if the flag is set.

fix build error(Shaohua)
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

d787be40

md: disconnect device from personality before trying to remove it. · 8430e7e0

由 NeilBrown 提交于 6月 02, 2016

When the HOT_REMOVE_DISK ioctl is used to remove a device, we
call remove_and_add_spares() which will remove it from the personality
if possible.  This improves the chances that the removal will succeed.

When writing "remove" to dev-XX/state, we don't.  So that can fail more easily.

So add the remove_and_add_spares() into "remove" handling.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

8430e7e0

MD:Update superblock when err == 0 in size_store · 4ba1e788

由 Xiao Ni 提交于 6月 12, 2016

This is a simple check before updating the superblock. It should update
the superblock when update_size return 0.
Signed-off-by: NXiao Ni <xni@redhat.com>
Signed-off-by: NShaohua Li <shli@fb.com>

4ba1e788

10 6月, 2016 1 次提交

md: use a mutex to protect a global list · 5b1f5bc3

由 Cong Wang 提交于 6月 08, 2016

We saw a list corruption in the list all_detected_devices:

 WARNING: CPU: 16 PID: 226 at lib/list_debug.c:29 __list_add+0x3c/0xa9()
 list_add corruption. next->prev should be prev (ffff880859d58320), but was ffff880859ce74c0. (next=ffffffff81abfdb0).
 Modules linked in: ahci libahci libata sd_mod scsi_mod
 CPU: 16 PID: 226 Comm: kworker/u241:4 Not tainted 4.1.20 #1
 Hardware name: Dell Inc. PowerEdge C6220/04GD66, BIOS 2.2.3 11/07/2013
 Workqueue: events_unbound async_run_entry_fn
  0000000000000000 ffff880859a5baf8 ffffffff81502872 ffff880859a5bb48
  0000000000000009 ffff880859a5bb38 ffffffff810692a5 ffff880859ee8828
  ffffffff812ad02c ffff880859d58320 ffffffff81abfdb0 ffff880859eb90c0
 Call Trace:
  [<ffffffff81502872>] dump_stack+0x4d/0x63
  [<ffffffff810692a5>] warn_slowpath_common+0xa1/0xbb
  [<ffffffff812ad02c>] ? __list_add+0x3c/0xa9
  [<ffffffff81069305>] warn_slowpath_fmt+0x46/0x48
  [<ffffffff812ad02c>] __list_add+0x3c/0xa9
  [<ffffffff81406f28>] md_autodetect_dev+0x41/0x62
  [<ffffffff81285862>] rescan_partitions+0x25f/0x29d
  [<ffffffff81506372>] ? mutex_lock+0x13/0x31
  [<ffffffff811a090f>] __blkdev_get+0x1aa/0x3cd
  [<ffffffff811a0b91>] blkdev_get+0x5f/0x294
  [<ffffffff81377ceb>] ? put_device+0x17/0x19
  [<ffffffff8128227c>] ? disk_put_part+0x12/0x14
  [<ffffffff812836f3>] add_disk+0x29d/0x407
  [<ffffffff81384345>] ? __pm_runtime_use_autosuspend+0x5c/0x64
  [<ffffffffa004a724>] sd_probe_async+0x115/0x1af [sd_mod]
  [<ffffffff81083177>] async_run_entry_fn+0x72/0x12c
  [<ffffffff8107c44c>] process_one_work+0x198/0x2ce
  [<ffffffff8107cac7>] worker_thread+0x1dd/0x2bb
  [<ffffffff8107c8ea>] ? cancel_delayed_work_sync+0x15/0x15
  [<ffffffff8107c8ea>] ? cancel_delayed_work_sync+0x15/0x15
  [<ffffffff81080d9c>] kthread+0xae/0xb6
  [<ffffffff81080000>] ? param_array_set+0x40/0xfa
  [<ffffffff81080cee>] ? __kthread_parkme+0x61/0x61
  [<ffffffff81508152>] ret_from_fork+0x42/0x70
  [<ffffffff81080cee>] ? __kthread_parkme+0x61/0x61

I suspect it is because there is no lock protecting this
global list, autostart_arrays() is called in ioctl() path
where there is no lock.

Cc: Shaohua Li <shli@kernel.org>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NShaohua Li <shli@fb.com>

5b1f5bc3

04 6月, 2016 2 次提交

G
md: simplify the code with md_kick_rdev_from_array · db767672
由 Guoqing Jiang 提交于 6月 02, 2016
```
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>
```
db767672

md-cluster: fix deadlock issue when add disk to an recoverying array · bb8bf15b

由 Guoqing Jiang 提交于 6月 02, 2016

Add a disk to an array which is performing recovery
is a little complicated, we need to do both reap the
sync thread and perform add disk for the case, then
it caused deadlock as follows.

linux44:~ # ps aux|grep md|grep D
root      1822  0.0  0.0      0     0 ?        D    16:50   0:00 [md127_resync]
root      1848  0.0  0.0  19860   952 pts/0    D+   16:50   0:00 mdadm --manage /dev/md127 --re-add /dev/vdb
linux44:~ # cat /proc/1848/stack
[<ffffffff8107afde>] kthread_stop+0x6e/0x120
[<ffffffffa051ddb0>] md_unregister_thread+0x40/0x80 [md_mod]
[<ffffffffa0526e45>] md_reap_sync_thread+0x15/0x150 [md_mod]
[<ffffffffa05271e0>] action_store+0x260/0x270 [md_mod]
[<ffffffffa05206b4>] md_attr_store+0xb4/0x100 [md_mod]
[<ffffffff81214a7e>] sysfs_write_file+0xbe/0x140
[<ffffffff811a6b98>] vfs_write+0xb8/0x1e0
[<ffffffff811a75b8>] SyS_write+0x48/0xa0
[<ffffffff8152a5c9>] system_call_fastpath+0x16/0x1b
[<00007f068ea1ed30>] 0x7f068ea1ed30
linux44:~ # cat /proc/1822/stack
[<ffffffffa05251a6>] md_do_sync+0x846/0xf40 [md_mod]
[<ffffffffa052402d>] md_thread+0x16d/0x180 [md_mod]
[<ffffffff8107ad94>] kthread+0xb4/0xc0
[<ffffffff8152a518>] ret_from_fork+0x58/0x90

                        Task1848                                Task1822
md_attr_store (held reconfig_mutex by call mddev_lock())
                        action_store
			md_reap_sync_thread
			md_unregister_thread
			kthread_stop                    md_wakeup_thread(mddev->thread);
						wait_event(mddev->sb_wait, !test_bit(MD_CHANGE_PENDING))

md_check_recovery is triggered by wakeup mddev->thread,
but it can't clear MD_CHANGE_PENDING flag since it can't
get lock which was held by md_attr_store already.

To solve the deadlock problem, we move "->resync_finish()"
from md_do_sync to md_reap_sync_thread (after md_update_sb),
also MD_HELD_RESYNC_LOCK is introduced since it is possible
that node can't get resync lock in md_do_sync.

Then we do not need to wait for MD_CHANGE_PENDING is cleared
or not since metadata should be updated after md_update_sb,
so just call resync_finish if MD_HELD_RESYNC_LOCK is set.

We also unified the code after skip label, since set PENDING
for non-clustered case should be harmless.
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

bb8bf15b

10 5月, 2016 2 次提交

md: set MD_CHANGE_PENDING in a atomic region · 85ad1d13

由 Guoqing Jiang 提交于 5月 03, 2016

Some code waits for a metadata update by:

1. flagging that it is needed (MD_CHANGE_DEVS or MD_CHANGE_CLEAN)
2. setting MD_CHANGE_PENDING and waking the management thread
3. waiting for MD_CHANGE_PENDING to be cleared

If the first two are done without locking, the code in md_update_sb()
which checks if it needs to repeat might test if an update is needed
before step 1, then clear MD_CHANGE_PENDING after step 2, resulting
in the wait returning early.

So make sure all places that set MD_CHANGE_PENDING are atomicial, and
bit_clear_unless (suggested by Neil) is introduced for the purpose.

Cc: Martin Kepplinger <martink@posteo.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: <linux-kernel@vger.kernel.org>
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

85ad1d13

md: md.c: fix oops in mddev_suspend for raid0 · 092398dc

由 Heinz Mauelshagen 提交于 5月 03, 2016

Introduced by upstream commit 70d9798b

The raid0 personality does not create mddev->thread as oposed to
other personalities leading to its unconditional access in
mddev_suspend() causing an oops.

Patch checks for mddev->thread in order to keep the
intention of aforementioned commit.

Fixes: 70d9798b ("MD: warn for potential deadlock")
Cc: stable@vger.kernel.org (4.5+)
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NShaohua Li <shli@fb.com>

092398dc

05 5月, 2016 4 次提交

md-cluster: wakeup thread if activated a spare disk · a578183e

由 Guoqing Jiang 提交于 5月 02, 2016

When a device is re-added, it will ultimately need
to be activated and that happens in md_check_recovery,
so we need to set MD_RECOVERY_NEEDED right after
remove_and_add_spares.

A specifical issue without the change is that when
one node perform fail/remove/readd on a disk, but
slave nodes could not add the disk back to array as
expected (added as missed instead of in sync). So
give slave nodes a chance to do resync.
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

a578183e

md-cluster: change array_sectors and update size are not supported · ab5a98b1

由 Guoqing Jiang 提交于 5月 02, 2016

Currently, some features are not supported yet,
such as change array_sectors and update size, so
return EINVAL for them and listed it in document.
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

ab5a98b1

md-cluser: make resync_finish only called after pers->sync_request · 2c97cf13

由 Guoqing Jiang 提交于 5月 02, 2016

It is not reasonable that cluster raid to release resync
lock before the last pers->sync_request has finished.

As the metadata will be changed when node performs resync,
we need to inform other nodes to update metadata, so the
MD_CHANGE_PENDING flag is set before finish resync.

Then metadata_update_finish is move ahead to ensure that
METADATA_UPDATED msg is sent before finish resync, and
metadata_update_start need to be run after "repeat:" label
accordingly.
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

2c97cf13

md-cluster: change resync lock from asynchronous to synchronous · 41a9a0dc

由 Guoqing Jiang 提交于 5月 02, 2016

If multiple nodes choose to attempt do resync at the same time
they need to be serialized so they don't duplicate effort. This
serialization is done by locking the 'resync' DLM lock.

Currently if a node cannot get the lock immediately it doesn't
request notification when the lock becomes available (i.e.
DLM_LKF_NOQUEUE is set), so it may not reliably find out when it
is safe to try again.

Rather than trying to arrange an async wake-up when the lock
becomes available, switch to using synchronous locking - this is
a lot easier to think about.  As it is not permitted to block in
the 'raid1d' thread, move the locking to the resync thread.  So
the rsync thread is forked immediately, but it blocks until the
resync lock is available. Once the lock is locked it checks again
if any resync action is needed.

A particular symptom of the current problem is that a node can
get stuck with "resync=pending" indefinitely.
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

41a9a0dc

26 4月, 2016 1 次提交

MD: make bio mergeable · 9c573de3

由 Shaohua Li 提交于 4月 25, 2016

blk_queue_split marks bio unmergeable, which makes sense for normal bio.
But if dispatching the bio to underlayer disk, the blk_queue_split
checks are invalid, hence it's possible the bio becomes mergeable.

In the reported bug, this bug causes trim against raid0 performance slash
https://bugzilla.kernel.org/show_bug.cgi?id=117051Reported-and-tested-by: NPark Ju Hyung <qkrwngud825@gmail.com>
Fixes: 6ac45aeb(block: avoid to merge splitted bio)
Cc: stable@vger.kernel.org (v4.3+)
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Neil Brown <neilb@suse.de>
Reviewed-by: NJens Axboe <axboe@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>

9c573de3

13 4月, 2016 1 次提交
- J
  md: update to using blk_queue_write_cache() · 56883a7e
  由 Jens Axboe 提交于 3月 30, 2016
```
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
```
  56883a7e
01 4月, 2016 2 次提交

MD: add rdev reference for super write · ed3b98c7

由 Shaohua Li 提交于 3月 29, 2016

Xiao Ni reported below crash:
[26396.335146] BUG: unable to handle kernel NULL pointer dereference at 00000000000002a8
[26396.342990] IP: [<ffffffffa0425b00>] super_written+0x20/0x80 [md_mod]
[26396.349449] PGD 0
[26396.351468] Oops: 0002 [#1] SMP
[26396.354898] Modules linked in: ext4 mbcache jbd2 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_td
[26396.408404] CPU: 5 PID: 3261 Comm: loop0 Not tainted 4.5.0 #1
[26396.414140] Hardware name: Dell Inc. PowerEdge R715/0G2DP3, BIOS 3.2.2 09/15/2014
[26396.421608] task: ffff8808339be680 ti: ffff8808365f4000 task.ti: ffff8808365f4000
[26396.429074] RIP: 0010:[<ffffffffa0425b00>]  [<ffffffffa0425b00>] super_written+0x20/0x80 [md_mod]
[26396.437952] RSP: 0018:ffff8808365f7c38  EFLAGS: 00010046
[26396.443252] RAX: ffffffffa0425ae0 RBX: ffff8804336a7900 RCX: ffffe8f9f7b41198
[26396.450371] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8804336a7900
[26396.457489] RBP: ffff8808365f7c50 R08: 0000000000000005 R09: 00001801e02ce3d7
[26396.464608] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[26396.471728] R13: ffff8808338d9a00 R14: 0000000000000000 R15: ffff880833f9fe00
[26396.478849] FS:  00007f9e5066d740(0000) GS:ffff880237b40000(0000) knlGS:0000000000000000
[26396.486922] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[26396.492656] CR2: 00000000000002a8 CR3: 00000000019ea000 CR4: 00000000000006e0
[26396.499775] Stack:
[26396.501781]  ffff8804336a7900 0000000000000000 0000000000000000 ffff8808365f7c68
[26396.509199]  ffffffff81308cd0 ffff8804336a7900 ffff8808365f7ca8 ffffffff81310637
[26396.516618]  00000000a0233a00 ffff880833f9fe00 0000000000000000 ffff880833fb0000
[26396.524038] Call Trace:
[26396.526485]  [<ffffffff81308cd0>] bio_endio+0x40/0x60
[26396.531529]  [<ffffffff81310637>] blk_update_request+0x87/0x320
[26396.537439]  [<ffffffff8131a20a>] blk_mq_end_request+0x1a/0x70
[26396.543261]  [<ffffffff81313889>] blk_flush_complete_seq+0xd9/0x2a0
[26396.549517]  [<ffffffff81313ccf>] flush_end_io+0x15f/0x240
[26396.554993]  [<ffffffff8131a22a>] blk_mq_end_request+0x3a/0x70
[26396.560815]  [<ffffffff8131a314>] __blk_mq_complete_request+0xb4/0xe0
[26396.567246]  [<ffffffff8131a35c>] blk_mq_complete_request+0x1c/0x20
[26396.573506]  [<ffffffffa04182df>] loop_queue_work+0x6f/0x72c [loop]
[26396.579764]  [<ffffffff81697844>] ? __schedule+0x2b4/0x8f0
[26396.585242]  [<ffffffff810a7812>] kthread_worker_fn+0x52/0x170
[26396.591065]  [<ffffffff810a77c0>] ? kthread_create_on_node+0x1a0/0x1a0
[26396.597582]  [<ffffffff810a7238>] kthread+0xd8/0xf0
[26396.602453]  [<ffffffff810a7160>] ? kthread_park+0x60/0x60
[26396.607929]  [<ffffffff8169bdcf>] ret_from_fork+0x3f/0x70
[26396.613319]  [<ffffffff810a7160>] ? kthread_park+0x60/0x60

md_super_write() and corresponding md_super_wait() generally are called
with reconfig_mutex locked, which prevents disk disappears. There is one
case this rule is broken. write_sb_page of bitmap.c doesn't hold the
mutex. next_active_rdev does increase rdev reference, but it decreases
the reference too early (eg, before IO finish). disk can disappear at
the window. We unconditionally increase rdev reference in
md_super_write() to avoid the race.
Reported-and-tested-by: NXiao Ni <xni@redhat.com>
Reviewed-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NShaohua Li <shli@fb.com>

ed3b98c7

md: fix a trivial typo in comments · 466ad292

由 Wei Fang 提交于 3月 21, 2016

Fix a trivial typo in md_ioctl().
Signed-off-by: NWei Fang <fangwei1@huawei.com>
Signed-off-by: NShaohua Li <shli@fb.com>

466ad292

27 2月, 2016 2 次提交

MD: warn for potential deadlock · 70d9798b

由 Shaohua Li 提交于 2月 24, 2016

The personality thread shouldn't call mddev_suspend(). Because
mddev_suspend() will for all IO finish, but IO is handled in personality
thread, so this could cause deadlock. To trigger this early, add a
warning if mddev_suspend() is called from personality thread.
Suggested-by: NNeilBrown <neilb@suse.com>
Cc: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NShaohua Li <shli@fb.com>

70d9798b

md: Drop sending a change uevent when stopping · 399146b8

由 Sebastian Parschauer 提交于 2月 17, 2016

When stopping an MD device, then its device node /dev/mdX may still
exist afterwards or it is recreated by udev. The next open() call
can lead to creation of an inoperable MD device. The reason for
this is that a change event (KOBJ_CHANGE) is sent to udev which
races against the remove event (KOBJ_REMOVE) from md_free().
So drop sending the change event.

A change is likely also required in mdadm as many versions send the
change event to udev as well.

Neil mentioned the change event is a workaround for old kernel
Commit: 934d9c23 ("md: destroy partitions and notify udev when md array is stopped.")
new mdadm can handle device remove now, so this isn't required any more.

Cc: NeilBrown <neilb@suse.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NSebastian Parschauer <sebastian.riemer@profitbricks.com>
Signed-off-by: NShaohua Li <shli@fb.com>

399146b8

14 1月, 2016 3 次提交

md/raid: only permit hot-add of compatible integrity profiles · 1501efad

由 Dan Williams 提交于 1月 13, 2016

It is not safe for an integrity profile to be changed while i/o is
in-flight in the queue.  Prevent adding new disks or otherwise online
spares to an array if the device has an incompatible integrity profile.

The original change to the blk_integrity_unregister implementation in
md, commmit c7bfced9 "md: suspend i/o during runtime
blk_integrity_unregister" introduced an immediate hang regression.

This policy of disallowing changes the integrity profile once one has
been established is shared with DM.

Here is an abbreviated log from a test run that:
1/ Creates a degraded raid1 with an integrity-enabled device (pmem0s) [   59.076127]
2/ Tries to add an integrity-disabled device (pmem1m) [   90.489209]
3/ Retries with an integrity-enabled device (pmem1s) [  205.671277]

[   59.076127] md/raid1:md0: active with 1 out of 2 mirrors
[   59.078302] md: data integrity enabled on md0
[..]
[   90.489209] md0: incompatible integrity profile for pmem1m
[..]
[  205.671277] md: super_written gets error=-5
[  205.677386] md/raid1:md0: Disk failure on pmem1m, disabling device.
[  205.677386] md/raid1:md0: Operation continuing on 1 devices.
[  205.683037] RAID1 conf printout:
[  205.684699]  --- wd:1 rd:2
[  205.685972]  disk 0, wo:0, o:1, dev:pmem0s
[  205.687562]  disk 1, wo:1, o:1, dev:pmem1s
[  205.691717] md: recovery of RAID array md0

Fixes: c7bfced9 ("md: suspend i/o during runtime blk_integrity_unregister")
Cc: <stable@vger.kernel.org>
Cc: Mike Snitzer <snitzer@redhat.com>
Reported-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

1501efad

MD: add journal with array suspended · 87d4d916

由 Shaohua Li 提交于 1月 06, 2016

Hot add journal disk in recovery thread context brings a lot of trouble
as IO could be running. Unlike spare disk hot add, adding journal disk
with array suspended makes more sense and implmentation is much easier.
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

87d4d916

md: set MD_HAS_JOURNAL in correct places · a62ab49e

由 Shaohua Li 提交于 1月 06, 2016

Set MD_HAS_JOURNAL when a array is loaded or journal is initialized.
This is to avoid the flags set too early in journal disk hotadd.
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

a62ab49e

10 1月, 2016 2 次提交

badblocks: rename badblocks_free to badblocks_exit · d3b407fb

由 Dan Williams 提交于 1月 06, 2016

For symmetry with badblocks_init() make it clear that this path only
destroys incremental allocations of a badblocks instance, and does not
free the badblocks instance itself.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

d3b407fb

md: convert to use the generic badblocks code · fc974ee2

由 Vishal Verma 提交于 12月 24, 2015

Retain badblocks as part of rdev, but use the accessor functions from
include/linux/badblocks for all manipulation.
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

fc974ee2

07 1月, 2016 2 次提交

md: Remove 'ready' field from mddev. · 274d8cbd

由 NeilBrown 提交于 1月 04, 2016

This field is always set in tandem with ->pers, and when it is tested
->pers is also tested.  So ->ready is not needed.

It was needed once, but code rearrangement and locking changes have
removed that needed.
Signed-off-by: NNeilBrown <neilb@suse.com>

274d8cbd

md: remove unnecesary md_new_event_inintr · bb9ef716

由 Guoqing Jiang 提交于 12月 28, 2015

md_new_event had removed sysfs_notify since 'commit 72a23c21
("Make sure all changes to md/sync_action are notified.")', so we
can use md_new_event and delete md_new_event_inintr.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

bb9ef716

06 1月, 2016 9 次提交

raid5-cache: add journal hot add/remove support · f6b6ec5c

由 Shaohua Li 提交于 12月 21, 2015

Add support for journal disk hot add/remove. Mostly trival checks in md
part. The raid5 part is a little tricky. For hot-remove, we can't wait
pending write as it's called from raid5d. The wait will cause deadlock.
We simplily fail the hot-remove. A hot-remove retry can success
eventually since if journal disk is faulty all pending write will be
failed and finish. For hot-add, since an array supporting journal but
without journal disk will be marked read-only, we are safe to hot add
journal without stopping IO (should be read IO, while journal only
handles write IO).
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

f6b6ec5c

drivers: md: use ktime_get_real_seconds() · 9ebc6ef1

由 Deepa Dinamani 提交于 12月 21, 2015

get_seconds() API is not y2038 safe on 32 bit systems and the API
is deprecated. Replace it with calls to ktime_get_real_seconds()
API instead. Change mddev structure types to time64_t accordingly.

32 bit signed timestamps will overflow in the year 2038.

Change the user interface mdu_array_info_s structure timestamps:
ctime and utime values used in ioctls GET_ARRAY_INFO and
SET_ARRAY_INFO to unsigned int. This will extend the field to last
until the year 2106.
The long term plan is to get rid of ctime and utime values in
this structure as this information can be read from the on-disk
meta data directly.

Clamp the tim64_t timestamps to positive values with a max of U32_MAX
when returning from GET_ARRAY_INFO ioctl to accommodate above changes
in the data type of timestamps to unsigned int.

v0.90 on disk meta data uses u32 for maintaining time stamps.
So this will also last until year 2106.
Assumption is that the usage of v0.90 will be deprecated by
year 2106.

Timestamp fields in the on disk meta data for v1.0 version already
use 64 bit data types. Remove the truncation of the bits while
writing to or reading from these from the disk.
Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NNeilBrown <neilb@suse.com>

9ebc6ef1

md: avoid warning for 32-bit sector_t · 3312c951

由 Arnd Bergmann 提交于 12月 21, 2015

When CONFIG_LBDAF is not set, sector_t is only 32-bits wide, which
means we cannot have devices with more than 2TB, and the code that
is trying to handle compatibility support for large devices in
md version 0.90 is meaningless but also causes a compile-time warning:

drivers/md/md.c: In function 'super_90_load':
drivers/md/md.c:1029:19: warning: large integer implicitly truncated to unsigned type [-Woverflow]
drivers/md/md.c: In function 'super_90_rdev_size_change':
drivers/md/md.c:1323:17: warning: large integer implicitly truncated to unsigned type [-Woverflow]

This adds a check for CONFIG_LBDAF to avoid even getting into this
code path, and also adds an explicit cast to let the compiler know
it doesn't have to warn about the truncation.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NNeilBrown <neilb@suse.com>

3312c951

md: update comment for md_allow_write · abf3508d

由 Guoqing Jiang 提交于 12月 21, 2015

MD_CHANGE_CLEAN had been replaced with MD_CHANGE_PENDING after
commit 070dc6 ("md: resolve confusion of MD_CHANGE_CLEAN"),
so make the change accordingly.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

abf3508d

md-cluster: Defer MD reloading to mddev->thread · 15858fa5

由 Guoqing Jiang 提交于 12月 21, 2015

Reloading of superblock must be performed under reconfig_mutex. However,
this cannot be done with md_reload_sb because it would deadlock with
the message DLM lock. So, we defer it in md_check_recovery() which is
executed by mddev->thread.

This introduces a new flag, MD_RELOAD_SB, which if set, will reload the
superblock. And good_device_nr is also added to 'struct mddev' which is
used to get the num of the good device within cluster raid.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

15858fa5

md-cluster: append some actions when change bitmap from clustered to none · f6a2dc64

由 Guoqing Jiang 提交于 12月 21, 2015

For clustered raid, we need to do extra actions when change
bitmap to none.

1. check if all the bitmap lock could be get or not, if yes then
   we can continue the change since cluster raid is only active
   in current node. Otherwise return fail and unlock the related
   bitmap locks
2. set nodes to 0 and then leave cluster environment.
3. release other nodes's bitmap lock.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

f6a2dc64

md-cluster: Allow spare devices to be marked as faulty · 09afd2a8

由 Goldwyn Rodrigues 提交于 12月 21, 2015

If a spare device was marked faulty, it would not be reflected
in receiving nodes because it would mark it as activated and continue.
Continue the operation, so it may be set as faulty.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

09afd2a8

md-cluster: Fix the remove sequence with the new MD reload code · 54a88392

由 Goldwyn Rodrigues 提交于 12月 21, 2015

The remove disk message does not need metadata_update_start(), but
can be an independent message.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

54a88392

md-cluster: remove a disk asynchronously from cluster environment · 659b254f

由 Guoqing Jiang 提交于 12月 21, 2015

For cluster raid, if one disk couldn't be reach in one node, then
other nodes would receive the REMOVE message for the disk.

In receiving node, we can't call md_kick_rdev_from_array to remove
the disk from array synchronously since the disk might still be busy
in this node. So let's set a ClusterRemove flag on the disk, then
let the thread to do the removal job eventually.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

659b254f

21 12月, 2015 1 次提交

md: remove check for MD_RECOVERY_NEEDED in action_store. · 312045ee

由 NeilBrown 提交于 12月 21, 2015

md currently doesn't allow a 'sync_action' such as 'reshape' to be set
while MD_RECOVERY_NEEDED is set.

This s a problem, particularly since commit 738a2738 as that can
cause ->check_shape to call mddev_resume() which sets
MD_RECOVERY_NEEDED.  So by the time we come to start 'reshape' it is
very likely that MD_RECOVERY_NEEDED is still set.

Testing for this flag is not really needed and is in any case very
racy as it can be set at any moment - asynchronously.  Any race
between setting a sync_action and setting MD_RECOVERY_NEEDED must
already be handled properly in some locked code, probably
md_check_recovery(), so remove the test here.

The test on MD_RECOVERY_RUNNING is also racy in the 'reshape' case
so we should test it again after getting mddev_lock().

As this fixes a race and a regression which can cause 'reshape' to
fail, it is suitable for -stable kernels since 4.1
Reported-by: NXiao Ni <xni@redhat.com>
Fixes: 738a2738 ("md/raid5: fix allocation of 'scribble' array.")
Cc: stable@vger.kernel.org (v4.1+)
Signed-off-by: NNeilBrown <neilb@suse.com>

312045ee

18 12月, 2015 1 次提交

Fix remove_and_add_spares removes drive added as spare in slot_store · cb01c549

由 Goldwyn Rodrigues 提交于 12月 18, 2015

Commit 2910ff17
introduced a regression which would remove a recently added spare via
slot_store. Revert part of the patch which touches slot_store() and add
the disk directly using pers->hot_add_disk()

Fixes: 2910ff17 ("md: remove_and_add_spares() to activate specific
rdev")
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NPawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

cb01c549

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功