- 08 11月, 2016 1 次提交
-
-
由 Tomasz Majchrzak 提交于
Add new rdev flag which external metadata handler can use to switch on/off bad block support. If new bad block is encountered, notify it via rdev 'unacknowledged_bad_blocks' sysfs file. If bad block has been cleared, notify update to rdev 'bad_blocks' sysfs file. When bad blocks support is being removed, just clear rdev flag. It is not necessary to reset badblocks->shift field. If there are bad blocks cleared or added at the same time, it is ok for those changes to be applied to the structure. The array is in blocked state and the drive which cannot handle bad blocks any more will be removed from the array before it is unlocked. Simplify state_show function by adding a separator at the end of each string and overwrite last separator with new line. Signed-off-by: NTomasz Majchrzak <tomasz.majchrzak@intel.com> Reviewed-by: NArtur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 29 10月, 2016 1 次提交
-
-
由 NeilBrown 提交于
mddev->curr_resync usually records where the current resync is up to, but during the starting phase it has some "magic" values. 1 - means that the array is trying to start a resync, but has yielded to another array which shares physical devices, and also needs to start a resync 2 - means the array is trying to start resync, but has found another array which shares physical devices and has already started resync. 3 - means that resync has commensed, but it is possible that nothing has actually been resynced yet. It is important that this value not be visible to user-space and particularly that it doesn't get written to the metadata, as the resync or recovery checkpoint. In part, this is because it may be slightly higher than the correct value, though this is very rare. In part, because it is not a multiple of 4K, and some devices only support 4K aligned accesses. There are two places where this value is propagates into either ->curr_resync_completed or ->recovery_cp or ->recovery_offset. These currently avoid the propagation of values 1 and 3, but will allow 3 to leak through. Change them to only propagate the value if it is > 3. As this can cause an array to fail, the patch is suitable for -stable. Cc: stable@vger.kernel.org (v3.7+) Reported-by: NViswesh <viswesh.vichu@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 25 10月, 2016 1 次提交
-
-
由 Tomasz Majchrzak 提交于
If there is a bad block on a disk and there is a recovery performed from this disk, the same bad block is reported for a new disk. It involves setting MD_CHANGE_PENDING flag in rdev_set_badblocks. For external metadata this flag is not being cleared as array state is reported as 'clean'. The read request to bad block in RAID5 array gets stuck as it is waiting for a flag to be cleared - as per commit c3cce6cd ("md/raid5: ensure device failure recorded before write request returns."). The meaning of MD_CHANGE_PENDING and MD_CHANGE_CLEAN flags has been clarified in commit 070dc6dd ("md: resolve confusion of MD_CHANGE_CLEAN"), however MD_CHANGE_PENDING flag has been used in personality error handlers since and it doesn't fully comply with initial purpose. It was supposed to notify that write request is about to start, however now it is also used to request metadata update. Initially (in md_allow_write, md_write_start) MD_CHANGE_PENDING flag has been set and in_sync has been set to 0 at the same time. Error handlers just set the flag without modifying in_sync value. Sysfs array state is a single value so now it reports 'clean' when MD_CHANGE_PENDING flag is set and in_sync is set to 1. Userspace has no idea it is expected to take some action. Swap the order that array state is checked so 'write_pending' is reported ahead of 'clean' ('write_pending' is a misleading name but it is too late to rename it now). Signed-off-by: NTomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 04 10月, 2016 1 次提交
-
-
由 Shaohua Li 提交于
if all disks in an array are non-rotational, set the array non-rotational. This only works for array with all disks populated at startup. Support for disk hotadd/hotremove could be added later if necessary. Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 22 9月, 2016 4 次提交
-
-
由 Shaohua Li 提交于
lockdep reports a potential deadlock. Fix this by droping the mutex before md_import_device [ 1137.126601] ====================================================== [ 1137.127013] [ INFO: possible circular locking dependency detected ] [ 1137.127013] 4.8.0-rc4+ #538 Not tainted [ 1137.127013] ------------------------------------------------------- [ 1137.127013] mdadm/16675 is trying to acquire lock: [ 1137.127013] (&bdev->bd_mutex){+.+.+.}, at: [<ffffffff81243cf3>] __blkdev_get+0x63/0x450 [ 1137.127013] but task is already holding lock: [ 1137.127013] (detected_devices_mutex){+.+.+.}, at: [<ffffffff81a5138c>] md_ioctl+0x2ac/0x1f50 [ 1137.127013] which lock already depends on the new lock. [ 1137.127013] the existing dependency chain (in reverse order) is: [ 1137.127013] -> #1 (detected_devices_mutex){+.+.+.}: [ 1137.127013] [<ffffffff810b6f19>] lock_acquire+0xb9/0x220 [ 1137.127013] [<ffffffff81c51647>] mutex_lock_nested+0x67/0x3d0 [ 1137.127013] [<ffffffff81a4eeaf>] md_autodetect_dev+0x3f/0x90 [ 1137.127013] [<ffffffff81595be8>] rescan_partitions+0x1a8/0x2c0 [ 1137.127013] [<ffffffff81590081>] __blkdev_reread_part+0x71/0xb0 [ 1137.127013] [<ffffffff815900e5>] blkdev_reread_part+0x25/0x40 [ 1137.127013] [<ffffffff81590c4b>] blkdev_ioctl+0x51b/0xa30 [ 1137.127013] [<ffffffff81242bf1>] block_ioctl+0x41/0x50 [ 1137.127013] [<ffffffff81214c96>] do_vfs_ioctl+0x96/0x6e0 [ 1137.127013] [<ffffffff81215321>] SyS_ioctl+0x41/0x70 [ 1137.127013] [<ffffffff81c56825>] entry_SYSCALL_64_fastpath+0x18/0xa8 [ 1137.127013] -> #0 (&bdev->bd_mutex){+.+.+.}: [ 1137.127013] [<ffffffff810b6af2>] __lock_acquire+0x1662/0x1690 [ 1137.127013] [<ffffffff810b6f19>] lock_acquire+0xb9/0x220 [ 1137.127013] [<ffffffff81c51647>] mutex_lock_nested+0x67/0x3d0 [ 1137.127013] [<ffffffff81243cf3>] __blkdev_get+0x63/0x450 [ 1137.127013] [<ffffffff81244307>] blkdev_get+0x227/0x350 [ 1137.127013] [<ffffffff812444f6>] blkdev_get_by_dev+0x36/0x50 [ 1137.127013] [<ffffffff81a46d65>] lock_rdev+0x35/0x80 [ 1137.127013] [<ffffffff81a49bb4>] md_import_device+0xb4/0x1b0 [ 1137.127013] [<ffffffff81a513d6>] md_ioctl+0x2f6/0x1f50 [ 1137.127013] [<ffffffff815909b3>] blkdev_ioctl+0x283/0xa30 [ 1137.127013] [<ffffffff81242bf1>] block_ioctl+0x41/0x50 [ 1137.127013] [<ffffffff81214c96>] do_vfs_ioctl+0x96/0x6e0 [ 1137.127013] [<ffffffff81215321>] SyS_ioctl+0x41/0x70 [ 1137.127013] [<ffffffff81c56825>] entry_SYSCALL_64_fastpath+0x18/0xa8 [ 1137.127013] other info that might help us debug this: [ 1137.127013] Possible unsafe locking scenario: [ 1137.127013] CPU0 CPU1 [ 1137.127013] ---- ---- [ 1137.127013] lock(detected_devices_mutex); [ 1137.127013] lock(&bdev->bd_mutex); [ 1137.127013] lock(detected_devices_mutex); [ 1137.127013] lock(&bdev->bd_mutex); [ 1137.127013] *** DEADLOCK *** Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Guoqing Jiang 提交于
cluster_info and bitmap_info.nodes also need to be cleared when array is stopped. Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Guoqing Jiang 提交于
When stop clustered raid while it is pending on resync, MD_STILL_CLOSED flag could be cleared since udev rule is triggered to open the mddev. So obviously array can't be stopped soon and returns EBUSY. mdadm -Ss md-raid-arrays.rules set MD_STILL_CLOSED md_open() ... ... ... clear MD_STILL_CLOSED do_md_stop We make below changes to resolve this issue: 1. rename MD_STILL_CLOSED to MD_CLOSING since it is set when stop array and it means we are stopping array. 2. let md_open returns early if CLOSING is set, so no other threads will open array if one thread is trying to close it. 3. no need to clear CLOSING bit in md_open because 1 has ensure the bit is cleared, then we also don't need to test CLOSING bit in do_md_stop. Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Guoqing Jiang 提交于
The new_disk_ack could return failure if WAITING_FOR_NEWDISK is not set, so we need to kick the dev from array in case failure happened. And we missed to check err before call new_disk_ack othwise we could kick a rdev which isn't in array, thanks for the reminder from Shaohua. Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 09 9月, 2016 1 次提交
-
-
由 Guoqing Jiang 提交于
The md-cluster is compiled as module by default, if it is compiled by built-in way, then we can't make md-cluster works. [64782.630008] md/raid1:md127: active with 2 out of 2 mirrors [64782.630528] md-cluster module not found. [64782.630530] md127: Could not setup cluster service (-2) Fixes: edb39c9d ("Introduce md_cluster_operations to handle cluster functions") Cc: stable@vger.kernel.org (v4.1+) Reported-by: NMarc Smith <marc.smith@mcc.edu> Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 25 8月, 2016 1 次提交
-
-
由 Song Liu 提交于
Currently, the code sets MD_JOURNAL_CLEAN when the array has MD_FEATURE_JOURNAL and the recovery_cp is MaxSector. The array will be MD_JOURNAL_CLEAN even if the journal device is missing. With this patch, the MD_JOURNAL_CLEAN is only set when the journal device presents. Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 18 8月, 2016 2 次提交
-
-
由 Artur Paszkiewicz 提交于
This fixes a long-standing bug that caused a flood of messages like: "md: delaying data-check of md1 until md2 has finished (they share one or more physical units)" It can be reproduced like this: 1. Create at least 3 raid1 arrays on a pair of disks, each on different partitions. 2. Request a sync operation like 'check' or 'repair' on 2 arrays by writing to their md/sync_action attribute files. One operation should start and one should be delayed and a message like the above will be printed. 3. Issue a write to the third array. Each write will cause 2 copies of the message to be printed. This happens when wake_up(&resync_wait) is called, usually by md_check_recovery(). Then the delayed sync thread again prints the message and is put to sleep. This patch adds a check in md_do_sync() to prevent printing this message more than once for the same pair of devices. Reported-by: NSven Koehler <sven.koehler@gmail.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=151801Signed-off-by: NArtur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Guoqing Jiang 提交于
The ret is not needed anymore since we have already move resync_start into md_do_sync in commit 41a9a0dc. Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 17 8月, 2016 1 次提交
-
-
由 Song Liu 提交于
GET_ARRAY_INFO counts journal as spare (spare_disks), which is not accurate. This patch fixes this. Reported-by: NYi Zhang <yizhan@redhat.com> Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 08 8月, 2016 1 次提交
-
-
由 Jens Axboe 提交于
Since commit 63a4cc24, bio->bi_rw contains flags in the lower portion and the op code in the higher portions. This means that old code that relies on manually setting bi_rw is most likely going to be broken. Instead of letting that brokeness linger, rename the member, to force old and out-of-tree code to break at compile time instead of at runtime. No intended functional changes in this commit. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 29 7月, 2016 1 次提交
-
-
由 Shaohua Li 提交于
The md device might not have personality (for example, ddf raid array). The issue is introduced by 8430e7e0(md: disconnect device from personality before trying to remove it) Reported-by: Nkernel test robot <xiaolong.ye@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 20 7月, 2016 3 次提交
-
-
由 Tomasz Majchrzak 提交于
Changeset 6791875e has added early return from a function so there is no sysfs notification for 'active' and 'clean' state change. Signed-off-by: NTomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Alexey Obitotskiy 提交于
md loads raidX modules and increments module refcount each time level has changed but does not decrement it. You are unable to unload raid0 module after reshape because raid0 reshape changes level to raid4 and back to raid0. Signed-off-by: NAleksey Obitotskiy <aleksey.obitotskiy@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Arnd Bergmann 提交于
The md code stores the exact time of the last error in the last_read_error variable using a timespec structure. It only ever uses the seconds portion of that though, so we can use a scalar for it. There won't be an overflow in 2038 here, because it already used monotonic time and 32-bit is enough for that, but I've decided to use time64_t for consistency in the conversion. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 14 6月, 2016 3 次提交
-
-
由 NeilBrown 提交于
Every time a device is removed with ->hot_remove_disk() a synchronize_rcu() call is made which can delay several milliseconds in some case. If lots of devices fail at once - as could happen with a large RAID10 where one set of devices are removed all at once - these delays can add up to be very inconcenient. As failure is not reversible we can check for that first, setting a separate flag if it is found, and then all synchronize_rcu() once for all the flagged devices. Then ->hot_remove_disk() function can skip the synchronize_rcu() step if the flag is set. fix build error(Shaohua) Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
When the HOT_REMOVE_DISK ioctl is used to remove a device, we call remove_and_add_spares() which will remove it from the personality if possible. This improves the chances that the removal will succeed. When writing "remove" to dev-XX/state, we don't. So that can fail more easily. So add the remove_and_add_spares() into "remove" handling. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Xiao Ni 提交于
This is a simple check before updating the superblock. It should update the superblock when update_size return 0. Signed-off-by: NXiao Ni <xni@redhat.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 10 6月, 2016 1 次提交
-
-
由 Cong Wang 提交于
We saw a list corruption in the list all_detected_devices: WARNING: CPU: 16 PID: 226 at lib/list_debug.c:29 __list_add+0x3c/0xa9() list_add corruption. next->prev should be prev (ffff880859d58320), but was ffff880859ce74c0. (next=ffffffff81abfdb0). Modules linked in: ahci libahci libata sd_mod scsi_mod CPU: 16 PID: 226 Comm: kworker/u241:4 Not tainted 4.1.20 #1 Hardware name: Dell Inc. PowerEdge C6220/04GD66, BIOS 2.2.3 11/07/2013 Workqueue: events_unbound async_run_entry_fn 0000000000000000 ffff880859a5baf8 ffffffff81502872 ffff880859a5bb48 0000000000000009 ffff880859a5bb38 ffffffff810692a5 ffff880859ee8828 ffffffff812ad02c ffff880859d58320 ffffffff81abfdb0 ffff880859eb90c0 Call Trace: [<ffffffff81502872>] dump_stack+0x4d/0x63 [<ffffffff810692a5>] warn_slowpath_common+0xa1/0xbb [<ffffffff812ad02c>] ? __list_add+0x3c/0xa9 [<ffffffff81069305>] warn_slowpath_fmt+0x46/0x48 [<ffffffff812ad02c>] __list_add+0x3c/0xa9 [<ffffffff81406f28>] md_autodetect_dev+0x41/0x62 [<ffffffff81285862>] rescan_partitions+0x25f/0x29d [<ffffffff81506372>] ? mutex_lock+0x13/0x31 [<ffffffff811a090f>] __blkdev_get+0x1aa/0x3cd [<ffffffff811a0b91>] blkdev_get+0x5f/0x294 [<ffffffff81377ceb>] ? put_device+0x17/0x19 [<ffffffff8128227c>] ? disk_put_part+0x12/0x14 [<ffffffff812836f3>] add_disk+0x29d/0x407 [<ffffffff81384345>] ? __pm_runtime_use_autosuspend+0x5c/0x64 [<ffffffffa004a724>] sd_probe_async+0x115/0x1af [sd_mod] [<ffffffff81083177>] async_run_entry_fn+0x72/0x12c [<ffffffff8107c44c>] process_one_work+0x198/0x2ce [<ffffffff8107cac7>] worker_thread+0x1dd/0x2bb [<ffffffff8107c8ea>] ? cancel_delayed_work_sync+0x15/0x15 [<ffffffff8107c8ea>] ? cancel_delayed_work_sync+0x15/0x15 [<ffffffff81080d9c>] kthread+0xae/0xb6 [<ffffffff81080000>] ? param_array_set+0x40/0xfa [<ffffffff81080cee>] ? __kthread_parkme+0x61/0x61 [<ffffffff81508152>] ret_from_fork+0x42/0x70 [<ffffffff81080cee>] ? __kthread_parkme+0x61/0x61 I suspect it is because there is no lock protecting this global list, autostart_arrays() is called in ioctl() path where there is no lock. Cc: Shaohua Li <shli@kernel.org> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 08 6月, 2016 3 次提交
-
-
由 Mike Christie 提交于
To avoid confusion between REQ_OP_FLUSH, which is handled by request_fn drivers, and upper layers requesting the block layer perform a flush sequence along with possibly a WRITE, this patch renames REQ_FLUSH to REQ_PREFLUSH. Signed-off-by: NMike Christie <mchristi@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Mike Christie 提交于
Separate the op from the rq_flag_bits and have md set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: NMike Christie <mchristi@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Mike Christie 提交于
This has callers of submit_bio/submit_bio_wait set the bio->bi_rw instead of passing it in. This makes that use the same as generic_make_request and how we set the other bio fields. Signed-off-by: NMike Christie <mchristi@redhat.com> Fixed up fs/ext4/crypto.c Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 04 6月, 2016 2 次提交
-
-
由 Guoqing Jiang 提交于
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Guoqing Jiang 提交于
Add a disk to an array which is performing recovery is a little complicated, we need to do both reap the sync thread and perform add disk for the case, then it caused deadlock as follows. linux44:~ # ps aux|grep md|grep D root 1822 0.0 0.0 0 0 ? D 16:50 0:00 [md127_resync] root 1848 0.0 0.0 19860 952 pts/0 D+ 16:50 0:00 mdadm --manage /dev/md127 --re-add /dev/vdb linux44:~ # cat /proc/1848/stack [<ffffffff8107afde>] kthread_stop+0x6e/0x120 [<ffffffffa051ddb0>] md_unregister_thread+0x40/0x80 [md_mod] [<ffffffffa0526e45>] md_reap_sync_thread+0x15/0x150 [md_mod] [<ffffffffa05271e0>] action_store+0x260/0x270 [md_mod] [<ffffffffa05206b4>] md_attr_store+0xb4/0x100 [md_mod] [<ffffffff81214a7e>] sysfs_write_file+0xbe/0x140 [<ffffffff811a6b98>] vfs_write+0xb8/0x1e0 [<ffffffff811a75b8>] SyS_write+0x48/0xa0 [<ffffffff8152a5c9>] system_call_fastpath+0x16/0x1b [<00007f068ea1ed30>] 0x7f068ea1ed30 linux44:~ # cat /proc/1822/stack [<ffffffffa05251a6>] md_do_sync+0x846/0xf40 [md_mod] [<ffffffffa052402d>] md_thread+0x16d/0x180 [md_mod] [<ffffffff8107ad94>] kthread+0xb4/0xc0 [<ffffffff8152a518>] ret_from_fork+0x58/0x90 Task1848 Task1822 md_attr_store (held reconfig_mutex by call mddev_lock()) action_store md_reap_sync_thread md_unregister_thread kthread_stop md_wakeup_thread(mddev->thread); wait_event(mddev->sb_wait, !test_bit(MD_CHANGE_PENDING)) md_check_recovery is triggered by wakeup mddev->thread, but it can't clear MD_CHANGE_PENDING flag since it can't get lock which was held by md_attr_store already. To solve the deadlock problem, we move "->resync_finish()" from md_do_sync to md_reap_sync_thread (after md_update_sb), also MD_HELD_RESYNC_LOCK is introduced since it is possible that node can't get resync lock in md_do_sync. Then we do not need to wait for MD_CHANGE_PENDING is cleared or not since metadata should be updated after md_update_sb, so just call resync_finish if MD_HELD_RESYNC_LOCK is set. We also unified the code after skip label, since set PENDING for non-clustered case should be harmless. Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 10 5月, 2016 2 次提交
-
-
由 Guoqing Jiang 提交于
Some code waits for a metadata update by: 1. flagging that it is needed (MD_CHANGE_DEVS or MD_CHANGE_CLEAN) 2. setting MD_CHANGE_PENDING and waking the management thread 3. waiting for MD_CHANGE_PENDING to be cleared If the first two are done without locking, the code in md_update_sb() which checks if it needs to repeat might test if an update is needed before step 1, then clear MD_CHANGE_PENDING after step 2, resulting in the wait returning early. So make sure all places that set MD_CHANGE_PENDING are atomicial, and bit_clear_unless (suggested by Neil) is introduced for the purpose. Cc: Martin Kepplinger <martink@posteo.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: <linux-kernel@vger.kernel.org> Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Heinz Mauelshagen 提交于
Introduced by upstream commit 70d9798b The raid0 personality does not create mddev->thread as oposed to other personalities leading to its unconditional access in mddev_suspend() causing an oops. Patch checks for mddev->thread in order to keep the intention of aforementioned commit. Fixes: 70d9798b ("MD: warn for potential deadlock") Cc: stable@vger.kernel.org (4.5+) Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 05 5月, 2016 4 次提交
-
-
由 Guoqing Jiang 提交于
When a device is re-added, it will ultimately need to be activated and that happens in md_check_recovery, so we need to set MD_RECOVERY_NEEDED right after remove_and_add_spares. A specifical issue without the change is that when one node perform fail/remove/readd on a disk, but slave nodes could not add the disk back to array as expected (added as missed instead of in sync). So give slave nodes a chance to do resync. Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Guoqing Jiang 提交于
Currently, some features are not supported yet, such as change array_sectors and update size, so return EINVAL for them and listed it in document. Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Guoqing Jiang 提交于
It is not reasonable that cluster raid to release resync lock before the last pers->sync_request has finished. As the metadata will be changed when node performs resync, we need to inform other nodes to update metadata, so the MD_CHANGE_PENDING flag is set before finish resync. Then metadata_update_finish is move ahead to ensure that METADATA_UPDATED msg is sent before finish resync, and metadata_update_start need to be run after "repeat:" label accordingly. Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Guoqing Jiang 提交于
If multiple nodes choose to attempt do resync at the same time they need to be serialized so they don't duplicate effort. This serialization is done by locking the 'resync' DLM lock. Currently if a node cannot get the lock immediately it doesn't request notification when the lock becomes available (i.e. DLM_LKF_NOQUEUE is set), so it may not reliably find out when it is safe to try again. Rather than trying to arrange an async wake-up when the lock becomes available, switch to using synchronous locking - this is a lot easier to think about. As it is not permitted to block in the 'raid1d' thread, move the locking to the resync thread. So the rsync thread is forked immediately, but it blocks until the resync lock is available. Once the lock is locked it checks again if any resync action is needed. A particular symptom of the current problem is that a node can get stuck with "resync=pending" indefinitely. Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 26 4月, 2016 1 次提交
-
-
由 Shaohua Li 提交于
blk_queue_split marks bio unmergeable, which makes sense for normal bio. But if dispatching the bio to underlayer disk, the blk_queue_split checks are invalid, hence it's possible the bio becomes mergeable. In the reported bug, this bug causes trim against raid0 performance slash https://bugzilla.kernel.org/show_bug.cgi?id=117051Reported-and-tested-by: NPark Ju Hyung <qkrwngud825@gmail.com> Fixes: 6ac45aeb(block: avoid to merge splitted bio) Cc: stable@vger.kernel.org (v4.3+) Cc: Ming Lei <ming.lei@canonical.com> Cc: Neil Brown <neilb@suse.de> Reviewed-by: NJens Axboe <axboe@fb.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 13 4月, 2016 1 次提交
-
-
由 Jens Axboe 提交于
Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 01 4月, 2016 2 次提交
-
-
由 Shaohua Li 提交于
Xiao Ni reported below crash: [26396.335146] BUG: unable to handle kernel NULL pointer dereference at 00000000000002a8 [26396.342990] IP: [<ffffffffa0425b00>] super_written+0x20/0x80 [md_mod] [26396.349449] PGD 0 [26396.351468] Oops: 0002 [#1] SMP [26396.354898] Modules linked in: ext4 mbcache jbd2 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_td [26396.408404] CPU: 5 PID: 3261 Comm: loop0 Not tainted 4.5.0 #1 [26396.414140] Hardware name: Dell Inc. PowerEdge R715/0G2DP3, BIOS 3.2.2 09/15/2014 [26396.421608] task: ffff8808339be680 ti: ffff8808365f4000 task.ti: ffff8808365f4000 [26396.429074] RIP: 0010:[<ffffffffa0425b00>] [<ffffffffa0425b00>] super_written+0x20/0x80 [md_mod] [26396.437952] RSP: 0018:ffff8808365f7c38 EFLAGS: 00010046 [26396.443252] RAX: ffffffffa0425ae0 RBX: ffff8804336a7900 RCX: ffffe8f9f7b41198 [26396.450371] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8804336a7900 [26396.457489] RBP: ffff8808365f7c50 R08: 0000000000000005 R09: 00001801e02ce3d7 [26396.464608] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 [26396.471728] R13: ffff8808338d9a00 R14: 0000000000000000 R15: ffff880833f9fe00 [26396.478849] FS: 00007f9e5066d740(0000) GS:ffff880237b40000(0000) knlGS:0000000000000000 [26396.486922] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [26396.492656] CR2: 00000000000002a8 CR3: 00000000019ea000 CR4: 00000000000006e0 [26396.499775] Stack: [26396.501781] ffff8804336a7900 0000000000000000 0000000000000000 ffff8808365f7c68 [26396.509199] ffffffff81308cd0 ffff8804336a7900 ffff8808365f7ca8 ffffffff81310637 [26396.516618] 00000000a0233a00 ffff880833f9fe00 0000000000000000 ffff880833fb0000 [26396.524038] Call Trace: [26396.526485] [<ffffffff81308cd0>] bio_endio+0x40/0x60 [26396.531529] [<ffffffff81310637>] blk_update_request+0x87/0x320 [26396.537439] [<ffffffff8131a20a>] blk_mq_end_request+0x1a/0x70 [26396.543261] [<ffffffff81313889>] blk_flush_complete_seq+0xd9/0x2a0 [26396.549517] [<ffffffff81313ccf>] flush_end_io+0x15f/0x240 [26396.554993] [<ffffffff8131a22a>] blk_mq_end_request+0x3a/0x70 [26396.560815] [<ffffffff8131a314>] __blk_mq_complete_request+0xb4/0xe0 [26396.567246] [<ffffffff8131a35c>] blk_mq_complete_request+0x1c/0x20 [26396.573506] [<ffffffffa04182df>] loop_queue_work+0x6f/0x72c [loop] [26396.579764] [<ffffffff81697844>] ? __schedule+0x2b4/0x8f0 [26396.585242] [<ffffffff810a7812>] kthread_worker_fn+0x52/0x170 [26396.591065] [<ffffffff810a77c0>] ? kthread_create_on_node+0x1a0/0x1a0 [26396.597582] [<ffffffff810a7238>] kthread+0xd8/0xf0 [26396.602453] [<ffffffff810a7160>] ? kthread_park+0x60/0x60 [26396.607929] [<ffffffff8169bdcf>] ret_from_fork+0x3f/0x70 [26396.613319] [<ffffffff810a7160>] ? kthread_park+0x60/0x60 md_super_write() and corresponding md_super_wait() generally are called with reconfig_mutex locked, which prevents disk disappears. There is one case this rule is broken. write_sb_page of bitmap.c doesn't hold the mutex. next_active_rdev does increase rdev reference, but it decreases the reference too early (eg, before IO finish). disk can disappear at the window. We unconditionally increase rdev reference in md_super_write() to avoid the race. Reported-and-tested-by: NXiao Ni <xni@redhat.com> Reviewed-by: NNeil Brown <neilb@suse.de> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Wei Fang 提交于
Fix a trivial typo in md_ioctl(). Signed-off-by: NWei Fang <fangwei1@huawei.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 27 2月, 2016 2 次提交
-
-
由 Shaohua Li 提交于
The personality thread shouldn't call mddev_suspend(). Because mddev_suspend() will for all IO finish, but IO is handled in personality thread, so this could cause deadlock. To trigger this early, add a warning if mddev_suspend() is called from personality thread. Suggested-by: NNeilBrown <neilb@suse.com> Cc: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Sebastian Parschauer 提交于
When stopping an MD device, then its device node /dev/mdX may still exist afterwards or it is recreated by udev. The next open() call can lead to creation of an inoperable MD device. The reason for this is that a change event (KOBJ_CHANGE) is sent to udev which races against the remove event (KOBJ_REMOVE) from md_free(). So drop sending the change event. A change is likely also required in mdadm as many versions send the change event to udev as well. Neil mentioned the change event is a workaround for old kernel Commit: 934d9c23 ("md: destroy partitions and notify udev when md array is stopped.") new mdadm can handle device remove now, so this isn't required any more. Cc: NeilBrown <neilb@suse.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: NSebastian Parschauer <sebastian.riemer@profitbricks.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 14 1月, 2016 1 次提交
-
-
由 Dan Williams 提交于
It is not safe for an integrity profile to be changed while i/o is in-flight in the queue. Prevent adding new disks or otherwise online spares to an array if the device has an incompatible integrity profile. The original change to the blk_integrity_unregister implementation in md, commmit c7bfced9 "md: suspend i/o during runtime blk_integrity_unregister" introduced an immediate hang regression. This policy of disallowing changes the integrity profile once one has been established is shared with DM. Here is an abbreviated log from a test run that: 1/ Creates a degraded raid1 with an integrity-enabled device (pmem0s) [ 59.076127] 2/ Tries to add an integrity-disabled device (pmem1m) [ 90.489209] 3/ Retries with an integrity-enabled device (pmem1s) [ 205.671277] [ 59.076127] md/raid1:md0: active with 1 out of 2 mirrors [ 59.078302] md: data integrity enabled on md0 [..] [ 90.489209] md0: incompatible integrity profile for pmem1m [..] [ 205.671277] md: super_written gets error=-5 [ 205.677386] md/raid1:md0: Disk failure on pmem1m, disabling device. [ 205.677386] md/raid1:md0: Operation continuing on 1 devices. [ 205.683037] RAID1 conf printout: [ 205.684699] --- wd:1 rd:2 [ 205.685972] disk 0, wo:0, o:1, dev:pmem0s [ 205.687562] disk 1, wo:1, o:1, dev:pmem1s [ 205.691717] md: recovery of RAID array md0 Fixes: c7bfced9 ("md: suspend i/o during runtime blk_integrity_unregister") Cc: <stable@vger.kernel.org> Cc: Mike Snitzer <snitzer@redhat.com> Reported-by: NNeilBrown <neilb@suse.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-