- 10 11月, 2016 1 次提交
-
-
由 NeilBrown 提交于
bitmap_flush() finishes with bitmap_update_sb(), and that finishes with write_page(..., 1), so write_page() will wait for all writes to complete. So there is no point calling md_super_wait() immediately afterwards. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 08 11月, 2016 6 次提交
-
-
由 NeilBrown 提交于
When adding devices to, or removing device from, an array we need to update the metadata. However we don't need to do it synchronously as data integrity doesn't depend on these changes being recorded instantly. So avoid the synchronous call to md_update_sb and just set a flag so that the thread will do it. This can reduce the number of updates performed when lots of devices are being added or removed. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
1/ using pr_debug() for a number of messages reduces the noise of md, but still allows them to be enabled when needed. 2/ try to be consistent in the usage of pr_err() and pr_warn(), and document the intention 3/ When strings have been split onto multiple lines, rejoin into a single string. The cost of having lines > 80 chars is less than the cost of not being able to easily search for a particular message. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
1/ don't print a warning if allocation fails. page_alloc() does that already. 2/ always check return status for error. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Tomasz Majchrzak 提交于
When raid1/raid10 array fails to write to one of the drives, the request is added to bio_end_io_list and finished by personality thread. The thread doesn't handle it as long as MD_CHANGE_PENDING flag is set. In case of external metadata this flag is cleared, however the thread is not woken up. It causes request to be blocked for few seconds (until another action on the array wakes up the thread) or to get stuck indefinitely. Wake up personality thread once MD_CHANGE_PENDING has been cleared. Moving 'restart_array' call after the flag is cleared it not a solution because in read-write mode the call doesn't wake up the thread. Signed-off-by: NTomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Tomasz Majchrzak 提交于
If external metadata handler supports bad blocks and unacknowledged bad blocks are present, don't report disk via sysfs as faulty. Such situation can be still handled so disk just has to be blocked for a moment. It makes it consistent with kernel state as corresponding rdev flag is also not set. When the disk in being unblocked there are few cases: 1. Disk has been in blocked and faulty state, it is being unblocked but it still remains in faulty state. Metadata handler will remove it from array in the next call. 2. There is no bad block support in external metadata handler and bad blocks are present - put the disk in blocked and faulty state (see case 1). 3. There is bad block support in external metadata handler and all bad blocks are acknowledged - clear all flags, continue. 4. There is bad block support in external metadata handler but there are still unacknowledged bad blocks - clear all flags, continue. It is fine to clear Blocked flag because it was probably not set anyway (if it was it is case 1). BlockedBadBlocks flag can also be cleared because the request waiting for it will set it again when it finds out that some bad block is still not acknowledged. Recovery is not necessary but there are no problems if the flag is set. Sysfs rdev state is still reported as blocked (due to unacknowledged bad blocks) so metadata handler will process remaining bad blocks and unblock disk again. Signed-off-by: NTomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Tomasz Majchrzak 提交于
Add new rdev flag which external metadata handler can use to switch on/off bad block support. If new bad block is encountered, notify it via rdev 'unacknowledged_bad_blocks' sysfs file. If bad block has been cleared, notify update to rdev 'bad_blocks' sysfs file. When bad blocks support is being removed, just clear rdev flag. It is not necessary to reset badblocks->shift field. If there are bad blocks cleared or added at the same time, it is ok for those changes to be applied to the structure. The array is in blocked state and the drive which cannot handle bad blocks any more will be removed from the array before it is unlocked. Simplify state_show function by adding a separator at the end of each string and overwrite last separator with new line. Signed-off-by: NTomasz Majchrzak <tomasz.majchrzak@intel.com> Reviewed-by: NArtur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 29 10月, 2016 1 次提交
-
-
由 NeilBrown 提交于
mddev->curr_resync usually records where the current resync is up to, but during the starting phase it has some "magic" values. 1 - means that the array is trying to start a resync, but has yielded to another array which shares physical devices, and also needs to start a resync 2 - means the array is trying to start resync, but has found another array which shares physical devices and has already started resync. 3 - means that resync has commensed, but it is possible that nothing has actually been resynced yet. It is important that this value not be visible to user-space and particularly that it doesn't get written to the metadata, as the resync or recovery checkpoint. In part, this is because it may be slightly higher than the correct value, though this is very rare. In part, because it is not a multiple of 4K, and some devices only support 4K aligned accesses. There are two places where this value is propagates into either ->curr_resync_completed or ->recovery_cp or ->recovery_offset. These currently avoid the propagation of values 1 and 3, but will allow 3 to leak through. Change them to only propagate the value if it is > 3. As this can cause an array to fail, the patch is suitable for -stable. Cc: stable@vger.kernel.org (v3.7+) Reported-by: NViswesh <viswesh.vichu@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 25 10月, 2016 1 次提交
-
-
由 Tomasz Majchrzak 提交于
If there is a bad block on a disk and there is a recovery performed from this disk, the same bad block is reported for a new disk. It involves setting MD_CHANGE_PENDING flag in rdev_set_badblocks. For external metadata this flag is not being cleared as array state is reported as 'clean'. The read request to bad block in RAID5 array gets stuck as it is waiting for a flag to be cleared - as per commit c3cce6cd ("md/raid5: ensure device failure recorded before write request returns."). The meaning of MD_CHANGE_PENDING and MD_CHANGE_CLEAN flags has been clarified in commit 070dc6dd ("md: resolve confusion of MD_CHANGE_CLEAN"), however MD_CHANGE_PENDING flag has been used in personality error handlers since and it doesn't fully comply with initial purpose. It was supposed to notify that write request is about to start, however now it is also used to request metadata update. Initially (in md_allow_write, md_write_start) MD_CHANGE_PENDING flag has been set and in_sync has been set to 0 at the same time. Error handlers just set the flag without modifying in_sync value. Sysfs array state is a single value so now it reports 'clean' when MD_CHANGE_PENDING flag is set and in_sync is set to 1. Userspace has no idea it is expected to take some action. Swap the order that array state is checked so 'write_pending' is reported ahead of 'clean' ('write_pending' is a misleading name but it is too late to rename it now). Signed-off-by: NTomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 04 10月, 2016 1 次提交
-
-
由 Shaohua Li 提交于
if all disks in an array are non-rotational, set the array non-rotational. This only works for array with all disks populated at startup. Support for disk hotadd/hotremove could be added later if necessary. Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 22 9月, 2016 4 次提交
-
-
由 Shaohua Li 提交于
lockdep reports a potential deadlock. Fix this by droping the mutex before md_import_device [ 1137.126601] ====================================================== [ 1137.127013] [ INFO: possible circular locking dependency detected ] [ 1137.127013] 4.8.0-rc4+ #538 Not tainted [ 1137.127013] ------------------------------------------------------- [ 1137.127013] mdadm/16675 is trying to acquire lock: [ 1137.127013] (&bdev->bd_mutex){+.+.+.}, at: [<ffffffff81243cf3>] __blkdev_get+0x63/0x450 [ 1137.127013] but task is already holding lock: [ 1137.127013] (detected_devices_mutex){+.+.+.}, at: [<ffffffff81a5138c>] md_ioctl+0x2ac/0x1f50 [ 1137.127013] which lock already depends on the new lock. [ 1137.127013] the existing dependency chain (in reverse order) is: [ 1137.127013] -> #1 (detected_devices_mutex){+.+.+.}: [ 1137.127013] [<ffffffff810b6f19>] lock_acquire+0xb9/0x220 [ 1137.127013] [<ffffffff81c51647>] mutex_lock_nested+0x67/0x3d0 [ 1137.127013] [<ffffffff81a4eeaf>] md_autodetect_dev+0x3f/0x90 [ 1137.127013] [<ffffffff81595be8>] rescan_partitions+0x1a8/0x2c0 [ 1137.127013] [<ffffffff81590081>] __blkdev_reread_part+0x71/0xb0 [ 1137.127013] [<ffffffff815900e5>] blkdev_reread_part+0x25/0x40 [ 1137.127013] [<ffffffff81590c4b>] blkdev_ioctl+0x51b/0xa30 [ 1137.127013] [<ffffffff81242bf1>] block_ioctl+0x41/0x50 [ 1137.127013] [<ffffffff81214c96>] do_vfs_ioctl+0x96/0x6e0 [ 1137.127013] [<ffffffff81215321>] SyS_ioctl+0x41/0x70 [ 1137.127013] [<ffffffff81c56825>] entry_SYSCALL_64_fastpath+0x18/0xa8 [ 1137.127013] -> #0 (&bdev->bd_mutex){+.+.+.}: [ 1137.127013] [<ffffffff810b6af2>] __lock_acquire+0x1662/0x1690 [ 1137.127013] [<ffffffff810b6f19>] lock_acquire+0xb9/0x220 [ 1137.127013] [<ffffffff81c51647>] mutex_lock_nested+0x67/0x3d0 [ 1137.127013] [<ffffffff81243cf3>] __blkdev_get+0x63/0x450 [ 1137.127013] [<ffffffff81244307>] blkdev_get+0x227/0x350 [ 1137.127013] [<ffffffff812444f6>] blkdev_get_by_dev+0x36/0x50 [ 1137.127013] [<ffffffff81a46d65>] lock_rdev+0x35/0x80 [ 1137.127013] [<ffffffff81a49bb4>] md_import_device+0xb4/0x1b0 [ 1137.127013] [<ffffffff81a513d6>] md_ioctl+0x2f6/0x1f50 [ 1137.127013] [<ffffffff815909b3>] blkdev_ioctl+0x283/0xa30 [ 1137.127013] [<ffffffff81242bf1>] block_ioctl+0x41/0x50 [ 1137.127013] [<ffffffff81214c96>] do_vfs_ioctl+0x96/0x6e0 [ 1137.127013] [<ffffffff81215321>] SyS_ioctl+0x41/0x70 [ 1137.127013] [<ffffffff81c56825>] entry_SYSCALL_64_fastpath+0x18/0xa8 [ 1137.127013] other info that might help us debug this: [ 1137.127013] Possible unsafe locking scenario: [ 1137.127013] CPU0 CPU1 [ 1137.127013] ---- ---- [ 1137.127013] lock(detected_devices_mutex); [ 1137.127013] lock(&bdev->bd_mutex); [ 1137.127013] lock(detected_devices_mutex); [ 1137.127013] lock(&bdev->bd_mutex); [ 1137.127013] *** DEADLOCK *** Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Guoqing Jiang 提交于
cluster_info and bitmap_info.nodes also need to be cleared when array is stopped. Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Guoqing Jiang 提交于
When stop clustered raid while it is pending on resync, MD_STILL_CLOSED flag could be cleared since udev rule is triggered to open the mddev. So obviously array can't be stopped soon and returns EBUSY. mdadm -Ss md-raid-arrays.rules set MD_STILL_CLOSED md_open() ... ... ... clear MD_STILL_CLOSED do_md_stop We make below changes to resolve this issue: 1. rename MD_STILL_CLOSED to MD_CLOSING since it is set when stop array and it means we are stopping array. 2. let md_open returns early if CLOSING is set, so no other threads will open array if one thread is trying to close it. 3. no need to clear CLOSING bit in md_open because 1 has ensure the bit is cleared, then we also don't need to test CLOSING bit in do_md_stop. Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Guoqing Jiang 提交于
The new_disk_ack could return failure if WAITING_FOR_NEWDISK is not set, so we need to kick the dev from array in case failure happened. And we missed to check err before call new_disk_ack othwise we could kick a rdev which isn't in array, thanks for the reminder from Shaohua. Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 09 9月, 2016 1 次提交
-
-
由 Guoqing Jiang 提交于
The md-cluster is compiled as module by default, if it is compiled by built-in way, then we can't make md-cluster works. [64782.630008] md/raid1:md127: active with 2 out of 2 mirrors [64782.630528] md-cluster module not found. [64782.630530] md127: Could not setup cluster service (-2) Fixes: edb39c9d ("Introduce md_cluster_operations to handle cluster functions") Cc: stable@vger.kernel.org (v4.1+) Reported-by: NMarc Smith <marc.smith@mcc.edu> Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 25 8月, 2016 1 次提交
-
-
由 Song Liu 提交于
Currently, the code sets MD_JOURNAL_CLEAN when the array has MD_FEATURE_JOURNAL and the recovery_cp is MaxSector. The array will be MD_JOURNAL_CLEAN even if the journal device is missing. With this patch, the MD_JOURNAL_CLEAN is only set when the journal device presents. Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 18 8月, 2016 2 次提交
-
-
由 Artur Paszkiewicz 提交于
This fixes a long-standing bug that caused a flood of messages like: "md: delaying data-check of md1 until md2 has finished (they share one or more physical units)" It can be reproduced like this: 1. Create at least 3 raid1 arrays on a pair of disks, each on different partitions. 2. Request a sync operation like 'check' or 'repair' on 2 arrays by writing to their md/sync_action attribute files. One operation should start and one should be delayed and a message like the above will be printed. 3. Issue a write to the third array. Each write will cause 2 copies of the message to be printed. This happens when wake_up(&resync_wait) is called, usually by md_check_recovery(). Then the delayed sync thread again prints the message and is put to sleep. This patch adds a check in md_do_sync() to prevent printing this message more than once for the same pair of devices. Reported-by: NSven Koehler <sven.koehler@gmail.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=151801Signed-off-by: NArtur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Guoqing Jiang 提交于
The ret is not needed anymore since we have already move resync_start into md_do_sync in commit 41a9a0dc. Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 17 8月, 2016 1 次提交
-
-
由 Song Liu 提交于
GET_ARRAY_INFO counts journal as spare (spare_disks), which is not accurate. This patch fixes this. Reported-by: NYi Zhang <yizhan@redhat.com> Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 08 8月, 2016 1 次提交
-
-
由 Jens Axboe 提交于
Since commit 63a4cc24, bio->bi_rw contains flags in the lower portion and the op code in the higher portions. This means that old code that relies on manually setting bi_rw is most likely going to be broken. Instead of letting that brokeness linger, rename the member, to force old and out-of-tree code to break at compile time instead of at runtime. No intended functional changes in this commit. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 29 7月, 2016 1 次提交
-
-
由 Shaohua Li 提交于
The md device might not have personality (for example, ddf raid array). The issue is introduced by 8430e7e0(md: disconnect device from personality before trying to remove it) Reported-by: Nkernel test robot <xiaolong.ye@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 20 7月, 2016 3 次提交
-
-
由 Tomasz Majchrzak 提交于
Changeset 6791875e has added early return from a function so there is no sysfs notification for 'active' and 'clean' state change. Signed-off-by: NTomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Alexey Obitotskiy 提交于
md loads raidX modules and increments module refcount each time level has changed but does not decrement it. You are unable to unload raid0 module after reshape because raid0 reshape changes level to raid4 and back to raid0. Signed-off-by: NAleksey Obitotskiy <aleksey.obitotskiy@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Arnd Bergmann 提交于
The md code stores the exact time of the last error in the last_read_error variable using a timespec structure. It only ever uses the seconds portion of that though, so we can use a scalar for it. There won't be an overflow in 2038 here, because it already used monotonic time and 32-bit is enough for that, but I've decided to use time64_t for consistency in the conversion. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 14 6月, 2016 3 次提交
-
-
由 NeilBrown 提交于
Every time a device is removed with ->hot_remove_disk() a synchronize_rcu() call is made which can delay several milliseconds in some case. If lots of devices fail at once - as could happen with a large RAID10 where one set of devices are removed all at once - these delays can add up to be very inconcenient. As failure is not reversible we can check for that first, setting a separate flag if it is found, and then all synchronize_rcu() once for all the flagged devices. Then ->hot_remove_disk() function can skip the synchronize_rcu() step if the flag is set. fix build error(Shaohua) Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
When the HOT_REMOVE_DISK ioctl is used to remove a device, we call remove_and_add_spares() which will remove it from the personality if possible. This improves the chances that the removal will succeed. When writing "remove" to dev-XX/state, we don't. So that can fail more easily. So add the remove_and_add_spares() into "remove" handling. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Xiao Ni 提交于
This is a simple check before updating the superblock. It should update the superblock when update_size return 0. Signed-off-by: NXiao Ni <xni@redhat.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 10 6月, 2016 1 次提交
-
-
由 Cong Wang 提交于
We saw a list corruption in the list all_detected_devices: WARNING: CPU: 16 PID: 226 at lib/list_debug.c:29 __list_add+0x3c/0xa9() list_add corruption. next->prev should be prev (ffff880859d58320), but was ffff880859ce74c0. (next=ffffffff81abfdb0). Modules linked in: ahci libahci libata sd_mod scsi_mod CPU: 16 PID: 226 Comm: kworker/u241:4 Not tainted 4.1.20 #1 Hardware name: Dell Inc. PowerEdge C6220/04GD66, BIOS 2.2.3 11/07/2013 Workqueue: events_unbound async_run_entry_fn 0000000000000000 ffff880859a5baf8 ffffffff81502872 ffff880859a5bb48 0000000000000009 ffff880859a5bb38 ffffffff810692a5 ffff880859ee8828 ffffffff812ad02c ffff880859d58320 ffffffff81abfdb0 ffff880859eb90c0 Call Trace: [<ffffffff81502872>] dump_stack+0x4d/0x63 [<ffffffff810692a5>] warn_slowpath_common+0xa1/0xbb [<ffffffff812ad02c>] ? __list_add+0x3c/0xa9 [<ffffffff81069305>] warn_slowpath_fmt+0x46/0x48 [<ffffffff812ad02c>] __list_add+0x3c/0xa9 [<ffffffff81406f28>] md_autodetect_dev+0x41/0x62 [<ffffffff81285862>] rescan_partitions+0x25f/0x29d [<ffffffff81506372>] ? mutex_lock+0x13/0x31 [<ffffffff811a090f>] __blkdev_get+0x1aa/0x3cd [<ffffffff811a0b91>] blkdev_get+0x5f/0x294 [<ffffffff81377ceb>] ? put_device+0x17/0x19 [<ffffffff8128227c>] ? disk_put_part+0x12/0x14 [<ffffffff812836f3>] add_disk+0x29d/0x407 [<ffffffff81384345>] ? __pm_runtime_use_autosuspend+0x5c/0x64 [<ffffffffa004a724>] sd_probe_async+0x115/0x1af [sd_mod] [<ffffffff81083177>] async_run_entry_fn+0x72/0x12c [<ffffffff8107c44c>] process_one_work+0x198/0x2ce [<ffffffff8107cac7>] worker_thread+0x1dd/0x2bb [<ffffffff8107c8ea>] ? cancel_delayed_work_sync+0x15/0x15 [<ffffffff8107c8ea>] ? cancel_delayed_work_sync+0x15/0x15 [<ffffffff81080d9c>] kthread+0xae/0xb6 [<ffffffff81080000>] ? param_array_set+0x40/0xfa [<ffffffff81080cee>] ? __kthread_parkme+0x61/0x61 [<ffffffff81508152>] ret_from_fork+0x42/0x70 [<ffffffff81080cee>] ? __kthread_parkme+0x61/0x61 I suspect it is because there is no lock protecting this global list, autostart_arrays() is called in ioctl() path where there is no lock. Cc: Shaohua Li <shli@kernel.org> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 08 6月, 2016 3 次提交
-
-
由 Mike Christie 提交于
To avoid confusion between REQ_OP_FLUSH, which is handled by request_fn drivers, and upper layers requesting the block layer perform a flush sequence along with possibly a WRITE, this patch renames REQ_FLUSH to REQ_PREFLUSH. Signed-off-by: NMike Christie <mchristi@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Mike Christie 提交于
Separate the op from the rq_flag_bits and have md set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: NMike Christie <mchristi@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Mike Christie 提交于
This has callers of submit_bio/submit_bio_wait set the bio->bi_rw instead of passing it in. This makes that use the same as generic_make_request and how we set the other bio fields. Signed-off-by: NMike Christie <mchristi@redhat.com> Fixed up fs/ext4/crypto.c Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 04 6月, 2016 2 次提交
-
-
由 Guoqing Jiang 提交于
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Guoqing Jiang 提交于
Add a disk to an array which is performing recovery is a little complicated, we need to do both reap the sync thread and perform add disk for the case, then it caused deadlock as follows. linux44:~ # ps aux|grep md|grep D root 1822 0.0 0.0 0 0 ? D 16:50 0:00 [md127_resync] root 1848 0.0 0.0 19860 952 pts/0 D+ 16:50 0:00 mdadm --manage /dev/md127 --re-add /dev/vdb linux44:~ # cat /proc/1848/stack [<ffffffff8107afde>] kthread_stop+0x6e/0x120 [<ffffffffa051ddb0>] md_unregister_thread+0x40/0x80 [md_mod] [<ffffffffa0526e45>] md_reap_sync_thread+0x15/0x150 [md_mod] [<ffffffffa05271e0>] action_store+0x260/0x270 [md_mod] [<ffffffffa05206b4>] md_attr_store+0xb4/0x100 [md_mod] [<ffffffff81214a7e>] sysfs_write_file+0xbe/0x140 [<ffffffff811a6b98>] vfs_write+0xb8/0x1e0 [<ffffffff811a75b8>] SyS_write+0x48/0xa0 [<ffffffff8152a5c9>] system_call_fastpath+0x16/0x1b [<00007f068ea1ed30>] 0x7f068ea1ed30 linux44:~ # cat /proc/1822/stack [<ffffffffa05251a6>] md_do_sync+0x846/0xf40 [md_mod] [<ffffffffa052402d>] md_thread+0x16d/0x180 [md_mod] [<ffffffff8107ad94>] kthread+0xb4/0xc0 [<ffffffff8152a518>] ret_from_fork+0x58/0x90 Task1848 Task1822 md_attr_store (held reconfig_mutex by call mddev_lock()) action_store md_reap_sync_thread md_unregister_thread kthread_stop md_wakeup_thread(mddev->thread); wait_event(mddev->sb_wait, !test_bit(MD_CHANGE_PENDING)) md_check_recovery is triggered by wakeup mddev->thread, but it can't clear MD_CHANGE_PENDING flag since it can't get lock which was held by md_attr_store already. To solve the deadlock problem, we move "->resync_finish()" from md_do_sync to md_reap_sync_thread (after md_update_sb), also MD_HELD_RESYNC_LOCK is introduced since it is possible that node can't get resync lock in md_do_sync. Then we do not need to wait for MD_CHANGE_PENDING is cleared or not since metadata should be updated after md_update_sb, so just call resync_finish if MD_HELD_RESYNC_LOCK is set. We also unified the code after skip label, since set PENDING for non-clustered case should be harmless. Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 10 5月, 2016 2 次提交
-
-
由 Guoqing Jiang 提交于
Some code waits for a metadata update by: 1. flagging that it is needed (MD_CHANGE_DEVS or MD_CHANGE_CLEAN) 2. setting MD_CHANGE_PENDING and waking the management thread 3. waiting for MD_CHANGE_PENDING to be cleared If the first two are done without locking, the code in md_update_sb() which checks if it needs to repeat might test if an update is needed before step 1, then clear MD_CHANGE_PENDING after step 2, resulting in the wait returning early. So make sure all places that set MD_CHANGE_PENDING are atomicial, and bit_clear_unless (suggested by Neil) is introduced for the purpose. Cc: Martin Kepplinger <martink@posteo.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: <linux-kernel@vger.kernel.org> Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Heinz Mauelshagen 提交于
Introduced by upstream commit 70d9798b The raid0 personality does not create mddev->thread as oposed to other personalities leading to its unconditional access in mddev_suspend() causing an oops. Patch checks for mddev->thread in order to keep the intention of aforementioned commit. Fixes: 70d9798b ("MD: warn for potential deadlock") Cc: stable@vger.kernel.org (4.5+) Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 05 5月, 2016 4 次提交
-
-
由 Guoqing Jiang 提交于
When a device is re-added, it will ultimately need to be activated and that happens in md_check_recovery, so we need to set MD_RECOVERY_NEEDED right after remove_and_add_spares. A specifical issue without the change is that when one node perform fail/remove/readd on a disk, but slave nodes could not add the disk back to array as expected (added as missed instead of in sync). So give slave nodes a chance to do resync. Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Guoqing Jiang 提交于
Currently, some features are not supported yet, such as change array_sectors and update size, so return EINVAL for them and listed it in document. Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Guoqing Jiang 提交于
It is not reasonable that cluster raid to release resync lock before the last pers->sync_request has finished. As the metadata will be changed when node performs resync, we need to inform other nodes to update metadata, so the MD_CHANGE_PENDING flag is set before finish resync. Then metadata_update_finish is move ahead to ensure that METADATA_UPDATED msg is sent before finish resync, and metadata_update_start need to be run after "repeat:" label accordingly. Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Guoqing Jiang 提交于
If multiple nodes choose to attempt do resync at the same time they need to be serialized so they don't duplicate effort. This serialization is done by locking the 'resync' DLM lock. Currently if a node cannot get the lock immediately it doesn't request notification when the lock becomes available (i.e. DLM_LKF_NOQUEUE is set), so it may not reliably find out when it is safe to try again. Rather than trying to arrange an async wake-up when the lock becomes available, switch to using synchronous locking - this is a lot easier to think about. As it is not permitted to block in the 'raid1d' thread, move the locking to the resync thread. So the rsync thread is forked immediately, but it blocks until the resync lock is available. Once the lock is locked it checks again if any resync action is needed. A particular symptom of the current problem is that a node can get stuck with "resync=pending" indefinitely. Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 26 4月, 2016 1 次提交
-
-
由 Shaohua Li 提交于
blk_queue_split marks bio unmergeable, which makes sense for normal bio. But if dispatching the bio to underlayer disk, the blk_queue_split checks are invalid, hence it's possible the bio becomes mergeable. In the reported bug, this bug causes trim against raid0 performance slash https://bugzilla.kernel.org/show_bug.cgi?id=117051Reported-and-tested-by: NPark Ju Hyung <qkrwngud825@gmail.com> Fixes: 6ac45aeb(block: avoid to merge splitted bio) Cc: stable@vger.kernel.org (v4.3+) Cc: Ming Lei <ming.lei@canonical.com> Cc: Neil Brown <neilb@suse.de> Reviewed-by: NJens Axboe <axboe@fb.com> Signed-off-by: NShaohua Li <shli@fb.com>
-