- 17 3月, 2017 3 次提交
-
-
由 Artur Paszkiewicz 提交于
Include information about PPL location and size into mdp_superblock_1 and copy it to/from rdev. Because PPL is mutually exclusive with bitmap, put it in place of 'bitmap_offset'. Add a new flag MD_FEATURE_PPL for 'feature_map', analogically to MD_FEATURE_BITMAP_OFFSET. Add MD_HAS_PPL to mddev->flags to indicate that PPL is enabled on an array. Signed-off-by: NArtur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Guoqing Jiang 提交于
To update size for cluster raid, we need to make sure all nodes can perform the change successfully. However, it is possible that some of them can't do it due to failure (bitmap_resize could fail). So we need to consider the issue before we set the capacity unconditionally, and we use below steps to perform sanity check. 1. A change the size, then broadcast METADATA_UPDATED msg. 2. B and C receive METADATA_UPDATED change the size excepts call set_capacity, sync_size is not update if the change failed. Also call bitmap_update_sb to sync sb to disk. 3. A checks other node's sync_size, if sync_size has been updated in all nodes, then send CHANGE_CAPACITY msg otherwise send msg to revert previous change. 4. B and C call set_capacity if receive CHANGE_CAPACITY msg, otherwise pers->resize will be called to restore the old value. Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Guoqing Jiang 提交于
Previously, when node received METADATA_UPDATED msg, it just need to wakeup mddev->thread, then md_reload_sb will be called eventually. We taken the asynchronous way to avoid a deadlock issue, the deadlock issue could happen when one node is receiving the METADATA_UPDATED msg (wants reconfig_mutex) and trying to run the path: md_check_recovery -> mddev_trylock(hold reconfig_mutex) -> md_update_sb-metadata_update_start (want EX on token however token is got by the sending node) Since we will support resizing for clustered raid, and we need the metadata update handling to be synchronous so that the initiating node can detect failure, so we need to change the way for handling METADATA_UPDATED msg. But, we obviously need to avoid above deadlock with the sync way. To make this happen, we considered to not hold reconfig_mutex to call md_reload_sb, if some other thread has already taken reconfig_mutex and waiting for the 'token', then process_recvd_msg() can safely call md_reload_sb() without taking the mutex. This is because we can be certain that no other thread will take the mutex, and we also certain that the actions performed by md_reload_sb() won't interfere with anything that the other thread is in the middle of. To make this more concrete, we added a new cinfo->state bit MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD Which is set in lock_token() just before dlm_lock_sync() is called, and cleared just after. As lock_token() is always called with reconfig_mutex() held (the specific case is the resync_info_update which is distinguished well in previous patch), if process_recvd_msg() finds that the new bit is set, then the mutex must be held by some other thread, and it will keep waiting. So process_metadata_update() can call md_reload_sb() if either mddev_trylock() succeeds, or if MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD is set. The tricky bit is what to do if neither of these apply. We need to wait. Fortunately mddev_unlock() always calls wake_up() on mddev->thread->wqueue. So we can get lock_token() to call wake_up() on that when it sets the bit. There are also some related changes inside this commit: 1. remove RELOAD_SB related codes since there are not valid anymore. 2. mddev is added into md_cluster_info then we can get mddev inside lock_token. 3. add new parameter for lock_token to distinguish reconfig_mutex is held or not. And, we need to set MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD in below: 1. set it before unregister thread, otherwise a deadlock could appear if stop a resyncing array. This is because md_unregister_thread(&cinfo->recv_thread) is blocked by recv_daemon -> process_recvd_msg -> process_metadata_update. To resolve the issue, MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD is also need to be set before unregister thread. 2. set it in metadata_update_start to fix another deadlock. a. Node A sends METADATA_UPDATED msg (held Token lock). b. Node B wants to do resync, and is blocked since it can't get Token lock, but MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD is not set since the callchain (md_do_sync -> sync_request -> resync_info_update -> sendmsg -> lock_comm -> lock_token) doesn't hold reconfig_mutex. c. Node B trys to update sb (held reconfig_mutex), but stopped at wait_event() in metadata_update_start since we have set MD_CLUSTER_SEND_LOCK flag in lock_comm (step 2). d. Then Node B receives METADATA_UPDATED msg from A, of course recv_daemon is blocked forever. Since metadata_update_start always calls lock_token with reconfig_mutex, we need to set MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD here as well, and lock_token don't need to set it twice unless lock_token is invoked from lock_comm. Finally, thanks to Neil for his great idea and help! Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 11 3月, 2017 2 次提交
-
-
由 Jason Yan 提交于
The sb->layout is of type __le32, so we shoud use le32_to_cpu. Signed-off-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Jason Yan 提交于
The sb->super_offset should be big-endian, but the rdev->sb_start is in host byte order, so fix this by adding cpu_to_le64. Signed-off-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 10 3月, 2017 3 次提交
-
-
由 NeilBrown 提交于
These arrays, created with "mdadm --build" don't benefit from a limit. The default will be used, which is '0' and is interpreted as "don't impose a limit". Reported-by: ian_bruce@mail.ru Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Guoqing Jiang 提交于
raid1_resize and raid5_resize should also check the mddev->queue if run underneath dm-raid. And both set_capacity and revalidate_disk are used in pers->resize such as raid1, raid10 and raid5. So move them from personality file to common code. Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Shaohua Li 提交于
Nobody is using mddev_check_plugged(), so delete the dead code Signed-off-by: NShaohua Li <shli@fb.com>
-
- 02 3月, 2017 1 次提交
-
-
由 Ingo Molnar 提交于
We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/signal.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 16 2月, 2017 3 次提交
-
-
由 Ming Lei 提交于
Firstly bio_clone_mddev() is used in raid normal I/O and isn't in resync I/O path. Secondly all the direct access to bvec table in raid happens on resync I/O except for write behind of raid1, in which we still use bio_clone() for allocating new bvec table. So this patch replaces bio_clone() with bio_clone_fast() in bio_clone_mddev(). Also kill bio_clone_mddev() and call bio_clone_fast() directly, as suggested by Christoph Hellwig. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMing Lei <tom.leiming@gmail.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Ming Lei 提交于
mddev is never NULL and neither is ->bio_set, so remove the check. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMing Lei <tom.leiming@gmail.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Ming Lei 提交于
The current behaviour is to fall back to allocate bio from 'fs_bio_set', that isn't a correct way because it might cause deadlock. So this patch simply return failure if mddev->bio_set can't be created. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMing Lei <tom.leiming@gmail.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 14 2月, 2017 1 次提交
-
-
由 NeilBrown 提交于
Commit: cbd19983 ("md: Fix unfortunate interaction with evms") change mddev_put() so that it would not destroy an md device while ->ctime was non-zero. Unfortunately, we didn't make sure to clear ->ctime when unloading the module, so it is possible for an md device to remain after module unload. An attempt to open such a device will trigger an invalid memory reference in: get_gendisk -> kobj_lookup -> exact_lock -> get_disk when tring to access disk->fops, which was in the module that has been removed. So ensure we clear ->ctime in md_exit(), and explain how that is useful, as it isn't immediately obvious when looking at the code. Fixes: cbd19983 ("md: Fix unfortunate interaction with evms") Tested-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 02 2月, 2017 1 次提交
-
-
由 Jan Kara 提交于
We will want to have struct backing_dev_info allocated separately from struct request_queue. As the first step add pointer to backing_dev_info to request_queue and convert all users touching it. No functional changes in this patch. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 25 1月, 2017 1 次提交
-
-
由 Song Liu 提交于
For safer operation, all arrays start in write-through mode, which has been better tested and is more mature. And actually the write-through/write-mode isn't persistent after array restarted, so we always start array in write-through mode. However, if recovery found data-only stripes before the shutdown (from previous write-back mode), it is not safe to start the array in write-through mode, as write-through mode can not handle stripes with data in write-back cache. To solve this problem, we flush all data-only stripes in r5l_recovery_log(). When r5l_recovery_log() returns, the array starts with empty cache in write-through mode. This logic is implemented in r5c_recovery_flush_data_only_stripes(): 1. enable write back cache 2. flush all stripes 3. wake up conf->mddev->thread 4. wait for all stripes get flushed (reuse wait_for_quiescent) 5. disable write back cache The wait in 4 will be waked up in release_inactive_stripe_list() when conf->active_stripes reaches 0. It is safe to wake up mddev->thread here because all the resource required for the thread has been initialized. Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 09 12月, 2016 2 次提交
-
-
由 Shaohua Li 提交于
The mddev->flags are used for different purposes. There are a lot of places we check/change the flags without masking unrelated flags, we could check/change unrelated flags. These usage are most for superblock write, so spearate superblock related flags. This should make the code clearer and also fix real bugs. Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Shaohua Li 提交于
Fixes: 90f5f7ad("md: Wait for md_check_recovery before attempting device removal.") Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 06 12月, 2016 1 次提交
-
-
由 NeilBrown 提交于
md_open() gets a counted reference on an mddev using mddev_find(). If it ends up returning an error, it must drop this reference. There are two error paths where the reference is not dropped. One only happens if the process is signalled and an awkward time, which is quite unlikely. The other was introduced recently in commit af8d8e6f. Change the code to ensure the drop the reference when returning an error, and make it harded to re-introduce this sort of bug in the future. Reported-by: NMarc Smith <marc.smith@mcc.edu> Fixes: af8d8e6f ("md: changes for MD_STILL_CLOSED flag") Signed-off-by: NNeilBrown <neilb@suse.com> Acked-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 24 11月, 2016 2 次提交
-
-
由 Shaohua Li 提交于
__md_stop_writes currently doesn't stop raid5-cache reclaim thread. It's possible the reclaim thread is still running and doing write, which doesn't match what __md_stop_writes should do. The extra ->quiesce() call should not harm any raid types. For raid5-cache, this will guarantee we reclaim all caches before we update superblock. Signed-off-by: NShaohua Li <shli@fb.com> Reviewed-by: NNeilBrown <neilb@suse.de> Cc: Song Liu <songliubraving@fb.com>
-
由 Shaohua Li 提交于
There is mechanism to suspend a kernel thread. Use it instead of playing create/destroy game. Signed-off-by: NShaohua Li <shli@fb.com> Reviewed-by: NNeilBrown <neilb@suse.de> Cc: Song Liu <songliubraving@fb.com>
-
- 23 11月, 2016 2 次提交
-
-
由 NeilBrown 提交于
This can only be supported on personalities which ensure that md_error() never causes an array to enter the 'failed' state. i.e. if marking a device Faulty would cause some data to be inaccessible, the device is status is left as non-Faulty. This is true for RAID1 and RAID10. If we get a failure writing metadata but the device doesn't fail, it must be the last device so we re-write without FAILFAST to improve chance of success. We also flag the device as LastDev so that future metadata updates don't waste time on failfast writes. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
This patch just adds a 'failfast' per-device flag which can be stored in v0.90 or v1.x metadata. The flag is not used yet but the intent is that it can be used for mirrored (raid1/raid10) arrays where low latency is more important than keeping all devices on-line. Setting the flag for a device effectively gives permission for that device to be marked as Faulty and excluded from the array on the first error. The underlying driver will be directed not to retry requests that result in failures. There is a proviso that the device must not be marked faulty if that would cause the array as a whole to fail, it may only be marked Faulty if the array remains functional, but is degraded. Failures on read requests will cause the device to be marked as Faulty immediately so that further reads will avoid that device. No attempt will be made to correct read errors by over-writing with the correct data. It is expected that if transient errors, such as cable unplug, are possible, then something in user-space will revalidate failed devices and re-add them when they appear to be working again. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 19 11月, 2016 1 次提交
-
-
由 Shaohua Li 提交于
superblock write is an expensive operation. With raid5-cache, it can be called regularly. Tracing to help performance debug. Signed-off-by: NShaohua Li <shli@fb.com> Cc: NeilBrown <neilb@suse.com>
-
- 10 11月, 2016 1 次提交
-
-
由 NeilBrown 提交于
bitmap_flush() finishes with bitmap_update_sb(), and that finishes with write_page(..., 1), so write_page() will wait for all writes to complete. So there is no point calling md_super_wait() immediately afterwards. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 08 11月, 2016 6 次提交
-
-
由 NeilBrown 提交于
When adding devices to, or removing device from, an array we need to update the metadata. However we don't need to do it synchronously as data integrity doesn't depend on these changes being recorded instantly. So avoid the synchronous call to md_update_sb and just set a flag so that the thread will do it. This can reduce the number of updates performed when lots of devices are being added or removed. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
1/ using pr_debug() for a number of messages reduces the noise of md, but still allows them to be enabled when needed. 2/ try to be consistent in the usage of pr_err() and pr_warn(), and document the intention 3/ When strings have been split onto multiple lines, rejoin into a single string. The cost of having lines > 80 chars is less than the cost of not being able to easily search for a particular message. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
1/ don't print a warning if allocation fails. page_alloc() does that already. 2/ always check return status for error. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Tomasz Majchrzak 提交于
When raid1/raid10 array fails to write to one of the drives, the request is added to bio_end_io_list and finished by personality thread. The thread doesn't handle it as long as MD_CHANGE_PENDING flag is set. In case of external metadata this flag is cleared, however the thread is not woken up. It causes request to be blocked for few seconds (until another action on the array wakes up the thread) or to get stuck indefinitely. Wake up personality thread once MD_CHANGE_PENDING has been cleared. Moving 'restart_array' call after the flag is cleared it not a solution because in read-write mode the call doesn't wake up the thread. Signed-off-by: NTomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Tomasz Majchrzak 提交于
If external metadata handler supports bad blocks and unacknowledged bad blocks are present, don't report disk via sysfs as faulty. Such situation can be still handled so disk just has to be blocked for a moment. It makes it consistent with kernel state as corresponding rdev flag is also not set. When the disk in being unblocked there are few cases: 1. Disk has been in blocked and faulty state, it is being unblocked but it still remains in faulty state. Metadata handler will remove it from array in the next call. 2. There is no bad block support in external metadata handler and bad blocks are present - put the disk in blocked and faulty state (see case 1). 3. There is bad block support in external metadata handler and all bad blocks are acknowledged - clear all flags, continue. 4. There is bad block support in external metadata handler but there are still unacknowledged bad blocks - clear all flags, continue. It is fine to clear Blocked flag because it was probably not set anyway (if it was it is case 1). BlockedBadBlocks flag can also be cleared because the request waiting for it will set it again when it finds out that some bad block is still not acknowledged. Recovery is not necessary but there are no problems if the flag is set. Sysfs rdev state is still reported as blocked (due to unacknowledged bad blocks) so metadata handler will process remaining bad blocks and unblock disk again. Signed-off-by: NTomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Tomasz Majchrzak 提交于
Add new rdev flag which external metadata handler can use to switch on/off bad block support. If new bad block is encountered, notify it via rdev 'unacknowledged_bad_blocks' sysfs file. If bad block has been cleared, notify update to rdev 'bad_blocks' sysfs file. When bad blocks support is being removed, just clear rdev flag. It is not necessary to reset badblocks->shift field. If there are bad blocks cleared or added at the same time, it is ok for those changes to be applied to the structure. The array is in blocked state and the drive which cannot handle bad blocks any more will be removed from the array before it is unlocked. Simplify state_show function by adding a separator at the end of each string and overwrite last separator with new line. Signed-off-by: NTomasz Majchrzak <tomasz.majchrzak@intel.com> Reviewed-by: NArtur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 01 11月, 2016 1 次提交
-
-
由 Christoph Hellwig 提交于
Remove the WRITE_* and READ_SYNC wrappers, and just use the flags directly. Where applicable this also drops usage of the bio_set_op_attrs wrapper. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 29 10月, 2016 1 次提交
-
-
由 NeilBrown 提交于
mddev->curr_resync usually records where the current resync is up to, but during the starting phase it has some "magic" values. 1 - means that the array is trying to start a resync, but has yielded to another array which shares physical devices, and also needs to start a resync 2 - means the array is trying to start resync, but has found another array which shares physical devices and has already started resync. 3 - means that resync has commensed, but it is possible that nothing has actually been resynced yet. It is important that this value not be visible to user-space and particularly that it doesn't get written to the metadata, as the resync or recovery checkpoint. In part, this is because it may be slightly higher than the correct value, though this is very rare. In part, because it is not a multiple of 4K, and some devices only support 4K aligned accesses. There are two places where this value is propagates into either ->curr_resync_completed or ->recovery_cp or ->recovery_offset. These currently avoid the propagation of values 1 and 3, but will allow 3 to leak through. Change them to only propagate the value if it is > 3. As this can cause an array to fail, the patch is suitable for -stable. Cc: stable@vger.kernel.org (v3.7+) Reported-by: NViswesh <viswesh.vichu@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 25 10月, 2016 1 次提交
-
-
由 Tomasz Majchrzak 提交于
If there is a bad block on a disk and there is a recovery performed from this disk, the same bad block is reported for a new disk. It involves setting MD_CHANGE_PENDING flag in rdev_set_badblocks. For external metadata this flag is not being cleared as array state is reported as 'clean'. The read request to bad block in RAID5 array gets stuck as it is waiting for a flag to be cleared - as per commit c3cce6cd ("md/raid5: ensure device failure recorded before write request returns."). The meaning of MD_CHANGE_PENDING and MD_CHANGE_CLEAN flags has been clarified in commit 070dc6dd ("md: resolve confusion of MD_CHANGE_CLEAN"), however MD_CHANGE_PENDING flag has been used in personality error handlers since and it doesn't fully comply with initial purpose. It was supposed to notify that write request is about to start, however now it is also used to request metadata update. Initially (in md_allow_write, md_write_start) MD_CHANGE_PENDING flag has been set and in_sync has been set to 0 at the same time. Error handlers just set the flag without modifying in_sync value. Sysfs array state is a single value so now it reports 'clean' when MD_CHANGE_PENDING flag is set and in_sync is set to 1. Userspace has no idea it is expected to take some action. Swap the order that array state is checked so 'write_pending' is reported ahead of 'clean' ('write_pending' is a misleading name but it is too late to rename it now). Signed-off-by: NTomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 04 10月, 2016 1 次提交
-
-
由 Shaohua Li 提交于
if all disks in an array are non-rotational, set the array non-rotational. This only works for array with all disks populated at startup. Support for disk hotadd/hotremove could be added later if necessary. Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 22 9月, 2016 4 次提交
-
-
由 Shaohua Li 提交于
lockdep reports a potential deadlock. Fix this by droping the mutex before md_import_device [ 1137.126601] ====================================================== [ 1137.127013] [ INFO: possible circular locking dependency detected ] [ 1137.127013] 4.8.0-rc4+ #538 Not tainted [ 1137.127013] ------------------------------------------------------- [ 1137.127013] mdadm/16675 is trying to acquire lock: [ 1137.127013] (&bdev->bd_mutex){+.+.+.}, at: [<ffffffff81243cf3>] __blkdev_get+0x63/0x450 [ 1137.127013] but task is already holding lock: [ 1137.127013] (detected_devices_mutex){+.+.+.}, at: [<ffffffff81a5138c>] md_ioctl+0x2ac/0x1f50 [ 1137.127013] which lock already depends on the new lock. [ 1137.127013] the existing dependency chain (in reverse order) is: [ 1137.127013] -> #1 (detected_devices_mutex){+.+.+.}: [ 1137.127013] [<ffffffff810b6f19>] lock_acquire+0xb9/0x220 [ 1137.127013] [<ffffffff81c51647>] mutex_lock_nested+0x67/0x3d0 [ 1137.127013] [<ffffffff81a4eeaf>] md_autodetect_dev+0x3f/0x90 [ 1137.127013] [<ffffffff81595be8>] rescan_partitions+0x1a8/0x2c0 [ 1137.127013] [<ffffffff81590081>] __blkdev_reread_part+0x71/0xb0 [ 1137.127013] [<ffffffff815900e5>] blkdev_reread_part+0x25/0x40 [ 1137.127013] [<ffffffff81590c4b>] blkdev_ioctl+0x51b/0xa30 [ 1137.127013] [<ffffffff81242bf1>] block_ioctl+0x41/0x50 [ 1137.127013] [<ffffffff81214c96>] do_vfs_ioctl+0x96/0x6e0 [ 1137.127013] [<ffffffff81215321>] SyS_ioctl+0x41/0x70 [ 1137.127013] [<ffffffff81c56825>] entry_SYSCALL_64_fastpath+0x18/0xa8 [ 1137.127013] -> #0 (&bdev->bd_mutex){+.+.+.}: [ 1137.127013] [<ffffffff810b6af2>] __lock_acquire+0x1662/0x1690 [ 1137.127013] [<ffffffff810b6f19>] lock_acquire+0xb9/0x220 [ 1137.127013] [<ffffffff81c51647>] mutex_lock_nested+0x67/0x3d0 [ 1137.127013] [<ffffffff81243cf3>] __blkdev_get+0x63/0x450 [ 1137.127013] [<ffffffff81244307>] blkdev_get+0x227/0x350 [ 1137.127013] [<ffffffff812444f6>] blkdev_get_by_dev+0x36/0x50 [ 1137.127013] [<ffffffff81a46d65>] lock_rdev+0x35/0x80 [ 1137.127013] [<ffffffff81a49bb4>] md_import_device+0xb4/0x1b0 [ 1137.127013] [<ffffffff81a513d6>] md_ioctl+0x2f6/0x1f50 [ 1137.127013] [<ffffffff815909b3>] blkdev_ioctl+0x283/0xa30 [ 1137.127013] [<ffffffff81242bf1>] block_ioctl+0x41/0x50 [ 1137.127013] [<ffffffff81214c96>] do_vfs_ioctl+0x96/0x6e0 [ 1137.127013] [<ffffffff81215321>] SyS_ioctl+0x41/0x70 [ 1137.127013] [<ffffffff81c56825>] entry_SYSCALL_64_fastpath+0x18/0xa8 [ 1137.127013] other info that might help us debug this: [ 1137.127013] Possible unsafe locking scenario: [ 1137.127013] CPU0 CPU1 [ 1137.127013] ---- ---- [ 1137.127013] lock(detected_devices_mutex); [ 1137.127013] lock(&bdev->bd_mutex); [ 1137.127013] lock(detected_devices_mutex); [ 1137.127013] lock(&bdev->bd_mutex); [ 1137.127013] *** DEADLOCK *** Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Guoqing Jiang 提交于
cluster_info and bitmap_info.nodes also need to be cleared when array is stopped. Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Guoqing Jiang 提交于
When stop clustered raid while it is pending on resync, MD_STILL_CLOSED flag could be cleared since udev rule is triggered to open the mddev. So obviously array can't be stopped soon and returns EBUSY. mdadm -Ss md-raid-arrays.rules set MD_STILL_CLOSED md_open() ... ... ... clear MD_STILL_CLOSED do_md_stop We make below changes to resolve this issue: 1. rename MD_STILL_CLOSED to MD_CLOSING since it is set when stop array and it means we are stopping array. 2. let md_open returns early if CLOSING is set, so no other threads will open array if one thread is trying to close it. 3. no need to clear CLOSING bit in md_open because 1 has ensure the bit is cleared, then we also don't need to test CLOSING bit in do_md_stop. Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Guoqing Jiang 提交于
The new_disk_ack could return failure if WAITING_FOR_NEWDISK is not set, so we need to kick the dev from array in case failure happened. And we missed to check err before call new_disk_ack othwise we could kick a rdev which isn't in array, thanks for the reminder from Shaohua. Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 09 9月, 2016 1 次提交
-
-
由 Guoqing Jiang 提交于
The md-cluster is compiled as module by default, if it is compiled by built-in way, then we can't make md-cluster works. [64782.630008] md/raid1:md127: active with 2 out of 2 mirrors [64782.630528] md-cluster module not found. [64782.630530] md127: Could not setup cluster service (-2) Fixes: edb39c9d ("Introduce md_cluster_operations to handle cluster functions") Cc: stable@vger.kernel.org (v4.1+) Reported-by: NMarc Smith <marc.smith@mcc.edu> Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 25 8月, 2016 1 次提交
-
-
由 Song Liu 提交于
Currently, the code sets MD_JOURNAL_CLEAN when the array has MD_FEATURE_JOURNAL and the recovery_cp is MaxSector. The array will be MD_JOURNAL_CLEAN even if the journal device is missing. With this patch, the MD_JOURNAL_CLEAN is only set when the journal device presents. Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NShaohua Li <shli@fb.com>
-