- 02 11月, 2017 7 次提交
-
-
由 NeilBrown 提交于
The '2' argument means "wake up anything that is waiting". This is an inelegant part of the design and was added to help support management of suspend_lo/suspend_hi setting. Now that suspend_lo/hi is managed in mddev_suspend/resume, that need is gone. These is still a couple of places where we call 'quiesce' with an argument of '2', but they can safely be changed to call ->quiesce(.., 1); ->quiesce(.., 0) which achieve the same result at the small cost of pausing IO briefly. This removes a small "optimization" from suspend_{hi,lo}_store, but it isn't clear that optimization served a useful purpose. The code now is a lot clearer. Suggested-by: NShaohua Li <shli@kernel.org> Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
There are various deadlocks that can occur when a thread holds reconfig_mutex and calls ->quiesce(mddev, 1). As some write request block waiting for metadata to be updated (e.g. to record device failure), and as the md thread updates the metadata while the reconfig mutex is held, holding the mutex can stop write requests completing, and this prevents ->quiesce(mddev, 1) from completing. ->quiesce() is now usually called from mddev_suspend(), and it is always called with reconfig_mutex held. So at this time it is safe for the thread to update metadata without explicitly taking the lock. So add 2 new flags, one which says the unlocked updates is allowed, and one which ways it is happening. Then allow it while the quiesce completes, and then wait for it to finish. Reported-and-tested-by: NXiao Ni <xni@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
mddev_suspend() is a more general interface than calling ->quiesce() and is so more extensible. A future patch will make use of this. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
responding to ->suspend_lo and ->suspend_hi is similar to responding to ->suspended. It is best to wait in the common core code without incrementing ->active_io. This allows mddev_suspend()/mddev_resume() to work while requests are waiting for suspend_lo/hi to change. This is will be important after a subsequent patch which uses mddev_suspend() to synchronize updating for suspend_lo/hi. So move the code for testing suspend_lo/hi out of raid1.c and raid5.c, and place it in md.c Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
bitmap_create() allocates memory with GFP_KERNEL and so can wait for IO. If called while the array is quiesced, it could wait indefinitely for write out to the array - deadlock. So call bitmap_create() before quiescing the array. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
Most often mddev_suspend() is called with reconfig_mutex held. Make this a requirement in preparation a subsequent patch. Also require reconfig_mutex to be held for mddev_resume(), partly for symmetry and partly to guarantee no races with incr/decr of mddev->suspend. Taking the mutex in r5c_disable_writeback_async() is a little tricky as this is called from a work queue via log->disable_writeback_work, and flush_work() is called on that while holding ->reconfig_mutex. If the work item hasn't run before flush_work() is called, the work function will not be able to get the mutex. So we use mddev_trylock() inside the wait_event() call, and have that abort when conf->log is set to NULL, which happens before flush_work() is called. We wait in mddev->sb_wait and ensure this is woken when any of the conditions change. This requires waking mddev->sb_wait in mddev_unlock(). This is only like to trigger extra wake_ups of threads that needn't be woken when metadata is being written, and that doesn't happen often enough that the cost would be noticeable. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
Having both a bitmap and a journal is pointless. Attempting to do so can corrupt the bitmap if the journal replay happens before the bitmap is initialized. Rather than try to avoid this corruption, simply refuse to allow arrays with both a bitmap and a journal. So: - if raid5_run sees both are present, fail. - if adding a bitmap finds a journal is present, fail - if adding a journal finds a bitmap is present, fail. Cc: stable@vger.kernel.org (4.10+) Signed-off-by: NNeilBrown <neilb@suse.com> Tested-by: NJoshua Kinard <kumba@gentoo.org> Acked-by: NJoshua Kinard <kumba@gentoo.org> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 17 10月, 2017 1 次提交
-
-
由 Mike Snitzer 提交于
Motivated by the desire to illiminate the imprecise nature of DM-specific patches being unnecessarily sent to both the MD maintainer and mailing-list. Which is born out of the fact that DM files also reside in drivers/md/ Now all MD-specific files in drivers/md/ start with either "raid" or "md-" and the MAINTAINERS file has been updated accordingly. Shaohua: don't change module name Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 09 10月, 2017 1 次提交
-
-
由 Guoqing Jiang 提交于
Since commit 4ad23a97 ("MD: use per-cpu counter for writes_pending"), the wait_queue is only got invoked if THREAD_WAKEUP is not set previously. With above change, I can see process_metadata_update could always hang on the wait queue, because mddev->thread could stay on 'D' status and the THREAD_WAKEUP flag is not cleared since there are lots of place to wake up mddev->thread. Then deadlock happened as follows: linux175:~ # ps aux|grep md|grep D root 20117 0.0 0.0 0 0 ? D 03:45 0:00 [md0_raid1] root 20125 0.0 0.0 0 0 ? D 03:45 0:00 [md0_cluster_rec] linux175:~ # cat /proc/20117/stack [<ffffffffa0635604>] dlm_lock_sync+0x94/0xd0 [md_cluster] [<ffffffffa0635674>] lock_token+0x34/0xd0 [md_cluster] [<ffffffffa0635804>] metadata_update_start+0x64/0x110 [md_cluster] [<ffffffffa04d985b>] md_update_sb.part.58+0x9b/0x860 [md_mod] [<ffffffffa04da035>] md_update_sb+0x15/0x30 [md_mod] [<ffffffffa04dc066>] md_check_recovery+0x266/0x490 [md_mod] [<ffffffffa06450e2>] raid1d+0x42/0x810 [raid1] [<ffffffffa04d2252>] md_thread+0x122/0x150 [md_mod] [<ffffffff81091741>] kthread+0x101/0x140 linux175:~ # cat /proc/20125/stack [<ffffffffa0636679>] recv_daemon+0x3f9/0x5c0 [md_cluster] [<ffffffffa04d2252>] md_thread+0x122/0x150 [md_mod] [<ffffffff81091741>] kthread+0x101/0x140 So let's revert the part of code in the commit to resovle the problem since we can't get lots of benefits of previous change. Fixes: 4ad23a97 ("MD: use per-cpu counter for writes_pending") Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 06 10月, 2017 1 次提交
-
-
由 NeilBrown 提交于
A recent patch aimed to cause md_write_start() to fail (rather than block) when the mddev was suspending, so as to avoid deadlocks. Unfortunately the test in wait_event() was wrong, and it didn't change behaviour at all. We wait_event() must wait until the metadata is written OR the array is suspending. Fixes: cc27b0c7 ("md: fix deadlock between mddev_suspend() and md_write_start()") Cc: stable@vger.kernel.org Reported-by: NXiao Ni <xni@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 28 9月, 2017 2 次提交
-
-
由 Shaohua Li 提交于
md_submit_flush_data calls pers->make_request, which missed the suspend check. Fix it with the new md_handle_request API. Reported-by: NNate Dailey <nate.dailey@stratus.com> Tested-by: NNate Dailey <nate.dailey@stratus.com> Fix: cc27b0c7(md: fix deadlock between mddev_suspend() and md_write_start()) Cc: stable@vger.kernel.org Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Shaohua Li 提交于
With commit cc27b0c7, pers->make_request could bail out without handling the bio. If that happens, we should retry. The commit fixes md_make_request but not other call sites. Separate the request handling part, so other call sites can use it. Reported-by: NNate Dailey <nate.dailey@stratus.com> Fix: cc27b0c7(md: fix deadlock between mddev_suspend() and md_write_start()) Cc: stable@vger.kernel.org Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 28 8月, 2017 1 次提交
-
-
由 Pawel Baldysiak 提交于
Increase PPL area to 1MB and use it as circular buffer to store PPL. The entry with highest generation number is the latest one. If PPL to be written is larger then space left in a buffer, rewind the buffer to the start (don't wrap it). Signed-off-by: NPawel Baldysiak <pawel.baldysiak@intel.com> Signed-off-by: NArtur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 26 8月, 2017 2 次提交
-
-
由 Cihangir Akturk 提交于
Since commit f1514638 ("fs: seq_file - add event counter to simplify poll() support"), md.c code has been no longer used the private field of the struct seq_file, but seq_release_private() has been continued to be used to release the allocated seq_file. This seems to have been forgotten. So convert it to use seq_release() instead of seq_release_private(). Signed-off-by: NCihangir Akturk <cakturk@gmail.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Alexey Obitotskiy 提交于
In case of external metadata arrays spare disks are added to containers first. mdadm keeps monitoring /proc/mdstat output and when spare disk is available, it moves it from the container to the array. The problem is there is no notification of new spare disk in the container and mdadm waits a long time (until timeout) before it takes the action. Signed-off-by: NAlexey Obitotskiy <aleksey.obitotskiy@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 24 8月, 2017 1 次提交
-
-
由 Christoph Hellwig 提交于
This way we don't need a block_device structure to submit I/O. The block_device has different life time rules from the gendisk and request_queue and is usually only available when the block device node is open. Other callers need to explicitly create one (e.g. the lightnvm passthrough code, or the new nvme multipathing code). For the actual I/O path all that we need is the gendisk, which exists once per block device. But given that the block layer also does partition remapping we additionally need a partition index, which is used for said remapping in generic_make_request. Note that all the block drivers generally want request_queue or sometimes the gendisk, so this removes a layer of indirection all over the stack. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 12 8月, 2017 1 次提交
-
-
由 Shaohua Li 提交于
->safemode should be triggered by mdadm for external metadaa array, otherwise array's state confuses mdadm. Fixes: 33182d15(md: always clear ->safemode when md_check_recovery gets the mddev lock.) Cc: NeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 08 8月, 2017 2 次提交
-
-
由 NeilBrown 提交于
md_write_start() needs to clear the in_sync flag is it is set, or if there might be a race with set_in_sync() such that the later will set it very soon. In the later case it is sufficient to take the spinlock to synchronize with set_in_sync(), and then set the flag if needed. The current test is incorrect. It should be: if "flag is set" or "race is possible" "flag is set" is trivially "mddev->in_sync". "race is possible" should be tested by "mddev->sync_checkers". If sync_checkers is 0, then there can be no race. set_in_sync() will wait in percpu_ref_switch_to_atomic_sync() for an RCU grace period, and as md_write_start() holds the rcu_read_lock(), set_in_sync() will be sure ot see the update to writes_pending. If sync_checkers is > 0, there could be race. If md_write_start() happened entirely between if (!mddev->in_sync && percpu_ref_is_zero(&mddev->writes_pending)) { and mddev->in_sync = 1; in set_in_sync(), then it would not see that is_sync had been set, and set_in_sync() would not see that writes_pending had been incremented. This bug means that in_sync is sometimes not set when it should be. Consequently there is a small chance that the array will be marked as "clean" when in fact it is inconsistent. Fixes: 4ad23a97 ("MD: use per-cpu counter for writes_pending") cc: stable@vger.kernel.org (v4.12+) Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
If ->safemode == 1, md_check_recovery() will try to get the mddev lock and perform various other checks. If mddev->in_sync is zero, it will call set_in_sync, and clear ->safemode. However if mddev->in_sync is not zero, ->safemode will not be cleared. When md_check_recovery() drops the mddev lock, the thread is woken up again. Normally it would just check if there was anything else to do, find nothing, and go to sleep. However as ->safemode was not cleared, it will take the mddev lock again, then wake itself up when unlocking. This results in an infinite loop, repeatedly calling md_check_recovery(), which RCU or the soft-lockup detector will eventually complain about. Prior to commit 4ad23a97 ("MD: use per-cpu counter for writes_pending"), safemode would only be set to one when the writes_pending counter reached zero, and would be cleared again when writes_pending is incremented. Since that patch, safemode is set more freely, but is not reliably cleared. So in md_check_recovery() clear ->safemode before checking ->in_sync. Fixes: 4ad23a97 ("MD: use per-cpu counter for writes_pending") Cc: stable@vger.kernel.org (4.12+) Reported-by: NDominik Brodowski <linux@dominikbrodowski.net> Reported-by: NDavid R <david@unsolicited.net> Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 26 7月, 2017 1 次提交
-
-
由 Shaohua Li 提交于
spin_is_locked always returns 0 for UP case, so ignores it Reported-by: NJoshua Kinard <kumba@gentoo.org> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 04 7月, 2017 1 次提交
-
-
由 Shaohua Li 提交于
bioset_free() will take a mutex, so can't get called with spinlock hold. Fix: 5a85071c(md: use a separate bio_set for synchronous IO.) Cc: NeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 24 6月, 2017 1 次提交
-
-
由 Shaohua Li 提交于
rdev->mddev could be null in start time. Reported-by: NMing Lei <ming.lei@redhat.com> Fix: 5a85071c(md: use a separate bio_set for synchronous IO.) Cc: NeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 22 6月, 2017 1 次提交
-
-
由 NeilBrown 提交于
md devices allocate a bio_set and use it for two distinct purposes. mddev->bio_set is used to clone bios as part of sending upper level requests down to lower level devices, and it is also use for synchronous IO such as superblock and bitmap updates, and for correcting read errors. This multiple usage can lead to deadlocks. It is likely that cloned bios might be queued for write and to be waiting for a metadata update before the write can be permitted. If the cloning exhausted mddev->bio_set, the metadata update may not be able to proceed. This scenario has been seen during heavy testing, with lots of IO and lots of memory pressure. Address this by adding a new bio_set specifically for synchronous IO. All synchronous IO goes directly to the underlying device and is not queued at the md level, so request using entries from the new mddev->sync_set will complete in a timely fashion. Requests that use mddev->bio_set will sometimes need to wait for synchronous IO, but will no longer risk deadlocking that iO. Also: small simplification in mddev_put(): there is no need to wait until the spinlock is released before calling bioset_free(). Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 19 6月, 2017 3 次提交
-
-
由 NeilBrown 提交于
bio_clone() is no longer used. Only bio_clone_bioset() or bio_clone_fast(). This is for the best, as bio_clone() used fs_bio_set, and filesystems are unlikely to want to use bio_clone(). So remove bio_clone() and all references. This includes a fix to some incorrect documentation. Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 NeilBrown 提交于
"flags" arguments are often seen as good API design as they allow easy extensibility. bioset_create_nobvec() is implemented internally as a variation in flags passed to __bioset_create(). To support future extension, make the internal structure part of the API. i.e. add a 'flags' argument to bioset_create() and discard bioset_create_nobvec(). Note that the bio_split allocations in drivers/md/raid* do not need the bvec mempool - they should have used bioset_create_nobvec(). Suggested-by: NChristoph Hellwig <hch@infradead.org> Reviewed-by: NChristoph Hellwig <hch@infradead.org> Reviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 NeilBrown 提交于
blk_queue_split() is always called with the last arg being q->bio_split, where 'q' is the first arg. Also blk_queue_split() sometimes uses the passed-in 'bs' and sometimes uses q->bio_split. This is inconsistent and unnecessary. Remove the last arg and always use q->bio_split inside blk_queue_split() Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMing Lei <ming.lei@redhat.com> Credit-to: Javier González <jg@lightnvm.io> (Noticed that lightnvm was missed) Reviewed-by: NJavier González <javier@cnexlabs.com> Tested-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 17 6月, 2017 1 次提交
-
-
由 Lidong Zhong 提交于
The value for spare spot of sb->dev_roles is changed from MD_DISK_ROLE_FAULTY to MD_DISK_ROLE_SPARE to keep align with the value when the superblock is firstly created in userspace. Signed-off-by: NLidong Zhong <lzhong@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 14 6月, 2017 1 次提交
-
-
由 NeilBrown 提交于
If mddev_suspend() races with md_write_start() we can deadlock with mddev_suspend() waiting for the request that is currently in md_write_start() to complete the ->make_request() call, and md_write_start() waiting for the metadata to be updated to mark the array as 'dirty'. As metadata updates done by md_check_recovery() only happen then the mddev_lock() can be claimed, and as mddev_suspend() is often called with the lock held, these threads wait indefinitely for each other. We fix this by having md_write_start() abort if mddev_suspend() is happening, and ->make_request() aborts if md_write_start() aborted. md_make_request() can detect this abort, decrease the ->active_io count, and wait for mddev_suspend(). Reported-by: NNix <nix@esperi.org.uk> Fix: 68866e42(MD: no sync IO while suspended) Cc: stable@vger.kernel.org Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 09 6月, 2017 1 次提交
-
-
由 Christoph Hellwig 提交于
Replace bi_error with a new bi_status to allow for a clear conversion. Note that device mapper overloaded bi_error with a private value, which we'll have to keep arround at least for now and thus propagate to a proper blk_status_t value. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 06 6月, 2017 1 次提交
-
-
由 NeilBrown 提交于
The new per-cpu counter for writes_pending is initialised in md_alloc(), which is not called by dm-raid. So dm-raid fails when md_write_start() is called. Move the initialization to the personality modules that need it. This way it is always initialised when needed, but isn't unnecessarily initialized (requiring memory allocation) when the personality doesn't use writes_pending. Reported-by: NHeinz Mauelshagen <heinzm@redhat.com> Fixes: 4ad23a97 ("MD: use per-cpu counter for writes_pending") Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 05 6月, 2017 1 次提交
-
-
由 Amir Goldstein 提交于
The md private helper uuid_equal() collides with a generic helper of the same name. Rename the md private helper to md_uuid_equal() and do the same for md_sb_equal(). Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NShaohua Li <shli@kernel.org> Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
-
- 01 6月, 2017 1 次提交
-
-
由 Jan Kara 提交于
Commit b685d3d6 "block: treat REQ_FUA and REQ_PREFLUSH as synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...} definitions. generic_make_request_checks() however strips REQ_FUA and REQ_PREFLUSH flags from a bio when the storage doesn't report volatile write cache and thus write effectively becomes asynchronous which can lead to performance regressions Fix the problem by making sure all bios which are synchronous are properly marked with REQ_SYNC. CC: linux-raid@vger.kernel.org CC: Shaohua Li <shli@kernel.org> Fixes: b685d3d6 CC: stable@vger.kernel.org Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 09 5月, 2017 1 次提交
-
-
由 Artur Paszkiewicz 提交于
This essentially reverts commit b5470dc5 ("md: resolve external metadata handling deadlock in md_allow_write") with some adjustments. Since commit 6791875e ("md: make reconfig_mutex optional for writes to md sysfs files.") changing array_state to 'active' does not use mddev_lock() and will not cause a deadlock with md_allow_write(). This revert simplifies userspace tools that write to sysfs attributes like "stripe_cache_size" or "consistency_policy" because it removes the need for special handling for external metadata arrays, checking for EAGAIN and retrying the write. Signed-off-by: NArtur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 21 4月, 2017 1 次提交
-
-
由 NeilBrown 提交于
1/ If an array has any read-only devices when it is started, the array itself must be read-only 2/ A read-only device cannot be added to an array after it is started. 3/ Setting an array to read-write should not succeed if any member devices are read-only Reported-and-Tested-by: NNanda Kishore Chinnaram <Nanda_Kishore_Chinna@dell.com> Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 13 4月, 2017 2 次提交
-
-
由 NeilBrown 提交于
md allows a new array device to be created by simply opening a device file. This make it difficult to remove the device and udev is likely to open the device file as part of processing the REMOVE event. There is an alternate mechanism for creating arrays by writing to the new_array module parameter. When using tools that work with this parameter, it is best to disable the old semantics. This new module parameter allows that. Signed-off-by: NNeilBrown <neilb@suse.com> Acted-by: NColy Li <colyli@suse.de> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
The intention when creating the "new_array" parameter and the possibility of having array names line "md_HOME" was to transition away from the old way of creating arrays and to eventually only use this new way. The "old" way of creating array is to create a device node in /dev and then open it. The act of opening creates the array. This is problematic because sometimes the device node can be opened when we don't want to create an array. This can easily happen when some rule triggered by udev looks at a device as it is being destroyed. The node in /dev continues to exist for a short period after an array is stopped, and opening it during this time recreates the array (as an inactive array). Unfortunately no clear plan for the transition was created. It is now time to fix that. This patch allows devices with numeric names, like "md999" to be created by writing to "new_array". This will only work if the minor number given is not already in use. This will allow mdadm to support the creation of arrays with numbers > 511 (currently not possible) by writing to new_array. mdadm can, at some point, use this approach to create *all* arrays, which will allow the transition to only using the new-way. Signed-off-by: NNeilBrown <neilb@suse.com> Acted-by: NColy Li <colyli@suse.de> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 11 4月, 2017 2 次提交
-
-
由 Zhilong Liu 提交于
md.c: it needs to release the mddev lock before the array_size_store() returns. Fixes: ab5a98b1 ("md-cluster: change array_sectors and update size are not supported") Signed-off-by: NZhilong Liu <zlliu@suse.com> Reviewed-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
if called md_set_readonly and set MD_CLOSING bit, the mddev cannot be opened any more due to the MD_CLOING bit wasn't cleared. Thus it needs to be cleared in md_ioctl after any call to md_set_readonly() or do_md_stop(). Signed-off-by: NNeilBrown <neilb@suse.com> Fixes: af8d8e6f ("md: changes for MD_STILL_CLOSED flag") Cc: stable@vger.kernel.org (v4.9+) Signed-off-by: NZhilong Liu <zlliu@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 23 3月, 2017 2 次提交
-
-
由 NeilBrown 提交于
The 'writes_pending' counter is used to determine when the array is stable so that it can be marked in the superblock as "Clean". Consequently it needs to be updated frequently but only checked for zero occasionally. Recent changes to raid5 cause the count to be updated even more often - once per 4K rather than once per bio. This provided justification for making the updates more efficient. So we replace the atomic counter a percpu-refcount. This can be incremented and decremented cheaply most of the time, and can be switched to "atomic" mode when more precise counting is needed. As it is possible for multiple threads to want a precise count, we introduce a "sync_checker" counter to count the number of threads in "set_in_sync()", and only switch the refcount back to percpu mode when that is zero. We need to be careful about races between set_in_sync() setting ->in_sync to 1, and md_write_start() setting it to zero. md_write_start() holds the rcu_read_lock() while checking if the refcount is in percpu mode. If it is, then we know a switch to 'atomic' will not happen until after we call rcu_read_unlock(), in which case set_in_sync() will see the elevated count, and not set in_sync to 1. If it is not in percpu mode, we take the mddev->lock to ensure proper synchronization. It is no longer possible to quickly check if the count is zero, which we previously did to update a timer or to schedule the md_thread. So now we do these every time we decrement that counter, but make sure they are fast. mod_timer() already optimizes the case where the timeout value doesn't actually change. We leverage that further by always rounding off the jiffies to the timeout value. This may delay the marking of 'clean' slightly, but ensure we only perform atomic operation here when absolutely needed. md_wakeup_thread() current always calls wake_up(), even if THREAD_WAKEUP is already set. That too can be optimised to avoid calls to wake_up(). Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
If ->in_sync is being set just as md_write_start() is being called, it is possible that set_in_sync() won't see the elevated ->writes_pending, and md_write_start() won't see the set ->in_sync. To close this race, re-test ->writes_pending after setting ->in_sync, and add memory barriers to ensure the increment of ->writes_pending will be seen by the time of this second test, or the new ->in_sync will be seen by md_write_start(). Add a spinlock to array_state_show() to ensure this temporary instability is never visible from userspace. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-