- 08 8月, 2014 3 次提交
-
-
由 NeilBrown 提交于
An array can only accept a bitmap if it will call bitmap_daemon_work periodically, which means it needs a thread running. If there is no thread, don't allow a bitmap to be added. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
If an array has a bitmap, then it cannot be converted to raid0. Reported-by: NXiao Ni <xni@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Xiao Ni 提交于
When we calculate the speed of recovery, the numerator that contains the recovery done sectors. It's need to subtract the sectors which don't finish recovery. Signed-off-by: NXiao Ni <xni@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 02 8月, 2014 2 次提交
-
-
由 Anssi Hannula 提交于
nr_dirty is updated without locking, causing it to drift so that it is non-zero (either a small positive integer, or a very large one when an underflow occurs) even when there are no actual dirty blocks. This was due to a race between the workqueue and map function accessing nr_dirty in parallel without proper protection. People were seeing under runs due to a race on increment/decrement of nr_dirty, see: https://lkml.org/lkml/2014/6/3/648 Fix this by using an atomic_t for nr_dirty. Reported-by: roma1390@gmail.com Signed-off-by: NAnssi Hannula <anssi.hannula@iki.fi> Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
-
由 Greg Thelen 提交于
1d3d4437 ("vmscan: per-node deferred work") added a flags field to struct shrinker assuming that all shrinkers were zero filled. The dm bufio shrinker is not zero filled, which leaves arbitrary kmalloc() data in flags. So far the only defined flags bit is SHRINKER_NUMA_AWARE. But there are proposed patches which add other bits to shrinker.flags (e.g. memcg awareness). Rather than simply initializing the shrinker, this patch uses kzalloc() when allocating the dm_bufio_client to ensure that the embedded shrinker and any other similar structures are zeroed. This fixes theoretical over aggressive shrinking of dm bufio objects. If the uninitialized dm_bufio_client.shrinker.flags contains SHRINKER_NUMA_AWARE then shrink_slab() would call the dm shrinker for each numa node rather than just once. This has been broken since 3.12. Signed-off-by: NGreg Thelen <gthelen@google.com> Acked-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # v3.12+
-
- 31 7月, 2014 2 次提交
-
-
由 NeilBrown 提交于
The way md devices are traditionally created in the kernel is simply to open the device with the desired major/minor number. This can be problematic as some support tools, notably udev and programs run by udev, can open a device just to see what is there, and find that it has created something. It is easy for a race to cause udev to open an md device just after it was destroy, causing it to suddenly re-appear. For some time we have had an alternate way to create md devices echo md_somename > /sys/modules/md_mod/paramaters/new_array This will always use a minor number of 512 or higher, which mdadm normally avoids. Using this makes the creation-by-opening unnecessary, but does not disable it, so it is still there to cause problems. This patch disable probing for devices with a major of 9 (MD_MAJOR) and a minor of 512 and up. This devices created by writing to new_array cannot be re-created by opening the node in /dev. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Currently we don't abort recovery on a write error if the write error to the recovering device was triggerd by normal IO (as opposed to recovery IO). This means that for one bitmap region, the recovery might write to the recovering device for a few sectors, then not bother for subsequent sectors (as it never writes to failed devices). In this case the bitmap bit will be cleared, but it really shouldn't. The result is that if the recovering device fails and is then re-added (after fixing whatever hardware problem triggerred the failure), the second recovery won't redo the region it was in the middle of, so some of the device will not be recovered properly. If we abort the recovery, the region being processes will be cancelled (bit not cleared) and the whole region will be retried. As the bug can result in data corruption the patch is suitable for -stable. For kernels prior to 3.11 there is a conflict in raid10.c which will require care. Original-from: jiao hui <jiaohui@bwstor.com.cn> Reported-and-tested-by: Njiao hui <jiaohui@bwstor.com.cn> Signed-off-by: NNeilBrown <neilb@suse.de> Cc: stable@vger.kernel.org
-
- 16 7月, 2014 3 次提交
-
-
由 NeilBrown 提交于
The current "wait_on_bit" interface requires an 'action' function to be provided which does the actual waiting. There are over 20 such functions, many of them identical. Most cases can be satisfied by one of just two functions, one which uses io_schedule() and one which just uses schedule(). So: Rename wait_on_bit and wait_on_bit_lock to wait_on_bit_action and wait_on_bit_lock_action to make it explicit that they need an action function. Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io which are *not* given an action function but implicitly use a standard one. The decision to error-out if a signal is pending is now made based on the 'mode' argument rather than being encoded in the action function. All instances of the old wait_on_bit and wait_on_bit_lock which can use the new version have been changed accordingly and their action functions have been discarded. wait_on_bit{_lock} does not return any specific error code in the event of a signal so the caller must check for non-zero and interpolate their own error code as appropriate. The wait_on_bit() call in __fscache_wait_on_invalidate() was ambiguous as it specified TASK_UNINTERRUPTIBLE but used fscache_wait_bit_interruptible as an action function. David Howells confirms this should be uniformly "uninterruptible" The main remaining user of wait_on_bit{,_lock}_action is NFS which needs to use a freezer-aware schedule() call. A comment in fs/gfs2/glock.c notes that having multiple 'action' functions is useful as they display differently in the 'wchan' field of 'ps'. (and /proc/$PID/wchan). As the new bit_wait{,_io} functions are tagged "__sched", they will not show up at all, but something higher in the stack. So the distinction will still be visible, only with different function names (gds2_glock_wait versus gfs2_glock_dq_wait in the gfs2/glock.c case). Since first version of this patch (against 3.15) two new action functions appeared, on in NFS and one in CIFS. CIFS also now uses an action function that makes the same freezer aware schedule call as NFS. Signed-off-by: NNeilBrown <neilb@suse.de> Acked-by: David Howells <dhowells@redhat.com> (fscache, keys) Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2) Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steve French <sfrench@samba.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brownSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Mike Snitzer 提交于
The block size for the dm-cache's data device must remained fixed for the life of the cache. Disallow any attempt to change the cache's data block size. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Acked-by: NJoe Thornber <ejt@redhat.com> Cc: stable@vger.kernel.org
-
由 Mike Snitzer 提交于
The block size for the thin-pool's data device must remained fixed for the life of the thin-pool. Disallow any attempt to change the thin-pool's data block size. It should be noted that attempting to change the data block size via thin-pool table reload will be ignored as a side-effect of the thin-pool handover that the thin-pool target does during thin-pool table reload. Here is an example outcome of attempting to load a thin-pool table that reduced the thin-pool's data block size from 1024K to 512K. Before: kernel: device-mapper: thin: 253:4: growing the data device from 204800 to 409600 blocks After: kernel: device-mapper: thin metadata: changing the data block size (from 2048 to 1024) is not supported kernel: device-mapper: table: 253:4: thin-pool: Error creating metadata object kernel: device-mapper: ioctl: error adding target to table Signed-off-by: NMike Snitzer <snitzer@redhat.com> Acked-by: NJoe Thornber <ejt@redhat.com> Cc: stable@vger.kernel.org
-
- 11 7月, 2014 4 次提交
-
-
由 Jun'ichi Nomura 提交于
Commit e8099177 ("dm mpath: push back requests instead of queueing") modified multipath_busy() to return true if !pg_ready(). pg_ready() checks the current state of the multipath device and may return false even if a new IO is needed to change the state. Bart Van Assche reported that he had multipath IO lockup when he was performing cable pull tests. Analysis showed that the multipath device had a single path group with both paths active, but that the path group itself was not active. During the multipath device state transitions 'queue_io' got set but nothing could clear it. Clearing 'queue_io' only happens in __choose_pgpath(), but it won't be called if multipath_busy() returns true due to pg_ready() returning false when 'queue_io' is set. As such the !pg_ready() check in multipath_busy() is wrong because new IO will not be sent to multipath target and the multipath state change won't happen. That results in multipath IO lockup. The intent of multipath_busy() is to avoid unnecessary cycles of dequeue + request_fn + requeue if it is known that the multipath device will requeue. Such "busy" situations would be: - path group is being activated - there is no path and the multipath is setup to requeue if no path Fix multipath_busy() to return "busy" early only for these specific situations. Reported-by: NBart Van Assche <bvanassche@acm.org> Tested-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # v3.15
-
由 Joe Thornber 提交于
There's a race condition between the atomic_dec_and_test(&io->count) in dec_count() and the waking of the sync_io() thread. If the thread is spuriously woken immediately after the decrement it may exit, making the on stack io struct invalid, yet the dec_count could still be using it. Fix this race by using a completion in sync_io() and dec_count(). Reported-by: NMinfei Huang <huangminfei@ucloud.cn> Signed-off-by: NJoe Thornber <thornber@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Acked-by: NMikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org
-
由 Jana Saout 提交于
Signed-off-by: NJana Saout <jana@saout.de> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mikulas Patocka 提交于
The commit 2c140a24 ("dm: allow remove to be deferred") introduced a deferred removal feature for the device mapper. When this feature is used (by passing a flag DM_DEFERRED_REMOVE to DM_DEV_REMOVE_CMD ioctl) and the user tries to remove a device that is currently in use, the device will be removed automatically in the future when the last user closes it. Device mapper used the system workqueue to perform deferred removals. However, some targets (dm-raid1, dm-mpath, dm-stripe) flush work items scheduled for the system workqueue from their destructor. If the destructor itself is called from the system workqueue during deferred removal, it introduces a possible deadlock - the workqueue tries to flush itself. Fix this possible deadlock by introducing a new workqueue for deferred removals. We allocate just one workqueue for all dm targets. The ability of dm targets to process IOs isn't dependent on deferred removal of unused targets, so a deadlock due to shared workqueue isn't possible. Also, cleanup local_init() to eliminate potential for returning success on failure. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.13+
-
- 03 7月, 2014 2 次提交
-
-
由 NeilBrown 提交于
When we write to a degraded array which has a bitmap, we make sure the relevant bit in the bitmap remains set when the write completes (so a 're-add' can quickly rebuilt a temporarily-missing device). If, immediately after such a write starts, we incorporate a spare, commence recovery, and skip over the region where the write is happening (because the 'needs recovery' flag isn't set yet), then that write will not get to the new device. Once the recovery finishes the new device will be trusted, but will have incorrect data, leading to possible corruption. We cannot set the 'needs recovery' flag when we start the write as we do not know easily if the write will be "degraded" or not. That depends on details of the particular raid level and particular write request. This patch fixes a corruption issue of long standing and so it suitable for any -stable kernel. It applied correctly to 3.0 at least and will minor editing to earlier kernels. Reported-by: NBill <billstuff2001@sbcglobal.net> Tested-by: NBill <billstuff2001@sbcglobal.net> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/53A518BB.60709@sbcglobal.netSigned-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
If an array has a bitmap, the when we set the "has bitmap" flag we incorrectly clear the "is clean" flag. "is clean" isn't really important when a bitmap is present, but it is best to get it right anyway. Reported-by: NGeorge Duffield <forumscollective@gmail.com> Link: http://lkml.kernel.org/CAG__1a4MRV6gJL38XLAurtoSiD3rLBTmWpcS5HYvPpSfPR88UQ@mail.gmail.com Fixes: 36fa3063 (v2.6.14) Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 12 6月, 2014 2 次提交
-
-
由 Lukas Czerner 提交于
DM thinp already checks whether the discard_granularity of the data device is a factor of the thin-pool block size. But when using the dm-thin-pool's discard passdown support, DM thinp was not selecting the max of the underlying data device's discard_granularity and the thin-pool's block size. Update set_discard_limits() to set discard_granularity to the max of these values. This enables blkdev_issue_discard() to properly align the discards that are sent to the DM thin device on a full block boundary. As such each discard will now cover an entire DM thin-pool block and the block will be reclaimed. Reported-by: NZdenek Kabelac <zkabelac@redhat.com> Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
-
由 Heinz Mauelshagen 提交于
Split the single per bio-prison lock by using per bucket locking. Per bucket locking benefits both dm-thin and dm-cache targets by reducing bio-prison lock contention. Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com> Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 10 6月, 2014 1 次提交
-
-
由 Eivind Sarto 提交于
The raid5 sync_request() processing calls handle_stripe() within the context of the resync-thread. The resync-thread issues the first set of read requests and this adds execution latency and slows down the scheduling of the next sync_request(). The current rebuild/resync speed of raid5 is not much faster than what rotational HDDs can sustain. Testing the following patch on a 6-drive array, I can increase the rebuild speed from 100 MB/s to 175 MB/s. The sync_request() now just sets STRIPE_HANDLE and releases the stripe. This creates some more parallelism between the resync-thread and raid5 kernel daemon. Signed-off-by: NEivind Sarto <esarto@fusionio.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 05 6月, 2014 1 次提交
-
-
由 hui jiao 提交于
A chunk aligned read increases counter active_aligned_reads and decreases it after sub-device handle it successfully. But when a read error occurs, the read redispatched by raid5d, and the active_aligned_reads will not be decreased until we can grab a stripe head in retry_aligned_read. Now suppose, a barrier io comes, set conf->quiesce to 2, and wait until both active_stripes and active_aligned_reads are zero. The retried chunk aligned read gets stuck at get_active_stripe waiting until conf->quiesce becomes 0. Retry_aligned_read and barrier io are waiting each other now. One possible solution is that we ignore conf->quiesce, let the retried aligned read finish. I reproduced this deadlock and test this patch on centos6.0 Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 04 6月, 2014 9 次提交
-
-
由 Mike Snitzer 提交于
There is no need for code other than DM core to use dm_set_device_limits so remove its EXPORT_SYMBOL_GPL. Also, cleanup a couple whitespace nits. Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mike Snitzer 提交于
Add DM core support for disabling WRITE SAME on first failure to both request-based and bio-based targets. The need to disable WRITE SAME stems from SCSI enabling it by default but then disabling it when it fails. When SCSI does this it returns "permanent target failure, do not retry" using -EREMOTEIO. Update DM core to only disable WRITE SAME on failure if the returned error is -EREMOTEIO. Commit f84cb8a4 ("dm mpath: disable WRITE SAME if it fails") implemented multipath specific disabling of WRITE SAME if it fails. However, as that commit detailed, the multipath-only solution doesn't go far enough if bio-based DM targets are stacked ontop of the request-based dm-multipath target (as is commonly done using dm-linear to support partitions on multipath devices, via kpartx). Signed-off-by: NMike Snitzer <snitzer@redhat.com> Acked-by: NMartin K. Petersen <martin.petersen@oracle.com> Tested-by: NAlex Chen <alex.chen@huawei.com>
-
由 Joe Thornber 提交于
era_ctr() may call era_destroy() before era->md is initialized so era_destory() must only close the metadata object if it is not NULL. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NNaohiro Aota <naota@elisp.net> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.15+
-
由 Mike Snitzer 提交于
Update the DM thin provisioning target's allocation failure error to be consistent with commit a9d6ceb8 ("[SCSI] return ENOSPC on thin provisioning failure"). The DM thin target now returns -ENOSPC rather than -EIO when block allocation fails due to the pool being out of data space (and the 'error_if_no_space' thin-pool feature is enabled). Signed-off-by: NMike Snitzer <snitzer@redhat.com> Acked-By: NJoe Thornber <ejt@redhat.com>
-
由 Joe Thornber 提交于
Factor out a pool_work interface that noflush_work makes use of to wait for and complete work items (in terms of a proper completion struct). Allows discontinuing the use of a custom completion in terms of atomic_t and wait_event. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mikulas Patocka 提交于
Change the snapshot-origin target so that only write bios are split on chunk boundary. Read bios are passed unchanged to the underlying device, so they don't have to be split. Later, we could change the target so that it accepts a larger write bio if it spans an area that is completely covered by snapshot exceptions. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mikulas Patocka 提交于
Allocate a per-target dm_origin structure. This is a prerequisite for the next commit ("dm snapshot: do not split read bios sent to snapshot-origin target") which adds a new member to this structure. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mikulas Patocka 提交于
The function dm_accept_partial_bio allows the target to specify how many sectors of the current bio it will process. If the target only wants to accept part of the bio, it calls dm_accept_partial_bio and the DM core sends the rest of the data in next bio. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mikulas Patocka 提交于
It is impossible to create bios with 2^23 or more sectors (the size is stored as a 32-bit byte count in the bio). So we convert some sector_t values to unsigned integers. This is needed for the next commit ("dm: introduce dm_accept_partial_bio") that replaces integer value arguments with pointers, so the size of the integer must match. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 29 5月, 2014 7 次提交
-
-
由 Shaohua Li 提交于
The stripe cache has two goals: 1. cache data, so next time if data can be found in stripe cache, disk access can be avoided. 2. stable data. data is copied from bio to stripe cache and calculated parity. data written to disk is from stripe cache, so if upper layer changes bio data, data written to disk isn't impacted. In my environment, I can guarantee 2 will not happen. And BDI_CAP_STABLE_WRITES can guarantee 2 too. For 1, it's not common too. block plug mechanism will dispatch a bunch of sequentail small requests together. And since I'm using SSD, I'm using small chunk size. It's rare case stripe cache is really useful. So I'd like to avoid the copy from bio to stripe cache and it's very helpful for performance. In my 1M randwrite tests, avoid the copy can increase the performance more than 30%. Of course, this shouldn't be enabled by default. It's reported enabling BDI_CAP_STABLE_WRITES can harm some workloads before, so I added an option to control it. Neilb: changed BUG_ON to WARN_ON Removed some assignments from raid5_build_block which are now not needed. Signed-off-by: NShaohua Li <shli@fusionio.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
file_page_index(store, 0) is *always* 0. This is because the bitmap sb, at 256 bytes, is *always* less than one page. So subtracting it has no effect and the code should be removed. Reported-by: NGoldwyn Rodrigues <rgoldwyn@suse.de> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Eivind Sarto 提交于
The (lockless) release_list reduces lock contention, but there is excessive queueing and dequeuing of stripes on this list. A stripe will currently be queued on the release_list with a stripe reference count > 1. This can cause the raid5 kernel thread(s) to dequeue the stripe and decrement the refcount without doing any other useful processing of the stripe. The are two cases when the stripe can be put on the release_list multiple times before it is actually handled by the kernel thread(s). 1) make_request() activates the stripe processing in 4k increments. When a write request is large enough to span multiple chunks of a stripe_head, the first 4k chunk adds the stripe to the plug list. The next 4k chunk that is processed for the same stripe puts the stripe on the release_list with a refcount=2. This can cause the kernel thread to process and decrement the stripe before the stripe us unplugged, which again will put it back on the release_list. 2) Whenever IO is scheduled on a stripe (pre-read and/or write), the stripe refcount is set to the number of active IO (for each chunk). The stripe is released as each IO complete, and can be queued and dequeued multiple times on the release_list, until its refcount finally reached zero. This simple patch will ensure a stripe is only queued on the release_list when its refcount=1 and is ready to be handled by the kernel thread(s). I added some instrumentation to raid5 and counted the number of times striped were queued on the release_list for a variety of write IO sizes. Without this patch the number of times stripes got queued on the release_list was 100-500% higher than with the patch. The excess queuing will increase with the IO size. The patch also improved throughput by 5-10%. Signed-off-by: NEivind Sarto <esarto@fusionio.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Julia Lawall and coccinelle report that md_clear_badblocks always returns 0, despite appearing to have an error path. The error path really should return an error code. ENOSPC is reasonably appropriate. Reported-by: NJulia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
If it is found that we need to pre-read some blocks before a write can succeed, we normally set STRIPE_DELAYED and don't actually perform the read until STRIPE_PREREAD_ACTIVE subsequently gets set. However for a degraded RAID6 we currently perform the reads as soon as we see that a write is pending. This significantly hurts throughput. So: - when handle_stripe_dirtying find a block that it wants on a device that is failed, set STRIPE_DELAY, instead of doing nothing, and - when fetch_block detects that a read might be required to satisfy a write, only perform the read if STRIPE_PREREAD_ACTIVE is set, and if we would actually need to read something to complete the write. This also helps RAID5, though less often as RAID5 supports a read-modify-write cycle. For RAID5 the read is performed too early only if the write is not a full 4K aligned write (i.e. no an R5_OVERWRITE). Also clean up a couple of horrible bits of formatting. Reported-by: NPatrik Horník <patrik@dsl.sk> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
read-only arrays should not be changed. This includes changing the level, layout, size, or number of devices. So reject those changes for readonly arrays. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Commit 8313b8e5 md: fix problem when adding device to read-only array with bitmap. added a called to md_reap_sync_thread() which cause a reshape thread to be interrupted (in particular, it could cause md_thread() to never even call md_do_sync()). However it didn't set MD_RECOVERY_INTR so ->finish_reshape() would not know that the reshape didn't complete. This only happens when mddev->ro is set and normally reshape threads don't run in that situation. But raid5 and raid10 can start a reshape thread during "run" is the array is in the middle of a reshape. They do this even if ->ro is set. So it is best to set MD_RECOVERY_INTR before abortingg the sync thread, just in case. Though it rare for this to trigger a problem it can cause data corruption because the reshape isn't finished properly. So it is suitable for any stable which the offending commit was applied to. (3.2 or later) Fixes: 8313b8e5 Cc: stable@vger.kernel.org (3.2+) Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 28 5月, 2014 1 次提交
-
-
由 NeilBrown 提交于
If mddev->ro is set, md_to_sync will (correctly) abort. However in that case MD_RECOVERY_INTR isn't set. If a RESHAPE had been requested, then ->finish_reshape() will be called and it will think the reshape was successful even though nothing happened. Normally a resync will not be requested if ->ro is set, but if an array is stopped while a reshape is on-going, then when the array is started, the reshape will be restarted. If the array is also set read-only at this point, the reshape will instantly appear to success, resulting in data corruption. Consequently, this patch is suitable for any -stable kernel. Cc: stable@vger.kernel.org (any) Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 27 5月, 2014 2 次提交
-
-
由 Hannes Reinecke 提交于
lockdep complains about a circular locking. And indeed, we need to release the lock before calling dm_table_run_md_queue_async(). As such, commit 4cdd2ad7 ("dm mpath: fix lock order inconsistency in multipath_ioctl") must also be reverted in addition to fixing the lock order in the other dm_table_run_md_queue_async() callers. Reported-by: NBart van Assche <bvanassche@acm.org> Tested-by: NBart van Assche <bvanassche@acm.org> Signed-off-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Heinz Mauelshagen 提交于
The DM cache target cannot cope with discards that span multiple cache blocks, so each discard bio that spans more than one cache block must get split by the DM core. Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com> Acked-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # v3.9+
-
- 21 5月, 2014 1 次提交
-
-
由 Mike Snitzer 提交于
Commit 85ad643b ("dm thin: add timeout to stop out-of-data-space mode holding IO forever") introduced a fixed 60 second timeout. Users may want to either disable or modify this timeout. Allow the out-of-data-space timeout to be configured using the 'no_space_timeout' dm-thin-pool module param. Setting it to 0 will disable the timeout, resulting in IO being queued until more data space is added to the thin-pool. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.14+
-