- 14 10月, 2014 9 次提交
-
-
由 NeilBrown 提交于
We don't really need the full mddev_lock here, and having to drop it is messy. RCU is enough to protect these lists. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Chao Yu 提交于
printk may cause long time lapse if value of printk_delay in sysctl is configured large by user. If register_md_personality takes long time to print in spinlock pers_lock, we may encounter high CPU usage rate when there are other pers_lock competitors who may be blocked to spin. We can avoid this condition by moving printk out of coverage of pers_lock spinlock. Signed-off-by: NChao Yu <chao2.yu@samsung.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
We don't really need that for_each loop, or those MD_BUGs. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Having both is a waste - just use the one. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
md_super_wait is really just wait_event() open-coded. So use the macro instead. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
In general we don't allow an array to be stopped if it is in use. However if the array hasn't really been started yet, then any apparent use is an anomily, probably due to 'udev' or similar having a look to see what is there. This means that if something goes wrong while assembling an array it cannot reliably be un-assembled - STOP_ARRAY could fail. There is no value here, so change do_md_stop() to succeed despite concurrent opens if the array has not yet been activated. i.e. if ->pers is NULL. Reported-by: N"Baldysiak, Pawel" <pawel.baldysiak@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
process_checks() always returns '0', so change it to 'void'. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Markus Stockhausen 提交于
raid5: fix init_stripe() inconsistencies 1) remove_hash() is not necessary. We will only be called right after get_free_stripe(). There we have already a call to remove_hash(). 2) Tracing prints out the sector of the freed stripe and not the sector that we want to initialize. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 09 10月, 2014 3 次提交
-
-
由 NeilBrown 提交于
Using {set,clear}_bit is more consistent than shifting and masking. No functional change. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
If two threads call bitmap_unplug at the same time, then one might schedule all the writes, and the other might decide that it doesn't need to wait. But really it does. It rarely hurts to wait when it isn't absolutely necessary, and the current code doesn't really focus on 'absolutely necessary' anyway. So just wait always. This can potentially lead to data corruption if a crash happens at an awkward time and data was written before the bitmap was updated. It is very unlikely, but this should go to -stable just to be safe. Appropriate for any -stable. Signed-off-by: NNeilBrown <neilb@suse.de> Cc: stable@vger.kernel.org (please delay until 3.18 is released)
-
- 02 10月, 2014 1 次提交
-
-
由 NeilBrown 提交于
It has come to my attention (thanks Martin) that 'discard_zeroes_data' is only a hint. Some devices in some cases don't do what it says on the label. The use of DISCARD in RAID5 depends on reads from discarded regions being predictably zero. If a write to a previously discarded region performs a read-modify-write cycle it assumes that the parity block was consistent with the data blocks. If all were zero, this would be the case. If some are and some aren't this would not be the case. This could lead to data corruption after a device failure when data needs to be reconstructed from the parity. As we cannot trust 'discard_zeroes_data', ignore it by default and so disallow DISCARD on all raid4/5/6 arrays. As many devices are trustworthy, and as there are benefits to using DISCARD, add a module parameter to over-ride this caution and cause DISCARD to work if discard_zeroes_data is set. If a site want to enable DISCARD on some arrays but not on others they should select DISCARD support at the filesystem level, and set the raid456 module parameter. raid456.devices_handle_discard_safely=Y As this is a data-safety issue, I believe this patch is suitable for -stable. DISCARD support for RAID456 was added in 3.7 Cc: Shaohua Li <shli@kernel.org> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Heinz Mauelshagen <heinzm@redhat.com> Cc: stable@vger.kernel.org (3.7+) Acked-by: NMartin K. Petersen <martin.petersen@oracle.com> Acked-by: NMike Snitzer <snitzer@redhat.com> Fixes: 620125f2Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 22 9月, 2014 8 次提交
-
-
由 NeilBrown 提交于
If a devices is being recovered it is not InSync and is not Faulty. If a read error is experienced on that device, fix_read_error() will be called, but it ignores non-InSync devices. So it will neither fix the error nor fail the device. It is incorrect that fix_read_error() ignores non-InSync devices. It should only ignore Faulty devices. So fix it. This became a bug when we allowed reading from a device that was being recovered. It is suitable for any subsequent -stable kernel. Fixes: da8840a7 Cc: stable@vger.kernel.org (v3.5+) Reported-by: NAlexander Lyakas <alex.bolshoy@gmail.com> Tested-by: NAlexander Lyakas <alex.bolshoy@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Both normal IO and resync IO can be retried with reschedule_retry() and so be counted into ->nr_queued, but only normal IO gets counted in ->nr_pending. Before the recent improvement to RAID1 resync there could only possibly have been one or the other on the queue. When handling a read failure it could only be normal IO. So when handle_read_error() called freeze_array() the fact that freeze_array only compares ->nr_queued against ->nr_pending was safe. But now that these two types can interleave, we can have both normal and resync IO requests queued, so we need to count them both in nr_pending. This error can lead to freeze_array() hanging if there is a read error, so it is suitable for -stable. Fixes: 79ef3a8a cc: stable@vger.kernel.org (v3.13+) Reported-by: NBrassow Jonathan <jbrassow@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
raise_barrier() uses next_resync as part of its calculations, so it really should be updated first, instead of afterwards. next_resync is always used under resync_lock so update it under resync lock to, just before it is used. That is safest. This could cause normal IO and resync IO to interact badly so it suitable for -stable. Fixes: 79ef3a8a cc: stable@vger.kernel.org (v3.13+) Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
next_resync is (approximately) the location for the next resync request. However it does *not* reliably determine the earliest location at which resync might be happening. This is because resync requests can complete out of order, and we only limit the number of current requests, not the distance from the earliest pending request to the latest. mddev->curr_resync_completed is a reliable indicator of the earliest position at which resync could be happening. It is updated less frequently, but is actually reliable which is more important. So use it to determine if a write request is before the region being resynced and so safe from conflict. This error can allow resync IO to interfere with normal IO which could lead to data corruption. Hence: stable. Fixes: 79ef3a8a cc: stable@vger.kernel.org (v3.13+) Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
The resync/recovery process for raid1 was recently changed so that writes could happen in parallel with resync providing they were in different regions of the device. There is a problem though: While a write request will always wait for conflicting resync to complete, a resync request will *not* always wait for conflicting writes to complete. Two changes are needed to fix this: 1/ raise_barrier (which waits until it is safe to do resync) must wait until current_window_requests is zero 2/ wait_battier (which waits at the start of a new write request) must update current_window_requests if the request could possible conflict with a concurrent resync. As concurrent writes and resync can lead to data loss, this patch is suitable for -stable. Fixes: 79ef3a8a Cc: stable@vger.kernel.org (v3.13+) Cc: majianpeng <majianpeng@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
If there are outstanding writes when close_sync is called, the change to ->start_next_window might cause them to decrement the wrong counter when they complete. Fix this by merging the two counters into the one that will be decremented. Having an incorrect value in a counter can cause raise_barrier() to hangs, so this is suitable for -stable. Fixes: 79ef3a8a cc: stable@vger.kernel.org (v3.13+) Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
commit 79ef3a8a made it possible for reads to happen concurrently with resync. This means that we need to be more careful where read_balancing is allowed during resync - we can no longer be sure that any resync that has already started will definitely finish. So keep read_balancing to before recovery_cp, which is conservative but safe. This bug makes it possible to read from a device that doesn't have up-to-date data, so it can cause data corruption. So it is suitable for any kernel since 3.11. Fixes: 79ef3a8a cc: stable@vger.kernel.org (v3.13+) Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
r1_bio->start_next_window is not initialised in the READ case, so allow_barrier may incorrectly decrement conf->current_window_requests which can cause raise_barrier() to block forever. Fixes: 79ef3a8a cc: stable@vger.kernel.org (v3.13+) Reported-by: NBrassow Jonathan <jbrassow@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 10 9月, 2014 1 次提交
-
-
由 Anssi Hannula 提交于
When a writeback or a promotion of a block is completed, the cell of that block is removed from the prison, the block is marked as clean, and the clear_dirty() callback of the cache policy is called. Unfortunately, performing those actions in this order allows an incoming new write bio for that block to come in before clearing the dirty status is completed and therefore possibly causing one of these two scenarios: Scenario A: Thread 1 Thread 2 cell_defer() . - cell removed from prison . - detained bios queued . . incoming write bio . remapped to cache . set_dirty() called, . but block already dirty . => it does nothing clear_dirty() . - block marked clean . - policy clear_dirty() called . Result: Block is marked clean even though it is actually dirty. No writeback will occur. Scenario B: Thread 1 Thread 2 cell_defer() . - cell removed from prison . - detained bios queued . clear_dirty() . - block marked clean . . incoming write bio . remapped to cache . set_dirty() called . - block marked dirty . - policy set_dirty() called - policy clear_dirty() called . Result: Block is properly marked as dirty, but policy thinks it is clean and therefore never asks us to writeback it. This case is visible in "dmsetup status" dirty block count (which normally decreases to 0 on a quiet device). Fix these issues by calling clear_dirty() before calling cell_defer(). Incoming bios for that block will then be detained in the cell and released only after clear_dirty() has completed, so the race will not occur. Found by inspecting the code after noticing spurious dirty counts (scenario B). Signed-off-by: NAnssi Hannula <anssi.hannula@iki.fi> Acked-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
-
- 29 8月, 2014 1 次提交
-
-
由 Mikulas Patocka 提交于
The DM crypt target accesses memory beyond allocated space resulting in a crash on 32 bit x86 systems. This bug is very old (it dates back to 2.6.25 commit 3a7f6c99 "dm crypt: use async crypto"). However, this bug was masked by the fact that kmalloc rounds the size up to the next power of two. This bug wasn't exposed until 3.17-rc1 commit 298a9fa0 ("dm crypt: use per-bio data"). By switching to using per-bio data there was no longer any padding beyond the end of a dm-crypt allocated memory block. To minimize allocation overhead dm-crypt puts several structures into one block allocated with kmalloc. The block holds struct ablkcipher_request, cipher-specific scratch pad (crypto_ablkcipher_reqsize(any_tfm(cc))), struct dm_crypt_request and an initialization vector. The variable dmreq_start is set to offset of struct dm_crypt_request within this memory block. dm-crypt allocates the block with this size: cc->dmreq_start + sizeof(struct dm_crypt_request) + cc->iv_size. When accessing the initialization vector, dm-crypt uses the function iv_of_dmreq, which performs this calculation: ALIGN((unsigned long)(dmreq + 1), crypto_ablkcipher_alignmask(any_tfm(cc)) + 1). dm-crypt allocated "cc->iv_size" bytes beyond the end of dm_crypt_request structure. However, when dm-crypt accesses the initialization vector, it takes a pointer to the end of dm_crypt_request, aligns it, and then uses it as the initialization vector. If the end of dm_crypt_request is not aligned on a crypto_ablkcipher_alignmask(any_tfm(cc)) boundary the alignment causes the initialization vector to point beyond the allocated space. Fix this bug by calculating the variable iv_size_padding and adding it to the allocated size. Also correct the alignment of dm_crypt_request. struct dm_crypt_request is specific to dm-crypt (it isn't used by the crypto subsystem at all), so it is aligned on __alignof__(struct dm_crypt_request). Also align per_bio_data_size on ARCH_KMALLOC_MINALIGN, so that it is aligned as if the block was allocated with kmalloc. Reported-by: NKrzysztof Kolasa <kkolasa@winsoft.pl> Tested-by: NMilan Broz <gmazyland@gmail.com> Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 19 8月, 2014 4 次提交
-
-
由 NeilBrown 提交于
Most places which allocate an r10_bio zero the ->state, some don't. As the r10_bio comes from a mempool, and the allocation function uses kzalloc it is often zero anyway. But sometimes it isn't and it is best to be safe. I only noticed this because of the bug fixed by an earlier patch where the r10_bios allocated for a reshape were left around to be used by a subsequent resync. In that case the R10BIO_IsReshape flag caused problems. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
If raid10 reshape fails to find somewhere to read a block from, it returns without freeing memory... Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
When a raid10 commences a resync/recovery/reshape it allocates some buffer space. When a resync/recovery completes the buffer space is freed. But not when the reshape completes. This can result in a small memory leak. There is a subtle side-effect of this bug. When a RAID10 is reshaped to a larger array (more devices), the reshape is immediately followed by a "resync" of the new space. This "resync" will use the buffer space which was allocated for "reshape". This can cause problems including a "BUG" in the SCSI layer. So this is suitable for -stable. Cc: stable@vger.kernel.org (v3.5+) Fixes: 3ea7daa5Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
raid10 reshape clears unwanted bits from a bio->bi_flags using a method which, while clumsy, worked until 3.10 when BIO_OWNS_VEC was added. Since then it clears that bit but shouldn't. This results in a memory leak. So change to used the approved method of clearing unwanted bits. As this causes a memory leak which can consume all of memory the fix is suitable for -stable. Fixes: a38352e0 Cc: stable@vger.kernel.org (v3.10+) Reported-by: mdraid.pkoch@dfgh.net (Peter Koch) Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 18 8月, 2014 2 次提交
-
-
由 NeilBrown 提交于
During recovery of a double-degraded RAID6 it is possible for some blocks not to be recovered properly, leading to corruption. If a write happens to one block in a stripe that would be written to a missing device, and at the same time that stripe is recovering data to the other missing device, then that recovered data may not be written. This patch skips, in the double-degraded case, an optimisation that is only safe for single-degraded arrays. Bug was introduced in 2.6.32 and fix is suitable for any kernel since then. In an older kernel with separate handle_stripe5() and handle_stripe6() functions the patch must change handle_stripe6(). Cc: stable@vger.kernel.org (2.6.32+) Fixes: 6c0069c0 Cc: Yuri Tikhonov <yur@emcraft.com> Cc: Dan Williams <dan.j.williams@intel.com> Reported-by: N"Manibalan P" <pmanibalan@amiindia.co.in> Tested-by: N"Manibalan P" <pmanibalan@amiindia.co.in> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1090423Signed-off-by: NNeilBrown <neilb@suse.de> Acked-by: NDan Williams <dan.j.williams@intel.com>
-
由 NeilBrown 提交于
If a stripe in a raid6 array received a write to each data block while the array is degraded, and if any of these writes to a missing device are not page-aligned, then a live-lock happens. In this case the P and Q blocks need to be read so that the part of the missing block which is *not* being updated by the write can be constructed. Due to a logic error, these blocks are not loaded, so the update cannot proceed and the stripe is 'handled' repeatedly in an infinite loop. This bug is unlikely as most writes are page aligned. However as it can lead to a livelock it is suitable for -stable. It was introduced in 3.16. Cc: stable@vger.kernel.org (v3.16) Fixed: 67f45548Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 11 8月, 2014 1 次提交
-
-
由 Jeff Moyer 提交于
Commit 05f1dd53 ("block: add queue flag for disabling SG merging") introduced a new queue flag: QUEUE_FLAG_NO_SG_MERGE. This gets set by default in blk_mq_init_queue for mq-enabled devices. The effect of the flag is to bypass the SG segment merging. Instead, the bio->bi_vcnt is used as the number of hardware segments. With a device mapper target on top of a device with QUEUE_FLAG_NO_SG_MERGE set, we can end up sending down more segments than a driver is prepared to handle. I ran into this when backporting the virtio_blk mq support. It triggerred this BUG_ON, in virtio_queue_rq: BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems); The queue's max is set here: blk_queue_max_segments(q, vblk->sg_elems-2); Basically, what happens is that a bio is built up for the dm device (which does not have the QUEUE_FLAG_NO_SG_MERGE flag set) using bio_add_page. That path will call into __blk_recalc_rq_segments, so what you end up with is bi_phys_segments being much smaller than bi_vcnt (and bi_vcnt grows beyond the maximum sg elements). Then, when the bio is submitted, it gets cloned. When the cloned bio is submitted, it will end up in blk_recount_segments, here: if (test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags)) bio->bi_phys_segments = bio->bi_vcnt; and now we've set bio->bi_phys_segments to a number that is beyond what was registered as queue_max_segments by the driver. The right way to fix this is to propagate the queue flag up the stack. The rules for propagating the flag are simple: - if the flag is set for any underlying device, it must be set for the upper device - consequently, if the flag is not set for any underlying device, it should not be set for the upper device. Signed-off-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.16+
-
- 08 8月, 2014 3 次提交
-
-
由 NeilBrown 提交于
An array can only accept a bitmap if it will call bitmap_daemon_work periodically, which means it needs a thread running. If there is no thread, don't allow a bitmap to be added. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
If an array has a bitmap, then it cannot be converted to raid0. Reported-by: NXiao Ni <xni@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Xiao Ni 提交于
When we calculate the speed of recovery, the numerator that contains the recovery done sectors. It's need to subtract the sectors which don't finish recovery. Signed-off-by: NXiao Ni <xni@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 05 8月, 2014 7 次提交
-
-
由 Kent Overstreet 提交于
this is needed for the queue/block device we created (it's done by blk_cleanup_queue() which we do call) - but calling it for the block devices we only opened is pointless. Change-Id: I53dfded14ed15b9581d10ca8399d5e1b3abbf9f2
-
由 Jianjian Huo 提交于
Since bch_is_open will iterate linked list bch_cache_sets and uncached_devices, it needs bch_register_lock. Signed-off-by: NJianjian Huo <samuel.huo@gmail.com>
-
由 Surbhi Palande 提交于
time_stats::btree_gc_max_duration_mc is not bit shifted by 8 Fixes BUG #138 Change-Id: I44fc6e1d0579674016acc533f1a546b080e5371a Signed-off-by: NSurbhi Palande <sap@daterainc.com>
-
由 Slava Pestov 提交于
bcache_flash_dev.ktest would reliably crash with 8k and 16k bucket size before; now it passes. Change-Id: Ib542232235e39298c3a7548fe52b645cabb823d1
-
由 Slava Pestov 提交于
If register_cache_set() failed, we would touch ca->set after it had already been freed. Also, fix an assertion to catch this. Change-Id: I748e5f5b223e2d9b2602075dec2f997cced2394d
-
由 Slava Pestov 提交于
Change-Id: I6abde52afe917633480caaf4e2518f42a816d886
-
由 Kent Overstreet 提交于
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-