- 22 9月, 2016 3 次提交
-
-
由 Shaohua Li 提交于
register_shrinker() now can fail. When it happens, shrinker.nr_deferred is null. We use it to determine if unregister_shrinker is required. Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Chao Yu 提交于
register_shrinker can fail after commit 1d3d4437 ("vmscan: per-node deferred work"), we should detect the failure of it, otherwise we may fail to register shrinker after raid5 configuration was setup successfully. Signed-off-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Shaohua Li 提交于
raid5 will split bio to proper size internally, there is no point to use underlayer disk's max_hw_sectors. In my qemu system, without the change, the raid5 only receives 128k size bio, which reduces the chance of bio merge sending to underlayer disks. Signed-off-by: NShaohua Li <shli@fb.com>
-
- 10 9月, 2016 1 次提交
-
-
由 Shaohua Li 提交于
commit 5f9d1fde(raid5: fix memory leak of bio integrity data) moves bio_reset to bio_endio. But it introduces a small race condition. It does bio_reset after raid5_release_stripe, which could make the stripe reusable and hence reuse the bio just before bio_reset. Moving bio_reset before raid5_release_stripe is called should fix the race. Reported-and-tested-by: NStefan Priebe - Profihost AG <s.priebe@profihost.ag> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 07 9月, 2016 1 次提交
-
-
Install the callbacks via the state machine and let the core invoke the callbacks on the already online CPUs. Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Neil Brown <neilb@suse.com> Cc: linux-raid@vger.kernel.org Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160818125731.27256-10-bigeasy@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 01 9月, 2016 1 次提交
-
-
由 Shaohua Li 提交于
If there aren't enough stripes, reshape will hang. We have a check for this in new reshape, but miss it for reshape resume, hence we could see hang in reshape resume. This patch forces enough stripes existed if reshape resumes. Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 25 8月, 2016 3 次提交
-
-
由 Shaohua Li 提交于
bio_reset doesn't change bi_io_vec and bi_max_vecs, so we don't need to set them every time. bi_private will be set before the bio is dispatched. Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Shaohua Li 提交于
Yi reported a memory leak of raid5 with DIF/DIX enabled disks. raid5 doesn't alloc/free bio, instead it reuses bios. There are two issues in current code: 1. the code calls bio_init (from init_stripe->raid5_build_block->bio_init) then bio_reset (ops_run_io). The bio is reused, so likely there is integrity data attached. bio_init will clear a pointer to integrity data and makes bio_reset can't release the data 2. bio_reset is called before dispatching bio. After bio is finished, it's possible we don't free bio's integrity data (eg, we don't call bio_reset again) Both issues will cause memory leak. The patch moves bio_init to stripe creation and bio_reset to bio end io. This will fix the two issues. Reported-by: NYi Zhang <yizhan@redhat.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Song Liu 提交于
Currently, the code sets MD_JOURNAL_CLEAN when the array has MD_FEATURE_JOURNAL and the recovery_cp is MaxSector. The array will be MD_JOURNAL_CLEAN even if the journal device is missing. With this patch, the MD_JOURNAL_CLEAN is only set when the journal device presents. Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 08 8月, 2016 1 次提交
-
-
由 Jens Axboe 提交于
Since commit 63a4cc24, bio->bi_rw contains flags in the lower portion and the op code in the higher portions. This means that old code that relies on manually setting bi_rw is most likely going to be broken. Instead of letting that brokeness linger, rename the member, to force old and out-of-tree code to break at compile time instead of at runtime. No intended functional changes in this commit. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 06 8月, 2016 1 次提交
-
-
由 Alexey Obitotskiy 提交于
After array enters in faulty state (e.g. number of failed drives becomes more then accepted for raid5 level) it sets error flags (one of this flags is MD_CHANGE_PENDING). For internal metadata arrays MD_CHANGE_PENDING cleared into md_update_sb, but not for external metadata arrays. MD_CHANGE_PENDING flag set prevents to finish all new or non-finished IOs to array and hold them in pending state. In some cases this can leads to deadlock situation. For example, we have faulty array (2 of 4 drives failed) and udev handle array state changes and blkid started (or other userspace application that used array to read/write) but unable to finish reads due to IO hold. At the same time we unable to get exclusive access to array (to stop array in our case) because another external application still use this array. Fix makes possible to return IO with errors immediately. So external application can finish working with array and give exclusive access to other applications to perform required management actions with array. Signed-off-by: NAlexey Obitotskiy <aleksey.obitotskiy@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 02 8月, 2016 1 次提交
-
-
由 ZhengYuan Liu 提交于
The counter conf->empty_inactive_list_nr is only used for determine if the raid5 is congested which is deal with in function raid5_congested(). It was increased in get_free_stripe() when conf->inactive_list got to be empty and decreased in release_inactive_stripe_list() when splice temp_inactive_list to conf->inactive_list. However, this may have a problem when raid5_get_active_stripe or stripe_add_to_batch_list was called, because these two functions may call list_del_init(&sh->lru) to delete sh from "conf->inactive_list + hash" which may cause "conf->inactive_list + hash" to be empty when atomic_inc_not_zero(&sh->count) got false. So a check should be done at these two point and increase empty_inactive_list_nr accordingly. Otherwise the counter may get to be negative number which would influence async readahead from VFS. Signed-off-by: NZhengYuan Liu <liuzhengyuan@kylinos.cn> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 21 7月, 2016 1 次提交
-
-
由 Christoph Hellwig 提交于
These two are confusing leftover of the old world order, combining values of the REQ_OP_ and REQ_ namespaces. For callers that don't special case we mostly just replace bi_rw with bio_data_dir or op_is_write, except for the few cases where a switch over the REQ_OP_ values makes more sense. Any check for READA is replaced with an explicit check for REQ_RAHEAD. Also remove the READA alias for REQ_RAHEAD. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NMike Christie <mchristi@redhat.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 14 6月, 2016 5 次提交
-
-
由 NeilBrown 提交于
Every time a device is removed with ->hot_remove_disk() a synchronize_rcu() call is made which can delay several milliseconds in some case. If lots of devices fail at once - as could happen with a large RAID10 where one set of devices are removed all at once - these delays can add up to be very inconcenient. As failure is not reversible we can check for that first, setting a separate flag if it is found, and then all synchronize_rcu() once for all the flagged devices. Then ->hot_remove_disk() function can skip the synchronize_rcu() step if the flag is set. fix build error(Shaohua) Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
It is important that we never increment rdev->nr_pending on a Faulty device as ->hot_remove_disk() assumes that once the Faulty flag is visible no code will take a new reference. Some places take a new reference after only check In_sync. This should be safe as the two are changed together. However to make the code more obviously safe, add checks for 'Faulty' as well. Note: the actual rule is: Never increment nr_pending if Faulty is set and Blocked is clear, never clear Faulty, and never set Blocked without holding a reference through nr_pending. fix build error (Shaohua) Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
Being in the middle of resync is no longer protection against failed rdevs disappearing. So add rcu protection. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
The rdev could be freed while handle_failed_sync is running, so rcu protection is needed. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 08 6月, 2016 3 次提交
-
-
由 Mike Christie 提交于
To avoid confusion between REQ_OP_FLUSH, which is handled by request_fn drivers, and upper layers requesting the block layer perform a flush sequence along with possibly a WRITE, this patch renames REQ_FLUSH to REQ_PREFLUSH. Signed-off-by: NMike Christie <mchristi@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Mike Christie 提交于
We don't need bi_rw to be so large on 64 bit archs, so reduce it to unsigned int. Signed-off-by: NMike Christie <mchristi@redhat.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Mike Christie 提交于
Separate the op from the rq_flag_bits and have md set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: NMike Christie <mchristi@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 26 5月, 2016 1 次提交
-
-
由 Song Liu 提交于
In current handle_stripe_dirtying, the code prefers rmw with PARITY_ENABLE_RMW; while prefers rcw with PARITY_PREFER_RMW. This patch reverses this behavior. Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 10 5月, 2016 2 次提交
-
-
由 Guoqing Jiang 提交于
Some code waits for a metadata update by: 1. flagging that it is needed (MD_CHANGE_DEVS or MD_CHANGE_CLEAN) 2. setting MD_CHANGE_PENDING and waking the management thread 3. waiting for MD_CHANGE_PENDING to be cleared If the first two are done without locking, the code in md_update_sb() which checks if it needs to repeat might test if an update is needed before step 1, then clear MD_CHANGE_PENDING after step 2, resulting in the wait returning early. So make sure all places that set MD_CHANGE_PENDING are atomicial, and bit_clear_unless (suggested by Neil) is introduced for the purpose. Cc: Martin Kepplinger <martink@posteo.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: <linux-kernel@vger.kernel.org> Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Heinz Mauelshagen 提交于
In case md runs underneath the dm-raid target, the mddev does not have a request queue or gendisk, thus avoid accesses. This patch adds a missing conditional to the raid5 personality. Signed-of-by: NHeinz Mauelshagen <heinzm@redhat.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 30 4月, 2016 1 次提交
-
-
由 Shaohua Li 提交于
If device has R5_LOCKED set, it's legit device has R5_SkipCopy set and page != orig_page. After R5_LOCKED is clear, handle_stripe_clean_event will clear the SkipCopy flag and set page to orig_page. So the warning is unnecessary. Reported-by: NJoey Liao <joeyliao@qnap.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 18 3月, 2016 1 次提交
-
-
由 Anna-Maria Gleixner 提交于
The raid456_cpu_notify() hotplug callback lacks handling of the CPU_UP_CANCELED case. That means if CPU_UP_PREPARE fails, the scratch buffer is leaked. Add handling for CPU_UP_CANCELED[_FROZEN] hotplug notifier transitions to free the scratch buffer. CC: Shaohua Li <shli@kernel.org> CC: linux-raid@vger.kernel.org Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 10 3月, 2016 2 次提交
-
-
由 Shaohua Li 提交于
Neil recently fixed an obscure race in break_stripe_batch_list. Debug would be quite convenient if we know the stripe state. This is what this patch does. Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
break_stripe_batch_list breaks up a batch and copies some flags from the batch head to the members, preserving others. It doesn't preserve or copy STRIPE_PREREAD_ACTIVE. This is not normally a problem as STRIPE_PREREAD_ACTIVE is cleared when a stripe_head is added to a batch, and is not set on stripe_heads already in a batch. However there is no locking to ensure one thread doesn't set the flag after it has just been cleared in another. This does occasionally happen. md/raid5 maintains a count of the number of stripe_heads with STRIPE_PREREAD_ACTIVE set: conf->preread_active_stripes. When break_stripe_batch_list clears STRIPE_PREREAD_ACTIVE inadvertently this could becomes incorrect and will never again return to zero. md/raid5 delays the handling of some stripe_heads until preread_active_stripes becomes zero. So when the above mention race happens, those stripe_heads become blocked and never progress, resulting is write to the array handing. So: change break_stripe_batch_list to preserve STRIPE_PREREAD_ACTIVE in the members of a batch. URL: https://bugzilla.kernel.org/show_bug.cgi?id=108741 URL: https://bugzilla.redhat.com/show_bug.cgi?id=1258153 URL: http://thread.gmane.org/5649C0E9.2030204@zoner.cz Reported-by: Martin Svec <martin.svec@zoner.cz> (and others) Tested-by: NTom Weber <linux@junkyard.4t2.com> Fixes: 1b956f7a ("md/raid5: be more selective about distributing flags across batch.") Cc: stable@vger.kernel.org (v4.1 and later) Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 27 2月, 2016 2 次提交
-
-
由 Shaohua Li 提交于
Revert commit e9e4c377(md/raid5: per hash value and exclusive wait_for_stripe) The problem is raid5_get_active_stripe waits on conf->wait_for_stripe[hash]. Assume hash is 0. My test release stripes in this order: - release all stripes with hash 0 - raid5_get_active_stripe still sleeps since active_stripes > max_nr_stripes * 3 / 4 - release all stripes with hash other than 0. active_stripes becomes 0 - raid5_get_active_stripe still sleeps, since nobody wakes up wait_for_stripe[0] The system live locks. The problem is active_stripes isn't a per-hash count. Revert the patch makes the live lock go away. Cc: stable@vger.kernel.org (v4.2+) Cc: Yuanhan Liu <yuanhan.liu@linux.intel.com> Cc: NeilBrown <neilb@suse.de> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Shaohua Li 提交于
check_reshape() is called from raid5d thread. raid5d thread shouldn't call mddev_suspend(), because mddev_suspend() waits for all IO finish but IO is handled in raid5d thread, we could easily deadlock here. This issue is introduced by 738a2738 ("md/raid5: fix allocation of 'scribble' array.") Cc: stable@vger.kernel.org (v4.1+) Reported-and-tested-by: NArtur Paszkiewicz <artur.paszkiewicz@intel.com> Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 26 2月, 2016 1 次提交
-
-
由 Jes Sorensen 提交于
'max_discard_sectors' is in sectors, while 'stripe' is in bytes. This fixes the problem where DISCARD would get disabled on some larger RAID5 configurations (6 or more drives in my testing), while it worked as expected with smaller configurations. Fixes: 620125f2 ("MD: raid5 trim support") Cc: stable@vger.kernel.org v3.7+ Signed-off-by: NJes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 21 1月, 2016 1 次提交
-
-
由 Shaohua Li 提交于
These short function names are hard to search. Rename them to make vim happy. Signed-off-by: NShaohua Li <shli@fb.com>
-
- 06 1月, 2016 2 次提交
-
-
由 Shaohua Li 提交于
Add support for journal disk hot add/remove. Mostly trival checks in md part. The raid5 part is a little tricky. For hot-remove, we can't wait pending write as it's called from raid5d. The wait will cause deadlock. We simplily fail the hot-remove. A hot-remove retry can success eventually since if journal disk is faulty all pending write will be failed and finish. For hot-add, since an array supporting journal but without journal disk will be marked read-only, we are safe to hot add journal without stopping IO (should be read IO, while journal only handles write IO). Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Roman Gushchin 提交于
The stripe_add_to_batch_list() function is called only if stripe_can_batch() returned true, so there is no need for double check. Signed-off-by: NRoman Gushchin <klamm@yandex-team.ru> Cc: Neil Brown <neilb@suse.com> Cc: linux-raid@vger.kernel.org Signed-off-by: NNeilBrown <neilb@suse.com>
-
- 01 11月, 2015 6 次提交
-
-
由 Shaohua Li 提交于
Set journal disk ->raid_disk to >=0, I choose raid_disks + 1 instead of 0, because we already have a disk with ->raid_disk 0 and this causes sysfs entry creation conflict. A lot of places assumes disk with ->raid_disk >=0 is normal raid disk, so we add check for journal disk. Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Shaohua Li 提交于
If raid array is expected to have journal (eg, journal is set in MD superblock feature map) and the array is started without journal disk, start the array readonly. Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Shaohua Li 提交于
There are 3 places the raid5-cache dispatches IO. The discard IO error doesn't matter, so we ignore it. The superblock write IO error can be handled in MD core. The remaining are log write and flush. When the IO error happens, we mark log disk faulty and fail all write IO. Read IO is still allowed to run. Userspace will get a notification too and corresponding daemon can choose setting raid array readonly for example. Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Shaohua Li 提交于
raid5-cache uses journal disk rdev->bdev, rdev->mddev in several places. Don't allow journal disk disappear magically. On the other hand, we do need to update superblock for other disks to bump up ->events, so next time journal disk will be identified as stale. Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Shaohua Li 提交于
Move reclaim stop to quiesce handling, where is safer for this stuff. Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Shaohua Li 提交于
With log enabled, bio is written to raid disks after the bio is settled down in log disk. The recovery guarantees we can recovery the bio data from log disk, so we we skip FLUSH IO. Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-