- 18 3月, 2016 3 次提交
-
-
由 Anna-Maria Gleixner 提交于
The raid456_cpu_notify() hotplug callback lacks handling of the CPU_UP_CANCELED case. That means if CPU_UP_PREPARE fails, the scratch buffer is leaked. Add handling for CPU_UP_CANCELED[_FROZEN] hotplug notifier transitions to free the scratch buffer. CC: Shaohua Li <shli@kernel.org> CC: linux-raid@vger.kernel.org Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Shaohua Li 提交于
This is the raid10 counterpart of the bug fixed by Nate (raid1: include bio_end_io_list in nr_queued to prevent freeze_array hang) Fixes: 95af587e(md/raid10: ensure device failure recorded before write request returns) Cc: stable@vger.kernel.org (V4.3+) Cc: Nate Dailey <nate.dailey@stratus.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Nate Dailey 提交于
If raid1d is handling a mix of read and write errors, handle_read_error's call to freeze_array can get stuck. This can happen because, though the bio_end_io_list is initially drained, writes can be added to it via handle_write_finished as the retry_list is processed. These writes contribute to nr_pending but are not included in nr_queued. If a later entry on the retry_list triggers a call to handle_read_error, freeze array hangs waiting for nr_pending == nr_queued+extra. The writes on the bio_end_io_list aren't included in nr_queued so the condition will never be satisfied. To prevent the hang, include bio_end_io_list writes in nr_queued. There's probably a better way to handle decrementing nr_queued, but this seemed like the safest way to avoid breaking surrounding code. I'm happy to supply the script I used to repro this hang. Fixes: 55ce74d4(md/raid1: ensure device failure recorded before write request returns.) Cc: stable@vger.kernel.org (v4.3+) Signed-off-by: NNate Dailey <nate.dailey@stratus.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 15 3月, 2016 4 次提交
-
-
由 Guoqing Jiang 提交于
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Guoqing Jiang 提交于
The "return 0" is not needed since bitmap_checkpage will finally return 0 for the case. Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Guoqing Jiang 提交于
Since bitmap_start_sync will not return until sync_blocks is not less than PAGE_SIZE>>9, so the BUG_ON is not needed anymore. Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Ming Lei 提交于
Inside multipath_make_request(), multipath maps the incoming bio into low level device's bio, but it is totally wrong to copy the bio into mapped bio via '*mapped_bio = *bio'. For example, .__bi_remaining is kept in the copy, especially if the incoming bio is chained to via bio splitting, so .bi_end_io can't be called for the mapped bio at all in the completing path in this kind of situation. This patch fixes the issue by using clone style. Cc: stable@vger.kernel.org (v3.14+) Reported-and-tested-by: NAndrea Righi <righi.andrea@gmail.com> Signed-off-by: NMing Lei <ming.lei@canonical.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 10 3月, 2016 2 次提交
-
-
由 Shaohua Li 提交于
Neil recently fixed an obscure race in break_stripe_batch_list. Debug would be quite convenient if we know the stripe state. This is what this patch does. Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
break_stripe_batch_list breaks up a batch and copies some flags from the batch head to the members, preserving others. It doesn't preserve or copy STRIPE_PREREAD_ACTIVE. This is not normally a problem as STRIPE_PREREAD_ACTIVE is cleared when a stripe_head is added to a batch, and is not set on stripe_heads already in a batch. However there is no locking to ensure one thread doesn't set the flag after it has just been cleared in another. This does occasionally happen. md/raid5 maintains a count of the number of stripe_heads with STRIPE_PREREAD_ACTIVE set: conf->preread_active_stripes. When break_stripe_batch_list clears STRIPE_PREREAD_ACTIVE inadvertently this could becomes incorrect and will never again return to zero. md/raid5 delays the handling of some stripe_heads until preread_active_stripes becomes zero. So when the above mention race happens, those stripe_heads become blocked and never progress, resulting is write to the array handing. So: change break_stripe_batch_list to preserve STRIPE_PREREAD_ACTIVE in the members of a batch. URL: https://bugzilla.kernel.org/show_bug.cgi?id=108741 URL: https://bugzilla.redhat.com/show_bug.cgi?id=1258153 URL: http://thread.gmane.org/5649C0E9.2030204@zoner.cz Reported-by: Martin Svec <martin.svec@zoner.cz> (and others) Tested-by: NTom Weber <linux@junkyard.4t2.com> Fixes: 1b956f7a ("md/raid5: be more selective about distributing flags across batch.") Cc: stable@vger.kernel.org (v4.1 and later) Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 08 3月, 2016 1 次提交
-
-
由 Eric Engestrom 提交于
daemon_sleep is an unsigned, so testing if it's 0 or less than 1 does the same thing. Signed-off-by: NEric Engestrom <eric.engestrom@imgtec.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 27 2月, 2016 4 次提交
-
-
由 Shaohua Li 提交于
The personality thread shouldn't call mddev_suspend(). Because mddev_suspend() will for all IO finish, but IO is handled in personality thread, so this could cause deadlock. To trigger this early, add a warning if mddev_suspend() is called from personality thread. Suggested-by: NNeilBrown <neilb@suse.com> Cc: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Sebastian Parschauer 提交于
When stopping an MD device, then its device node /dev/mdX may still exist afterwards or it is recreated by udev. The next open() call can lead to creation of an inoperable MD device. The reason for this is that a change event (KOBJ_CHANGE) is sent to udev which races against the remove event (KOBJ_REMOVE) from md_free(). So drop sending the change event. A change is likely also required in mdadm as many versions send the change event to udev as well. Neil mentioned the change event is a workaround for old kernel Commit: 934d9c23 ("md: destroy partitions and notify udev when md array is stopped.") new mdadm can handle device remove now, so this isn't required any more. Cc: NeilBrown <neilb@suse.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: NSebastian Parschauer <sebastian.riemer@profitbricks.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Shaohua Li 提交于
Revert commit e9e4c377(md/raid5: per hash value and exclusive wait_for_stripe) The problem is raid5_get_active_stripe waits on conf->wait_for_stripe[hash]. Assume hash is 0. My test release stripes in this order: - release all stripes with hash 0 - raid5_get_active_stripe still sleeps since active_stripes > max_nr_stripes * 3 / 4 - release all stripes with hash other than 0. active_stripes becomes 0 - raid5_get_active_stripe still sleeps, since nobody wakes up wait_for_stripe[0] The system live locks. The problem is active_stripes isn't a per-hash count. Revert the patch makes the live lock go away. Cc: stable@vger.kernel.org (v4.2+) Cc: Yuanhan Liu <yuanhan.liu@linux.intel.com> Cc: NeilBrown <neilb@suse.de> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Shaohua Li 提交于
check_reshape() is called from raid5d thread. raid5d thread shouldn't call mddev_suspend(), because mddev_suspend() waits for all IO finish but IO is handled in raid5d thread, we could easily deadlock here. This issue is introduced by 738a2738 ("md/raid5: fix allocation of 'scribble' array.") Cc: stable@vger.kernel.org (v4.1+) Reported-and-tested-by: NArtur Paszkiewicz <artur.paszkiewicz@intel.com> Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 26 2月, 2016 1 次提交
-
-
由 Jes Sorensen 提交于
'max_discard_sectors' is in sectors, while 'stripe' is in bytes. This fixes the problem where DISCARD would get disabled on some larger RAID5 configurations (6 or more drives in my testing), while it worked as expected with smaller configurations. Fixes: 620125f2 ("MD: raid5 trim support") Cc: stable@vger.kernel.org v3.7+ Signed-off-by: NJes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 22 2月, 2016 1 次提交
-
-
由 Mike Snitzer 提交于
Using request-based DM mpath configured with the following stacking (.request_fn DM mpath ontop of scsi-mq paths): echo Y > /sys/module/scsi_mod/parameters/use_blk_mq echo N > /sys/module/dm_mod/parameters/use_blk_mq 'struct dm_rq_target_io' would leak if a request is requeued before a blk-mq clone is allocated (or fails to allocate). free_rq_tio() wasn't being called. kmemleak reported: unreferenced object 0xffff8800b90b98c0 (size 112): comm "kworker/7:1H", pid 5692, jiffies 4295056109 (age 78.589s) hex dump (first 32 bytes): 00 d0 5c 2c 03 88 ff ff 40 00 bf 01 00 c9 ff ff ..\,....@....... e0 d9 b1 34 00 88 ff ff 00 00 00 00 00 00 00 00 ...4............ backtrace: [<ffffffff81672b6e>] kmemleak_alloc+0x4e/0xb0 [<ffffffff811dbb63>] kmem_cache_alloc+0xc3/0x1e0 [<ffffffff8117eae5>] mempool_alloc_slab+0x15/0x20 [<ffffffff8117ec1e>] mempool_alloc+0x6e/0x170 [<ffffffffa00029ac>] dm_old_prep_fn+0x3c/0x180 [dm_mod] [<ffffffff812fbd78>] blk_peek_request+0x168/0x290 [<ffffffffa0003e62>] dm_request_fn+0xb2/0x1b0 [dm_mod] [<ffffffff812f66e3>] __blk_run_queue+0x33/0x40 [<ffffffff812f9585>] blk_delay_work+0x25/0x40 [<ffffffff81096fff>] process_one_work+0x14f/0x3d0 [<ffffffff81097715>] worker_thread+0x125/0x4b0 [<ffffffff8109ce88>] kthread+0xd8/0xf0 [<ffffffff8167cb8f>] ret_from_fork+0x3f/0x70 [<ffffffffffffffff>] 0xffffffffffffffff crash> struct -o dm_rq_target_io struct dm_rq_target_io { ... } SIZE: 112 Fixes: e5863d9a ("dm: allocate requests in target when stacking on blk-mq devices") Cc: stable@vger.kernel.org # 4.0+ Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 25 1月, 2016 2 次提交
-
-
由 Shaohua Li 提交于
page->index already considers node offset. The node_offset calculation in write_sb_page is useless and confusion. Cc: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: NeilBrown <neilb@suse.com> Acked-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Shaohua Li 提交于
There are several places we allocate dlm_lock_resource, but not free it. leave() need free a lock resource too (from Guoqing) Cc: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: Guoqing Jiang <gqjiang@suse.com> Cc: NeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 21 1月, 2016 1 次提交
-
-
由 Shaohua Li 提交于
These short function names are hard to search. Rename them to make vim happy. Signed-off-by: NShaohua Li <shli@fb.com>
-
- 14 1月, 2016 4 次提交
-
-
由 Dan Williams 提交于
It is not safe for an integrity profile to be changed while i/o is in-flight in the queue. Prevent adding new disks or otherwise online spares to an array if the device has an incompatible integrity profile. The original change to the blk_integrity_unregister implementation in md, commmit c7bfced9 "md: suspend i/o during runtime blk_integrity_unregister" introduced an immediate hang regression. This policy of disallowing changes the integrity profile once one has been established is shared with DM. Here is an abbreviated log from a test run that: 1/ Creates a degraded raid1 with an integrity-enabled device (pmem0s) [ 59.076127] 2/ Tries to add an integrity-disabled device (pmem1m) [ 90.489209] 3/ Retries with an integrity-enabled device (pmem1s) [ 205.671277] [ 59.076127] md/raid1:md0: active with 1 out of 2 mirrors [ 59.078302] md: data integrity enabled on md0 [..] [ 90.489209] md0: incompatible integrity profile for pmem1m [..] [ 205.671277] md: super_written gets error=-5 [ 205.677386] md/raid1:md0: Disk failure on pmem1m, disabling device. [ 205.677386] md/raid1:md0: Operation continuing on 1 devices. [ 205.683037] RAID1 conf printout: [ 205.684699] --- wd:1 rd:2 [ 205.685972] disk 0, wo:0, o:1, dev:pmem0s [ 205.687562] disk 1, wo:1, o:1, dev:pmem1s [ 205.691717] md: recovery of RAID array md0 Fixes: c7bfced9 ("md: suspend i/o during runtime blk_integrity_unregister") Cc: <stable@vger.kernel.org> Cc: Mike Snitzer <snitzer@redhat.com> Reported-by: NNeilBrown <neilb@suse.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Shaohua Li 提交于
Handle journal hotadd in quiesce to avoid creating duplicated threads. Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Shaohua Li 提交于
Hot add journal disk in recovery thread context brings a lot of trouble as IO could be running. Unlike spare disk hot add, adding journal disk with array suspended makes more sense and implmentation is much easier. Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Shaohua Li 提交于
Set MD_HAS_JOURNAL when a array is loaded or journal is initialized. This is to avoid the flags set too early in journal disk hotadd. Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
- 10 1月, 2016 2 次提交
-
-
由 Dan Williams 提交于
For symmetry with badblocks_init() make it clear that this path only destroys incremental allocations of a badblocks instance, and does not free the badblocks instance itself. Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Vishal Verma 提交于
Retain badblocks as part of rdev, but use the accessor functions from include/linux/badblocks for all manipulation. Signed-off-by: NVishal Verma <vishal.l.verma@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 09 1月, 2016 1 次提交
-
-
由 Mikulas Patocka 提交于
When there is an error copying a chunk dm-snapshot can incorrectly hold associated bios indefinitely, resulting in hung IO. The function copy_callback sets pe->error if there was error copying the chunk, and then calls complete_exception. complete_exception calls pending_complete on error, otherwise it calls commit_exception with commit_callback (and commit_callback calls complete_exception). The persistent exception store (dm-snap-persistent.c) assumes that calls to prepare_exception and commit_exception are paired. persistent_prepare_exception increases ps->pending_count and persistent_commit_exception decreases it. If there is a copy error, persistent_prepare_exception is called but persistent_commit_exception is not. This results in the variable ps->pending_count never returning to zero and that causes some pending exceptions (and their associated bios) to be held forever. Fix this by unconditionally calling commit_exception regardless of whether the copy was successful. A new "valid" parameter is added to commit_exception -- when the copy fails this parameter is set to zero so that the chunk that failed to copy (and all following chunks) is not recorded in the snapshot store. Also, remove commit_callback now that it is merely a wrapper around pending_complete. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
-
- 07 1月, 2016 3 次提交
-
-
由 Mike Snitzer 提交于
Commit 3d5f6733 ("dm thin metadata: speed up discard of partially mapped volumes"), or some other dm-thinp change during the Linux 4.5 development window, really should've bumped these target versions. Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 NeilBrown 提交于
This field is always set in tandem with ->pers, and when it is tested ->pers is also tested. So ->ready is not needed. It was needed once, but code rearrangement and locking changes have removed that needed. Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Guoqing Jiang 提交于
md_new_event had removed sysfs_notify since 'commit 72a23c21 ("Make sure all changes to md/sync_action are notified.")', so we can use md_new_event and delete md_new_event_inintr. Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
- 06 1月, 2016 11 次提交
-
-
由 Christoph Hellwig 提交于
And propagate the error up the stack so we can add the stripe to no_stripes_list and retry our log operation later. This avoids blocking raid5d due to reclaim, an it allows to get rid of the deadlock-prone GFP_NOFAIL allocation. shli: add missing mempool_destroy() Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Christoph Hellwig 提交于
We only have a limited number in flight, so use a page based mempool. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Christoph Hellwig 提交于
This allows us to make guaranteed forward progress. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Shaohua Li 提交于
Add support for journal disk hot add/remove. Mostly trival checks in md part. The raid5 part is a little tricky. For hot-remove, we can't wait pending write as it's called from raid5d. The wait will cause deadlock. We simplily fail the hot-remove. A hot-remove retry can success eventually since if journal disk is faulty all pending write will be failed and finish. For hot-add, since an array supporting journal but without journal disk will be marked read-only, we are safe to hot add journal without stopping IO (should be read IO, while journal only handles write IO). Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Deepa Dinamani 提交于
get_seconds() API is not y2038 safe on 32 bit systems and the API is deprecated. Replace it with calls to ktime_get_real_seconds() API instead. Change mddev structure types to time64_t accordingly. 32 bit signed timestamps will overflow in the year 2038. Change the user interface mdu_array_info_s structure timestamps: ctime and utime values used in ioctls GET_ARRAY_INFO and SET_ARRAY_INFO to unsigned int. This will extend the field to last until the year 2106. The long term plan is to get rid of ctime and utime values in this structure as this information can be read from the on-disk meta data directly. Clamp the tim64_t timestamps to positive values with a max of U32_MAX when returning from GET_ARRAY_INFO ioctl to accommodate above changes in the data type of timestamps to unsigned int. v0.90 on disk meta data uses u32 for maintaining time stamps. So this will also last until year 2106. Assumption is that the usage of v0.90 will be deprecated by year 2106. Timestamp fields in the on disk meta data for v1.0 version already use 64 bit data types. Remove the truncation of the bits while writing to or reading from these from the disk. Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Arnd Bergmann 提交于
When CONFIG_LBDAF is not set, sector_t is only 32-bits wide, which means we cannot have devices with more than 2TB, and the code that is trying to handle compatibility support for large devices in md version 0.90 is meaningless but also causes a compile-time warning: drivers/md/md.c: In function 'super_90_load': drivers/md/md.c:1029:19: warning: large integer implicitly truncated to unsigned type [-Woverflow] drivers/md/md.c: In function 'super_90_rdev_size_change': drivers/md/md.c:1323:17: warning: large integer implicitly truncated to unsigned type [-Woverflow] This adds a check for CONFIG_LBDAF to avoid even getting into this code path, and also adds an explicit cast to let the compiler know it doesn't have to warn about the truncation. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Christoph Hellwig 提交于
Once the I/O completed we don't need the meta page anymore. As the iounits can live on for a long time this reduces memory pressure a bit. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NShaohua Li <shli@fb.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Christoph Hellwig 提交于
It's only used for one kind of move, so make that explicit. Also clean up the code a bit by using list_for_each_safe. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NShaohua Li <shli@fb.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Guoqing Jiang 提交于
MD_CHANGE_CLEAN had been replaced with MD_CHANGE_PENDING after commit 070dc6 ("md: resolve confusion of MD_CHANGE_CLEAN"), so make the change accordingly. Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Guoqing Jiang 提交于
1. fix unbalanced parentheses. 2. add more description about that MD_CLUSTER_SEND_LOCKED_ALREADY will be cleared after set it in add_new_disk. Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Guoqing Jiang 提交于
Communication can happen through multiple threads. It is possible that one thread steps over another threads sequence. So, we use mutexes to protect both the send and receive sequences. Send communication is locked through state bit, MD_CLUSTER_SEND_LOCK. Communication is locked with bit manipulation in order to allow "lock and hold" for the add operation. In case of an add operation, if the lock is held, MD_CLUSTER_SEND_LOCKED_ALREADY is set. When md_update_sb() calls metadata_update_start(), it checks (in a single statement to avoid races), if the communication is already locked. If yes, it merely returns zero, else it locks the token lockresource. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-