- 14 1月, 2016 4 次提交
-
-
由 Dan Williams 提交于
It is not safe for an integrity profile to be changed while i/o is in-flight in the queue. Prevent adding new disks or otherwise online spares to an array if the device has an incompatible integrity profile. The original change to the blk_integrity_unregister implementation in md, commmit c7bfced9 "md: suspend i/o during runtime blk_integrity_unregister" introduced an immediate hang regression. This policy of disallowing changes the integrity profile once one has been established is shared with DM. Here is an abbreviated log from a test run that: 1/ Creates a degraded raid1 with an integrity-enabled device (pmem0s) [ 59.076127] 2/ Tries to add an integrity-disabled device (pmem1m) [ 90.489209] 3/ Retries with an integrity-enabled device (pmem1s) [ 205.671277] [ 59.076127] md/raid1:md0: active with 1 out of 2 mirrors [ 59.078302] md: data integrity enabled on md0 [..] [ 90.489209] md0: incompatible integrity profile for pmem1m [..] [ 205.671277] md: super_written gets error=-5 [ 205.677386] md/raid1:md0: Disk failure on pmem1m, disabling device. [ 205.677386] md/raid1:md0: Operation continuing on 1 devices. [ 205.683037] RAID1 conf printout: [ 205.684699] --- wd:1 rd:2 [ 205.685972] disk 0, wo:0, o:1, dev:pmem0s [ 205.687562] disk 1, wo:1, o:1, dev:pmem1s [ 205.691717] md: recovery of RAID array md0 Fixes: c7bfced9 ("md: suspend i/o during runtime blk_integrity_unregister") Cc: <stable@vger.kernel.org> Cc: Mike Snitzer <snitzer@redhat.com> Reported-by: NNeilBrown <neilb@suse.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Shaohua Li 提交于
Handle journal hotadd in quiesce to avoid creating duplicated threads. Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Shaohua Li 提交于
Hot add journal disk in recovery thread context brings a lot of trouble as IO could be running. Unlike spare disk hot add, adding journal disk with array suspended makes more sense and implmentation is much easier. Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Shaohua Li 提交于
Set MD_HAS_JOURNAL when a array is loaded or journal is initialized. This is to avoid the flags set too early in journal disk hotadd. Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
- 07 1月, 2016 2 次提交
-
-
由 NeilBrown 提交于
This field is always set in tandem with ->pers, and when it is tested ->pers is also tested. So ->ready is not needed. It was needed once, but code rearrangement and locking changes have removed that needed. Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Guoqing Jiang 提交于
md_new_event had removed sysfs_notify since 'commit 72a23c21 ("Make sure all changes to md/sync_action are notified.")', so we can use md_new_event and delete md_new_event_inintr. Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
- 06 1月, 2016 18 次提交
-
-
由 Christoph Hellwig 提交于
And propagate the error up the stack so we can add the stripe to no_stripes_list and retry our log operation later. This avoids blocking raid5d due to reclaim, an it allows to get rid of the deadlock-prone GFP_NOFAIL allocation. shli: add missing mempool_destroy() Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Christoph Hellwig 提交于
We only have a limited number in flight, so use a page based mempool. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Christoph Hellwig 提交于
This allows us to make guaranteed forward progress. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Shaohua Li 提交于
Add support for journal disk hot add/remove. Mostly trival checks in md part. The raid5 part is a little tricky. For hot-remove, we can't wait pending write as it's called from raid5d. The wait will cause deadlock. We simplily fail the hot-remove. A hot-remove retry can success eventually since if journal disk is faulty all pending write will be failed and finish. For hot-add, since an array supporting journal but without journal disk will be marked read-only, we are safe to hot add journal without stopping IO (should be read IO, while journal only handles write IO). Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Deepa Dinamani 提交于
get_seconds() API is not y2038 safe on 32 bit systems and the API is deprecated. Replace it with calls to ktime_get_real_seconds() API instead. Change mddev structure types to time64_t accordingly. 32 bit signed timestamps will overflow in the year 2038. Change the user interface mdu_array_info_s structure timestamps: ctime and utime values used in ioctls GET_ARRAY_INFO and SET_ARRAY_INFO to unsigned int. This will extend the field to last until the year 2106. The long term plan is to get rid of ctime and utime values in this structure as this information can be read from the on-disk meta data directly. Clamp the tim64_t timestamps to positive values with a max of U32_MAX when returning from GET_ARRAY_INFO ioctl to accommodate above changes in the data type of timestamps to unsigned int. v0.90 on disk meta data uses u32 for maintaining time stamps. So this will also last until year 2106. Assumption is that the usage of v0.90 will be deprecated by year 2106. Timestamp fields in the on disk meta data for v1.0 version already use 64 bit data types. Remove the truncation of the bits while writing to or reading from these from the disk. Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Arnd Bergmann 提交于
When CONFIG_LBDAF is not set, sector_t is only 32-bits wide, which means we cannot have devices with more than 2TB, and the code that is trying to handle compatibility support for large devices in md version 0.90 is meaningless but also causes a compile-time warning: drivers/md/md.c: In function 'super_90_load': drivers/md/md.c:1029:19: warning: large integer implicitly truncated to unsigned type [-Woverflow] drivers/md/md.c: In function 'super_90_rdev_size_change': drivers/md/md.c:1323:17: warning: large integer implicitly truncated to unsigned type [-Woverflow] This adds a check for CONFIG_LBDAF to avoid even getting into this code path, and also adds an explicit cast to let the compiler know it doesn't have to warn about the truncation. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Christoph Hellwig 提交于
Once the I/O completed we don't need the meta page anymore. As the iounits can live on for a long time this reduces memory pressure a bit. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NShaohua Li <shli@fb.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Christoph Hellwig 提交于
It's only used for one kind of move, so make that explicit. Also clean up the code a bit by using list_for_each_safe. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NShaohua Li <shli@fb.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Guoqing Jiang 提交于
MD_CHANGE_CLEAN had been replaced with MD_CHANGE_PENDING after commit 070dc6 ("md: resolve confusion of MD_CHANGE_CLEAN"), so make the change accordingly. Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Guoqing Jiang 提交于
1. fix unbalanced parentheses. 2. add more description about that MD_CLUSTER_SEND_LOCKED_ALREADY will be cleared after set it in add_new_disk. Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Guoqing Jiang 提交于
Communication can happen through multiple threads. It is possible that one thread steps over another threads sequence. So, we use mutexes to protect both the send and receive sequences. Send communication is locked through state bit, MD_CLUSTER_SEND_LOCK. Communication is locked with bit manipulation in order to allow "lock and hold" for the add operation. In case of an add operation, if the lock is held, MD_CLUSTER_SEND_LOCKED_ALREADY is set. When md_update_sb() calls metadata_update_start(), it checks (in a single statement to avoid races), if the communication is already locked. If yes, it merely returns zero, else it locks the token lockresource. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Guoqing Jiang 提交于
Reloading of superblock must be performed under reconfig_mutex. However, this cannot be done with md_reload_sb because it would deadlock with the message DLM lock. So, we defer it in md_check_recovery() which is executed by mddev->thread. This introduces a new flag, MD_RELOAD_SB, which if set, will reload the superblock. And good_device_nr is also added to 'struct mddev' which is used to get the num of the good device within cluster raid. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Guoqing Jiang 提交于
For clustered raid, we need to do extra actions when change bitmap to none. 1. check if all the bitmap lock could be get or not, if yes then we can continue the change since cluster raid is only active in current node. Otherwise return fail and unlock the related bitmap locks 2. set nodes to 0 and then leave cluster environment. 3. release other nodes's bitmap lock. Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Goldwyn Rodrigues 提交于
If a spare device was marked faulty, it would not be reflected in receiving nodes because it would mark it as activated and continue. Continue the operation, so it may be set as faulty. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Goldwyn Rodrigues 提交于
The remove disk message does not need metadata_update_start(), but can be an independent message. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Guoqing Jiang 提交于
For cluster raid, if one disk couldn't be reach in one node, then other nodes would receive the REMOVE message for the disk. In receiving node, we can't call md_kick_rdev_from_array to remove the disk from array synchronously since the disk might still be busy in this node. So let's set a ClusterRemove flag on the disk, then let the thread to do the removal job eventually. Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Goldwyn Rodrigues 提交于
If a RESYNCING message with (0,0) has been sent before, do not send it again. This avoids a resync ping pong between the nodes. We read the bitmap lockresource's LVB to figure out the previous value of the RESYNCING message. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Roman Gushchin 提交于
The stripe_add_to_batch_list() function is called only if stripe_can_batch() returned true, so there is no need for double check. Signed-off-by: NRoman Gushchin <klamm@yandex-team.ru> Cc: Neil Brown <neilb@suse.com> Cc: linux-raid@vger.kernel.org Signed-off-by: NNeilBrown <neilb@suse.com>
-
- 21 12月, 2015 1 次提交
-
-
由 NeilBrown 提交于
md currently doesn't allow a 'sync_action' such as 'reshape' to be set while MD_RECOVERY_NEEDED is set. This s a problem, particularly since commit 738a2738 as that can cause ->check_shape to call mddev_resume() which sets MD_RECOVERY_NEEDED. So by the time we come to start 'reshape' it is very likely that MD_RECOVERY_NEEDED is still set. Testing for this flag is not really needed and is in any case very racy as it can be set at any moment - asynchronously. Any race between setting a sync_action and setting MD_RECOVERY_NEEDED must already be handled properly in some locked code, probably md_check_recovery(), so remove the test here. The test on MD_RECOVERY_RUNNING is also racy in the 'reshape' case so we should test it again after getting mddev_lock(). As this fixes a race and a regression which can cause 'reshape' to fail, it is suitable for -stable kernels since 4.1 Reported-by: NXiao Ni <xni@redhat.com> Fixes: 738a2738 ("md/raid5: fix allocation of 'scribble' array.") Cc: stable@vger.kernel.org (v4.1+) Signed-off-by: NNeilBrown <neilb@suse.com>
-
- 18 12月, 2015 4 次提交
-
-
由 Goldwyn Rodrigues 提交于
Commit 2910ff17 introduced a regression which would remove a recently added spare via slot_store. Revert part of the patch which touches slot_store() and add the disk directly using pers->hot_add_disk() Fixes: 2910ff17 ("md: remove_and_add_spares() to activate specific rdev") Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NPawel Baldysiak <pawel.baldysiak@intel.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Mikulas Patocka 提交于
The patch c7bfced9 committed to 4.4-rc causes crash in LVM test shell/lvchange-raid.sh. The kernel crashes with this BUG, the reason is that we attempt to suspend a device that is already suspended. See also https://bugzilla.redhat.com/show_bug.cgi?id=1283491 This patch fixes the bug by changing functions mddev_suspend and mddev_resume to always nest. The number of nested calls to mddev_nested_suspend is kept in the variable mddev->suspended. [neilb: made mddev_suspend() always nest instead of introduce mddev_nested_suspend] kernel BUG at drivers/md/md.c:317! CPU: 3 PID: 32754 Comm: lvm Not tainted 4.4.0-rc2 #1 task: 0000000047076040 ti: 0000000047014000 task.ti: 0000000047014000 YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI PSW: 00001000000001000000000000001111 Not tainted r00-03 000000000804000f 00000000102c5280 0000000010c7522c 000000007e3d1810 r04-07 0000000010c6f000 000000004ef37f20 000000007e3d1dd0 000000007e3d1810 r08-11 000000007c9f1600 0000000000000000 0000000000000001 ffffffffffffffff r12-15 0000000010c1d000 0000000000000041 00000000f98d63c8 00000000f98e49e4 r16-19 00000000f98e49e4 00000000c138fd06 00000000f98d63c8 0000000000000001 r20-23 0000000000000002 000000004ef37f00 00000000000000b0 00000000000001d1 r24-27 00000000424783a0 000000007e3d1dd0 000000007e3d1810 00000000102b2000 r28-31 0000000000000001 0000000047014840 0000000047014930 0000000000000001 sr00-03 0000000007040800 0000000000000000 0000000000000000 0000000007040800 sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000102c538c 00000000102c5390 IIR: 03ffe01f ISR: 0000000000000000 IOR: 00000000102b2748 CPU: 3 CR30: 0000000047014000 CR31: 0000000000000000 ORIG_R28: 00000000000000b0 IAOQ[0]: mddev_suspend+0x10c/0x160 [md_mod] IAOQ[1]: mddev_suspend+0x110/0x160 [md_mod] RP(r2): raid1_add_disk+0xd4/0x2c0 [raid1] Backtrace: [<0000000010c7522c>] raid1_add_disk+0xd4/0x2c0 [raid1] [<0000000010c20078>] raid_resume+0x390/0x418 [dm_raid] [<00000000105833e8>] dm_table_resume_targets+0xc0/0x188 [dm_mod] [<000000001057f784>] dm_resume+0x144/0x1e0 [dm_mod] [<0000000010587dd4>] dev_suspend+0x1e4/0x568 [dm_mod] [<0000000010589278>] ctl_ioctl+0x1e8/0x428 [dm_mod] [<0000000010589518>] dm_compat_ctl_ioctl+0x18/0x68 [dm_mod] [<0000000040377b88>] compat_SyS_ioctl+0xd0/0x1558 Fixes: c7bfced9 ("md: suspend i/o during runtime blk_integrity_unregister") Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Shaohua Li 提交于
Neil pointed out setting journal disk role to raid_disks will confuse reshape if we support reshape eventually. Switching the role to 0 (we should be fine as long as the value >=0) and skip sysfs file creation to avoid error. Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Artur Paszkiewicz 提交于
The commit c31df25f ("md/raid10: make sync_request_write() call bio_copy_data()") replaced manual data copying with bio_copy_data() but it doesn't work as intended. The source bio (fbio) is already processed, so its bvec_iter has bi_size == 0 and bi_idx == bi_vcnt. Because of this, bio_copy_data() either does not copy anything, or worse, copies data from the ->bi_next bio if it is set. This causes wrong data to be written to drives during resync and sometimes lockups/crashes in bio_copy_data(): [ 517.338478] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [md126_raid10:3319] [ 517.347324] Modules linked in: raid10 xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables x86_pkg_temp_thermal coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul cryptd shpchp pcspkr ipmi_si ipmi_msghandler tpm_crb acpi_power_meter acpi_cpufreq ext4 mbcache jbd2 sr_mod cdrom sd_mod e1000e ax88179_178a usbnet mii ahci ata_generic crc32c_intel libahci ptp pata_acpi libata pps_core wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod [ 517.440555] CPU: 0 PID: 3319 Comm: md126_raid10 Not tainted 4.3.0-rc6+ #1 [ 517.448384] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS PLYDCRB1.86B.0055.D14.1509221924 09/22/2015 [ 517.459768] task: ffff880153773980 ti: ffff880150df8000 task.ti: ffff880150df8000 [ 517.468529] RIP: 0010:[<ffffffff812e1888>] [<ffffffff812e1888>] bio_copy_data+0xc8/0x3c0 [ 517.478164] RSP: 0018:ffff880150dfbc98 EFLAGS: 00000246 [ 517.484341] RAX: ffff880169356688 RBX: 0000000000001000 RCX: 0000000000000000 [ 517.492558] RDX: 0000000000000000 RSI: ffffea0001ac2980 RDI: ffffea0000d835c0 [ 517.500773] RBP: ffff880150dfbd08 R08: 0000000000000001 R09: ffff880153773980 [ 517.508987] R10: ffff880169356600 R11: 0000000000001000 R12: 0000000000010000 [ 517.517199] R13: 000000000000e000 R14: 0000000000000000 R15: 0000000000001000 [ 517.525412] FS: 0000000000000000(0000) GS:ffff880174a00000(0000) knlGS:0000000000000000 [ 517.534844] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 517.541507] CR2: 00007f8a044d5fed CR3: 0000000169504000 CR4: 00000000001406f0 [ 517.549722] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 517.557929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 517.566144] Stack: [ 517.568626] ffff880174a16bc0 ffff880153773980 ffff880169356600 0000000000000000 [ 517.577659] 0000000000000001 0000000000000001 ffff880153773980 ffff88016a61a800 [ 517.586715] ffff880150dfbcf8 0000000000000001 ffff88016dd209e0 0000000000001000 [ 517.595773] Call Trace: [ 517.598747] [<ffffffffa043ef95>] raid10d+0xfc5/0x1690 [raid10] [ 517.605610] [<ffffffff816697ae>] ? __schedule+0x29e/0x8e2 [ 517.611987] [<ffffffff814ff206>] md_thread+0x106/0x140 [ 517.618072] [<ffffffff810c1d80>] ? wait_woken+0x80/0x80 [ 517.624252] [<ffffffff814ff100>] ? super_1_load+0x520/0x520 [ 517.630817] [<ffffffff8109ef89>] kthread+0xc9/0xe0 [ 517.636506] [<ffffffff8109eec0>] ? flush_kthread_worker+0x70/0x70 [ 517.643653] [<ffffffff8166d99f>] ret_from_fork+0x3f/0x70 [ 517.649929] [<ffffffff8109eec0>] ? flush_kthread_worker+0x70/0x70 Signed-off-by: NArtur Paszkiewicz <artur.paszkiewicz@intel.com> Reviewed-by: NShaohua Li <shli@kernel.org> Cc: stable@vger.kernel.org (v4.2+) Fixes: c31df25f ("md/raid10: make sync_request_write() call bio_copy_data()") Signed-off-by: NNeilBrown <neilb@suse.com>
-
- 10 12月, 2015 3 次提交
-
-
由 Joe Thornber 提交于
If dm_btree_del()'s call to push_frame() fails, e.g. due to btree_node_validator finding invalid metadata, the dm_btree_del() error path must unlock all frames (which have active dm-bufio buffers) that were pushed onto the del_stack. Otherwise, dm_bufio_client_destroy() will BUG_ON() because dm-bufio buffers have leaked, e.g.: device-mapper: bufio: leaked buffer 3, hold count 1, list 0 Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
-
由 Joe Thornber 提交于
When applying block operations (BOPs) do not remove them from the uncommitted BOP ring-buffer until after they've been applied -- in case we recurse. Also, perform BOP_INC operation, in dm_sm_metadata_create() and sm_metadata_extend(), in terms of the uncommitted BOP ring-buffer rather than using direct calls to sm_ll_inc(). Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
-
由 Joe Thornber 提交于
When you take a metadata snapshot the btree roots for the mapping and details tree need to have their reference counts incremented so they persist for the lifetime of the metadata snap. The roots being incremented were those currently written in the superblock, which could possibly be out of date if concurrent IO is triggering new mappings, breaking of sharing, etc. Fix this by performing a commit with the metadata lock held while taking a metadata snapshot. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
-
- 03 12月, 2015 2 次提交
-
-
由 Joe Thornber 提交于
dm_btree_remove_leaves() only unmaps a contiguous region so we need a loop, in __remove_range(), to handle ranges that contain multiple regions. A new btree function, dm_btree_lookup_next(), is introduced which is more efficiently able to skip over regions of the thin device which aren't mapped. __remove_range() uses dm_btree_lookup_next() for each iteration of __remove_range()'s loop. Also, improve description of dm_btree_remove_leaves(). Fixes: 6550f075 ("dm thin metadata: add dm_thin_remove_range()") Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 4.1+
-
由 Mike Snitzer 提交于
The block allocated at the start of btree_split_sibling() is never released if later insert_at() fails. Fix this by releasing the previously allocated bufio block using unlock_block(). Reported-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
-
- 24 11月, 2015 1 次提交
-
-
由 Mike Snitzer 提交于
When establishing a thin device's discard limits we cannot rely on the underlying thin-pool device's discard capabilities (which are inherited from the thin-pool's underlying data device) given that DM thin devices must provide discard support even when the thin-pool's underlying data device doesn't support discards. Users were exposed to this thin device discard limits regression if their thin-pool's underlying data device does _not_ support discards. This regression caused all upper-layers that called the blkdev_issue_discard() interface to not be able to issue discards to thin devices (because discard_granularity was 0). This regression wasn't caught earlier because the device-mapper-test-suite's extensive 'thin-provisioning' discard tests are only ever performed against thin-pool's with data devices that support discards. Fix is to have thin_io_hints() test the pool's 'discard_enabled' feature rather than inferring whether or not a thin device's discard support should be enabled by looking at the thin-pool's discard_granularity. Fixes: 21607670 ("dm thin: disable discard support for thin devices if pool's is disabled") Reported-by: NMike Gerber <mike@sprachgewalt.de> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 4.1+
-
- 20 11月, 2015 1 次提交
-
-
由 Mikulas Patocka 提交于
A kernel thread executes __set_current_state(TASK_INTERRUPTIBLE), __add_wait_queue, spin_unlock_irq and then tests kthread_should_stop(). It is possible that the processor reorders memory accesses so that kthread_should_stop() is executed before __set_current_state(). If such reordering happens, there is a possible race on thread termination: CPU 0: calls kthread_should_stop() it tests KTHREAD_SHOULD_STOP bit, returns false CPU 1: calls kthread_stop(cc->write_thread) sets the KTHREAD_SHOULD_STOP bit calls wake_up_process on the kernel thread, that sets the thread state to TASK_RUNNING CPU 0: sets __set_current_state(TASK_INTERRUPTIBLE) spin_unlock_irq(&cc->write_thread_wait.lock) schedule() - and the process is stuck and never terminates, because the state is TASK_INTERRUPTIBLE and wake_up_process on CPU 1 already terminated Fix this race condition by using a new flag DM_CRYPT_EXIT_THREAD to signal that the kernel thread should exit. The flag is set and tested while holding cc->write_thread_wait.lock, so there is no possibility of racy access to the flag. Also, remove the unnecessary set_task_state(current, TASK_RUNNING) following the schedule() call. When the process was woken up, its state was already set to TASK_RUNNING. Other kernel code also doesn't set the state to TASK_RUNNING following schedule() (for example, do_wait_for_common in completion.c doesn't do it). Fixes: dc267621 ("dm crypt: offload writes to thread") Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 18 11月, 2015 3 次提交
-
-
由 Junichi Nomura 提交于
In multipath_prepare_ioctl(), - pgpath is a path selected from available paths - m->queue_io is true if we cannot send a request immediately to paths, either because: * there is no available path * the path group needs activation (pg_init) - pg_init is not started - pg_init is still running - m->queue_if_no_path is true if the device is configured to queue I/O if there are no available paths If !pgpath && !m->queue_if_no_path, the handler should return -EIO. However in the course of refactoring the condition check has broken and returns success in that case. Since bdev points to the dm device itself, dm_blk_ioctl() calls __blk_dev_driver_ioctl() for itself and recurses until crash. You could reproduce the problem like this: # dmsetup create mp --table '0 1024 multipath 0 0 0 0' # sg_inq /dev/mapper/mp <crash> [ 172.648615] BUG: unable to handle kernel paging request at fffffffc81b10268 [ 172.662843] PGD 19dd067 PUD 0 [ 172.666269] Thread overran stack, or stack corrupted [ 172.671808] Oops: 0000 [#1] SMP ... Fix the condition check with some clarifications. Fixes: e56f81e0 ("dm: refactor ioctl handling") Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Mike Snitzer <snitzer@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mike Snitzer 提交于
(Ab)using the @bdev passed to dm_blk_ioctl() opens the potential for targets' .prepare_ioctl to fail if they go on to check the bdev for !NULL. Fixes: e56f81e0 ("dm: refactor ioctl handling") Reported-by: NJunichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Junichi Nomura 提交于
dm-mpath retries ioctl, when no path is readily available and the device is configured to queue I/O in such a case. If you want to stop the retry before multipathd decides to turn off queueing mode, you could send signal for the process to exit from the loop. However the check of fatal signal has not carried along when commit 6c182cd8 ("dm mpath: fix ioctl deadlock when no paths") moved the loop from dm-mpath to dm core. As a result, we can't terminate such a process in the retry loop. Easy reproducer of the situation is: # dmsetup create mp --table '0 1024 multipath 0 0 0 0' # dmsetup message mp 0 'queue_if_no_path' # sg_inq /dev/mapper/mp then you should be able to terminate sg_inq by pressing Ctrl+C. Fixes: 6c182cd8 ("dm mpath: fix ioctl deadlock when no paths") Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
-
- 16 11月, 2015 1 次提交
-
-
由 Mike Snitzer 提交于
A thin-pool that is in out-of-data-space (OODS) mode may transition back to write mode -- without the admin adding more space to the thin-pool -- if/when blocks are released (either by deleting thin devices or discarding provisioned blocks). But as part of the thin-pool's earlier transition to out-of-data-space mode the thin-pool may have set the 'error_if_no_space' flag to true if the no_space_timeout expires without more space having been made available. That implementation detail, of changing the pool's error_if_no_space setting, needs to be reset back to the default that the user specified when the thin-pool's table was loaded. Otherwise we'll drop the user requested behaviour on the floor when this out-of-data-space to write mode transition occurs. Reported-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Acked-by: NJoe Thornber <ejt@redhat.com> Fixes: 2c43fd26 ("dm thin: fix missing out-of-data-space to write mode transition if blocks are released") Cc: stable@vger.kernel.org
-