- 03 8月, 2022 20 次提交
-
-
由 Mikulas Patocka 提交于
When we ran the lvm test "shell/integrity-blocksize-3.sh" on a kernel with kasan, we got failure in write_page. The reason for the failure is that md_bitmap_destroy is called before destroying the thread and the thread may be waiting in the function write_page for the bio to complete. When the thread finishes waiting, it executes "if (test_bit(BITMAP_WRITE_ERROR, &bitmap->flags))", which triggers the kasan warning. Note that the commit 48df498d that caused this bug claims that it is neede for md-cluster, you should check md-cluster and possibly find another bugfix for it. BUG: KASAN: use-after-free in write_page+0x18d/0x680 [md_mod] Read of size 8 at addr ffff889162030c78 by task mdX_raid1/5539 CPU: 10 PID: 5539 Comm: mdX_raid1 Not tainted 5.19.0-rc2 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0x45/0x57a ? __lock_text_start+0x18/0x18 ? write_page+0x18d/0x680 [md_mod] kasan_report+0xa8/0xe0 ? write_page+0x18d/0x680 [md_mod] kasan_check_range+0x13f/0x180 write_page+0x18d/0x680 [md_mod] ? super_sync+0x4d5/0x560 [dm_raid] ? md_bitmap_file_kick+0xa0/0xa0 [md_mod] ? rs_set_dev_and_array_sectors+0x2e0/0x2e0 [dm_raid] ? mutex_trylock+0x120/0x120 ? preempt_count_add+0x6b/0xc0 ? preempt_count_sub+0xf/0xc0 md_update_sb+0x707/0xe40 [md_mod] md_reap_sync_thread+0x1b2/0x4a0 [md_mod] md_check_recovery+0x533/0x960 [md_mod] raid1d+0xc8/0x2a20 [raid1] ? var_wake_function+0xe0/0xe0 ? psi_group_change+0x411/0x500 ? preempt_count_sub+0xf/0xc0 ? _raw_spin_lock_irqsave+0x78/0xc0 ? __lock_text_start+0x18/0x18 ? raid1_end_read_request+0x2a0/0x2a0 [raid1] ? preempt_count_sub+0xf/0xc0 ? _raw_spin_unlock_irqrestore+0x19/0x40 ? del_timer_sync+0xa9/0x100 ? try_to_del_timer_sync+0xc0/0xc0 ? _raw_spin_lock_irqsave+0x78/0xc0 ? __lock_text_start+0x18/0x18 ? __list_del_entry_valid+0x68/0xa0 ? finish_wait+0xa3/0x100 md_thread+0x161/0x260 [md_mod] ? unregister_md_personality+0xa0/0xa0 [md_mod] ? _raw_spin_lock_irqsave+0x78/0xc0 ? prepare_to_wait_event+0x2c0/0x2c0 ? unregister_md_personality+0xa0/0xa0 [md_mod] kthread+0x148/0x180 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 5522: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x80/0xa0 md_bitmap_create+0xa8/0xe80 [md_mod] md_run+0x777/0x1300 [md_mod] raid_ctr+0x249c/0x4a30 [dm_raid] dm_table_add_target+0x2b0/0x620 [dm_mod] table_load+0x1c8/0x400 [dm_mod] ctl_ioctl+0x29e/0x560 [dm_mod] dm_compat_ctl_ioctl+0x7/0x20 [dm_mod] __do_compat_sys_ioctl+0xfa/0x160 do_syscall_64+0x90/0xc0 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Freed by task 5680: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x40 kasan_set_free_info+0x20/0x40 __kasan_slab_free+0xf7/0x140 kfree+0x80/0x240 md_bitmap_free+0x1c3/0x280 [md_mod] __md_stop+0x21/0x120 [md_mod] md_stop+0x9/0x40 [md_mod] raid_dtr+0x1b/0x40 [dm_raid] dm_table_destroy+0x98/0x1e0 [dm_mod] __dm_destroy+0x199/0x360 [dm_mod] dev_remove+0x10c/0x160 [dm_mod] ctl_ioctl+0x29e/0x560 [dm_mod] dm_compat_ctl_ioctl+0x7/0x20 [dm_mod] __do_compat_sys_ioctl+0xfa/0x160 do_syscall_64+0x90/0xc0 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Fixes: 48df498d ("md: move bitmap_destroy to the beginning of __md_stop") Signed-off-by: NSong Liu <song@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Two callers of md_alloc want to use the newly allocated devices, so return it instead of letting them find it cumbersomely after the allocation. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-and-tested-by: NLogan Gunthorpe <logang@deltatee.com> Signed-off-by: NSong Liu <song@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
autorun_devices should not be limited to the controls for the legacy probe on open, so just call md_alloc directly. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-and-tested-by: NLogan Gunthorpe <logang@deltatee.com> Signed-off-by: NSong Liu <song@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Yang Li 提交于
Eliminate the following coccicheck warning: ./drivers/md/md.c:8208:2-3: Unneeded semicolon Reported-by: NAbaci Robot <abaci@linux.alibaba.com> Signed-off-by: NYang Li <yang.lee@linux.alibaba.com> Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSong Liu <song@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Stephen Rothwell 提交于
After merging the block tree, today's linux-next build (x86_64 allmodconfig) failed like this: drivers/md/md.c:717:22: error: 'mddev_find' defined but not used [-Werror=unused-function] 717 | static struct mddev *mddev_find(dev_t unit) | ^~~~~~~~~~ cc1: all warnings being treated as errors Caused by commit 4500d5c17910 ("md: simplify md_open") Make mddev_find() available only for non-modular builds. Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220721131132.070be166@canb.auug.org.auSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Now that devices are on the all_mddevs list until the gendisk is freed, there can't be any duplicates. Remove the global list lookup and just grab a reference. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NLogan Gunthorpe <logang@deltatee.com> Signed-off-by: NSong Liu <song@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
This ensures device names don't get prematurely reused. Instead add a deleted flag to skip already deleted devices in mddev_get and other places that only want to see live mddevs. Reported-by: NLogan Gunthorpe <logang@deltatee.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NLogan Gunthorpe <logang@deltatee.com> Signed-off-by: NSong Liu <song@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Just do a simple list_for_each_entry_safe on all_mddevs, and only grab a reference when we drop the lock and delete the now unused for_each_mddev macro. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NSong Liu <song@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Just do a simple list_for_each_entry_safe on all_mddevs, and only grab a reference when we drop the lock. Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NSong Liu <song@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Just do a plain list_for_each that only grabs a mddev reference in the case where the thread sleeps and restarts the list iteration. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NSong Liu <song@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
This splits the code into nicely readable chunks and also avoids the refcount inc/dec manipulations. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NSong Liu <song@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
The md_free name is rather misleading, so pick a better one. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NSong Liu <song@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Ensure that all private data is only freed once all accesses are done. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NSong Liu <song@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Error handling in md_alloc is a mess. Untangle it to just free the mddev directly before add_disk is called and thus the gendisk is globally visible. After that clear the hold flag and let the mddev_put take care of cleaning up the mddev through the usual mechanisms. Fixes: 5e55e2f5 ("[PATCH] md: convert compile time warnings into runtime warnings") Fixes: 9be68dd7 ("md: add error handling support for add_disk()") Fixes: 7ad10691 ("md: properly unwind when failing to add the kobject in md_alloc") Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NSong Liu <song@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Once a kobject is initialized, the containing object should not be directly freed. So delay initialization until it is added. Also remove the kobject_del call as the last put will remove the kobject as well. The explicitly delete isn't needed here, and dropping it will simplify further fixes. With this md_free now does not need to check that ->gendisk is non-NULL as it is always set by the time that kobject_init is called on mddev->kobj. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NSong Liu <song@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Guoqing Jiang 提交于
Since the bug which commit 8b48ec23 ("md: don't unregister sync_thread with reconfig_mutex held") fixed is related with action_store path, other callers which reap sync_thread didn't need to be changed. Let's pull md_unregister_thread from md_reap_sync_thread, then fix previous bug with belows. 1. unlock mddev before md_reap_sync_thread in action_store. 2. save reshape_position before unlock, then restore it to ensure position not changed accidentally by others. Signed-off-by: NGuoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: NSong Liu <song@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Chris Webb 提交于
Boot-time assembly of arrays with md= command-line arguments breaks when CONFIG_BLOCK_LEGACY_AUTOLOAD is unset. md_setup_drive() in md-autodetect.c calls blkdev_get_by_dev(), assuming this implicitly creates the block device. Fix this by attempting to md_alloc() the array first. As in the probe path, ignore any error as failure is caught by blkdev_get_by_dev() anyway. Signed-off-by: NChris Webb <chris@arachsys.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSong Liu <song@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Logan Gunthorpe 提交于
The mdadm test 07layouts randomly produces a kernel hung task deadlock. The deadlock is caused by the suspend_lo/suspend_hi files being set by the mdadm background process during reshape and not being cleared because the process hangs. (Leaving aside the issue of the fragility of freezing kernel tasks by buggy userspace processes...) When the background mdadm process hangs it, is waiting (without a timeout) on a change to the sync_completed file signalling that the reshape has completed. The process is woken up a couple times when the reshape finishes but it is woken up before MD_RECOVERY_RUNNING is cleared so sync_completed_show() reports 0 instead of "none". To fix this, notify the sysfs file in md_reap_sync_thread() after MD_RECOVERY_RUNNING has been cleared. This wakes up mdadm and causes it to continue and write to suspend_lo/suspend_hi to allow IO to continue. Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSong Liu <song@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Logan Gunthorpe 提交于
The 07layouts test in mdadm fails on some systems. The failure presents itself as the backup file not being removed before the next layout is grown into: mdadm: /dev/md0: cannot create backup file /tmp/md-test-backup: File exists This is because the background mdadm process, which is responsible for cleaning up this backup file gets into an infinite loop waiting for the reshape to start. mdadm checks the mdstat file if a reshape is going and, if it is not, it waits for an event on the file or times out in 5 seconds. On faster machines, the reshape may complete before the 5 seconds times out, and thus the background mdadm process loops waiting for a reshape to start that has already occurred. mdadm reads the mdstat file to start, but mdstat does not report that the reshape has begun, even though it has indeed begun. So the mdstat_wait() call (in mdadm) which polls on the mdstat file won't ever return until timing out. The reason mdstat reports the reshape has started is due to an issue in status_resync(). recovery_active is subtracted from curr_resync which will result in a value of zero for the first chunk of reshaped data, and the resulting read will report no reshape in progress. To fix this, if "resync - recovery_active" is an overloaded value, force the value to be MD_RESYNC_ACTIVE so the code reports a resync in progress. Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSong Liu <song@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Logan Gunthorpe 提交于
Comments in the code document special values used for mddev->curr_resync. Make this clearer by using an enum to label these values. The only functional change is a couple places use the wrong comparison operator that implied 3 is another special value. They are all fixed to imply that 3 or greater is an active resync. Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSong Liu <song@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 15 7月, 2022 2 次提交
-
-
由 Bart Van Assche 提交于
Improve uniformity in the kernel of handling of request operation and flags by passing these as a single argument. Cc: Song Liu <song@kernel.org> Signed-off-by: NBart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-32-bvanassche@acm.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Replace the remaining calls of bdevname with snprintf using the %pg format specifier. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220713055317.1888500-10-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 28 6月, 2022 1 次提交
-
-
由 Christoph Hellwig 提交于
blk_cleanup_disk is nothing but a trivial wrapper for put_disk now, so remove it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 16 6月, 2022 1 次提交
-
-
由 Guoqing Jiang 提交于
The 07reshape5intr test is broke because of below path. md_reap_sync_thread -> mddev_unlock -> md_unregister_thread(&mddev->sync_thread) And md_check_recovery is triggered by, mddev_unlock -> md_wakeup_thread(mddev->thread) then mddev->reshape_position is set to MaxSector in raid5_finish_reshape since MD_RECOVERY_INTR is cleared in md_check_recovery, which means feature_map is not set with MD_FEATURE_RESHAPE_ACTIVE and superblock's reshape_position can't be updated accordingly. Fixes: 8b48ec23 ("md: don't unregister sync_thread with reconfig_mutex held") Reported-by: NLogan Gunthorpe <logang@deltatee.com> Signed-off-by: NGuoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: NSong Liu <song@kernel.org>
-
- 23 5月, 2022 4 次提交
-
-
由 Xiao Ni 提交于
Now io_acct_set is alloc and free in personality. Remove the codes that free io_acct_set in md_free and md_stop. Fixes: 0c031fd3 (md: Move alloc/free acct bioset in to personality) Signed-off-by: NXiao Ni <xni@redhat.com> Signed-off-by: NSong Liu <song@kernel.org>
-
由 Christoph Hellwig 提交于
Use the %pg format specifier to save on stack consumption and code size. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSong Liu <song@kernel.org>
-
由 Guoqing Jiang 提交于
Generally, the md_unregister_thread is called with reconfig_mutex, but raid_message in dm-raid doesn't hold reconfig_mutex to unregister thread, so md_unregister_thread can be called simulitaneously from two call sites in theory. Then after previous commit which remove the protection of reconfig_mutex for md_unregister_thread completely, the potential issue could be worse than before. Let's take pers_lock at the beginning of function to ensure reentrancy. Reported-by: NDonald Buczek <buczek@molgen.mpg.de> Signed-off-by: NGuoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: NSong Liu <song@kernel.org>
-
由 Guoqing Jiang 提交于
Unregister sync_thread doesn't need to hold reconfig_mutex since it doesn't reconfigure array. And it could cause deadlock problem for raid5 as follows: 1. process A tried to reap sync thread with reconfig_mutex held after echo idle to sync_action. 2. raid5 sync thread was blocked if there were too many active stripes. 3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer) which causes the number of active stripes can't be decreased. 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able to hold reconfig_mutex. More details in the link: https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t And add one parameter to md_reap_sync_thread since it could be called by dm-raid which doesn't hold reconfig_mutex. Reported-and-tested-by: NDonald Buczek <buczek@molgen.mpg.de> Signed-off-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: NSong Liu <song@kernel.org>
-
- 26 4月, 2022 5 次提交
-
-
由 David Sloan 提交于
There are several instances where magic numbers are used in md.c instead of the defined constants in md_p.h. This patch set improves code readability by replacing all occurrences of 0xffff, 0xfffe, and 0xfffd when relating to md roles with their equivalent defined constant. Signed-off-by: NDavid Sloan <david.sloan@eideticom.com> Reviewed-by: NLogan Gunthorpe <logang@deltatee.com> Signed-off-by: NSong Liu <song@kernel.org>
-
由 Heming Zhao 提交于
This commit includes two topics: 1> replace deprecated strlcpy change strlcpy to strscpy for strlcpy is marked as deprecated in Documentation/process/deprecated.rst 2> remove duplicated strlcpy line in md_bitmap_read_sb@md-bitmap.c there are two duplicated strlcpy(), the history: - commit cf921cc1 ("Add node recovery callbacks") introduced the first usage of strlcpy(). - commit b97e9257 ("Use separate bitmaps for each nodes in the cluster") introduced the second strlcpy(). this time, the two strlcpy() are same, we can remove anyone safely. - commit d3b178ad ("md: Skip cluster setup for dm-raid") added dm-raid special handling. And the "nodes" value is the key of this patch. but from this patch, strlcpy() which was introduced by b97e9257 become necessary. - commit 3c462c88 ("md: Increment version for clustered bitmaps") used clustered major version to only handle in clustered env. this patch could look a polishment for clustered code logic. So cf921cc1 became useless after d3b178ad, we could remove it safely. Signed-off-by: NHeming Zhao <heming.zhao@suse.com> Signed-off-by: NSong Liu <song@kernel.org>
-
由 Xiaomeng Tong 提交于
The bug is here: if (!rdev || rdev->desc_nr != nr) { The list iterator value 'rdev' will *always* be set and non-NULL by rdev_for_each_rcu(), so it is incorrect to assume that the iterator value will be NULL if the list is empty or no element found (In fact, it will be a bogus pointer to an invalid struct object containing the HEAD). Otherwise it will bypass the check and lead to invalid memory access passing the check. To fix the bug, use a new variable 'iter' as the list iterator, while using the original variable 'pdev' as a dedicated pointer to point to the found element. Cc: stable@vger.kernel.org Fixes: 70bcecdb ("md-cluster: Improve md_reload_sb to be less error prone") Signed-off-by: NXiaomeng Tong <xiam0nd.tong@gmail.com> Signed-off-by: NSong Liu <song@kernel.org>
-
由 Xiaomeng Tong 提交于
The bug is here: if (!rdev) The list iterator value 'rdev' will *always* be set and non-NULL by rdev_for_each(), so it is incorrect to assume that the iterator value will be NULL if the list is empty or no element found. Otherwise it will bypass the NULL check and lead to invalid memory access passing the check. To fix the bug, use a new variable 'iter' as the list iterator, while using the original variable 'rdev' as a dedicated pointer to point to the found element. Cc: stable@vger.kernel.org Fixes: 2aa82191 ("md-cluster: Perform a lazy update") Acked-by: NGuoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: NXiaomeng Tong <xiam0nd.tong@gmail.com> Acked-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NSong Liu <song@kernel.org>
-
由 Mariusz Tkaczyk 提交于
There is no direct mechanism to determine raid failure outside personality. It is done by checking rdev->flags after executing md_error(). If "faulty" flag is not set then -EBUSY is returned to userspace. -EBUSY means that array will be failed after drive removal. Mdadm has special routine to handle the array failure and it is executed if -EBUSY is returned by md. There are at least two known reasons to not consider this mechanism as correct: 1. drive can be removed even if array will be failed[1]. 2. -EBUSY seems to be wrong status. Array is not busy, but removal process cannot proceed safe. -EBUSY expectation cannot be removed without breaking compatibility with userspace. In this patch first issue is resolved by adding support for MD_BROKEN flag for RAID1 and RAID10. Support for RAID456 is added in next commit. The idea is to set the MD_BROKEN if we are sure that raid is in failed state now. This is done in each error_handler(). In md_error() MD_BROKEN flag is checked. If is set, then -EBUSY is returned to userspace. As in previous commit, it causes that #mdadm --set-faulty is able to fail array. Previously proposed workaround is valid if optional functionality[1] is disabled. [1] commit 9a567843("md: allow last device to be forcibly removed from RAID1/RAID10.") Reviewd-by: NXiao Ni <xni@redhat.com> Signed-off-by: NMariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> Signed-off-by: NSong Liu <song@kernel.org>
-
- 18 4月, 2022 2 次提交
-
-
由 Christoph Hellwig 提交于
Secure erase is a very different operation from discard in that it is a data integrity operation vs hint. Fully split the limits and helper infrastructure to make the separation more clear. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nifs2] Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> [f2fs] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: NChao Yu <chao@kernel.org> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-27-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Add a helper to check the nonrot flag based on the block_device instead of having to poke into the block layer internal request_queue. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-12-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 3月, 2022 1 次提交
-
-
由 Eric Dumazet 提交于
Calling mdelay(1000) from process context, even while a reboot is in progress, does not make sense. Using msleep() allows other threads to make progress. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: linux-raid@vger.kernel.org Signed-off-by: NSong Liu <song@kernel.org>
-
- 04 2月, 2022 1 次提交
-
-
由 Christoph Hellwig 提交于
Pass a block_device to bio_clone_fast and __bio_clone_fast and give the functions more suitable names. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220202160109.108149-14-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 03 2月, 2022 1 次提交
-
-
由 Song Liu 提交于
Leon reported NULL pointer deref with nowait support: [ 15.123761] device-mapper: raid: Loading target version 1.15.1 [ 15.124185] device-mapper: raid: Ignoring chunk size parameter for RAID 1 [ 15.124192] device-mapper: raid: Choosing default region size of 4MiB [ 15.129524] BUG: kernel NULL pointer dereference, address: 0000000000000060 [ 15.129530] #PF: supervisor write access in kernel mode [ 15.129533] #PF: error_code(0x0002) - not-present page [ 15.129535] PGD 0 P4D 0 [ 15.129538] Oops: 0002 [#1] PREEMPT SMP NOPTI [ 15.129541] CPU: 5 PID: 494 Comm: ldmtool Not tainted 5.17.0-rc2-1-mainline #1 9fe89d43dfcb215d2731e6f8851740520778615e [ 15.129546] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ELITE/X570 AORUS ELITE, BIOS F36e 10/14/2021 [ 15.129549] RIP: 0010:blk_queue_flag_set+0x7/0x20 [ 15.129555] Code: 00 00 00 0f 1f 44 00 00 48 8b 35 e4 e0 04 02 48 8d 57 28 bf 40 01 \ 00 00 e9 16 c1 be ff 66 0f 1f 44 00 00 0f 1f 44 00 00 89 ff <f0> 48 0f ab 7e 60 \ 31 f6 89 f7 c3 66 66 2e 0f 1f 84 00 00 00 00 00 [ 15.129559] RSP: 0018:ffff966b81987a88 EFLAGS: 00010202 [ 15.129562] RAX: ffff8b11c363a0d0 RBX: ffff8b11e294b070 RCX: 0000000000000000 [ 15.129564] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000001d [ 15.129566] RBP: ffff8b11e294b058 R08: 0000000000000000 R09: 0000000000000000 [ 15.129568] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b11e294b070 [ 15.129570] R13: 0000000000000000 R14: ffff8b11e294b000 R15: 0000000000000001 [ 15.129572] FS: 00007fa96e826780(0000) GS:ffff8b18deb40000(0000) knlGS:0000000000000000 [ 15.129575] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.129577] CR2: 0000000000000060 CR3: 000000010b8ce000 CR4: 00000000003506e0 [ 15.129580] Call Trace: [ 15.129582] <TASK> [ 15.129584] md_run+0x67c/0xc70 [md_mod 1e470c1b6bcf1114198109f42682f5a2740e9531] [ 15.129597] raid_ctr+0x134a/0x28ea [dm_raid 6a645dd7519e72834bd7e98c23497eeade14cd63] [ 15.129604] ? dm_split_args+0x63/0x150 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e] [ 15.129615] dm_table_add_target+0x188/0x380 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e] [ 15.129625] table_load+0x13b/0x370 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e] [ 15.129635] ? dev_suspend+0x2d0/0x2d0 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e] [ 15.129644] ctl_ioctl+0x1bd/0x460 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e] [ 15.129655] dm_ctl_ioctl+0xa/0x20 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e] [ 15.129663] __x64_sys_ioctl+0x8e/0xd0 [ 15.129667] do_syscall_64+0x5c/0x90 [ 15.129672] ? syscall_exit_to_user_mode+0x23/0x50 [ 15.129675] ? do_syscall_64+0x69/0x90 [ 15.129677] ? do_syscall_64+0x69/0x90 [ 15.129679] ? syscall_exit_to_user_mode+0x23/0x50 [ 15.129682] ? do_syscall_64+0x69/0x90 [ 15.129684] ? do_syscall_64+0x69/0x90 [ 15.129686] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 15.129689] RIP: 0033:0x7fa96ecd559b [ 15.129692] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c \ c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff \ ff 73 01 c3 48 8b 0d a5 a8 0c 00 f7 d8 64 89 01 48 [ 15.129696] RSP: 002b:00007ffcaf85c258 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 [ 15.129699] RAX: ffffffffffffffda RBX: 00007fa96f1b48f0 RCX: 00007fa96ecd559b [ 15.129701] RDX: 00007fa97017e610 RSI: 00000000c138fd09 RDI: 0000000000000003 [ 15.129702] RBP: 00007fa96ebab583 R08: 00007fa97017c9e0 R09: 00007ffcaf85bf27 [ 15.129704] R10: 0000000000000001 R11: 0000000000000206 R12: 00007fa97017e610 [ 15.129706] R13: 00007fa97017e640 R14: 00007fa97017e6c0 R15: 00007fa97017e530 [ 15.129709] </TASK> This is caused by missing mddev->queue check for setting QUEUE_FLAG_NOWAIT Fix this by moving the QUEUE_FLAG_NOWAIT logic to under mddev->queue check. Fixes: f51d46d0 ("md: add support for REQ_NOWAIT") Reported-by: NLeon Möller <jkhsjdhjs@totally.rip> Tested-by: NLeon Möller <jkhsjdhjs@totally.rip> Cc: Vishal Verma <vverma@digitalocean.com> Signed-off-by: NSong Liu <song@kernel.org>
-
- 02 2月, 2022 2 次提交
-
-
由 Christoph Hellwig 提交于
Pass the block_device that we plan to use this bio for and the operation to bio_init to optimize the assignment. A NULL block_device can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-19-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Pass the block_device and operation that we plan to use this bio for to bio_alloc_bioset to optimize the assigment. NULL/0 can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Also move the gfp_mask argument after the nr_vecs argument for a much more logical calling convention matching what most of the kernel does. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-16-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-