- 29 7月, 2022 2 次提交
-
-
由 Mike Christie 提交于
The specs state that if you send a reserve down a path that is already the holder success must be returned and if it goes down a path that is not the holder reservation conflict must be returned. Windows failover clustering will send a second reservation and expects that a device returns success. The problem for multipathing is that for an All Registrants reservation, we can send the reserve down any path but for all other reservation types there is one path that is the holder. To handle this we could add PR state to dm but that can get nasty. Look at target_core_pr.c for an example of the type of things we'd have to track. It will also get more complicated because other initiators can change the state so we will have to add in async event/sense handling. This commit, and the 3 commits that follow, tries to keep dm simple and keep just doing passthrough. This commit modifies dm_call_pr to be able to find the first usable path that can execute our pr_op then return. When dm_pr_reserve is converted to dm_call_pr in the next commit for the normal case we will use the same path for every reserve. Signed-off-by: NMike Christie <michael.christie@oracle.com> Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
由 Mike Snitzer 提交于
Otherwise PR ops may be issued while the broader DM device is being reconfigured, etc. Fixes: 9c72bad1 ("dm: call PR reserve/unreserve on each underlying device") Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
- 16 7月, 2022 1 次提交
-
-
由 Luo Meng 提交于
Fault inject on pool metadata device reports: BUG: KASAN: use-after-free in dm_pool_register_metadata_threshold+0x40/0x80 Read of size 8 at addr ffff8881b9d50068 by task dmsetup/950 CPU: 7 PID: 950 Comm: dmsetup Tainted: G W 5.19.0-rc6 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_address_description.constprop.0.cold+0xeb/0x3f4 kasan_report.cold+0xe6/0x147 dm_pool_register_metadata_threshold+0x40/0x80 pool_ctr+0xa0a/0x1150 dm_table_add_target+0x2c8/0x640 table_load+0x1fd/0x430 ctl_ioctl+0x2c4/0x5a0 dm_ctl_ioctl+0xa/0x10 __x64_sys_ioctl+0xb3/0xd0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 This can be easily reproduced using: echo offline > /sys/block/sda/device/state dd if=/dev/zero of=/dev/mapper/thin bs=4k count=10 dmsetup load pool --table "0 20971520 thin-pool /dev/sda /dev/sdb 128 0 0" If a metadata commit fails, the transaction will be aborted and the metadata space maps will be destroyed. If a DM table reload then happens for this failed thin-pool, a use-after-free will occur in dm_sm_register_threshold_callback (called from dm_pool_register_metadata_threshold). Fix this by in dm_pool_register_metadata_threshold() by returning the -EINVAL error if the thin-pool is in fail mode. Also fail pool_ctr() with a new error message: "Error registering metadata threshold". Fixes: ac8c3f3d ("dm thin: generate event when metadata threshold passed") Cc: stable@vger.kernel.org Reported-by: NHulk Robot <hulkci@huawei.com> Signed-off-by: NLuo Meng <luomeng12@huawei.com> Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
- 15 7月, 2022 6 次提交
-
-
由 Mikulas Patocka 提交于
Change dm-writecache, so that it counts the number of blocks discarded instead of the number of discard bios. Make it consistent with the read and write statistics counters that were changed to count the number of blocks instead of bios. Fixes: e3a35d03 ("dm writecache: add event counters") Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
由 Mikulas Patocka 提交于
Change dm-writecache, so that it counts the number of blocks written instead of the number of write bios. Bios can be split and requeued using the dm_accept_partial_bio function, so counting bios caused inaccurate results. Fixes: e3a35d03 ("dm writecache: add event counters") Reported-by: NYu Kuai <yukuai1@huaweicloud.com> Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
由 Mikulas Patocka 提交于
Change dm-writecache, so that it counts the number of blocks read instead of the number of read bios. Bios can be split and requeued using the dm_accept_partial_bio function, so counting bios caused inaccurate results. Fixes: e3a35d03 ("dm writecache: add event counters") Reported-by: NYu Kuai <yukuai1@huaweicloud.com> Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
由 Mikulas Patocka 提交于
The functions writecache_map_remap_origin and writecache_bio_copy_ssd only return a single value, thus they can be made to return void. This helps simplify the following IO accounting changes. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
由 Mikulas Patocka 提交于
dm-kcopyd doesn't access the allocated pages directly, it only passes them to dm-io which adds them to a bio list - thus, we can allocate the pages from high memory. This will reduce pressure on the low memory when there are a large number of kcopyd jobs in progress. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
由 Mikulas Patocka 提交于
dm-writecache has the capability to limit the number of writeback jobs in progress. However, this feature was off by default. As such there were some out-of-memory crashes observed when lowering the low watermark while the cache is full. This commit enables writeback limit by default. It is set to 256MiB or 1/16 of total system memory, whichever is smaller. Cc: stable@vger.kernel.org Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
- 07 7月, 2022 9 次提交
-
-
由 Zhang Jiaming 提交于
Signed-off-by: NZhang Jiaming <jiaming@nfschina.com> Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
由 Jiang Jian 提交于
Signed-off-by: NJiang Jian <jiangjian@cdjrlc.com> Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
由 Steven Lung 提交于
Replace neccessarily with necessarily. Signed-off-by: NSteven Lung <1030steven@gmail.com> Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
由 JeongHyeon Lee 提交于
Resolves: ERROR: else should follow close brace '}' Signed-off-by: NJeongHyeon Lee <jhs2.lee@samsung.com> Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
由 Mike Snitzer 提交于
Rename from "tgt" to "ti" so that all of dm-table.c code uses the same naming for dm_target variables. Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
由 Mike Snitzer 提交于
All callers of dm_table_get_target() are expected to do proper bounds checking on the index they pass. Move dm_table_get_target() to dm-core.h to make it extra clear that only DM core code should be using it. Switch it to be inlined while at it. Standardize all DM core callers to use the same for loop pattern and make associated variables as local as possible. Rename some variables (e.g. s/table/t/ and s/tgt/ti/) along the way. Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
由 Mike Snitzer 提交于
More efficient and readable to just access table->num_targets directly. Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
由 Ming Lei 提交于
Commit 61b6e2e5 ("dm: fix BLK_STS_DM_REQUEUE handling when dm_io represents split bio") reverted DM core's bio splitting back to using bio_split()+bio_chain() because it was found that otherwise DM's BLK_STS_DM_REQUEUE would trigger a live-lock waiting for bio completion that would never occur. Restore using bio_trim()+bio_inc_remaining(), like was done in commit 7dd76d1f ("dm: improve bio splitting and associated IO accounting"), but this time with proper handling for the above scenario that is covered in more detail in the commit header for 61b6e2e5. Solve this issue by adding a two staged dm_io requeue mechanism that uses the new dm_bio_rewind() via dm_io_rewind(): 1) requeue the dm_io into the requeue_list added to struct mapped_device, and schedule it via new added requeue work. This workqueue just clones the dm_io->orig_bio (which DM saves and ensures its end sector isn't modified). dm_io_rewind() uses the sectors and sectors_offset members of the dm_io that are recorded relative to the end of orig_bio: dm_bio_rewind()+bio_trim() are then used to make that cloned bio reflect the subset of the original bio that is represented by the dm_io that is being requeued. 2) the 2nd stage requeue is same with original requeue, but io->orig_bio points to new cloned bio (which matches the requeued dm_io as described above). This allows DM core to shift the need for bio cloning from bio-split time (during IO submission) to the less likely BLK_STS_DM_REQUEUE handling (after IO completes with that error). Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
由 Ming Lei 提交于
Commit 7759eb23 ("block: remove bio_rewind_iter()") removed a similar API for the following reasons: ``` It is pointed that bio_rewind_iter() is one very bad API[1]: 1) bio size may not be restored after rewinding 2) it causes some bogus change, such as 5151842b (block: reset bi_iter.bi_done after splitting bio) 3) rewinding really makes things complicated wrt. bio splitting 4) unnecessary updating of .bi_done in fast path [1] https://marc.info/?t=153549924200005&r=1&w=2 So this patch takes Kent's suggestion to restore one bio into its original state via saving bio iterator(struct bvec_iter) in bio_integrity_prep(), given now bio_rewind_iter() is only used by bio integrity code. ``` However, saving off a copy of the 32 bytes bio->bi_iter in case rewind needed isn't efficient because it bloats per-bio-data for what is an unlikely case. That suggestion also ignores the need to restore crypto and integrity info. Add dm_bio_rewind() API for a specific use-case that is much more narrow than the previous more generic rewind code that was reverted: 1) most bios have a fixed end sector since bio split is done from front of the bio, if driver just records how many sectors between current bio's start sector and the original bio's end sector, the original position can be restored. Keeping the original bio's end sector fixed is a _hard_ requirement for this interface! 2) if a bio's end sector won't change (usually bio_trim() isn't called, or in the case of DM it preserves original bio), user can restore the original position by storing sector offset from the current ->bi_iter.bi_sector to bio's end sector; together with saving bio size, only 8 bytes is needed to restore to original bio. 3) DM's requeue use case: when BLK_STS_DM_REQUEUE happens, DM core needs to restore to an "original bio" which represents the current dm_io to be requeued (which may be a subset of the original bio). By storing the sector offset from the original bio's end sector and dm_io's size, dm_bio_rewind() can restore such original bio. See commit 7dd76d1f ("dm: improve bio splitting and associated IO accounting") for more details on how DM does this. Leveraging this, allows DM core to shift the need for bio cloning from bio-split time (during IO submission) to the less likely BLK_STS_DM_REQUEUE handling (after IO completes with that error). 4) Unlike the original rewind API, dm_bio_rewind() doesn't add .bi_done to bvec_iter and there is no effect on the fast path. Implement dm_bio_rewind() by factoring out clear helpers that it calls: dm_bio_integrity_rewind, dm_bio_crypt_rewind and dm_bio_rewind_iter. DM is able to ensure that dm_bio_rewind() is used safely but, given the constraint that the bio's end must never change, other hypothetical future callers may not take the same care. So make dm_bio_rewind() and all supporting code local to DM to avoid risk of hypothetical abuse. A "dm_" prefix was added to all functions to avoid any namespace collisions. Suggested-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
- 30 6月, 2022 3 次提交
-
-
由 Ming Lei 提交于
If either BLK_STS_DM_REQUEUE or BLK_STS_AGAIN is returned for POLLED io, we requeue the original bio into deferred list and kick md->wq to re-submit it to block layer. Improve the handling in the following way: 1) Factor out dm_handle_requeue() for handling dm_io requeue. 2) Unify handling for BLK_STS_DM_REQUEUE and BLK_STS_AGAIN: clear REQ_POLLED for BLK_STS_DM_REQUEUE too, for the sake of simplicity, given BLK_STS_DM_REQUEUE is very unusual. 3) Queue md->wq explicitly in dm_handle_requeue(), so requeue handling becomes more robust. Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
由 Christoph Hellwig 提交于
The current split between dm_table_alloc_md_mempools and dm_alloc_md_mempools is rather arbitrary, so merge the two into one easy to follow function. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
由 Christoph Hellwig 提交于
dm_get_reserved_rq_based_ios is only used in the core dm code, so remove the export. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
- 28 6月, 2022 1 次提交
-
-
由 Christoph Hellwig 提交于
blk_cleanup_disk is nothing but a trivial wrapper for put_disk now, so remove it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 27 6月, 2022 1 次提交
-
-
由 Christoph Hellwig 提交于
max_io_len always passes an explicitly non-zero chunk_sectors into blk_max_size_offset. That means much of blk_max_size_offset is not needed and can be open coded to simplify the code. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMike Snitzer <snitzer@kernel.org> Link: https://lore.kernel.org/r/20220614090934.570632-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 24 6月, 2022 2 次提交
-
-
由 Mikulas Patocka 提交于
Commit 85e123c2 ("dm mirror log: round up region bitmap size to BITS_PER_LONG") introduced a regression on 64-bit architectures in the lvm testsuite tests: lvcreate-mirror, mirror-names and vgsplit-operation. If the device is shrunk, we need to clear log bits beyond the end of the device. The code clears bits up to a 32-bit boundary and then calculates lc->sync_count by summing set bits up to a 64-bit boundary (the commit changed that; previously, this boundary was 32-bit too). So, it was using some non-zeroed bits in the calculation and this caused misbehavior. Fix this regression by clearing bits up to BITS_PER_LONG boundary. Fixes: 85e123c2 ("dm mirror log: round up region bitmap size to BITS_PER_LONG") Cc: stable@vger.kernel.org Reported-by: NBenjamin Marzinski <bmarzins@redhat.com> Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
由 Ming Lei 提交于
Commit 7dd76d1f ("dm: improve bio splitting and associated IO accounting") removed using cloned bio when dm io splitting is needed. Using bio_trim()+bio_inc_remaining() rather than bio_split()+bio_chain() causes multiple dm_io instances to share the same original bio, and it works fine if IOs are completed successfully. But a regression was caused for the case when BLK_STS_DM_REQUEUE is returned from any one of DM's cloned bios (whose dm_io share the same orig_bio). In this BLK_STS_DM_REQUEUE case only the mapped subset of the original bio for the current exact dm_io needs to be re-submitted. However, since the original bio is shared among all dm_io instances, the ->orig_bio actually only represents the last dm_io instance, so requeue can't work as expected. Also when more than one dm_io is requeued, the same original bio is requeued from all dm_io's completion handler, then race is caused. Fix this issue by still allocating one clone bio for completing io only, then io accounting can rely on ->orig_bio being unmodified. This is needed because the dm_io's sector_offset and sectors members are recorded relative to an unmodified ->orig_bio. In the future, we can go back to using bio_trim()+bio_inc_remaining() for dm's io splitting but then delay needing a bio clone only when handling BLK_STS_DM_REQUEUE, but that approach is a bit complicated (so it needs a development cycle): 1) bio clone needs to be done in task context 2) a block interface for unwinding bio is required Fixes: 7dd76d1f ("dm: improve bio splitting and associated IO accounting") Reported-by: NBenjamin Marzinski <bmarzins@redhat.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
- 22 6月, 2022 2 次提交
-
-
由 Mike Snitzer 提交于
Commit 52919840 ("dm: fix bio polling to handle possibile BLK_STS_AGAIN") inadvertently introduced an early return from dm_io_complete() without first queueing the bio to DM if BLK_STS_AGAIN occurs and bio-polling is _not_ being used. Fix this by only returning early from dm_io_complete() if the bio has first been properly queued to DM. Otherwise, the bio will never finish via bio_endio. Fixes: 52919840 ("dm: fix bio polling to handle possibile BLK_STS_AGAIN") Cc: stable@vger.kernel.org Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
由 Nikos Tsironis 提交于
During postsuspend dm-era does the following: 1. Archives the current era 2. Commits the metadata, as part of the RPC call for archiving the current era 3. Stops the worker Until the worker stops, it might write to the metadata again. Moreover, these writes are not flushed to disk immediately, but are cached by the dm-bufio client, which writes them back asynchronously. As a result, the committed metadata of a suspended dm-era device might not be consistent with the in-core metadata. In some cases, this can result in the corruption of the on-disk metadata. Suppose the following sequence of events: 1. Load a new table, e.g. a snapshot-origin table, to a device with a dm-era table 2. Suspend the device 3. dm-era commits its metadata, but the worker does a few more metadata writes until it stops, as part of digesting an archived writeset 4. These writes are cached by the dm-bufio client 5. Load the dm-era table to another device. 6. The new instance of the dm-era target loads the committed, on-disk metadata, which don't include the extra writes done by the worker after the metadata commit. 7. Resume the new device 8. The new dm-era target instance starts using the metadata 9. Resume the original device 10. The destructor of the old dm-era target instance is called and destroys the dm-bufio client, which results in flushing the cached writes to disk 11. These writes might overwrite the writes done by the new dm-era instance, hence corrupting its metadata. Fix this by committing the metadata after the worker stops running. stop_worker uses flush_workqueue to flush the current work. However, the work item may re-queue itself and flush_workqueue doesn't wait for re-queued works to finish. This could result in the worker changing the metadata after they have been committed, or writing to the metadata concurrently with the commit in the postsuspend thread. Use drain_workqueue instead, which waits until the work and all re-queued works finish. Fixes: eec40579 ("dm: add era target") Cc: stable@vger.kernel.org # v3.15+ Signed-off-by: NNikos Tsironis <ntsironis@arrikto.com> Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
- 17 6月, 2022 3 次提交
-
-
由 Mikulas Patocka 提交于
The code in dm-log rounds up bitset_size to 32 bits. It then uses find_next_zero_bit_le on the allocated region. find_next_zero_bit_le accesses the bitmap using unsigned long pointers. So, on 64-bit architectures, it may access 4 bytes beyond the allocated size. Fix this bug by rounding up bitset_size to BITS_PER_LONG. This bug was found by running the lvm2 testsuite with kasan. Fixes: 29121bd0 ("[PATCH] dm mirror log: bitset_size fix") Cc: stable@vger.kernel.org Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
由 Mikulas Patocka 提交于
Starting with the commit 63a225c9fd20, device mapper has an optimization that it will take cheaper table lock (dm_get_live_table_fast instead of dm_get_live_table) if the bio has REQ_NOWAIT. The bios with REQ_NOWAIT must not block in the target request routine, if they did, we would be blocking while holding rcu_read_lock, which is prohibited. The targets that are suitable for REQ_NOWAIT optimization (and that don't block in the map routine) have the flag DM_TARGET_NOWAIT set. Device mapper will test if all the targets and all the devices in a table support nowait (see the function dm_table_supports_nowait) and it will set or clear the QUEUE_FLAG_NOWAIT flag on its request queue according to this check. There's a test in submit_bio_noacct: "if ((bio->bi_opf & REQ_NOWAIT) && !blk_queue_nowait(q)) goto not_supported" - this will make sure that REQ_NOWAIT bios can't enter a request queue that doesn't support them. This mechanism works to prevent REQ_NOWAIT bios from reaching dm targets that don't support the REQ_NOWAIT flag (and that may block in the map routine) - except that there is a small race condition: submit_bio_noacct checks if the queue has the QUEUE_FLAG_NOWAIT without holding any locks. Immediatelly after this check, the device mapper table may be reloaded with a table that doesn't support REQ_NOWAIT (for example, if we start moving the logical volume or if we activate a snapshot). However the REQ_NOWAIT bio that already passed the check in submit_bio_noacct would be sent to device mapper, where it could be redirected to a dm target that doesn't support REQ_NOWAIT - the result is sleeping while we hold rcu_read_lock. In order to fix this race, we double-check if the target supports REQ_NOWAIT while we hold the table lock (so that the table can't change under us). Fixes: 563a225c ("dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio") Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
由 Mikulas Patocka 提交于
dm_put_live_table_bio is called from the end of dm_submit_bio. However, at this point, the bio may be already finished and the caller may have freed the bio. Consequently, dm_put_live_table_bio accesses the stale "bio" pointer. Fix this bug by loading the bi_opf value and passing it to dm_get_live_table_bio and dm_put_live_table_bio instead of the bio. This bug was found by running the lvm2 testsuite with kasan. Fixes: 563a225c ("dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio") Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
- 16 6月, 2022 2 次提交
-
-
由 Logan Gunthorpe 提交于
bio_alloc_bioset() takes a block device, number of vectors, the OP flags, the GFP mask and the bio set. However when the prototype was changed, the callisite in ppl_do_flush() had the OP flags and the GFP flags reversed. This introduced some sparse error: drivers/md/raid5-ppl.c:632:57: warning: incorrect type in argument 3 (different base types) drivers/md/raid5-ppl.c:632:57: expected unsigned int opf drivers/md/raid5-ppl.c:632:57: got restricted gfp_t [usertype] drivers/md/raid5-ppl.c:633:61: warning: incorrect type in argument 4 (different base types) drivers/md/raid5-ppl.c:633:61: expected restricted gfp_t [usertype] gfp_mask drivers/md/raid5-ppl.c:633:61: got unsigned long long The sparse error introduction may not have been reported correctly by 0day due to other work that was cleaning up other sparse errors in this area. Fixes: 609be106 ("block: pass a block_device and opf to bio_alloc_bioset") Cc: stable@vger.kernel.org # 5.18+ Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSong Liu <song@kernel.org>
-
由 Guoqing Jiang 提交于
The 07reshape5intr test is broke because of below path. md_reap_sync_thread -> mddev_unlock -> md_unregister_thread(&mddev->sync_thread) And md_check_recovery is triggered by, mddev_unlock -> md_wakeup_thread(mddev->thread) then mddev->reshape_position is set to MaxSector in raid5_finish_reshape since MD_RECOVERY_INTR is cleared in md_check_recovery, which means feature_map is not set with MD_FEATURE_RESHAPE_ACTIVE and superblock's reshape_position can't be updated accordingly. Fixes: 8b48ec23 ("md: don't unregister sync_thread with reconfig_mutex held") Reported-by: NLogan Gunthorpe <logang@deltatee.com> Signed-off-by: NGuoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: NSong Liu <song@kernel.org>
-
- 15 6月, 2022 1 次提交
-
-
由 Benjamin Marzinski 提交于
After commit 82f6cdcc ("dm: switch dm_io booleans over to proper flags") dm_start_io_acct stopped atomically checking and setting was_accounted, which turned into the DM_IO_ACCOUNTED flag. This opened the possibility for a race where IO accounting is started twice for duplicate bios. To remove the race, check the flag while holding the io->lock. Fixes: 82f6cdcc ("dm: switch dm_io booleans over to proper flags") Cc: stable@vger.kernel.org Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com> Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
- 11 6月, 2022 1 次提交
-
-
由 Mike Snitzer 提交于
After the commit ca522482 ("dm: pass NULL bdev to bio_alloc_clone"), clone_endio() only calls dm_zone_endio() when DM targets remap the clone bio's bdev to something other than the md->disk->part0 default. However, if a DM target (e.g. dm-crypt) stacked ontop of a dm-zoned does not remap the clone bio using bio_set_dev() then dm_zone_endio() is not called at completion of the bios and zone locks are not properly unlocked. This triggers a hang, in dm_zone_map_bio(), when blktests block/004 is run for dm-crypt on zoned block devices. To avoid the hang, simply remove the clone_endio() check that verifies the target remapped the clone bio to a device other than the default. Fixes: ca522482 ("dm: pass NULL bdev to bio_alloc_clone") Reported-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
- 09 6月, 2022 1 次提交
-
-
由 Christoph Hellwig 提交于
The use of bioset_init_from_src mean that the pre-allocated pools weren't used for anything except parameter passing, and the integrity pool creation got completely lost for the actual live mapped_device. Fix that by assigning the actual preallocated dm_md_mempools to the mapped_device and using that for I/O instead of creating new mempools. Fixes: 2a2a4c51 ("dm: use bioset_init_from_src() to copy bio_set") Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
- 01 6月, 2022 2 次提交
-
-
由 Sarthak Kukreti 提交于
The device-mapper framework provides a mechanism to mark targets as immutable (and hence fail table reloads that try to change the target type). Add the DM_TARGET_IMMUTABLE flag to the dm-verity target's feature flags to prevent switching the verity target with a different target type. Fixes: a4ffc152 ("dm: add verity target") Cc: stable@vger.kernel.org Signed-off-by: NSarthak Kukreti <sarthakkukreti@google.com> Reviewed-by: NKees Cook <keescook@chromium.org> Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
由 Mike Snitzer 提交于
It was reported that the "generic/250" test in xfstests (which uses the dm-error target) demonstrates a regression where the kernel crashes in bioset_exit(). Since commit cfc97abc ("dm: conditionally enable BIOSET_PERCPU_CACHE for dm_io bioset") the bioset_init() for the dm_io bioset will setup the bioset's per-cpu alloc cache if all devices have QUEUE_FLAG_POLL set. But there was an bug where a target that doesn't have any data devices (and that doesn't even set the .iterate_devices dm target callback) will incorrectly return true from dm_table_supports_poll(). Fix this by updating dm_table_supports_poll() to follow dm-table.c's well-worn pattern for testing that _all_ targets in a DM table do in fact have underlying devices that set QUEUE_FLAG_POLL. NOTE: An additional block fix is still needed so that bio_alloc_cache_destroy() clears the bioset's ->cache member. Otherwise, a DM device's table reload that transitions the DM device's bioset from using a per-cpu alloc cache to _not_ using one will result in bioset_exit() crashing in bio_alloc_cache_destroy() because dm's dm_io bioset ("io_bs") was left with a stale ->cache member. Fixes: cfc97abc ("dm: conditionally enable BIOSET_PERCPU_CACHE for dm_io bioset") Reported-by: NMatthew Wilcox <willy@infradead.org> Reported-by: NDave Chinner <david@fromorbit.com> Signed-off-by: NMike Snitzer <snitzer@kernel.org>
-
- 28 5月, 2022 1 次提交
-
-
由 Coly Li 提交于
The kworker routine update_writeback_rate() is schedued to update the writeback rate in every 5 seconds by default. Before calling __update_writeback_rate() to do real job, semaphore dc->writeback_lock should be held by the kworker routine. At the same time, bcache writeback thread routine bch_writeback_thread() also needs to hold dc->writeback_lock before flushing dirty data back into the backing device. If the dirty data set is large, it might be very long time for bch_writeback_thread() to scan all dirty buckets and releases dc->writeback_lock. In such case update_writeback_rate() can be starved for long enough time so that kernel reports a soft lockup warn- ing started like: watchdog: BUG: soft lockup - CPU#246 stuck for 23s! [kworker/246:31:179713] Such soft lockup condition is unnecessary, because after the writeback thread finishes its job and releases dc->writeback_lock, the kworker update_writeback_rate() may continue to work and everything is fine indeed. This patch avoids the unnecessary soft lockup by the following method, - Add new member to struct cached_dev - dc->rate_update_retry (0 by default) - In update_writeback_rate() call down_read_trylock(&dc->writeback_lock) firstly, if it fails then lock contention happens. - If dc->rate_update_retry <= BCH_WBRATE_UPDATE_MAX_SKIPS (15), doesn't acquire the lock and reschedules the kworker for next try. - If dc->rate_update_retry > BCH_WBRATE_UPDATE_MAX_SKIPS, no retry anymore and call down_read(&dc->writeback_lock) to wait for the lock. By the above method, at worst case update_writeback_rate() may retry for 1+ minutes before blocking on dc->writeback_lock by calling down_read(). For a 4TB cache device with 1TB dirty data, 90%+ of the unnecessary soft lockup warning message can be avoided. When retrying to acquire dc->writeback_lock in update_writeback_rate(), of course the writeback rate cannot be updated. It is fair, because when the kworker is blocked on the lock contention of dc->writeback_lock, the writeback rate cannot be updated neither. This change follows Jens Axboe's suggestion to a more clear and simple version. Signed-off-by: NColy Li <colyli@suse.de> Link: https://lore.kernel.org/r/20220528124550.32834-2-colyli@suse.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 27 5月, 2022 2 次提交
-
-
由 Jia-Ju Bai 提交于
The function kzalloc() in detached_dev_do_request() can fail, so its return value should be checked. Fixes: bc082a55 ("bcache: fix inaccurate io state for detached bcache devices") Reported-by: NTOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: NJia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: NColy Li <colyli@suse.de> Link: https://lore.kernel.org/r/20220527152818.27545-4-colyli@suse.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Coly Li 提交于
The local variables check_state (in bch_btree_check()) and state (in bch_sectors_dirty_init()) should be fully filled by 0, because before allocating them on stack, they were dynamically allocated by kzalloc(). Signed-off-by: NColy Li <colyli@suse.de> Link: https://lore.kernel.org/r/20220527152818.27545-2-colyli@suse.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-