提交 · 92d9aac92b7cc92c770e736c70c3acae7b803278 · openeuler / Kernel

26 4月, 2022 6 次提交

md: replace deprecated strlcpy & remove duplicated line · 92d9aac9

由 Heming Zhao 提交于 4月 01, 2022

This commit includes two topics:

1> replace deprecated strlcpy

change strlcpy to strscpy for strlcpy is marked as deprecated in
Documentation/process/deprecated.rst

2> remove duplicated strlcpy line

in md_bitmap_read_sb@md-bitmap.c there are two duplicated strlcpy(), the
history:

- commit cf921cc1 ("Add node recovery callbacks") introduced the first
  usage of strlcpy().

- commit b97e9257 ("Use separate bitmaps for each nodes in the cluster")
  introduced the second strlcpy(). this time, the two strlcpy() are same,
   we can remove anyone safely.

- commit d3b178ad ("md: Skip cluster setup for dm-raid") added dm-raid
  special handling. And the "nodes" value is the key of this patch. but
  from this patch, strlcpy() which was introduced by b97e9257
  become necessary.

- commit 3c462c88 ("md: Increment version for clustered bitmaps") used
  clustered major version to only handle in clustered env. this patch
  could look a polishment for clustered code logic.

So cf921cc1 became useless after d3b178ad, we could remove it
safely.
Signed-off-by: NHeming Zhao <heming.zhao@suse.com>
Signed-off-by: NSong Liu <song@kernel.org>

92d9aac9

md/bitmap: don't set sb values if can't pass sanity check · e68cb83a

由 Heming Zhao 提交于 4月 01, 2022

If bitmap area contains invalid data, kernel will crash then mdadm
triggers "Segmentation fault".
This is cluster-md speical bug. In non-clustered env, mdadm will
handle broken metadata case. In clustered array, only kernel space
handles bitmap slot info. But even this bug only happened in clustered
env, current sanity check is wrong, the code should be changed.

How to trigger: (faulty injection)

dd if=/dev/zero bs=1M count=1 oflag=direct of=/dev/sda
dd if=/dev/zero bs=1M count=1 oflag=direct of=/dev/sdb
mdadm -C /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda /dev/sdb
mdadm -Ss
echo aaa > magic.txt
 == below modifying slot 2 bitmap data ==
dd if=magic.txt of=/dev/sda seek=16384 bs=1 count=3 <== destroy magic
dd if=/dev/zero of=/dev/sda seek=16436 bs=1 count=4 <== ZERO chunksize
mdadm -A /dev/md0 /dev/sda /dev/sdb
 == kernel crashes. mdadm outputs "Segmentation fault" ==

Reason of kernel crash:

In md_bitmap_read_sb (called by md_bitmap_create), bad bitmap magic didn't
block chunksize assignment, and zero value made DIV_ROUND_UP_SECTOR_T()
trigger "divide error".

Crash log:

kernel: md: md0 stopped.
kernel: md/raid1:md0: not clean -- starting background reconstruction
kernel: md/raid1:md0: active with 2 out of 2 mirrors
kernel: dlm: ... ...
kernel: md-cluster: Joined cluster 44810aba-38bb-e6b8-daca-bc97a0b254aa slot 1
kernel: md0: invalid bitmap file superblock: bad magic
kernel: md_bitmap_copy_from_slot can't get bitmap from slot 2
kernel: md-cluster: Could not gather bitmaps from slot 2
kernel: divide error: 0000 [#1] SMP NOPTI
kernel: CPU: 0 PID: 1603 Comm: mdadm Not tainted 5.14.6-1-default
kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
kernel: RIP: 0010:md_bitmap_create+0x1d1/0x850 [md_mod]
kernel: RSP: 0018:ffffc22ac0843ba0 EFLAGS: 00010246
kernel: ... ...
kernel: Call Trace:
kernel:  ? dlm_lock_sync+0xd0/0xd0 [md_cluster 77fe..7a0]
kernel:  md_bitmap_copy_from_slot+0x2c/0x290 [md_mod 24ea..d3a]
kernel:  load_bitmaps+0xec/0x210 [md_cluster 77fe..7a0]
kernel:  md_bitmap_load+0x81/0x1e0 [md_mod 24ea..d3a]
kernel:  do_md_run+0x30/0x100 [md_mod 24ea..d3a]
kernel:  md_ioctl+0x1290/0x15a0 [md_mod 24ea....d3a]
kernel:  ? mddev_unlock+0xaa/0x130 [md_mod 24ea..d3a]
kernel:  ? blkdev_ioctl+0xb1/0x2b0
kernel:  block_ioctl+0x3b/0x40
kernel:  __x64_sys_ioctl+0x7f/0xb0
kernel:  do_syscall_64+0x59/0x80
kernel:  ? exit_to_user_mode_prepare+0x1ab/0x230
kernel:  ? syscall_exit_to_user_mode+0x18/0x40
kernel:  ? do_syscall_64+0x69/0x80
kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
kernel: RIP: 0033:0x7f4a15fa722b
kernel: ... ...
kernel: ---[ end trace 8afa7612f559c868 ]---
kernel: RIP: 0010:md_bitmap_create+0x1d1/0x850 [md_mod]
Reported-by: Nkernel test robot <lkp@intel.com>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NGuoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: NHeming Zhao <heming.zhao@suse.com>
Signed-off-by: NSong Liu <song@kernel.org>

e68cb83a

md: fix an incorrect NULL check in md_reload_sb · 64c54d92

由 Xiaomeng Tong 提交于 4月 08, 2022

The bug is here:
	if (!rdev || rdev->desc_nr != nr) {

The list iterator value 'rdev' will *always* be set and non-NULL
by rdev_for_each_rcu(), so it is incorrect to assume that the
iterator value will be NULL if the list is empty or no element
found (In fact, it will be a bogus pointer to an invalid struct
object containing the HEAD). Otherwise it will bypass the check
and lead to invalid memory access passing the check.

To fix the bug, use a new variable 'iter' as the list iterator,
while using the original variable 'pdev' as a dedicated pointer to
point to the found element.

Cc: stable@vger.kernel.org
Fixes: 70bcecdb ("md-cluster: Improve md_reload_sb to be less error prone")
Signed-off-by: NXiaomeng Tong <xiam0nd.tong@gmail.com>
Signed-off-by: NSong Liu <song@kernel.org>

64c54d92

md: fix an incorrect NULL check in does_sb_need_changing · fc873834

由 Xiaomeng Tong 提交于 4月 08, 2022

The bug is here:
	if (!rdev)

The list iterator value 'rdev' will *always* be set and non-NULL
by rdev_for_each(), so it is incorrect to assume that the iterator
value will be NULL if the list is empty or no element found.
Otherwise it will bypass the NULL check and lead to invalid memory
access passing the check.

To fix the bug, use a new variable 'iter' as the list iterator,
while using the original variable 'rdev' as a dedicated pointer to
point to the found element.

Cc: stable@vger.kernel.org
Fixes: 2aa82191 ("md-cluster: Perform a lazy update")
Acked-by: NGuoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: NXiaomeng Tong <xiam0nd.tong@gmail.com>
Acked-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NSong Liu <song@kernel.org>

fc873834

raid5: introduce MD_BROKEN · 57668f0a

由 Mariusz Tkaczyk 提交于 3月 22, 2022

Raid456 module had allowed to achieve failed state. It was fixed by
fb73b357 ("raid5: block failing device if raid will be failed").
This fix introduces a bug, now if raid5 fails during IO, it may result
with a hung task without completion. Faulty flag on the device is
necessary to process all requests and is checked many times, mainly in
analyze_stripe().
Allow to set faulty on drive again and set MD_BROKEN if raid is failed.

As a result, this level is allowed to achieve failed state again, but
communication with userspace (via -EBUSY status) will be preserved.

This restores possibility to fail array via #mdadm --set-faulty command
and will be fixed by additional verification on mdadm side.

Reproduction steps:
 mdadm -CR imsm -e imsm -n 3 /dev/nvme[0-2]n1
 mdadm -CR r5 -e imsm -l5 -n3 /dev/nvme[0-2]n1 --assume-clean
 mkfs.xfs /dev/md126 -f
 mount /dev/md126 /mnt/root/

 fio --filename=/mnt/root/file --size=5GB --direct=1 --rw=randrw
--bs=64k --ioengine=libaio --iodepth=64 --runtime=240 --numjobs=4
--time_based --group_reporting --name=throughput-test-job
--eta-newline=1 &

 echo 1 > /sys/block/nvme2n1/device/device/remove
 echo 1 > /sys/block/nvme1n1/device/device/remove

 [ 1475.787779] Call Trace:
 [ 1475.793111] __schedule+0x2a6/0x700
 [ 1475.799460] schedule+0x38/0xa0
 [ 1475.805454] raid5_get_active_stripe+0x469/0x5f0 [raid456]
 [ 1475.813856] ? finish_wait+0x80/0x80
 [ 1475.820332] raid5_make_request+0x180/0xb40 [raid456]
 [ 1475.828281] ? finish_wait+0x80/0x80
 [ 1475.834727] ? finish_wait+0x80/0x80
 [ 1475.841127] ? finish_wait+0x80/0x80
 [ 1475.847480] md_handle_request+0x119/0x190
 [ 1475.854390] md_make_request+0x8a/0x190
 [ 1475.861041] generic_make_request+0xcf/0x310
 [ 1475.868145] submit_bio+0x3c/0x160
 [ 1475.874355] iomap_dio_submit_bio.isra.20+0x51/0x60
 [ 1475.882070] iomap_dio_bio_actor+0x175/0x390
 [ 1475.889149] iomap_apply+0xff/0x310
 [ 1475.895447] ? iomap_dio_bio_actor+0x390/0x390
 [ 1475.902736] ? iomap_dio_bio_actor+0x390/0x390
 [ 1475.909974] iomap_dio_rw+0x2f2/0x490
 [ 1475.916415] ? iomap_dio_bio_actor+0x390/0x390
 [ 1475.923680] ? atime_needs_update+0x77/0xe0
 [ 1475.930674] ? xfs_file_dio_aio_read+0x6b/0xe0 [xfs]
 [ 1475.938455] xfs_file_dio_aio_read+0x6b/0xe0 [xfs]
 [ 1475.946084] xfs_file_read_iter+0xba/0xd0 [xfs]
 [ 1475.953403] aio_read+0xd5/0x180
 [ 1475.959395] ? _cond_resched+0x15/0x30
 [ 1475.965907] io_submit_one+0x20b/0x3c0
 [ 1475.972398] __x64_sys_io_submit+0xa2/0x180
 [ 1475.979335] ? do_io_getevents+0x7c/0xc0
 [ 1475.986009] do_syscall_64+0x5b/0x1a0
 [ 1475.992419] entry_SYSCALL_64_after_hwframe+0x65/0xca
 [ 1476.000255] RIP: 0033:0x7f11fc27978d
 [ 1476.006631] Code: Bad RIP value.
 [ 1476.073251] INFO: task fio:3877 blocked for more than 120 seconds.

Cc: stable@vger.kernel.org
Fixes: fb73b357 ("raid5: block failing device if raid will be failed")
Reviewd-by: NXiao Ni <xni@redhat.com>
Signed-off-by: NMariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Signed-off-by: NSong Liu <song@kernel.org>

57668f0a

md: Set MD_BROKEN for RAID1 and RAID10 · 9631abdb

由 Mariusz Tkaczyk 提交于 3月 22, 2022

There is no direct mechanism to determine raid failure outside
personality. It is done by checking rdev->flags after executing
md_error(). If "faulty" flag is not set then -EBUSY is returned to
userspace. -EBUSY means that array will be failed after drive removal.

Mdadm has special routine to handle the array failure and it is executed
if -EBUSY is returned by md.

There are at least two known reasons to not consider this mechanism
as correct:
1. drive can be removed even if array will be failed[1].
2. -EBUSY seems to be wrong status. Array is not busy, but removal
   process cannot proceed safe.

-EBUSY expectation cannot be removed without breaking compatibility
with userspace. In this patch first issue is resolved by adding support
for MD_BROKEN flag for RAID1 and RAID10. Support for RAID456 is added in
next commit.

The idea is to set the MD_BROKEN if we are sure that raid is in failed
state now. This is done in each error_handler(). In md_error() MD_BROKEN
flag is checked. If is set, then -EBUSY is returned to userspace.

As in previous commit, it causes that #mdadm --set-faulty is able to
fail array. Previously proposed workaround is valid if optional
functionality[1] is disabled.

[1] commit 9a567843("md: allow last device to be forcibly removed from
    RAID1/RAID10.")
Reviewd-by: NXiao Ni <xni@redhat.com>
Signed-off-by: NMariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Signed-off-by: NSong Liu <song@kernel.org>

9631abdb

18 4月, 2022 6 次提交

block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD · 44abff2c

由 Christoph Hellwig 提交于 4月 15, 2022

Secure erase is a very different operation from discard in that it is
a data integrity operation vs hint.  Fully split the limits and helper
infrastructure to make the separation more clear.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd]
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nifs2]
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> [f2fs]
Acked-by: Coly Li <colyli@suse.de> [bcache]
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Acked-by: NChao Yu <chao@kernel.org>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-27-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

44abff2c

block: remove QUEUE_FLAG_DISCARD · 70200574

由 Christoph Hellwig 提交于 4月 15, 2022

Just use a non-zero max_discard_sectors as an indicator for discard
support, similar to what is done for write zeroes.

The only places where needs special attention is the RAID5 driver,
which must clear discard support for security reasons by default,
even if the default stacking rules would allow for it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd]
Acked-by: Jan Höppner <hoeppner@linux.ibm.com> [s390]
Acked-by: Coly Li <colyli@suse.de> [bcache]
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-25-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

70200574

block: add a bdev_max_discard_sectors helper · cf0fbf89

由 Christoph Hellwig 提交于 4月 15, 2022

Add a helper to query the number of sectors support per each discard bio
based on the block device and use this helper to stop various places from
poking into the request_queue to see if discard is supported and if so how
much. This mirrors what is done e.g. for write zeroes as well.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd]
Acked-by: Coly Li <colyli@suse.de> [bcache]
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-24-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

cf0fbf89

block: add a bdev_stable_writes helper · 36d25489

由 Christoph Hellwig 提交于 4月 15, 2022

Add a helper to check the stable writes flag based on the block_device
instead of having to poke into the block layer internal request_queue.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-15-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

36d25489

block: add a bdev_nonrot helper · 10f0d2a5

由 Christoph Hellwig 提交于 4月 15, 2022

Add a helper to check the nonrot flag based on the block_device instead
of having to poke into the block layer internal request_queue.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-12-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

10f0d2a5

block: turn bio_kmalloc into a simple kmalloc wrapper · 066ff571

由 Christoph Hellwig 提交于 4月 06, 2022

Remove the magic autofree semantics and require the callers to explicitly
call bio_init to initialize the bio.

This allows bio_free to catch accidental bio_put calls on bio_init()ed
bios as well.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NColy Li <colyli@suse.de>
Acked-by: NMike Snitzer <snitzer@kernel.org>
Link: https://lore.kernel.org/r/20220406061228.410163-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

066ff571

16 4月, 2022 1 次提交

dm: fix bio length of empty flush · 92b914e2

由 Shin'ichiro Kawasaki 提交于 4月 15, 2022

The commit 92986f6b ("dm: use bio_clone_fast in alloc_io/alloc_tio")
removed bio_clone_fast() call from alloc_tio() when ci->io->tio is
available. In this case, ci->bio is not copied to ci->io->tio.clone.
This is fine since init_clone_info() sets same values to ci->bio and
ci->io->tio.clone.

However, when incoming bios have REQ_PREFLUSH flag, __send_empty_flush()
prepares a zero length bio on stack and set it to ci->bio. At this time,
ci->io->tio.clone still keeps non-zero length. When alloc_tio() chooses
this ci->io->tio.clone as the bio to map, it is passed to targets as
non-empty flush bio. It causes bio length check failure in dm-zoned and
unexpected operation such as dm_accept_partial_bio() call.

To avoid the non-empty flush bio, set zero length to ci->io->tio.clone
in __send_empty_flush().

Fixes: 92986f6b ("dm: use bio_clone_fast in alloc_io/alloc_tio")
Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@kernel.org>

92b914e2

15 4月, 2022 1 次提交

dm: allow dm_accept_partial_bio() for dm_io without duplicate bios · 7dd06a25

由 Mike Snitzer 提交于 4月 14, 2022

The intent behind commit e6fc9f62 ("dm: flag clones created by
__send_duplicate_bios") was to formally disallow the use of
dm_accept_partial_bio() where it simply isn't possible -- due to
constraint that multiple bios cannot meaningfully update a shared
tio->len_ptr.

But that commit went too far and disallowed the case where "abormal"
IO (e.g. WRITE_ZEROES) is only using a single bio.  Fix this by
not marking a dm_io with a single dm_target_io (and bio), that happens
to be created by __send_duplicate_bios, as DM_TIO_IS_DUPLICATE_BIO.
Also remove 'unsigned *len' parameter from alloc_multiple_bios().

This commit fixes a dm_accept_partial_bio() BUG_ON() with dm-zoned
when a WRITE_ZEROES bio is issued.

Fixes: 655f3aad ("dm: switch dm_target_io booleans over to proper flags")
Reported-by: NShinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: NMike Snitzer <snitzer@kernel.org>

7dd06a25

14 4月, 2022 3 次提交

dm zone: fix NULL pointer dereference in dm_zone_map_bio · 73d7b06e

由 Mike Snitzer 提交于 4月 13, 2022

Commit 0fbb4d93 ("dm: add dm_submit_bio_remap interface") changed
the alloc_io() function to delay the initialization of struct dm_io's
orig_bio member, leaving it NULL until after the dm_io and associated
user submitted bio is processed by __split_and_process_bio(). This
change causes a NULL pointer dereference in dm_zone_map_bio() when the
original user bio is inspected to detect the need for zone append
command emulation.

Fix this NULL pointer by updating dm_zone_map_bio() to not access
->orig_bio when the same info can be accessed from the clone of the
->orig_bio _before_ any ->map processing. Save off the bio_op() and
bio_sectors() for the clone and then use the saved orig_bio_details as
needed.

Fixes: 0fbb4d93 ("dm: add dm_submit_bio_remap interface")
Reported-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: NMike Snitzer <snitzer@kernel.org>

73d7b06e

dm mpath: only use ktime_get_ns() in historical selector · ce40426f

由 Khazhismel Kumykov 提交于 4月 11, 2022

Mixing sched_clock() and ktime_get_ns() usage will give bad results.

Switch hst_select_path() from using sched_clock() to ktime_get_ns().
Also rename path_service_time()'s 'sched_now' variable to 'now'.

Fixes: 2613eab1 ("dm mpath: add Historical Service Time Path Selector")
Signed-off-by: NKhazhismel Kumykov <khazhy@google.com>
Signed-off-by: NMike Snitzer <snitzer@kernel.org>

ce40426f

dm integrity: fix memory corruption when tag_size is less than digest size · 08c1af8f

由 Mikulas Patocka 提交于 4月 03, 2022

It is possible to set up dm-integrity in such a way that the
"tag_size" parameter is less than the actual digest size. In this
situation, a part of the digest beyond tag_size is ignored.

In this case, dm-integrity would write beyond the end of the
ic->recalc_tags array and corrupt memory. The corruption happened in
integrity_recalc->integrity_sector_checksum->crypto_shash_final.

Fix this corruption by increasing the tags array so that it has enough
padding at the end to accomodate the loop in integrity_recalc() being
able to write a full digest size for the last member of the tags
array.

Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@kernel.org>

08c1af8f

02 4月, 2022 2 次提交

dm: fix bio polling to handle possibile BLK_STS_AGAIN · 52919840

由 Ming Lei 提交于 4月 01, 2022

Expanded testing of DM's bio polling support (using more fio threads
to dm-linear ontop of null_blk) exposed the possibility for polled
bios to hang (repeatedly polling in io_uring) when null_blk responds
with BLK_STS_AGAIN (due to lack of resources):

1) io_complete_rw_iopoll() is called from blkdev_bio_end_io_async() to
   notify kiocb is done, that is the completion interface between block
   layer and io_uring

2) io_complete_rw_iopoll() is called from io_do_iopoll()

3) dm returns BLK_STS_AGAIN for one bio (on behalf of underlying
   driver), then io_complete_rw_iopoll is called, but io_do_iopoll()
   doesn't handle -EAGAIN at all (due to logic in io_rw_should_reissue)

4) reason for dm's BLK_STS_AGAIN is underlying null_blk driver ran out
   of requests (easier to reproduce by setting low hw_queue_depth).

5) dm should handle BLK_STS_AGAIN for POLLED underlying IO, and may
   retry in dm layer.

This fix adds REQ_POLLED specific BLK_STS_AGAIN handling to
dm_io_complete() that clears REQ_POLLED and requeues the bio to DM
using queue_io().

Fixes: b99fdcdc ("dm: support bio polling")
Signed-off-by: NMing Lei <ming.lei@redhat.com>
[snitzer: revised header, reused dm_io_complete's REQ_POLLED case]
Signed-off-by: NMike Snitzer <snitzer@kernel.org>

52919840

dm: fix dm_io and dm_target_io flags race condition on Alpha · aad5b23e

由 Mikulas Patocka 提交于 3月 28, 2022

Early alpha processors cannot write a single byte or short; they read 8
bytes, modify the value in registers and write back 8 bytes.

This could cause race condition in the structure dm_io - if the fields
flags and io_count are modified simultaneously.

Fix this bug by using 32-bit flags if we are on Alpha and if we are
compiling for a processor that doesn't have the byte-word-extension.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Fixes: bd4a6dd2 ("dm: reduce size of dm_io and dm_target_io structs")
[snitzer: Jens allowed this change since Mikulas owns a relevant Alpha!]
Acked-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NMike Snitzer <snitzer@kernel.org>

aad5b23e

01 4月, 2022 2 次提交

dm integrity: set journal entry unused when shrinking device · cc09e8a9

由 Mikulas Patocka 提交于 3月 26, 2022

Commit f6f72f32 ("dm integrity: don't replay journal data past the
end of the device") skips journal replay if the target sector points
beyond the end of the device. Unfortunatelly, it doesn't set the
journal entry unused, which resulted in this BUG being triggered:
BUG_ON(!journal_entry_is_unused(je))

Fix this by calling journal_entry_set_unused() for this case.

Fixes: f6f72f32 ("dm integrity: don't replay journal data past the end of the device")
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Tested-by: NMilan Broz <gmazyland@gmail.com>
[snitzer: revised header]
Signed-off-by: NMike Snitzer <snitzer@kernel.org>

cc09e8a9

dm ioctl: log an error if the ioctl structure is corrupted · dbdcc906

由 Mikulas Patocka 提交于 3月 20, 2022

This will help triage bugs when userspace is passing invalid ioctl
structure to the kernel.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
[snitzer: log errors using DMERR instead of DMWARN]
Signed-off-by: NMike Snitzer <snitzer@kernel.org>

dbdcc906

22 3月, 2022 4 次提交

dm: consolidate spinlocks in dm_io struct · 4d7bca13

由 Mike Snitzer 提交于 3月 19, 2022

No reason to have separate startio_lock and endio_lock given endio_lock
could be used during submission anyway.

This change leaves the dm_io struct weighing in at 256 bytes (down
from 272 bytes, so saves a cacheline).
Signed-off-by: NMike Snitzer <snitzer@kernel.org>

4d7bca13

M
dm: reduce size of dm_io and dm_target_io structs · bd4a6dd2
由 Mike Snitzer 提交于 3月 19, 2022
```
Signed-off-by: NMike Snitzer <snitzer@kernel.org>
```
bd4a6dd2

dm: switch dm_target_io booleans over to proper flags · 655f3aad

由 Mike Snitzer 提交于 3月 19, 2022

Add flags to dm_target_io and manage them using the same pattern used
for bi_flags in struct bio.
Signed-off-by: NMike Snitzer <snitzer@kernel.org>

655f3aad

dm: switch dm_io booleans over to proper flags · 82f6cdcc

由 Mike Snitzer 提交于 3月 18, 2022

Add flags to dm_io and manage them using the same pattern used for
bi_flags in struct bio.
Signed-off-by: NMike Snitzer <snitzer@kernel.org>

82f6cdcc

11 3月, 2022 6 次提交

M
dm: return void from __send_empty_flush · 332f2b1e
由 Mike Snitzer 提交于 3月 10, 2022
```
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
332f2b1e

dm: factor out dm_io_complete · e2736347

由 Mike Snitzer 提交于 3月 09, 2022

Optimizes dm_io_dec_pending() slightly by avoiding local variables.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e2736347

M
dm cache: use dm_submit_bio_remap · 69596f55
由 Mike Snitzer 提交于 3月 08, 2022
```
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
69596f55

dm: simplify dm_sumbit_bio_remap interface · b7f8dff0

由 Mike Snitzer 提交于 3月 10, 2022

Remove the from_wq argument from dm_sumbit_bio_remap(). Eliminates the
need for dm_sumbit_bio_remap() callers to know whether they are
calling for a workqueue or from the original dm_submit_bio().

Add map_task to dm_io struct, record the map_task in alloc_io and
clear it after all target ->map() calls have completed. Update
dm_sumbit_bio_remap to check if 'current' matches io->map_task rather
than rely on passed 'from_rq' argument.

This change really simplifies the chore of porting each DM target to
using dm_sumbit_bio_remap() because there is no longer the risk of
programming error by not completely knowing all the different contexts
a particular method that calls dm_sumbit_bio_remap() might be used in.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

b7f8dff0

M
dm thin: use dm_submit_bio_remap · a9251281
由 Mike Snitzer 提交于 3月 08, 2022
```
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
a9251281

dm: add WARN_ON_ONCE to dm_submit_bio_remap · 0a8e9599

由 Mike Snitzer 提交于 3月 08, 2022

If a target uses dm_submit_bio_remap() it should set
ti->accounts_remapped_io.

Also, switch dm_start_io_acct() WARN_ON to WARN_ON_ONCE.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

0a8e9599

10 3月, 2022 1 次提交

dm: support bio polling · b99fdcdc

由 Ming Lei 提交于 3月 04, 2022

Support bio polling (REQ_POLLED) in the following approach:

1) only support io polling on normal READ/WRITE, and other abnormal IOs
still fallback to IRQ mode, so the target io (and DM's clone bio) is
exactly inside the dm io.

2) hold one refcnt on io->io_count after submitting this dm bio with
REQ_POLLED

3) support dm native bio splitting, any dm io instance associated with
current bio will be added into one list which head is bio->bi_private
which will be recovered before ending this bio

4) implement .poll_bio() callback, call bio_poll() on the single target
bio inside the dm io which is retrieved via bio->bi_bio_drv_data; call
dm_io_dec_pending() after the target io is done in .poll_bio()

5) enable QUEUE_FLAG_POLL if all underlying queues enable QUEUE_FLAG_POLL,
which is based on Jeffle's previous patch.

These changes are good for a 30-35% IOPS improvement for polled IO.

For detailed test results please see (Jens, thanks for testing!):
https://listman.redhat.com/archives/dm-devel/2022-March/049868.html
or https://marc.info/?l=linux-block&m=164684246214700&w=2Tested-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

b99fdcdc

09 3月, 2022 6 次提交

raid5: initialize the stripe_head embeeded bios as needed · 03a6b195

由 Christoph Hellwig 提交于 2月 28, 2022

Use bio_init to initialize the bios when needed to the full state
instead of a partial initialization plus later setting of dev and op
and bio_reset.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSong Liu <song@kernel.org>

03a6b195

raid5-cache: statically allocate the recovery ra bio · 89f94b64

由 Christoph Hellwig 提交于 2月 28, 2022

There is no need to preallocate the bio and reset it when use.  Just
allocate it on-stack and use a bvec places next to the pages used for
it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSong Liu <song@kernel.org>

89f94b64

raid5-cache: fully initialize flush_bio when needed · 0dd00cba

由 Christoph Hellwig 提交于 2月 28, 2022

Stop using bio_reset and just initialize the bio fully when needed.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSong Liu <song@kernel.org>

0dd00cba

raid5-ppl: fully initialize the bio in ppl_new_iounit · 9f7c3f83

由 Christoph Hellwig 提交于 2月 28, 2022

We have all the information to pass the bdev and op directly to bio_init,
so do that.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSong Liu <song@kernel.org>

9f7c3f83

md: use msleep() in md_notify_reboot() · 7d959f6e

由 Eric Dumazet 提交于 3月 03, 2022

Calling mdelay(1000) from process context, even while a reboot
is in progress, does not make sense.

Using msleep() allows other threads to make progress.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: linux-raid@vger.kernel.org
Signed-off-by: NSong Liu <song@kernel.org>

7d959f6e

md: raid1/raid10: drop pending_cnt · daae161f

由 Mariusz Tkaczyk 提交于 1月 17, 2022

Those counters are not necessary after commit 11bb45e8aaf6 ("md: drop queue
limitation for RAID1 and RAID10"). Remove them from all code (conf and
plug structs). raid1_plug_cb and raid10_plug_cb are identical, so move
definition of raid1_plug_cb to common raid1-10 definitions and use it for
RAID10 too.
Signed-off-by: NMariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Signed-off-by: NSong Liu <song@kernel.org>

daae161f

08 3月, 2022 1 次提交

block: remove the per-bio/request write hint · c75e707f

由 Christoph Hellwig 提交于 3月 04, 2022

With the NVMe support for this gone, there are no consumers of these hints
left, so remove them.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220304175556.407719-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

c75e707f

07 3月, 2022 1 次提交

raid5-ppl: stop using bio_devname · c7dec462

由 Christoph Hellwig 提交于 3月 04, 2022

Use the %pg format specifier to save on stack consuption and code size.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NSong Liu <song@kernel.org>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220304180105.409765-9-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

c7dec462

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功