提交 · 900d156bac2bc474cf7c7bee4efbc6c83ec5ae58 · openeuler / Kernel

15 7月, 2022 1 次提交

由 Christoph Hellwig 提交于 7月 13, 2022

Replace the remaining calls of bdevname with snprintf using the %pg
format specifier.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220713055317.1888500-10-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

900d156b

23 5月, 2022 1 次提交

md: remove most calls to bdevname · 913cce5a

由 Christoph Hellwig 提交于 5月 12, 2022

Use the %pg format specifier to save on stack consumption and code size.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSong Liu <song@kernel.org>

913cce5a

26 4月, 2022 1 次提交

md: Set MD_BROKEN for RAID1 and RAID10 · 9631abdb

由 Mariusz Tkaczyk 提交于 3月 22, 2022

There is no direct mechanism to determine raid failure outside
personality. It is done by checking rdev->flags after executing
md_error(). If "faulty" flag is not set then -EBUSY is returned to
userspace. -EBUSY means that array will be failed after drive removal.

Mdadm has special routine to handle the array failure and it is executed
if -EBUSY is returned by md.

There are at least two known reasons to not consider this mechanism
as correct:
1. drive can be removed even if array will be failed[1].
2. -EBUSY seems to be wrong status. Array is not busy, but removal
   process cannot proceed safe.

-EBUSY expectation cannot be removed without breaking compatibility
with userspace. In this patch first issue is resolved by adding support
for MD_BROKEN flag for RAID1 and RAID10. Support for RAID456 is added in
next commit.

The idea is to set the MD_BROKEN if we are sure that raid is in failed
state now. This is done in each error_handler(). In md_error() MD_BROKEN
flag is checked. If is set, then -EBUSY is returned to userspace.

As in previous commit, it causes that #mdadm --set-faulty is able to
fail array. Previously proposed workaround is valid if optional
functionality[1] is disabled.

[1] commit 9a567843("md: allow last device to be forcibly removed from
    RAID1/RAID10.")
Reviewd-by: NXiao Ni <xni@redhat.com>
Signed-off-by: NMariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Signed-off-by: NSong Liu <song@kernel.org>

9631abdb

18 4月, 2022 3 次提交

block: remove QUEUE_FLAG_DISCARD · 70200574

由 Christoph Hellwig 提交于 4月 15, 2022

Just use a non-zero max_discard_sectors as an indicator for discard
support, similar to what is done for write zeroes.

The only places where needs special attention is the RAID5 driver,
which must clear discard support for security reasons by default,
even if the default stacking rules would allow for it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd]
Acked-by: Jan Höppner <hoeppner@linux.ibm.com> [s390]
Acked-by: Coly Li <colyli@suse.de> [bcache]
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-25-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

70200574

block: add a bdev_nonrot helper · 10f0d2a5

由 Christoph Hellwig 提交于 4月 15, 2022

Add a helper to check the nonrot flag based on the block_device instead
of having to poke into the block layer internal request_queue.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-12-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

10f0d2a5

block: turn bio_kmalloc into a simple kmalloc wrapper · 066ff571

由 Christoph Hellwig 提交于 4月 06, 2022

Remove the magic autofree semantics and require the callers to explicitly
call bio_init to initialize the bio.

This allows bio_free to catch accidental bio_put calls on bio_init()ed
bios as well.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NColy Li <colyli@suse.de>
Acked-by: NMike Snitzer <snitzer@kernel.org>
Link: https://lore.kernel.org/r/20220406061228.410163-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

066ff571

09 3月, 2022 1 次提交

md: raid1/raid10: drop pending_cnt · daae161f

由 Mariusz Tkaczyk 提交于 1月 17, 2022

Those counters are not necessary after commit 11bb45e8aaf6 ("md: drop queue
limitation for RAID1 and RAID10"). Remove them from all code (conf and
plug structs). raid1_plug_cb and raid10_plug_cb are identical, so move
definition of raid1_plug_cb to common raid1-10 definitions and use it for
RAID10 too.
Signed-off-by: NMariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Signed-off-by: NSong Liu <song@kernel.org>

daae161f

08 3月, 2022 1 次提交

block: remove the per-bio/request write hint · c75e707f

由 Christoph Hellwig 提交于 3月 04, 2022

With the NVMe support for this gone, there are no consumers of these hints
left, so remove them.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220304175556.407719-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

c75e707f

07 3月, 2022 1 次提交

raid1: stop using bio_devname · ac483eb3

由 Christoph Hellwig 提交于 3月 04, 2022

Use the %pg format specifier to save on stack consuption and code size.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NSong Liu <song@kernel.org>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220304180105.409765-8-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

ac483eb3

23 2月, 2022 1 次提交

scsi: md: Remove WRITE_SAME support · 10fa225c

由 Christoph Hellwig 提交于 2月 09, 2022

There are no more end-users of REQ_OP_WRITE_SAME left, so we can start
deleting it.

Link: https://lore.kernel.org/r/20220209082828.2629273-6-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

10fa225c

04 2月, 2022 1 次提交

block: pass a block_device to bio_clone_fast · abfc426d

由 Christoph Hellwig 提交于 2月 02, 2022

Pass a block_device to bio_clone_fast and __bio_clone_fast and give
the functions more suitable names.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220202160109.108149-14-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

abfc426d

02 2月, 2022 2 次提交

block: pass a block_device and opf to bio_reset · a7c50c94

由 Christoph Hellwig 提交于 1月 24, 2022

Pass the block_device that we plan to use this bio for and the
operation to bio_reset to optimize the assigment. A NULL block_device
can be passed, both for the passthrough case on a raw request_queue and
to temporarily avoid refactoring some nasty code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220124091107.642561-20-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

a7c50c94

block: pass a block_device and opf to bio_alloc_bioset · 609be106

由 Christoph Hellwig 提交于 1月 24, 2022

Pass the block_device and operation that we plan to use this bio for to
bio_alloc_bioset to optimize the assigment. NULL/0 can be passed, both
for the passthrough case on a raw request_queue and to temporarily avoid
refactoring some nasty code.

Also move the gfp_mask argument after the nr_vecs argument for a much
more logical calling convention matching what most of the kernel does.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220124091107.642561-16-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

609be106

07 1月, 2022 2 次提交

md: raid1 add nowait support · 5aa70503

由 Vishal Verma 提交于 12月 21, 2021

This adds nowait support to the RAID1 driver. It makes RAID1 driver
return with EAGAIN for situations where it could wait for eg:

  - Waiting for the barrier,

wait_barrier() fn is modified to return bool to support error for
wait barriers. It returns true in case of wait or if wait is not
required and returns false if wait was required but not performed
to support nowait.
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NVishal Verma <vverma@digitalocean.com>
Signed-off-by: NSong Liu <song@kernel.org>

5aa70503

md: drop queue limitation for RAID1 and RAID10 · a92ce0fe

由 Mariusz Tkaczyk 提交于 12月 17, 2021

As suggested by Neil Brown[1], this limitation seems to be
deprecated.

With plugging in use, writes are processed behind the raid thread
and conf->pending_count is not increased. This limitation occurs only
if caller doesn't use plugs.

It can be avoided and often it is (with plugging). There are no reports
that queue is growing to enormous size so remove queue limitation for
non-plugged IOs too.

[1] https://lore.kernel.org/linux-raid/162496301481.7211.18031090130574610495@noble.neil.brown.nameSigned-off-by: NMariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Signed-off-by: NSong Liu <song@kernel.org>

a92ce0fe

04 1月, 2022 1 次提交

md/raid1: fix missing bitmap update w/o WriteMostly devices · 46669e86

由 Song Liu 提交于 1月 03, 2022

commit [1] causes missing bitmap updates when there isn't any WriteMostly
devices.

Detailed steps to reproduce by Norbert (which somehow didn't make to lore):

   # setup md10 (raid1) with two drives (1 GByte sparse files)
   dd if=/dev/zero of=disk1 bs=1024k seek=1024 count=0
   dd if=/dev/zero of=disk2 bs=1024k seek=1024 count=0

   losetup /dev/loop11 disk1
   losetup /dev/loop12 disk2

   mdadm --create /dev/md10 --level=1 --raid-devices=2 /dev/loop11 /dev/loop12

   # add bitmap (aka write-intent log)
   mdadm /dev/md10 --grow --bitmap=internal

   echo check > /sys/block/md10/md/sync_action

   root:# cat /sys/block/md10/md/mismatch_cnt
   0
   root:#

   # remove member drive disk2 (loop12)
   mdadm /dev/md10 -f loop12 ; mdadm /dev/md10 -r loop12

   # modify degraded md device
   dd if=/dev/urandom of=/dev/md10 bs=512 count=1

   # no blocks recorded as out of sync on the remaining member disk1/loop11
   root:# mdadm -X /dev/loop11 | grep Bitmap
             Bitmap : 16 bits (chunks), 0 dirty (0.0%)
   root:#

   # re-add disk2, nothing synced because of empty bitmap
   mdadm /dev/md10 --re-add /dev/loop12

   # check integrity again
   echo check > /sys/block/md10/md/sync_action

   # disk1 and disk2 are no longer in sync, reads return differend data
   root:# cat /sys/block/md10/md/mismatch_cnt
   128
   root:#

   # clean up
   mdadm -S /dev/md10
   losetup -d /dev/loop11
   losetup -d /dev/loop12
   rm disk1 disk2

Fix this by moving the WriteMostly check to the if condition for
alloc_behind_master_bio().

[1] commit fd3b6975 ("md/raid1: only allocate write behind bio for WriteMostly device")
Fixes: fd3b6975 ("md/raid1: only allocate write behind bio for WriteMostly device")
Cc: stable@vger.kernel.org # v5.12+
Cc: Guoqing Jiang <guoqing.jiang@linux.dev>
Cc: Jens Axboe <axboe@kernel.dk>
Reported-by: NNorbert Warmuth <nwarmuth@t-online.de>
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NSong Liu <song@kernel.org>

46669e86

19 10月, 2021 2 次提交

md/raid1: use rdev in raid1_write_request directly · 2e94275e

由 Guoqing Jiang 提交于 10月 04, 2021

We already get rdev from conf->mirrors[i].rdev at the beginning of the
loop, so just use it.
Signed-off-by: NGuoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2e94275e

md/raid1: only allocate write behind bio for WriteMostly device · fd3b6975

由 Guoqing Jiang 提交于 10月 04, 2021

Commit 6607cd31 ("raid1: ensure write
behind bio has less than BIO_MAX_VECS sectors") tried to guarantee the
size of behind bio is not bigger than BIO_MAX_VECS sectors.

Unfortunately the same calltrace still could happen since an array could
enable write-behind without write mostly device.

To match the manpage of mdadm (which says "write-behind is only attempted
on drives marked as write-mostly"), we need to check WriteMostly flag to
avoid such unexpected behavior.

[1]. https://bugzilla.kernel.org/show_bug.cgi?id=213181#c25

Cc: stable@vger.kernel.org # v5.12+
Cc: Jens Stutte <jens@chianterastutte.eu>
Reported-by: NJens Stutte <jens@chianterastutte.eu>
Signed-off-by: NGuoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fd3b6975

28 8月, 2021 1 次提交

raid1: ensure write behind bio has less than BIO_MAX_VECS sectors · 6607cd31

由 Guoqing Jiang 提交于 8月 24, 2021

We can't split write behind bio with more than BIO_MAX_VECS sectors,
otherwise the below call trace was triggered because we could allocate
oversized write behind bio later.

[ 8.097936] bvec_alloc+0x90/0xc0
[ 8.098934] bio_alloc_bioset+0x1b3/0x260
[ 8.099959] raid1_make_request+0x9ce/0xc50 [raid1]
[ 8.100988] ? __bio_clone_fast+0xa8/0xe0
[ 8.102008] md_handle_request+0x158/0x1d0 [md_mod]
[ 8.103050] md_submit_bio+0xcd/0x110 [md_mod]
[ 8.104084] submit_bio_noacct+0x139/0x530
[ 8.105127] submit_bio+0x78/0x1d0
[ 8.106163] ext4_io_submit+0x48/0x60 [ext4]
[ 8.107242] ext4_writepages+0x652/0x1170 [ext4]
[ 8.108300] ? do_writepages+0x41/0x100
[ 8.109338] ? __ext4_mark_inode_dirty+0x240/0x240 [ext4]
[ 8.110406] do_writepages+0x41/0x100
[ 8.111450] __filemap_fdatawrite_range+0xc5/0x100
[ 8.112513] file_write_and_wait_range+0x61/0xb0
[ 8.113564] ext4_sync_file+0x73/0x370 [ext4]
[ 8.114607] __x64_sys_fsync+0x33/0x60
[ 8.115635] do_syscall_64+0x33/0x40
[ 8.116670] entry_SYSCALL_64_after_hwframe+0x44/0xae

Thanks for the comment from Christoph.

[1]. https://bugs.archlinux.org/task/70992

Cc: stable@vger.kernel.org # v5.12+
Reported-by: NJens Stutte <jens@chianterastutte.eu>
Tested-by: NJens Stutte <jens@chianterastutte.eu>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn>
Signed-off-by: NSong Liu <songliubraving@fb.com>

6607cd31

24 7月, 2021 1 次提交

md/raid10: properly indicate failure when ending a failed write request · 5ba03936

由 Wei Shuyu 提交于 6月 28, 2021

Similar to [1], this patch fixes the same bug in raid10. Also cleanup the
comments.

[1] commit 2417b986 ("md/raid1: properly indicate failure when ending
                         a failed write request")
Cc: stable@vger.kernel.org
Fixes: 7cee6d4e ("md/raid10: end bio when the device faulty")
Signed-off-by: NWei Shuyu <wsy@dogben.com>
Acked-by: NGuoqing Jiang <jiangguoqing@kylinos.cn>
Signed-off-by: NSong Liu <song@kernel.org>

5ba03936

15 6月, 2021 2 次提交

md/raid1: enable io accounting · a0159832

由 Guoqing Jiang 提交于 5月 25, 2021

For raid1, we record the start time between split bio and clone bio,
and finish the accounting in the final endio.

Also introduce start_time in r1bio accordingly.
Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn>
Signed-off-by: NSong Liu <song@kernel.org>

a0159832

md/raid1: rename print_msg with r1bio_existed · 9b8ae7b9

由 Guoqing Jiang 提交于 5月 25, 2021

The caller of raid1_read_request could pass NULL or a valid pointer for
"struct r1bio *r1_bio", so it actually means whether r1_bio is existed
or not.
Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn>
Signed-off-by: NSong Liu <song@kernel.org>

9b8ae7b9

24 4月, 2021 1 次提交

md/raid1: properly indicate failure when ending a failed write request · 2417b986

由 Paul Clements 提交于 4月 15, 2021

This patch addresses a data corruption bug in raid1 arrays using bitmaps.
Without this fix, the bitmap bits for the failed I/O end up being cleared.

Since we are in the failure leg of raid1_end_write_request, the request
either needs to be retried (R1BIO_WriteError) or failed (R1BIO_Degraded).

Fixes: eeba6809 ("md/raid1: end bio when the device faulty")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: NPaul Clements <paul.clements@us.sios.com>
Signed-off-by: NSong Liu <song@kernel.org>

2417b986

28 1月, 2021 1 次提交

md: remove bio_alloc_mddev · a78f18da

由 Christoph Hellwig 提交于 1月 26, 2021

bio_alloc_mddev is never called with a NULL mddev, and ->bio_set is
initialized in md_run, so it always must be initialized as well.  Just
open code the remaining call to bio_alloc_bioset.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NSong Liu <song@kernel.org>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a78f18da

25 1月, 2021 1 次提交

block: store a block_device pointer in struct bio · 309dca30

由 Christoph Hellwig 提交于 1月 24, 2021

Replace the gendisk pointer in struct bio with a pointer to the newly
improved struct block device.  From that the gendisk can be trivially
accessed with an extra indirection, but it also allows to directly
look up all information related to partition remapping.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

309dca30

05 12月, 2020 1 次提交

block: remove the request_queue argument to the block_bio_remap tracepoint · 1c02fca6

由 Christoph Hellwig 提交于 12月 03, 2020

The request_queue can trivially be derived from the bio.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1c02fca6

09 7月, 2020 1 次提交

writeback: remove bdi->congested_fn · 21cf8661

由 Christoph Hellwig 提交于 7月 01, 2020

Except for pktdvd, the only places setting congested bits are file
systems that allocate their own backing_dev_info structures.  And
pktdvd is a deprecated driver that isn't useful in stack setup
either.  So remove the dead congested_fn stacking infrastructure.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NSong Liu <song@kernel.org>
Acked-by: NDavid Sterba <dsterba@suse.com>
[axboe: fixup unused variables in bcache/request.c]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

21cf8661

01 7月, 2020 1 次提交

block: rename generic_make_request to submit_bio_noacct · ed00aabd

由 Christoph Hellwig 提交于 7月 01, 2020

generic_make_request has always been very confusingly misnamed, so rename
it to submit_bio_noacct to make it clear that it is submit_bio minus
accounting and a few checks.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ed00aabd

14 5月, 2020 1 次提交

md/raid1: release pending accounting for an I/O only after write-behind is also finished · c91114c2

由 David Jeffery 提交于 1月 27, 2020

When using RAID1 and write-behind, md can deadlock when errors occur. With
write-behind, r1bio structs can be accounted by raid1 as queued but not
counted as pending. The pending count is dropped when the original bio is
returned complete but write-behind for the r1bio may still be active.

This breaks the accounting used in some conditions to know when the raid1
md device has reached an idle state. It can result in calls to
freeze_array deadlocking. freeze_array will never complete from a negative
"unqueued" value being calculated due to a queued count larger than the
pending count.

To properly account for write-behind, move the call to allow_barrier from
call_bio_endio to raid_end_bio_io. When using write-behind, md can call
call_bio_endio before all write-behind I/O is complete. Using
raid_end_bio_io for the point to call allow_barrier will release the
pending count at a point where all I/O for an r1bio, even write-behind, is
done.
Signed-off-by: NDavid Jeffery <djeffery@redhat.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

c91114c2

14 1月, 2020 5 次提交

md/raid1: introduce wait_for_serialization · d0d2d8ba