提交 · 52f019d43c229afd65dc11c8c1b05b6436bf6765 · openeuler / Kernel

25 1月, 2021 1 次提交

block: add a hard-readonly flag to struct gendisk · 52f019d4

由 Christoph Hellwig 提交于 1月 09, 2021

Commit 20bd1d02 ("scsi: sd: Keep disk read-only when re-reading
partition") addressed a long-standing problem with user read-only
policy being overridden as a result of a device-initiated revalidate.
The commit has since been reverted due to a regression that left some
USB devices read-only indefinitely.

To fix the underlying problems with revalidate we need to keep track
of hardware state and user policy separately.

The gendisk has been updated to reflect the current hardware state set
by the device driver. This is done to allow returning the device to
the hardware state once the user clears the BLKROSET flag.

The resulting semantics are as follows:

- If BLKROSET sets a given partition read-only, that partition will
remain read-only even if the underlying storage stack initiates a
revalidate. However, the BLKRRPART ioctl will cause the partition
table to be dropped and any user policy on partitions will be lost.

- If BLKROSET has not been set, both the whole disk device and any
partitions will reflect the current write-protect state of the
underlying device.

Based on a patch from Martin K. Petersen <martin.petersen@oracle.com>.
Reported-by: NOleksii Kurochko <olkuroch@cisco.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201221Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

52f019d4

10 12月, 2020 3 次提交

scsi: block: Do not accept any requests while suspended · 52abca64

由 Alan Stern 提交于 12月 08, 2020

blk_queue_enter() accepts BLK_MQ_REQ_PM requests independent of the runtime
power management state. Now that SCSI domain validation no longer depends
on this behavior, modify the behavior of blk_queue_enter() as follows:

   - Do not accept any requests while suspended.

   - Only process power management requests while suspending or resuming.

Submitting BLK_MQ_REQ_PM requests to a device that is runtime suspended
causes runtime-suspended devices not to resume as they should. The request
which should cause a runtime resume instead gets issued directly, without
resuming the device first. Of course the device can't handle it properly,
the I/O fails, and the device remains suspended.

The problem is fixed by checking that the queue's runtime-PM status isn't
RPM_SUSPENDED before allowing a request to be issued, and queuing a
runtime-resume request if it is.  In particular, the inline
blk_pm_request_resume() routine is renamed blk_pm_resume_queue() and the
code is unified by merging the surrounding checks into the routine.  If the
queue isn't set up for runtime PM, or there currently is no restriction on
allowed requests, the request is allowed.  Likewise if the BLK_MQ_REQ_PM
flag is set and the status isn't RPM_SUSPENDED.  Otherwise a runtime resume
is queued and the request is blocked until conditions are more suitable.

[ bvanassche: modified commit message and removed Cc: stable because
  without the previous patches from this series this patch would break
  parallel SCSI domain validation + introduced queue_rpm_status() ]

Link: https://lore.kernel.org/r/20201209052951.16136-9-bvanassche@acm.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Can Guo <cang@codeaurora.org>
Cc: Stanley Chu <stanley.chu@mediatek.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reported-and-tested-by: NMartin Kepplinger <martin.kepplinger@puri.sm>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NCan Guo <cang@codeaurora.org>
Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

52abca64

scsi: block: Remove RQF_PREEMPT and BLK_MQ_REQ_PREEMPT · a4d34da7

由 Bart Van Assche 提交于 12月 08, 2020

Remove flag RQF_PREEMPT and BLK_MQ_REQ_PREEMPT since these are no longer
used by any kernel code.

Link: https://lore.kernel.org/r/20201209052951.16136-8-bvanassche@acm.org
Cc: Can Guo <cang@codeaurora.org>
Cc: Stanley Chu <stanley.chu@mediatek.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Martin Kepplinger <martin.kepplinger@puri.sm>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NCan Guo <cang@codeaurora.org>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

a4d34da7

scsi: block: Introduce BLK_MQ_REQ_PM · 0854bcdc

由 Bart Van Assche 提交于 12月 08, 2020

Introduce the BLK_MQ_REQ_PM flag. This flag makes the request allocation
functions set RQF_PM. This is the first step towards removing
BLK_MQ_REQ_PREEMPT.

Link: https://lore.kernel.org/r/20201209052951.16136-3-bvanassche@acm.org
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Stanley Chu <stanley.chu@mediatek.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Can Guo <cang@codeaurora.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NCan Guo <cang@codeaurora.org>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

0854bcdc

05 12月, 2020 2 次提交

block: remove the request_queue argument to the block_bio_remap tracepoint · 1c02fca6

由 Christoph Hellwig 提交于 12月 03, 2020

The request_queue can trivially be derived from the bio.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1c02fca6

block: simplify and extend the block_bio_merge tracepoint class · e8a676d6

由 Christoph Hellwig 提交于 12月 03, 2020

The block_bio_merge tracepoint class can be reused for most bio-based
tracepoints.  For that it just needs to lose the superfluous q and rq
parameters.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e8a676d6

02 12月, 2020 7 次提交

block: switch partition lookup to use struct block_device · 8446fe92

由 Christoph Hellwig 提交于 11月 24, 2020

Use struct block_device to lookup partitions on a disk.  This removes
all usage of struct hd_struct from the I/O path.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Acked-by: Coly Li <colyli@suse.de>			[bcache]
Acked-by: Chao Yu <yuchao0@huawei.com>			[f2fs]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8446fe92

block: allocate struct hd_struct as part of struct bdev_inode · cb8432d6

由 Christoph Hellwig 提交于 11月 26, 2020

Allocate hd_struct together with struct block_device to pre-load
the lifetime rule changes in preparation of merging the two structures.

Note that part0 was previously embedded into struct gendisk, but is
a separate allocation now, and already points to the block_device instead
of the hd_struct.  The lifetime of struct gendisk is still controlled by
the struct device embedded in the part0 hd_struct.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cb8432d6

block: move the policy field to struct block_device · 83950d35

由 Christoph Hellwig 提交于 11月 23, 2020

Move the policy field to struct block_device and rename it to the
more descriptive bd_read_only.  Also turn the field into a bool as it
is used as such.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

83950d35

block: move make_it_fail to struct block_device · b309e993

由 Christoph Hellwig 提交于 11月 23, 2020

Move the make_it_fail flag to struct block_device an turn it into a bool
in preparation of killing struct hd_struct.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b309e993

block: move the start_sect field to struct block_device · 29ff57c6

由 Christoph Hellwig 提交于 11月 24, 2020

Move the start_sect field to struct block_device in preparation
of killing struct hd_struct.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

29ff57c6

block: move disk stat accounting to struct block_device · 15e3d2c5

由 Christoph Hellwig 提交于 11月 24, 2020

Move the dkstats and stamp field to struct block_device in preparation
of killing struct hd_struct.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

15e3d2c5

block: remove the nr_sects field in struct hd_struct · a782483c

由 Christoph Hellwig 提交于 11月 26, 2020

Now that the hd_struct always has a block device attached to it, there is
no need for having two size field that just get out of sync.

Additionally the field in hd_struct did not use proper serialization,
possibly allowing for torn writes.  By only using the block_device field
this problem also gets fixed.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Acked-by: Coly Li <colyli@suse.de>			[bcache]
Acked-by: Chao Yu <yuchao0@huawei.com>			[f2fs]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a782483c

14 10月, 2020 1 次提交

block: add zone specific block statuses · 3b481d91

由 Keith Busch 提交于 9月 24, 2020

A zoned device with limited resources to open or activate zones may
return an error when the host exceeds those limits. The same command may
be successful if retried later, but the host needs to wait for specific
zone states before it should expect a retry to succeed. Have the block
layer provide an appropriate status for these conditions so applications
can distinuguish this error for special handling.

Cc: linux-api@vger.kernel.org
Cc: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3b481d91

09 10月, 2020 1 次提交

block: ratelimit handle_bad_sector() message · f4ac712e

由 Tetsuo Handa 提交于 10月 08, 2020

syzbot is reporting unkillable task [1], for the caller is failing to
handle a corrupted filesystem image which attempts to access beyond
the end of the device. While we need to fix the caller, flooding the
console with handle_bad_sector() message is unlikely useful.

[1] https://syzkaller.appspot.com/bug?id=f1f49fb971d7a3e01bd8ab8cff2ff4572ccf3092Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f4ac712e

06 10月, 2020 1 次提交

block: make blk_crypto_rq_bio_prep() able to fail · 93f221ae

由 Eric Biggers 提交于 9月 15, 2020

blk_crypto_rq_bio_prep() assumes its gfp_mask argument always includes
__GFP_DIRECT_RECLAIM, so that the mempool_alloc() will always succeed.

However, blk_crypto_rq_bio_prep() might be called with GFP_ATOMIC via
setup_clone() in drivers/md/dm-rq.c.

This case isn't currently reachable with a bio that actually has an
encryption context.  However, it's fragile to rely on this.  Just make
blk_crypto_rq_bio_prep() able to fail.
Suggested-by: NSatya Tangirala <satyat@google.com>
Signed-off-by: NEric Biggers <ebiggers@google.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NSatya Tangirala <satyat@google.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

93f221ae

25 9月, 2020 3 次提交

block: add QUEUE_FLAG_NOWAIT · 021a2446

由 Mike Snitzer 提交于 9月 23, 2020

Add QUEUE_FLAG_NOWAIT to allow a block device to advertise support for
REQ_NOWAIT. Bio-based devices may set QUEUE_FLAG_NOWAIT where
applicable.

Update QUEUE_FLAG_MQ_DEFAULT to include QUEUE_FLAG_NOWAIT.  Also
update submit_bio_checks() to verify it is set for REQ_NOWAIT bios.
Reported-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Suggested-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

021a2446

bdi: remove BDI_CAP_CGROUP_WRITEBACK · ed7b6b4f

由 Christoph Hellwig 提交于 9月 24, 2020

Just checking SB_I_CGROUPWB for cgroup writeback support is enough.
Either the file system allocates its own bdi (e.g. btrfs), in which case
it is known to support cgroup writeback, or the bdi comes from the block
layer, which always supports cgroup writeback.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ed7b6b4f

bdi: initialize ->ra_pages and ->io_pages in bdi_init · 55b2598e

由 Christoph Hellwig 提交于 9月 24, 2020

Set up a readahead size by default, as very few users have a good
reason to change it.  This means code, ecryptfs, and orangefs now
set up the values while they were previously missing it, while ubifs,
mtd and vboxsf manually set it to 0 to avoid readahead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Acked-by: Richard Weinberger <richard@nod.at> [ubifs, mtd]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

55b2598e

12 9月, 2020 1 次提交

block: introduce part_[begin|end]_io_acct · 7b26410b

由 Song Liu 提交于 8月 31, 2020

These functions can be used to enable iostat for partitions on devices
like md, bcache.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7b26410b

04 9月, 2020 1 次提交

blk-mq: Record nr_active_requests per queue for when using shared sbitmap · bccf5e26

由 John Garry 提交于 8月 19, 2020

The per-hctx nr_active value can no longer be used to fairly assign a share
of tag depth per request queue for when using a shared sbitmap, as it does
not consider that the tags are shared tags over all hctx's.

For this case, record the nr_active_requests per request_queue, and make
the judgement based on that value.

Co-developed-with: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used
Tested-by: NDouglas Gilbert <dgilbert@interlog.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bccf5e26

02 9月, 2020 4 次提交

block: better deal with the delayed not supported case in blk_cloned_rq_check_limits · 8327cce5

由 Ritika Srivastava 提交于 9月 01, 2020

If WRITE_ZERO/WRITE_SAME operation is not supported by the storage,
blk_cloned_rq_check_limits() will return IO error which will cause
device-mapper to fail the paths.

Instead, if the queue limit is set to 0, return BLK_STS_NOTSUPP.
BLK_STS_NOTSUPP will be ignored by device-mapper and will not fail the
paths.
Suggested-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NRitika Srivastava <ritika.srivastava@oracle.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8327cce5

block: Return blk_status_t instead of errno codes · 143d2600

由 Ritika Srivastava 提交于 9月 01, 2020

Replace returning legacy errno codes with blk_status_t in
blk_cloned_rq_check_limits().
Signed-off-by: NRitika Srivastava <ritika.srivastava@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

143d2600

blk-mq: use BLK_MQ_NO_TAG for no tag · e44a6a23

由 Xianting Tian 提交于 8月 27, 2020

Replace various magic -1 constants for tags with BLK_MQ_NO_TAG.
Signed-off-by: NXianting Tian <tian.xianting@h3c.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e44a6a23

block: Move bio merge related functions into blk-merge.c · 8e756373

由 Baolin Wang 提交于 8月 28, 2020

It's better to move bio merge related functions into blk-merge.c,
which contains all merge related functions.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8e756373

01 9月, 2020 1 次提交

block: ensure bdi->io_pages is always initialized · de1b0ee4

由 Jens Axboe 提交于 8月 31, 2020

If a driver leaves the limit settings as the defaults, then we don't
initialize bdi->io_pages. This means that file systems may need to
work around bdi->io_pages == 0, which is somewhat messy.

Initialize the default value just like we do for ->ra_pages.

Cc: stable@vger.kernel.org
Fixes: 9491ae4a ("mm: don't cap request size based on read-ahead setting")
Reported-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

de1b0ee4

08 7月, 2020 1 次提交

block: remove a bogus warning in __submit_bio_noacct_mq · 0e6e255e

由 Christoph Hellwig 提交于 7月 07, 2020

If blk_mq_submit_bio flushes the plug list, bios for other disks can
show up on current->bio_list.  As that doesn't involve any stacking of
block device it is entirely harmless and we should not warn about
this case.

Fixes: ff93ea0c ("block: shortcut __submit_bio_noacct for blk-mq drivers")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0e6e255e

03 7月, 2020 1 次提交

block: initialize current->bio_list[1] in __submit_bio_noacct_mq · 7c792f33

由 Christoph Hellwig 提交于 7月 02, 2020

bio_alloc_bioset references current->bio_list[1], so we need to
initialize it for the blk-mq submission path as well.

Fixes: ff93ea0c ("block: shortcut __submit_bio_noacct for blk-mq drivers")
Reported-by: NQian Cai <cai@lca.pw>
Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7c792f33

01 7月, 2020 8 次提交

block: remove direct_make_request · 5a6c35f9

由 Christoph Hellwig 提交于 7月 01, 2020

Now that submit_bio_noacct has a decent blk-mq fast path there is no
more need for this bypass.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5a6c35f9

block: shortcut __submit_bio_noacct for blk-mq drivers · ff93ea0c

由 Christoph Hellwig 提交于 7月 01, 2020

For blk-mq drivers bios can only be inserted for the same queue.  So
bypass the complicated sorting logic in __submit_bio_noacct with
a blk-mq simpler submission helper.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ff93ea0c

block: refator submit_bio_noacct · 566acf2d

由 Christoph Hellwig 提交于 7月 01, 2020

Split out a __submit_bio_noacct helper for the actual de-recursion
algorithm, and simplify the loop by using a continue when we can't
enter the queue for a bio.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

566acf2d

block: rename generic_make_request to submit_bio_noacct · ed00aabd

由 Christoph Hellwig 提交于 7月 01, 2020

generic_make_request has always been very confusingly misnamed, so rename
it to submit_bio_noacct to make it clear that it is submit_bio minus
accounting and a few checks.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ed00aabd

block: move ->make_request_fn to struct block_device_operations · c62b37d9

由 Christoph Hellwig 提交于 7月 01, 2020

The make_request_fn is a little weird in that it sits directly in
struct request_queue instead of an operation vector.  Replace it with
a block_device_operations method called submit_bio (which describes much
better what it does).  Also remove the request_queue argument to it, as
the queue can be derived pretty trivially from the bio.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c62b37d9

block: remove the nr_sectors variable in generic_make_request_checks · e439ab71

由 Christoph Hellwig 提交于 7月 01, 2020

The variable is only used once, so just open code the bio_sector()
there.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e439ab71

block: remove the NULL queue check in generic_make_request_checks · 833f84e2

由 Christoph Hellwig 提交于 7月 01, 2020

All registers disks must have a valid queue pointer, so don't bother to
log a warning for that case.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

833f84e2

block: tidy up a warning in bio_check_ro · c8178674

由 Christoph Hellwig 提交于 7月 01, 2020

The "generic_make_request: " prefix has no value, and will soon become
stale.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c8178674

29 6月, 2020 1 次提交

blk-cgroup: remove blkcg_bio_issue_check · db18a53e

由 Christoph Hellwig 提交于 6月 27, 2020

blkcg_bio_issue_check is a giant inline function that does three entirely
different things.  Factor out the blk-cgroup related bio initalization
into a new helper, and the open code the sequence in the only caller,
relying on the fact that all the actual functionality is stubbed out for
non-cgroup builds.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

db18a53e

24 6月, 2020 3 次提交

block: create the request_queue debugfs_dir on registration · 85e0cbbb

由 Luis Chamberlain 提交于 6月 19, 2020

We were only creating the request_queue debugfs_dir only
for make_request block drivers (multiqueue), but never for
request-based block drivers. We did this as we were only
creating non-blktrace additional debugfs files on that directory
for make_request drivers. However, since blktrace *always* creates
that directory anyway, we special-case the use of that directory
on blktrace. Other than this being an eye-sore, this exposes
request-based block drivers to the same debugfs fragile
race that used to exist with make_request block drivers
where if we start adding files onto that directory we can later
run a race with a double removal of dentries on the directory
if we don't deal with this carefully on blktrace.

Instead, just simplify things by always creating the request_queue
debugfs_dir on request_queue registration. Rename the mutex also to
reflect the fact that this is used outside of the blktrace context.
Signed-off-by: NLuis Chamberlain <mcgrof@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

85e0cbbb

block: revert back to synchronous request_queue removal · e8c7d14a

由 Luis Chamberlain 提交于 6月 19, 2020

Commit dc9edc44 ("block: Fix a blk_exit_rl() regression") merged on
v4.12 moved the work behind blk_release_queue() into a workqueue after a
splat floated around which indicated some work on blk_release_queue()
could sleep in blk_exit_rl(). This splat would be possible when a driver
called blk_put_queue() or blk_cleanup_queue() (which calls blk_put_queue()
as its final call) from an atomic context.

blk_put_queue() decrements the refcount for the request_queue kobject, and
upon reaching 0 blk_release_queue() is called. Although blk_exit_rl() is
now removed through commit db6d9952 ("block: remove request_list code")
on v5.0, we reserve the right to be able to sleep within
blk_release_queue() context.

The last reference for the request_queue must not be called from atomic
context. *When* the last reference to the request_queue reaches 0 varies,
and so let's take the opportunity to document when that is expected to
happen and also document the context of the related calls as best as
possible so we can avoid future issues, and with the hopes that the
synchronous request_queue removal sticks.

We revert back to synchronous request_queue removal because asynchronous
removal creates a regression with expected userspace interaction with
several drivers. An example is when removing the loopback driver, one
uses ioctls from userspace to do so, but upon return and if successful,
one expects the device to be removed. Likewise if one races to add another
device the new one may not be added as it is still being removed. This was
expected behavior before and it now fails as the device is still present
and busy still. Moving to asynchronous request_queue removal could have
broken many scripts which relied on the removal to have been completed if
there was no error. Document this expectation as well so that this
doesn't regress userspace again.

Using asynchronous request_queue removal however has helped us find
other bugs. In the future we can test what could break with this
arrangement by enabling CONFIG_DEBUG_KOBJECT_RELEASE.

While at it, update the docs with the context expectations for the
request_queue / gendisk refcount decrement, and make these
expectations explicit by using might_sleep().

Fixes: dc9edc44 ("block: Fix a blk_exit_rl() regression")
Suggested-by: NNicolai Stange <nstange@suse.de>
Signed-off-by: NLuis Chamberlain <mcgrof@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: yu kuai <yukuai3@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e8c7d14a

block: clarify context for refcount increment helpers · 763b5892

由 Luis Chamberlain 提交于 6月 19, 2020

Let us clarify the context under which the helpers to increment the
refcount for the gendisk and request_queue can be called under. We
make this explicit on the places where we may sleep with might_sleep().

We don't address the decrement context yet, as that needs some extra
work and fixes, but will be addressed in the next patch.
Signed-off-by: NLuis Chamberlain <mcgrof@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

763b5892

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功