提交 · 760f83ea63a7b2c12b3b1e656b24cbb880b6b814 · openeuler / Kernel

29 4月, 2020 2 次提交

block: cleanup the memory stall accounting in submit_bio · 760f83ea

由 Christoph Hellwig 提交于 4月 28, 2020

Instead of a convoluted chain just check for REQ_OP_READ directly,
and keep all the memory stall code together in a single unlikely
branch.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

760f83ea

block: improve the submit_bio and generic_make_request documentation · 3fdd4086

由 Christoph Hellwig 提交于 4月 28, 2020

The current documentation is a little weird, as it doesn't clearly
explain which function to use, and also has the guts of the information
on generic_make_request, which is the internal interface for stacking
drivers.

Fix this up by properly documenting submit_bio, and only documenting
the differences and the use case for generic_make_request.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3fdd4086

25 4月, 2020 2 次提交

block: bypass ->make_request_fn for blk-mq drivers · 8cf7961d

由 Christoph Hellwig 提交于 4月 25, 2020

Call blk_mq_make_request when no ->make_request_fn is set.  This is
safe now that blk_alloc_queue always sets up the pointer for make_request
based drivers.  This avoids an indirect call in the blk-mq driver I/O
fast path, which is rather expensive due to spectre mitigations.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8cf7961d

block: remove create_io_context · 3e82c348

由 Christoph Hellwig 提交于 4月 25, 2020

create_io_context just has a single caller, which also happens to not
even use the return value.  Just open code it there.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3e82c348

23 4月, 2020 1 次提交

block: move dma_pad handling from blk_rq_map_sg into the callers · bdf8710d

由 Christoph Hellwig 提交于 4月 14, 2020

There are only two callers of blk_rq_map_sg/__blk_rq_map_sg that set
the dma_pad value in the queue.  Move the handling into those callers
instead of burdening the common code, and move the ->extra_len field
from struct request to struct scsi_cmnd.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bdf8710d

30 3月, 2020 1 次提交

block: return NULL in blk_alloc_queue() on error · 654a3667

由 Chaitanya Kulkarni 提交于 3月 29, 2020

This patch fixes follwoing warning:

block/blk-core.c: In function ‘blk_alloc_queue’:
block/blk-core.c:558:10: warning: returning ‘int’ from a function with return type ‘struct request_queue *’ makes pointer from integer without a cast [-Wint-conversion]
   return -EINVAL;

Fixes: 3d745ea5 ("block: simplify queue allocation")
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

654a3667

28 3月, 2020 1 次提交

block: simplify queue allocation · 3d745ea5

由 Christoph Hellwig 提交于 3月 27, 2020

Current make_request based drivers use either blk_alloc_queue_node or
blk_alloc_queue to allocate a queue, and then set up the make_request_fn
function pointer and a few parameters using the blk_queue_make_request
helper. Simplify this by passing the make_request pointer to
blk_alloc_queue, and while at it merge the _node variant into the main
helper by always passing a node_id, and remove the superfluous gfp_mask
parameter. A lower-level __blk_alloc_queue is kept for the blk-mq case.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3d745ea5

25 3月, 2020 2 次提交

block/diskstats: replace time_in_queue with sum of request times · 8cd5b8fc

由 Konstantin Khlebnikov 提交于 3月 25, 2020

Column "time_in_queue" in diskstats is supposed to show total waiting time
of all requests. I.e. value should be equal to the sum of times from other
columns. But this is not true, because column "time_in_queue" is counted
separately in jiffies rather than in nanoseconds as other times.

This patch removes redundant counter for "time_in_queue" and shows total
time of read, write, discard and flush requests.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8cd5b8fc

block/diskstats: more accurate approximation of io_ticks for slow disks · 2b8bd423

由 Konstantin Khlebnikov 提交于 3月 25, 2020

Currently io_ticks is approximated by adding one at each start and end of
requests if jiffies counter has changed. This works perfectly for requests
shorter than a jiffy or if one of requests starts/ends at each jiffy.

If disk executes just one request at a time and they are longer than two
jiffies then only first and last jiffies will be accounted.

Fix is simple: at the end of request add up into io_ticks jiffies passed
since last update rather than just one jiffy.

Example: common HDD executes random read 4k requests around 12ms.

fio --name=test --filename=/dev/sdb --rw=randread --direct=1 --runtime=30 &
iostat -x 10 sdb

Note changes of iostat's "%util" 8,43% -> 99,99% before/after patch:

Before:

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0,00 0,00 82,60 0,00 330,40 0,00 8,00 0,96 12,09 12,09 0,00 1,02 8,43

After:

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0,00 0,00 82,50 0,00 330,00 0,00 8,00 1,00 12,10 12,10 0,00 12,12 99,99

Now io_ticks does not loose time between start and end of requests, but
for queue-depth > 1 some I/O time between adjacent starts might be lost.

For load estimation "%util" is not as useful as average queue length,
but it clearly shows how often disk queue is completely empty.

Fixes: 5b18b5a7 ("block: delete part_round_stats and switch to less precise counting")
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2b8bd423

12 3月, 2020 4 次提交

block: cleanup for _blk/blk_rq_prep_clone · 361301a2

由 Guoqing Jiang 提交于 3月 09, 2020

Both cmd and sense had been moved to scsi_request, so remove
the related comments to avoid confusion.

And as Bart suggested, move _blk_rq_prep_clone into the only
caller (blk_rq_prep_clone).
Signed-off-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

361301a2

block: remove redundant setting of QUEUE_FLAG_DYING · fc4cc772

由 Guoqing Jiang 提交于 3月 09, 2020

Previously, blk_cleanup_queue has called blk_set_queue_dying to set the
flag, no need to do it again.
Signed-off-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fc4cc772

block: use bio_{wouldblock,io}_error in direct_make_request · 35ed78b3

由 Guoqing Jiang 提交于 3月 09, 2020

Use the two functions to simplify code.
Signed-off-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

35ed78b3

block: fix comment for blk_cloned_rq_check_limits · 0d720318

由 Guoqing Jiang 提交于 3月 09, 2020

Since the later description mentioned "checked against the new queue
limits", so make the change to avoid confusion.
Signed-off-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0d720318

02 3月, 2020 1 次提交

block: Remove used kblockd_schedule_work_on() · e959e540

由 Daniel Wagner 提交于 3月 02, 2020

Commit ee63cfa7 ("block: add kblockd_schedule_work_on()")
introduced the helper in 2016. Remove it because since then no caller
was added.

Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NDaniel Wagner <dwagner@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e959e540

18 12月, 2019 1 次提交

block: end bio with BLK_STS_AGAIN in case of non-mq devs and REQ_NOWAIT · c58c1f83

由 Roman Penyaev 提交于 12月 17, 2019

Non-mq devs do not honor REQ_NOWAIT so give a chance to the caller to repeat
request gracefully on -EAGAIN error.

The problem is well reproduced using io_uring:

   mkfs.ext4 /dev/ram0
   mount /dev/ram0 /mnt

   # Preallocate a file
   dd if=/dev/zero of=/mnt/file bs=1M count=1

   # Start fio with io_uring and get -EIO
   fio --rw=write --ioengine=io_uring --size=1M --direct=1 --name=job --filename=/mnt/file
Signed-off-by: NRoman Penyaev <rpenyaev@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c58c1f83

12 12月, 2019 1 次提交

block: fix NULL pointer dereference in account statistics with IDE · ecb6186c

由 Logan Gunthorpe 提交于 12月 10, 2019

The IDE driver creates some passthru requests which never get
submitted to the block layer in such a way that blk_account_io_start()
gets called. However, the driver still calls __blk_mq_end_request() in
ide_end_rq() which will call blk_account_io_completion() which tries
to dereferences req->part which is never set. See ide_prep_sense() for
an example of where these requests come from.

To fix this, blk_account_io_completion() and blk_account_io_done()
should do nothing if req->part is not set.

The back trace of this bug is:

    BUG: kernel NULL pointer dereference, address: 000002ac
    #PF: supervisor write access in kernel mode
    #PF: error_code(0x0002) - not-present page
    *pde = 00000000
    Oops: 0002 [#1]
    CPU: 0 PID: 237 Comm: kworker/0:1H Not tainted
    5.4.0-rc2-00011-g48d9b0d4 #1
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1
    04/01/2014
    Workqueue: kblockd drive_rq_insert_work
    EIP: blk_account_io_completion+0x7a/0xf0
    Code: 89 54 24 08 31 d2 89 4c 24 04 31 c9 c7 04 24 02 00 00 00 c1 ee
    09 e8 f5 21 a6 ff e8 70 5c a7 ff 8b 53 60 8d 04 bd 00 00 00 00 <01> b4
    02 ac 02 00 00 8b 9a 88 02 00 00 85 db 74 11 85 d2 74 51 8b
    EAX: 00000000 EBX: f5b80000 ECX: 00000000 EDX: 00000000
    ESI: 00000000 EDI: 00000000 EBP: f3031e70 ESP: f3031e54
    DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010046
    CR0: 80050033 CR2: 000002ac CR3: 03c25000 CR4: 000406d0
    Call Trace:
     <IRQ>
      blk_update_request+0x85/0x420
      ide_end_rq+0x38/0xa0
      ide_complete_rq+0x3d/0x70
      cdrom_newpc_intr+0x258/0xba0
      ide_intr+0x135/0x250
      __handle_irq_event_percpu+0x3e/0x250
      handle_irq_event_percpu+0x1f/0x50
      handle_irq_event+0x32/0x60
      handle_level_irq+0x6c/0x110
      handle_irq+0x72/0xa0
      </IRQ>
      do_IRQ+0x45/0xad
      common_interrupt+0x115/0x11c

Fixes: 48d9b0d4 ("block: account statistics for passthrough requests")
Reported-by: Nkernel test robot <rong.a.chen@intel.com>
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ecb6186c

10 12月, 2019 1 次提交

treewide: Use sizeof_field() macro · c593642c

由 Pankaj Bharadiya 提交于 12月 09, 2019

Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
at places where these are defined. Later patches will remove the unused
definition of FIELD_SIZEOF().

This patch is generated using following script:

EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
do

	if [[ "$file" =~ $EXCLUDE_FILES ]]; then
		continue
	fi
	sed -i  -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
done
Signed-off-by: NPankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.comCo-developed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NKees Cook <keescook@chromium.org>
Acked-by: David Miller <davem@davemloft.net> # for net

c593642c

13 11月, 2019 1 次提交

block: Remove partition support for zoned block devices · 5eac3eb3

由 Damien Le Moal 提交于 11月 11, 2019

No known partitioning tool supports zoned block devices, especially the
host managed flavor with strong sequential write constraints.
Furthermore, there are also no known user nor use cases for partitioned
zoned block devices.

This patch removes partition device creation for zoned block devices,
which allows simplifying the processing of zone commands for zoned
block devices. A warning is added if a partition table is found on the
device.

For report zones operations no zone sector information remapping is
necessary anymore, simplifying the code. Of note is that remapping of
zone reports for DM targets is still necessary as done by
dm_remap_zone_report().

Similarly, remaping of a zone reset bio is not necessary anymore.
Testing for the applicability of the zone reset all request also becomes
simpler and only needs to check that the number of sectors of the
requested zone range is equal to the disk capacity.
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5eac3eb3

07 11月, 2019 1 次提交

block: add zone open, close and finish operations · 6c1b1da5

由 Ajay Joshi 提交于 10月 27, 2019

Zoned block devices (ZBC and ZAC devices) allow an explicit control
over the condition (state) of zones. The operations allowed are:
* Open a zone: Transition to open condition to indicate that a zone will
  actively be written
* Close a zone: Transition to closed condition to release the drive
  resources used for writing to a zone
* Finish a zone: Transition an open or closed zone to the full
  condition to prevent write operations

To enable this control for in-kernel zoned block device users, define
the new request operations REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE
and REQ_OP_ZONE_FINISH as well as the generic function
blkdev_zone_mgmt() for submitting these operations on a range of zones.
This results in blkdev_reset_zones() removal and replacement with this
new zone magement function. Users of blkdev_reset_zones() (f2fs and
dm-zoned) are updated accordingly.

Contains contributions from Matias Bjorling, Hans Holmberg,
Dmitry Fomichev, Keith Busch, Damien Le Moal and Christoph Hellwig.
Reviewed-by: NJavier González <javier@javigon.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAjay Joshi <ajay.joshi@wdc.com>
Signed-off-by: NMatias Bjorling <matias.bjorling@wdc.com>
Signed-off-by: NHans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: NDmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6c1b1da5

07 10月, 2019 2 次提交

block: Reduce sysfs_lock locking inside blk_cleanup_queue() · 73f1c77e

由 Bart Van Assche 提交于 9月 30, 2019

Since blk_cleanup_queue() is called after blk_unregister_queue() and
since that last function removes all sysfs attributes, serializing
any code in blk_cleanup_queue() against sysfs callback methods nor against
I/O scheduler changes is necessary. Hence remove the syfs_lock locking
calls from the start of blk_cleanup_queue().

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

73f1c77e

block: Remove "dying" checks from sysfs callbacks · bae85c15

由 Bart Van Assche 提交于 9月 30, 2019

Block drivers must call del_gendisk() before blk_cleanup_queue().
del_gendisk() calls kobject_del() and kobject_del() waits until any
ongoing sysfs callback functions have finished. In other words, the
sysfs callback functions won't be called for a queue in the dying
state. Hence remove the "dying" checks from the sysfs callback
functions.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bae85c15

18 9月, 2019 1 次提交

block: centralize PI remapping logic to the block layer · 54d4e6ab

由 Max Gurtovoy 提交于 9月 16, 2019

Currently t10_pi_prepare/t10_pi_complete functions are called during the
NVMe and SCSi layers command preparetion/completion, but their actual
place should be the block layer since T10-PI is a general data integrity
feature that is used by block storage protocols. Introduce .prepare_fn
and .complete_fn callbacks within the integrity profile that each type
can implement according to its needs.
Suggested-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Suggested-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>

Fixed to not call queue integrity functions if BLK_DEV_INTEGRITY
isn't defined in the config.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

54d4e6ab

29 8月, 2019 1 次提交

block/rq_qos: add rq_qos_merge() · d3e65fff

由 Tejun Heo 提交于 8月 28, 2019

Add a merge hook for rq_qos.  This will be used by io.weight.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d3e65fff

28 8月, 2019 1 次提交

block: split .sysfs_lock into two locks · cecf5d87

由 Ming Lei 提交于 8月 27, 2019

The kernfs built-in lock of 'kn->count' is held in sysfs .show/.store
path. Meantime, inside block's .show/.store callback, q->sysfs_lock is
required.

However, when mq & iosched kobjects are removed via
blk_mq_unregister_dev() & elv_unregister_queue(), q->sysfs_lock is held
too. This way causes AB-BA lock because the kernfs built-in lock of
'kn-count' is required inside kobject_del() too, see the lockdep warning[1].

On the other hand, it isn't necessary to acquire q->sysfs_lock for
both blk_mq_unregister_dev() & elv_unregister_queue() because
clearing REGISTERED flag prevents storing to 'queue/scheduler'
from being happened. Also sysfs write(store) is exclusive, so no
necessary to hold the lock for elv_unregister_queue() when it is
called in switching elevator path.

So split .sysfs_lock into two: one is still named as .sysfs_lock for
covering sync .store, the other one is named as .sysfs_dir_lock
for covering kobjects and related status change.

sysfs itself can handle the race between add/remove kobjects and
showing/storing attributes under kobjects. For switching scheduler
via storing to 'queue/scheduler', we use the queue flag of
QUEUE_FLAG_REGISTERED with .sysfs_lock for avoiding the race, then
we can avoid to hold .sysfs_lock during removing/adding kobjects.

[1]  lockdep warning
    ======================================================
    WARNING: possible circular locking dependency detected
    5.3.0-rc3-00044-g73277fc75ea0 #1380 Not tainted
    ------------------------------------------------------
    rmmod/777 is trying to acquire lock:
    00000000ac50e981 (kn->count#202){++++}, at: kernfs_remove_by_name_ns+0x59/0x72

    but task is already holding lock:
    00000000fb16ae21 (&q->sysfs_lock){+.+.}, at: blk_unregister_queue+0x78/0x10b

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&q->sysfs_lock){+.+.}:
           __lock_acquire+0x95f/0xa2f
           lock_acquire+0x1b4/0x1e8
           __mutex_lock+0x14a/0xa9b
           blk_mq_hw_sysfs_show+0x63/0xb6
           sysfs_kf_seq_show+0x11f/0x196
           seq_read+0x2cd/0x5f2
           vfs_read+0xc7/0x18c
           ksys_read+0xc4/0x13e
           do_syscall_64+0xa7/0x295
           entry_SYSCALL_64_after_hwframe+0x49/0xbe

    -> #0 (kn->count#202){++++}:
           check_prev_add+0x5d2/0xc45
           validate_chain+0xed3/0xf94
           __lock_acquire+0x95f/0xa2f
           lock_acquire+0x1b4/0x1e8
           __kernfs_remove+0x237/0x40b
           kernfs_remove_by_name_ns+0x59/0x72
           remove_files+0x61/0x96
           sysfs_remove_group+0x81/0xa4
           sysfs_remove_groups+0x3b/0x44
           kobject_del+0x44/0x94
           blk_mq_unregister_dev+0x83/0xdd
           blk_unregister_queue+0xa0/0x10b
           del_gendisk+0x259/0x3fa
           null_del_dev+0x8b/0x1c3 [null_blk]
           null_exit+0x5c/0x95 [null_blk]
           __se_sys_delete_module+0x204/0x337
           do_syscall_64+0xa7/0x295
           entry_SYSCALL_64_after_hwframe+0x49/0xbe

    other info that might help us debug this:

     Possible unsafe locking scenario:

           CPU0                    CPU1
           ----                    ----
      lock(&q->sysfs_lock);
                                   lock(kn->count#202);
                                   lock(&q->sysfs_lock);
      lock(kn->count#202);

     *** DEADLOCK ***

    2 locks held by rmmod/777:
     #0: 00000000e69bd9de (&lock){+.+.}, at: null_exit+0x2e/0x95 [null_blk]
     #1: 00000000fb16ae21 (&q->sysfs_lock){+.+.}, at: blk_unregister_queue+0x78/0x10b

    stack backtrace:
    CPU: 0 PID: 777 Comm: rmmod Not tainted 5.3.0-rc3-00044-g73277fc75ea0 #1380
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20180724_192412-buildhw-07.phx4
    Call Trace:
     dump_stack+0x9a/0xe6
     check_noncircular+0x207/0x251
     ? print_circular_bug+0x32a/0x32a
     ? find_usage_backwards+0x84/0xb0
     check_prev_add+0x5d2/0xc45
     validate_chain+0xed3/0xf94
     ? check_prev_add+0xc45/0xc45
     ? mark_lock+0x11b/0x804
     ? check_usage_forwards+0x1ca/0x1ca
     __lock_acquire+0x95f/0xa2f
     lock_acquire+0x1b4/0x1e8
     ? kernfs_remove_by_name_ns+0x59/0x72
     __kernfs_remove+0x237/0x40b
     ? kernfs_remove_by_name_ns+0x59/0x72
     ? kernfs_next_descendant_post+0x7d/0x7d
     ? strlen+0x10/0x23
     ? strcmp+0x22/0x44
     kernfs_remove_by_name_ns+0x59/0x72
     remove_files+0x61/0x96
     sysfs_remove_group+0x81/0xa4
     sysfs_remove_groups+0x3b/0x44
     kobject_del+0x44/0x94
     blk_mq_unregister_dev+0x83/0xdd
     blk_unregister_queue+0xa0/0x10b
     del_gendisk+0x259/0x3fa
     ? disk_events_poll_msecs_store+0x12b/0x12b
     ? check_flags+0x1ea/0x204
     ? mark_held_locks+0x1f/0x7a
     null_del_dev+0x8b/0x1c3 [null_blk]
     null_exit+0x5c/0x95 [null_blk]
     __se_sys_delete_module+0x204/0x337
     ? free_module+0x39f/0x39f
     ? blkcg_maybe_throttle_current+0x8a/0x718
     ? rwlock_bug+0x62/0x62
     ? __blkcg_punt_bio_submit+0xd0/0xd0
     ? trace_hardirqs_on_thunk+0x1a/0x20
     ? mark_held_locks+0x1f/0x7a
     ? do_syscall_64+0x4c/0x295
     do_syscall_64+0xa7/0x295
     entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7fb696cdbe6b
    Code: 73 01 c3 48 8b 0d 1d 20 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 008
    RSP: 002b:00007ffec9588788 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
    RAX: ffffffffffffffda RBX: 0000559e589137c0 RCX: 00007fb696cdbe6b
    RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559e58913828
    RBP: 0000000000000000 R08: 00007ffec9587701 R09: 0000000000000000
    R10: 00007fb696d4eae0 R11: 0000000000000206 R12: 00007ffec95889b0
    R13: 00007ffec95896b3 R14: 0000559e58913260 R15: 0000559e589137c0

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cecf5d87

19 8月, 2019 1 次提交

block: remove struct request_queue queue_head · 988721db

由 Junxiao Bi 提交于 8月 16, 2019

The dispatch list is not used any more, as the legacy block IO stack
has been removed.
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

988721db

14 8月, 2019 1 次提交

block: annotate refault stalls from IO submission · b8e24a93

由 Johannes Weiner 提交于 8月 08, 2019

psi tracks the time tasks wait for refaulting pages to become
uptodate, but it does not track the time spent submitting the IO. The
submission part can be significant if backing storage is contended or
when cgroup throttling (io.latency) is in effect - a lot of time is
spent in submit_bio(). In that case, we underreport memory pressure.

Annotate submit_bio() to account submission time as memory stall when
the bio is reading userspace workingset pages.
Tested-by: NSuren Baghdasaryan <surenb@google.com>
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b8e24a93

05 8月, 2019 2 次提交

blk-zoned: implement REQ_OP_ZONE_RESET_ALL · 6e33dbf2

由 Chaitanya Kulkarni 提交于 8月 01, 2019

This implements REQ_OP_ZONE_RESET_ALL as a special case of the block
device zone reset operations where we just simply issue bio with the
newly introduced req op.

We issue this req op when the number of sectors is equal to the device's
partition's number of sectors and device has no partitions.

We also add support so that blk_op_str() can print the new reset-all
zone operation.

This patch also adds a generic make request check for newly
introduced REQ_OP_ZONE_RESET_ALL req_opf. We simply return error
when queue is zoned and reset-all flag is not set for
REQ_OP_ZONE_RESET_ALL.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6e33dbf2

block: Fix a comment in blk_cleanup_queue() · 67ed8b73

由 Bart Van Assche 提交于 8月 01, 2019

Change a reference to the legacy block layer into a reference to blk-mq.
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

67ed8b73

11 7月, 2019 1 次提交

block: Disable write plugging for zoned block devices · b49773e7

由 Damien Le Moal 提交于 7月 11, 2019

Simultaneously writing to a sequential zone of a zoned block device
from multiple contexts requires mutual exclusion for BIO issuing to
ensure that writes happen sequentially. However, even for a well
behaved user correctly implementing such synchronization, BIO plugging
may interfere and result in BIOs from the different contextx to be
reordered if plugging is done outside of the mutual exclusion section,
e.g. the plug was started by a function higher in the call chain than
the function issuing BIOs.

         Context A                     Context B

   | blk_start_plug()
   | ...
   | seq_write_zone()
     | mutex_lock(zone)
     | bio-0->bi_iter.bi_sector = zone->wp
     | zone->wp += bio_sectors(bio-0)
     | submit_bio(bio-0)
     | bio-1->bi_iter.bi_sector = zone->wp
     | zone->wp += bio_sectors(bio-1)
     | submit_bio(bio-1)
     | mutex_unlock(zone)
     | return
   | -----------------------> | seq_write_zone()
  				| mutex_lock(zone)
     				| bio-2->bi_iter.bi_sector = zone->wp
     				| zone->wp += bio_sectors(bio-2)
				| submit_bio(bio-2)
				| mutex_unlock(zone)
   | <------------------------- |
   | blk_finish_plug()

In the above example, despite the mutex synchronization ensuring the
correct BIO issuing order 0, 1, 2, context A BIOs 0 and 1 end up being
issued after BIO 2 of context B, when the plug is released with
blk_finish_plug().

While this problem can be addressed using the blk_flush_plug_list()
function (in the above example, the call must be inserted before the
zone mutex lock is released), a simple generic solution in the block
layer avoid this additional code in all zoned block device user code.
The simple generic solution implemented with this patch is to introduce
the internal helper function blk_mq_plug() to access the current
context plug on BIO submission. This helper returns the current plug
only if the target device is not a zoned block device or if the BIO to
be plugged is not a write operation. Otherwise, the caller context plug
is ignored and NULL returned, resulting is all writes to zoned block
device to never be plugged.
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b49773e7

10 7月, 2019 2 次提交

blkcg: implement REQ_CGROUP_PUNT · d3f77dfd

由 Tejun Heo 提交于 6月 27, 2019

When a shared kthread needs to issue a bio for a cgroup, doing so
synchronously can lead to priority inversions as the kthread can be
trapped waiting for that cgroup.  This patch implements
REQ_CGROUP_PUNT flag which makes submit_bio() punt the actual issuing
to a dedicated per-blkcg work item to avoid such priority inversions.

This will be used to fix priority inversions in btrfs compression and
should be generally useful as we grow filesystem support for
comprehensive IO control.

Cc: Chris Mason <clm@fb.com>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d3f77dfd

block: init flush rq ref count to 1 · b554db14

由 Josef Bacik 提交于 3月 07, 2019

We discovered a problem in newer kernels where a disconnect of a NBD
device while the flush request was pending would result in a hang.  This
is because the blk mq timeout handler does

        if (!refcount_inc_not_zero(&rq->ref))
                return true;

to determine if it's ok to run the timeout handler for the request.
Flush_rq's don't have a ref count set, so we'd skip running the timeout
handler for this request and it would just sit there in limbo forever.

Fix this by always setting the refcount of any request going through
blk_init_rq() to 1.  I tested this with a nbd-server that dropped flush
requests to verify that it hung, and then tested with this patch to
verify I got the timeout as expected and the error handling kicked in.
Thanks,
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b554db14

21 6月, 2019 8 次提交

block: update print_req_error() · b0e5168a

由 Chaitanya Kulkarni 提交于 6月 20, 2019

Improve the print_req_error with additional request fields which are
helpful for debugging. Use newly introduced blk_op_str() to print the
REQ_OP_XXX in the string format.
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b0e5168a

block: add centralize REQ_OP_XXX to string helper · e47bc4ed

由 Chaitanya Kulkarni 提交于 6月 20, 2019

In order to centralize the REQ_OP_XXX to string conversion which can be
used in the block layer and different places in the kernel like f2fs,
this patch adds a new helper function along with an array similar to the
one present in the blk-mq-debugfs.c.

We keep this helper functionality centralize under blk-core.c instead of
blk-mq-debugfs.c since blk-core.c is configured using CONFIG_BLOCK and
it will not be dependent on blk-mq-debugfs.c which is configured using
CONFIG_BLK_DEBUG_FS.

Next patch adjusts the code in the blk-mq-debugfs.c with newly
introduced helper.
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e47bc4ed

block: improve print_req_error · 178cc590

由 Christoph Hellwig 提交于 6月 20, 2019

Print the calling function instead of print_req_error as a prefix, and
print the operation and op_flags separately instead of the whole field.
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

178cc590

block: mark blk_rq_bio_prep as inline · 1aa0a133

由 Christoph Hellwig 提交于 6月 06, 2019

This function just has a few trivial assignments, has two callers with
one of them being in the fastpath.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1aa0a133

block: simplify blk_recalc_rq_segments · e9cd19c0

由 Christoph Hellwig 提交于 6月 06, 2019

Return the segement and let the callers assign them, which makes the code
a littler more obvious.  Also pass the request instead of q plus bio
chain, allowing for the use of rq_for_each_bvec.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e9cd19c0

block: remove the bi_phys_segments field in struct bio · 14ccb66b

由 Christoph Hellwig 提交于 6月 06, 2019

We only need the number of segments in the blk-mq submission path.
Remove the field from struct bio, and return it from a variant of
blk_queue_split instead of that it can passed as an argument to
those functions that need the value.

This also means we stop recounting segments except for cloning
and partial segments.

To keep the number of arguments in this how path down remove
pointless struct request_queue arguments from any of the functions
that had it and grew a nr_segs argument.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

14ccb66b

block: remove blk_init_request_from_bio · f924cdde

由 Christoph Hellwig 提交于 6月 06, 2019

lightnvm should have never used this function, as it is sending
passthrough requests, so switch it to blk_rq_append_bio like all the
other passthrough request users.  Inline blk_init_request_from_bio into
the only remaining caller.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: NJavier González <javier@javigon.com>
Reviewed-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f924cdde

block: initialize the write priority in blk_rq_bio_prep · 0c8cf8c2

由 Christoph Hellwig 提交于 6月 06, 2019

The priority field also makes sense for passthrough requests, so
initialize it in blk_rq_bio_prep.
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0c8cf8c2

20 6月, 2019 1 次提交

blk-core: Remove blk_end_request*() declarations · 3a211b71

由 Pavel Begunkov 提交于 5月 23, 2019

Commit a1ce35fa ("block: remove dead elevator code")
deleted blk_end_request() and friends, but some declaration are still
left. Purge them.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3a211b71

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功