提交 · 3a605e32a7f8f78d844b4272c257029c337a4352 · openeuler / Kernel

23 12月, 2021 3 次提交

nvme: drop unused variable ctrl in nvme_setup_cmd · 3a605e32

由 Geliang Tang 提交于 12月 22, 2021

The variable 'ctrl' became useless since the code using it was dropped
from nvme_setup_cmd() in the commit 292ddf67bbd5 ("nvme: increment
request genctr on completion"). Fix it to get rid of this compilation
warning in the nvme-5.17 branch:

 drivers/nvme/host/core.c: In function ‘nvme_setup_cmd’:
 drivers/nvme/host/core.c:993:20: warning: unused variable ‘ctrl’ [-Wunused-variable]
   struct nvme_ctrl *ctrl = nvme_req(req)->ctrl;
                     ^~~~

Fixes: 292ddf67bbd5 ("nvme: increment request genctr on completion")
Signed-off-by: NGeliang Tang <geliang.tang@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

3a605e32

nvme: increment request genctr on completion · e4fdb2b1

由 Keith Busch 提交于 12月 13, 2021

The nvme request generation counter is intended to catch duplicate
completions. Incrementing the counter on submission means duplicates can
only be caught if the request tag is reallocated and dispatched prior to
the driver observing the corrupted CQE. Incrementing on completion
removes this window, making it possible to detect duplicate completions
in consecutive entries.
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e4fdb2b1

nvme-fabrics: print out valid arguments when reading from /dev/nvme-fabrics · f18ee3d9

由 Hannes Reinecke 提交于 12月 07, 2021

Currently applications have a hard time figuring out which
nvme-over-fabrics arguments are supported for any given kernel;
the ioctl will return an error code on failure, and the application
has to guess whether this was due to an invalid argument or due
to a connection or controller error.
With this patch applications can read a list of supported
arguments by simply reading from /dev/nvme-fabrics, allowing
them to validate the connection string.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

f18ee3d9

17 12月, 2021 1 次提交

block: remove the rsxx driver · 3427f2b2

由 Christoph Hellwig 提交于 12月 16, 2021

This driver was for rare and shortlived high end enterprise hardware
and hasn't been maintained since 2014, which also means it never got
converted to use blk-mq.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3427f2b2

14 12月, 2021 6 次提交

rsxx: Drop PCI legacy power management · ac6f6548

由 Bjorn Helgaas 提交于 12月 08, 2021

The rsxx driver doesn't support device suspend, so remove
rsxx_pci_suspend(), the legacy PCI .suspend() method, completely.
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20211208192449.146076-5-helgaas@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

ac6f6548

mtip32xx: convert to generic power management · cd97b7e0

由 Vaibhav Gupta 提交于 12月 08, 2021

Convert mtip32xx from legacy PCI power management to the generic power
management framework.

Previously, mtip32xx used legacy PCI power management, where
mtip_pci_suspend() and mtip_pci_resume() were responsible for both
device-specific things and generic PCI things:

  mtip_pci_suspend
    mtip_block_suspend(dd)              <-- device-specific
    pci_save_state(pdev)                <-- generic PCI
    pci_set_power_state(pdev, pci_choose_state(pdev, state))

  mtip_pci_resume
    pci_set_power_state(PCI_D0)         <-- generic PCI
    pci_restore_state(pdev)             <-- generic PCI
    pcim_enable_device(pdev)            <-- generic PCI
    pci_set_master(pdev)                <-- generic PCI
    mtip_block_resume(dd)               <-- device-specific

With generic power management, the PCI bus PM methods do the generic PCI
things, and the driver needs only the device-specific part, i.e.,

  suspend_devices_and_enter
    dpm_suspend_start(PMSG_SUSPEND)
      pci_pm_suspend                    # PCI bus .suspend() method
        mtip_pci_suspend                # dev->driver->pm->suspend
          mtip_block_suspend            <-- device-specific
    suspend_enter
      dpm_suspend_noirq(PMSG_SUSPEND)
        pci_pm_suspend_noirq            # PCI bus .suspend_noirq() method
          pci_save_state                <-- generic PCI
          pci_prepare_to_sleep          <-- generic PCI
            pci_set_power_state
    ...
    dpm_resume_end(PMSG_RESUME)
      pci_pm_resume                     # PCI bus .resume() method
        pci_restore_standard_config
          pci_set_power_state(PCI_D0)   <-- generic PCI
          pci_restore_state             <-- generic PCI
        mtip_pci_resume                 # dev->driver->pm->resume
          mtip_block_resume             <-- device-specific

[bhelgaas: commit log]

Link: https://lore.kernel.org/r/20210114115423.52414-2-vaibhavgupta40@gmail.comSigned-off-by: NVaibhav Gupta <vaibhavgupta40@gmail.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20211208192449.146076-4-helgaas@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

cd97b7e0

mtip32xx: remove pointless drvdata lookups · 9e541f14

由 Bjorn Helgaas 提交于 12月 08, 2021

Previously we passed a struct pci_dev * to mtip_check_surprise_removal(),
which immediately looked up the driver_data. But all callers already have
the driver_data pointer, so just pass it directly and skip the extra
lookup. No functional change intended.
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20211208192449.146076-3-helgaas@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

9e541f14

mtip32xx: remove pointless drvdata checking · 2920417c

由 Bjorn Helgaas 提交于 12月 08, 2021

The .suspend() and .resume() methods are only called after the .probe()
method (mtip_pci_probe()) has set the drvdata and returned success.

Therefore, if we get to mtip_pci_suspend() or mtip_pci_resume(), the
drvdata must be valid. Drop the unnecessary checking.
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20211208192449.146076-2-helgaas@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

2920417c

drbd: Use struct_group() to zero algs · 52a0cab3

由 Kees Cook 提交于 11月 18, 2021

In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memset(), avoid intentionally writing across
neighboring fields.

Add a struct_group() for the algs so that memset() can correctly reason
about the size.
Signed-off-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NGustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20211118203712.1288866-1-keescook@chromium.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

52a0cab3

loop: make autoclear operation asynchronous · 322c4293

由 Tetsuo Handa 提交于 12月 13, 2021

syzbot is reporting circular locking problem at __loop_clr_fd() [1], for
commit 87579e9b ("loop: use worker per cgroup instead of kworker")
is calling destroy_workqueue() with disk->open_mutex held.

This circular dependency cannot be broken unless we call __loop_clr_fd()
without holding disk->open_mutex. Therefore, defer __loop_clr_fd() from
lo_release() to a WQ context.

Link: https://syzkaller.appspot.com/bug?extid=643e4ce4b6ad1347d372 [1]
Reported-by: Nsyzbot <syzbot+643e4ce4b6ad1347d372@syzkaller.appspotmail.com>
Suggested-by: NChristoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Tested-by: syzbot+643e4ce4b6ad1347d372@syzkaller.appspotmail.com
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/1ed7df28-ebd6-71fb-70e5-1c2972e05ddb@i-love.sakura.ne.jpSigned-off-by: NJens Axboe <axboe@kernel.dk>

322c4293

11 12月, 2021 1 次提交

null_blk: cast command status to integer · c5eafd79

由 Jens Axboe 提交于 12月 10, 2021

kernel test robot reports that sparse now triggers a warning on null_blk:

>> drivers/block/null_blk/main.c:1577:55: sparse: sparse: incorrect type in argument 3 (different base types) @@     expected int ioerror @@     got restricted blk_status_t [usertype] error @@
   drivers/block/null_blk/main.c:1577:55: sparse:     expected int ioerror
   drivers/block/null_blk/main.c:1577:55: sparse:     got restricted blk_status_t [usertype] error

because blk_mq_add_to_batch() takes an integer instead of a blk_status_t.
Just cast this to an integer to silence it, null_blk is the odd one out
here since the command status is the "right" type. If we change the
function type, then we'll have do that for other callers too (existing and
future ones).

Fixes: 2385ebf3 ("block: null_blk: batched complete poll requests")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c5eafd79

10 12月, 2021 1 次提交

pktdvd: stop using bdi congestion framework. · db67097a

由 NeilBrown 提交于 12月 09, 2021

The bdi congestion framework isn't widely used and should be
deprecated.

pktdvd makes use of it to track congestion, but this can be done
entirely internally to pktdvd, so it doesn't need to use the framework.

So introduce a "congested" flag.  When waiting for bio_queue_size to
drop, set this flag and a var_waitqueue() to wait for it.  When
bio_queue_size does drop and this flag is set, clear the flag and call
wake_up_var().

We don't use a wait_var_event macro for the waiting as we need to set
the flag and drop the spinlock before calling schedule() and while that
is possible with __wait_var_event(), result is not easy to read.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NNeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/163910843527.9928.857338663717630212@noble.neil.brown.nameSigned-off-by: NJens Axboe <axboe@kernel.dk>

db67097a

03 12月, 2021 4 次提交

block: null_blk: batched complete poll requests · 2385ebf3

由 Ming Lei 提交于 12月 03, 2021

Complete poll requests via blk_mq_add_to_batch() and
blk_mq_end_request_batch(), so that we can cover batched complete
code path by running null_blk test.

Meantime this way shows ~14% IOPS boost on 't/io_uring /dev/nullb0'
in my test.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20211203081703.3506020-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

2385ebf3

floppy: Add max size check for user space request · 545a3249

由 Xiongwei Song 提交于 11月 16, 2021

We need to check the max request size that is from user space before
allocating pages. If the request size exceeds the limit, return -EINVAL.
This check can avoid the warning below from page allocator.

WARNING: CPU: 3 PID: 16525 at mm/page_alloc.c:5344 current_gfp_context include/linux/sched/mm.h:195 [inline]
WARNING: CPU: 3 PID: 16525 at mm/page_alloc.c:5344 __alloc_pages+0x45d/0x500 mm/page_alloc.c:5356
Modules linked in:
CPU: 3 PID: 16525 Comm: syz-executor.3 Not tainted 5.15.0-syzkaller #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
RIP: 0010:__alloc_pages+0x45d/0x500 mm/page_alloc.c:5344
Code: be c9 00 00 00 48 c7 c7 20 4a 97 89 c6 05 62 32 a7 0b 01 e8 74 9a 42 07 e9 6a ff ff ff 0f 0b e9 a0 fd ff ff 40 80 e5 3f eb 88 <0f> 0b e9 18 ff ff ff 4c 89 ef 44 89 e6 45 31 ed e8 1e 76 ff ff e9
RSP: 0018:ffffc90023b87850 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 1ffff92004770f0b RCX: dffffc0000000000
RDX: 0000000000000000 RSI: 0000000000000033 RDI: 0000000000010cc1
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
R10: ffffffff81bb4686 R11: 0000000000000001 R12: ffffffff902c1960
R13: 0000000000000033 R14: 0000000000000000 R15: ffff88804cf64a30
FS:  0000000000000000(0000) GS:ffff88802cd00000(0063) knlGS:00000000f44b4b40
CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 000000002c921000 CR3: 000000004f507000 CR4: 0000000000150ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 alloc_pages+0x1a7/0x300 mm/mempolicy.c:2191
 __get_free_pages+0x8/0x40 mm/page_alloc.c:5418
 raw_cmd_copyin drivers/block/floppy.c:3113 [inline]
 raw_cmd_ioctl drivers/block/floppy.c:3160 [inline]
 fd_locked_ioctl+0x12e5/0x2820 drivers/block/floppy.c:3528
 fd_ioctl drivers/block/floppy.c:3555 [inline]
 fd_compat_ioctl+0x891/0x1b60 drivers/block/floppy.c:3869
 compat_blkdev_ioctl+0x3b8/0x810 block/ioctl.c:662
 __do_compat_sys_ioctl+0x1c7/0x290 fs/ioctl.c:972
 do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
 __do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178
 do_fast_syscall_32+0x2f/0x70 arch/x86/entry/common.c:203
 entry_SYSENTER_compat_after_hwframe+0x4d/0x5c

Reported-by: syzbot+23a02c7df2cf2bc93fa2@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/20211116131033.27685-1-sxwjean@me.comSigned-off-by: NXiongwei Song <sxwjean@gmail.com>
Signed-off-by: NDenis Efremov <efremov@linux.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

545a3249

floppy: Fix hang in watchdog when disk is ejected · fb48febc

由 Tasos Sahanidis 提交于 9月 03, 2021

When the watchdog detects a disk change, it calls cancel_activity(),
which in turn tries to cancel the fd_timer delayed work.

In the above scenario, fd_timer_fn is set to fd_watchdog(), meaning
it is trying to cancel its own work.
This results in a hang as cancel_delayed_work_sync() is waiting for the
watchdog (itself) to return, which never happens.

This can be reproduced relatively consistently by attempting to read a
broken floppy, and ejecting it while IO is being attempted and retried.

To resolve this, this patch calls cancel_delayed_work() instead, which
cancels the work without waiting for the watchdog to return and finish.

Before this regression was introduced, the code in this section used
del_timer(), and not del_timer_sync() to delete the watchdog timer.

Link: https://lore.kernel.org/r/399e486c-6540-db27-76aa-7a271b061f76@tasossah.com
Fixes: 070ad7e7 ("floppy: convert to delayed work and single-thread wq")
Signed-off-by: NTasos Sahanidis <tasos@tasossah.com>
Signed-off-by: NDenis Efremov <efremov@linux.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fb48febc

null_blk: allow zero poll queues · 2bfdbe8b

由 Ming Lei 提交于 12月 03, 2021

There isn't any reason to not allow zero poll queues from user
viewpoint.

Also sometimes we need to compare io poll between poll mode and irq
mode, so not allowing poll queues is bad.

Fixes: 15dfc662 ("null_blk: Fix handling of submit_queues and poll_queues attributes")
Cc: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20211203023935.3424042-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

2bfdbe8b

29 11月, 2021 24 次提交

loop: don't hold lo_mutex during __loop_clr_fd() · 6050fa4c

由 Tetsuo Handa 提交于 11月 24, 2021

syzbot is reporting circular locking problem at __loop_clr_fd() [1], for
commit 87579e9b ("loop: use worker per cgroup instead of kworker")
is calling destroy_workqueue() with lo->lo_mutex held.

Since all functions where lo->lo_state matters are already checking
lo->lo_state with lo->lo_mutex held (in order to avoid racing with e.g.
ioctl(LOOP_CTL_REMOVE)), and __loop_clr_fd() can be called from either
ioctl(LOOP_CLR_FD) xor close(), lo->lo_state == Lo_rundown is considered
as an exclusive lock for __loop_clr_fd(). Therefore, hold lo->lo_mutex
inside __loop_clr_fd() only when asserting/updating lo->lo_state.

Since ioctl(LOOP_CLR_FD) depends on lo->lo_state == Lo_bound, a valid
lo->lo_backing_file must have been assigned by ioctl(LOOP_SET_FD) or
ioctl(LOOP_CONFIGURE). Thus, we can remove lo->lo_backing_file test,
and convert __loop_clr_fd() into a void function.

Link: https://syzkaller.appspot.com/bug?extid=63614029dfb79abd4383 [1]
Reported-by: Nsyzbot <syzbot+63614029dfb79abd4383@syzkaller.appspotmail.com>
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/8ebe3b2e-8975-7f26-0620-7144a3b8b8cd@i-love.sakura.ne.jpSigned-off-by: NJens Axboe <axboe@kernel.dk>

6050fa4c

scsi: remove the gendisk argument to scsi_ioctl · a30e3441

由 Christoph Hellwig 提交于 11月 26, 2021

Now that blk_execute_rq does not take a gendisk argument there is no need
to pass it through the scsi_ioctl callchain either.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20211126121802.2090656-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

a30e3441

block: remove the gendisk argument to blk_execute_rq · b84ba30b

由 Christoph Hellwig 提交于 11月 26, 2021

Remove the gendisk aregument to blk_execute_rq and blk_execute_rq_nowait
given that it is unused now. Also convert the boolean at_head parameter
to actually use the bool type while touching the prototype.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20211126121802.2090656-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

b84ba30b

block: remove the ->rq_disk field in struct request · f3fa33ac

由 Christoph Hellwig 提交于 11月 26, 2021

Just use the disk attached to the request_queue instead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20211126121802.2090656-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

f3fa33ac

block: don't check ->rq_disk in merges · 79bb1dbd

由 Christoph Hellwig 提交于 11月 26, 2021

There is a 1:1 relationship between request_queues and gendisks now, so
no need for these extra checks.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20211126121802.2090656-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

79bb1dbd

mtd_blkdevs: remove the sector out of range check in do_blktrans_request · 82baa324

由 Christoph Hellwig 提交于 11月 26, 2021

The block layer already performs this check, no need to duplicate it in
the driver.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMiquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20211126121802.2090656-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

82baa324

block: Remove redundant initialization of variable ret · af22fef3

由 Colin Ian King 提交于 11月 26, 2021

The variable ret is being initialized with a value that is never
read, it is being updated later on. The assignment is redundant and
can be removed.
Signed-off-by: NColin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20211126230652.1175636-1-colin.i.king@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

af22fef3

block: simplify ioc_lookup_icq · eca5892a

由 Christoph Hellwig 提交于 11月 26, 2021

Remove the ioc argument as it always points to current->io_context.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211126115817.2087431-15-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

eca5892a

block: simplify ioc_create_icq · 18b74c4d

由 Christoph Hellwig 提交于 11月 26, 2021

Remove the ioc and gfp_mask argument, which are hard coded by the caller.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211126115817.2087431-14-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

18b74c4d

block: return the io_context from create_task_io_context · d538ea4c

由 Christoph Hellwig 提交于 11月 26, 2021

Grab a reference to the newly allocated or existing io_context in
create_task_io_context and return it. This simplifies the callers and
removes the need for double lookups.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211126115817.2087431-13-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

d538ea4c

block: use alloc_io_context in __copy_io · 8ffc1368

由 Christoph Hellwig 提交于 11月 26, 2021

In __copy_io we know that the newly allocate task_struct does not have
an I/O context yet and is not exiting. So just allocate the I/O context
struct and install it directly. There is no need to lock the task
either as it is just being created.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211126115817.2087431-12-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

8ffc1368

block: factor out a alloc_io_context helper · a0f14d8b

由 Christoph Hellwig 提交于 11月 26, 2021

Factor out a helper that just allocate an I/O context.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211126115817.2087431-11-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

a0f14d8b

block: remove get_io_context_active · 50569c24

由 Christoph Hellwig 提交于 11月 26, 2021

Fold it into it's only caller, and remove a lof of the debug checks
that are not needed.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211126115817.2087431-10-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

50569c24

block: move the remaining elv.icq handling to the I/O scheduler · 222ee581

由 Christoph Hellwig 提交于 11月 26, 2021

After the prepare side has been moved to the only I/O scheduler that
cares, do the same for the cleanup and the NULL initialization.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211126115817.2087431-9-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

222ee581

block: move blk_mq_sched_assign_ioc to blk-ioc.c · 87dd1d63

由 Christoph Hellwig 提交于 11月 26, 2021

Move blk_mq_sched_assign_ioc so that many interfaces from the file can
be marked static. Rename the function to ioc_find_get_icq as well and
return the icq to simplify the interface.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211126115817.2087431-8-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

87dd1d63

block: mark put_io_context_active static · 33047425

由 Christoph Hellwig 提交于 11月 26, 2021

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211126115817.2087431-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

33047425

Revert "block: Provide blk_mq_sched_get_icq()" · c2a32464

由 Christoph Hellwig 提交于 11月 26, 2021

This reverts commit 4896c4e64ba5d5d5acdbcf68c5910dd4f6d8fa62.

The helper is not needed any more.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211126115817.2087431-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

c2a32464

bfq: use bfq_bic_lookup in bfq_limit_depth · a0725c22

由 Christoph Hellwig 提交于 11月 26, 2021

No need to create a new I/O context if there is none present yet in
->limit_depth.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211126115817.2087431-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

a0725c22

bfq: simplify bfq_bic_lookup · 836b394b

由 Christoph Hellwig 提交于 11月 26, 2021

Remove the unused bfqd argument, and hardcode ioc to current->io_context.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211126115817.2087431-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

836b394b

fork: move copy_io to block/blk-ioc.c · 88c9a2ce

由 Christoph Hellwig 提交于 11月 26, 2021

Move the copying of the I/O context to the block layer as that is where
we can use the proper low-level interfaces.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211126115817.2087431-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

88c9a2ce

RDMA/qib: rename copy_io to qib_copy_io · e92a559e

由 Christoph Hellwig 提交于 11月 26, 2021

Add the proper module prefix to avoid conflicts with a function
in the scheduler.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211126115817.2087431-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

e92a559e

blk-mq: use bio->bi_opf after bio is checked · 5f480b1a

由 Ming Lei 提交于 11月 27, 2021

bio->bi_opf isn't finalized before checking the bio, so use it after
submit_bio_checks() returns.

Fixes: 5b13bc8a ("blk-mq: cleanup request allocation")
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5f480b1a

bfq: Do not let waker requests skip proper accounting · c65e6fd4

由 Jan Kara 提交于 11月 25, 2021

Commit 7cc4ffc5 ("block, bfq: put reqs of waker and woken in
dispatch list") added a condition to bfq_insert_request() which added
waker's requests directly to dispatch list. The rationale was that
completing waker's IO is needed to get more IO for the current queue.
Although this rationale is valid, there is a hole in it. The waker does
not necessarily serve the IO only for the current queue and maybe it's
current IO is not needed for current queue to make progress. Furthermore
injecting IO like this completely bypasses any service accounting within
bfq and thus we do not properly track how much service is waker's queue
getting or that the waker is actually doing any IO. Depending on the
conditions this can result in the waker getting too much or too few
service.

Consider for example the following job file:

[global]
directory=/mnt/repro/
rw=write
size=8g
time_based
runtime=30
ramp_time=10
blocksize=1m
direct=0
ioengine=sync

[slowwriter]
numjobs=1
prioclass=2
prio=7
fsync=200

[fastwriter]
numjobs=1
prioclass=2
prio=0
fsync=200

Despite processes have very different IO priorities, they get the same
about of service. The reason is that bfq identifies these processes as
having waker-wakee relationship and once that happens, IO from
fastwriter gets injected during slowwriter's time slice. As a result bfq
is not aware that fastwriter has any IO to do and constantly schedules
only slowwriter's queue. Thus fastwriter is forced to compete with
slowwriter's IO all the time instead of getting its share of time based
on IO priority.

Drop the special injection condition from bfq_insert_request(). As a
result, requests will be tracked and queued in a normal way and on next
dispatch bfq_select_queue() can decide whether the waker's inserted
requests should be injected during the current queue's timeslice or not.

Fixes: 7cc4ffc5 ("block, bfq: put reqs of waker and woken in dispatch list")
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20211125133645.27483-8-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>

c65e6fd4

bfq: Log waker detections · 1eb17f5e

由 Jan Kara 提交于 11月 25, 2021

Waker - wakee relationships are important in deciding whether one queue
can preempt the other one. Print information about detected waker-wakee
relationships so that scheduling decisions can be better understood from
block traces.
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20211125133645.27483-7-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>

1eb17f5e

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功