提交 · 6880fa6c56601bb8ed59df6c30fd390cc5f6dd8f · openeuler / Kernel

08 9月, 2021 1 次提交

blk-mq: allow 4x BLK_MAX_REQUEST_COUNT at blk_plug for multiple_queues · 7f2a6a69

由 Song Liu 提交于 9月 07, 2021

Limiting number of request to BLK_MAX_REQUEST_COUNT at blk_plug hurts
performance for large md arrays. [1] shows resync speed of md array drops
for md array with more than 16 HDDs.

Fix this by allowing more request at plug queue. The multiple_queue flag
is used to only apply higher limit to multiple queue cases.

[1] https://lore.kernel.org/linux-raid/CAFDAVznS71BXW8Jxv6k9dXc2iR3ysX3iZRBww_rzA8WifBFxGg@mail.gmail.com/Tested-by: NMarcin Wanat <marcin.wanat@gmail.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7f2a6a69

07 9月, 2021 4 次提交

block: move fs/block_dev.c to block/bdev.c · 0dca4462

由 Christoph Hellwig 提交于 9月 07, 2021

Move it together with the rest of the block layer.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210907141303.1371844-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

0dca4462

block: split out operations on block special files · cd82cca7

由 Christoph Hellwig 提交于 9月 07, 2021

Add a new block/fops.c for all the file and address_space operations
that provide the block special file support.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210907141303.1371844-2-hch@lst.de
[axboe: correct trailing whitespace while at it]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cd82cca7

blk-throttle: fix UAF by deleteing timer in blk_throtl_exit() · 884f0e84

由 Li Jinlin 提交于 9月 07, 2021

The pending timer has been set up in blk_throtl_init(). However, the
timer is not deleted in blk_throtl_exit(). This means that the timer
handler may still be running after freeing the timer, which would
result in a use-after-free.

Fix by calling del_timer_sync() to delete the timer in blk_throtl_exit().
Signed-off-by: NLi Jinlin <lijinlin3@huawei.com>
Link: https://lore.kernel.org/r/20210907121242.2885564-1-lijinlin3@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

884f0e84

block: genhd: don't call blkdev_show() with major_names_lock held · dfbb3409

由 Tetsuo Handa 提交于 9月 07, 2021

If CONFIG_BLK_DEV_LOOP && CONFIG_MTD (at least; there might be other
combinations), lockdep complains circular locking dependency at
__loop_clr_fd(), for major_names_lock serves as a locking dependency
aggregating hub across multiple block modules.

 ======================================================
 WARNING: possible circular locking dependency detected
 5.14.0+ #757 Tainted: G            E
 ------------------------------------------------------
 systemd-udevd/7568 is trying to acquire lock:
 ffff88800f334d48 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x70/0x560

 but task is already holding lock:
 ffff888014a7d4a0 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x4d/0x400 [loop]

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #6 (&lo->lo_mutex){+.+.}-{3:3}:
        lock_acquire+0xbe/0x1f0
        __mutex_lock_common+0xb6/0xe10
        mutex_lock_killable_nested+0x17/0x20
        lo_open+0x23/0x50 [loop]
        blkdev_get_by_dev+0x199/0x540
        blkdev_open+0x58/0x90
        do_dentry_open+0x144/0x3a0
        path_openat+0xa57/0xda0
        do_filp_open+0x9f/0x140
        do_sys_openat2+0x71/0x150
        __x64_sys_openat+0x78/0xa0
        do_syscall_64+0x3d/0xb0
        entry_SYSCALL_64_after_hwframe+0x44/0xae

 -> #5 (&disk->open_mutex){+.+.}-{3:3}:
        lock_acquire+0xbe/0x1f0
        __mutex_lock_common+0xb6/0xe10
        mutex_lock_nested+0x17/0x20
        bd_register_pending_holders+0x20/0x100
        device_add_disk+0x1ae/0x390
        loop_add+0x29c/0x2d0 [loop]
        blk_request_module+0x5a/0xb0
        blkdev_get_no_open+0x27/0xa0
        blkdev_get_by_dev+0x5f/0x540
        blkdev_open+0x58/0x90
        do_dentry_open+0x144/0x3a0
        path_openat+0xa57/0xda0
        do_filp_open+0x9f/0x140
        do_sys_openat2+0x71/0x150
        __x64_sys_openat+0x78/0xa0
        do_syscall_64+0x3d/0xb0
        entry_SYSCALL_64_after_hwframe+0x44/0xae

 -> #4 (major_names_lock){+.+.}-{3:3}:
        lock_acquire+0xbe/0x1f0
        __mutex_lock_common+0xb6/0xe10
        mutex_lock_nested+0x17/0x20
        blkdev_show+0x19/0x80
        devinfo_show+0x52/0x60
        seq_read_iter+0x2d5/0x3e0
        proc_reg_read_iter+0x41/0x80
        vfs_read+0x2ac/0x330
        ksys_read+0x6b/0xd0
        do_syscall_64+0x3d/0xb0
        entry_SYSCALL_64_after_hwframe+0x44/0xae

 -> #3 (&p->lock){+.+.}-{3:3}:
        lock_acquire+0xbe/0x1f0
        __mutex_lock_common+0xb6/0xe10
        mutex_lock_nested+0x17/0x20
        seq_read_iter+0x37/0x3e0
        generic_file_splice_read+0xf3/0x170
        splice_direct_to_actor+0x14e/0x350
        do_splice_direct+0x84/0xd0
        do_sendfile+0x263/0x430
        __se_sys_sendfile64+0x96/0xc0
        do_syscall_64+0x3d/0xb0
        entry_SYSCALL_64_after_hwframe+0x44/0xae

 -> #2 (sb_writers#3){.+.+}-{0:0}:
        lock_acquire+0xbe/0x1f0
        lo_write_bvec+0x96/0x280 [loop]
        loop_process_work+0xa68/0xc10 [loop]
        process_one_work+0x293/0x480
        worker_thread+0x23d/0x4b0
        kthread+0x163/0x180
        ret_from_fork+0x1f/0x30

 -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
        lock_acquire+0xbe/0x1f0
        process_one_work+0x280/0x480
        worker_thread+0x23d/0x4b0
        kthread+0x163/0x180
        ret_from_fork+0x1f/0x30

 -> #0 ((wq_completion)loop0){+.+.}-{0:0}:
        validate_chain+0x1f0d/0x33e0
        __lock_acquire+0x92d/0x1030
        lock_acquire+0xbe/0x1f0
        flush_workqueue+0x8c/0x560
        drain_workqueue+0x80/0x140
        destroy_workqueue+0x47/0x4f0
        __loop_clr_fd+0xb4/0x400 [loop]
        blkdev_put+0x14a/0x1d0
        blkdev_close+0x1c/0x20
        __fput+0xfd/0x220
        task_work_run+0x69/0xc0
        exit_to_user_mode_prepare+0x1ce/0x1f0
        syscall_exit_to_user_mode+0x26/0x60
        do_syscall_64+0x4c/0xb0
        entry_SYSCALL_64_after_hwframe+0x44/0xae

 other info that might help us debug this:

 Chain exists of:
   (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&lo->lo_mutex);
                                lock(&disk->open_mutex);
                                lock(&lo->lo_mutex);
   lock((wq_completion)loop0);

  *** DEADLOCK ***

 2 locks held by systemd-udevd/7568:
  #0: ffff888012554128 (&disk->open_mutex){+.+.}-{3:3}, at: blkdev_put+0x4c/0x1d0
  #1: ffff888014a7d4a0 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x4d/0x400 [loop]

 stack backtrace:
 CPU: 0 PID: 7568 Comm: systemd-udevd Tainted: G            E     5.14.0+ #757
 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020
 Call Trace:
  dump_stack_lvl+0x79/0xbf
  print_circular_bug+0x5d6/0x5e0
  ? stack_trace_save+0x42/0x60
  ? save_trace+0x3d/0x2d0
  check_noncircular+0x10b/0x120
  validate_chain+0x1f0d/0x33e0
  ? __lock_acquire+0x953/0x1030
  ? __lock_acquire+0x953/0x1030
  __lock_acquire+0x92d/0x1030
  ? flush_workqueue+0x70/0x560
  lock_acquire+0xbe/0x1f0
  ? flush_workqueue+0x70/0x560
  flush_workqueue+0x8c/0x560
  ? flush_workqueue+0x70/0x560
  ? sched_clock_cpu+0xe/0x1a0
  ? drain_workqueue+0x41/0x140
  drain_workqueue+0x80/0x140
  destroy_workqueue+0x47/0x4f0
  ? blk_mq_freeze_queue_wait+0xac/0xd0
  __loop_clr_fd+0xb4/0x400 [loop]
  ? __mutex_unlock_slowpath+0x35/0x230
  blkdev_put+0x14a/0x1d0
  blkdev_close+0x1c/0x20
  __fput+0xfd/0x220
  task_work_run+0x69/0xc0
  exit_to_user_mode_prepare+0x1ce/0x1f0
  syscall_exit_to_user_mode+0x26/0x60
  do_syscall_64+0x4c/0xb0
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7f0fd4c661f7
 Code: 00 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 13 fc ff ff
 RSP: 002b:00007ffd1c9e9fd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
 RAX: 0000000000000000 RBX: 00007f0fd46be6c8 RCX: 00007f0fd4c661f7
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000006
 RBP: 0000000000000006 R08: 000055fff1eaf400 R09: 0000000000000000
 R10: 00007f0fd46be6c8 R11: 0000000000000246 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000002f08 R15: 00007ffd1c9ea050

Commit 1c500ad7 ("loop: reduce the loop_ctl_mutex scope") is for
breaking "loop_ctl_mutex => &lo->lo_mutex" dependency chain. But enabling
a different block module results in forming circular locking dependency
due to shared major_names_lock mutex.

The simplest fix is to call probe function without holding
major_names_lock [1], but Christoph Hellwig does not like such idea.
Therefore, instead of holding major_names_lock in blkdev_show(),
introduce a different lock for blkdev_show() in order to break
"sb_writers#$N => &p->lock => major_names_lock" dependency chain.

Link: https://lkml.kernel.org/r/b2af8a5b-3c1b-204e-7f56-bea0b15848d6@i-love.sakura.ne.jp [1]
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: https://lore.kernel.org/r/18a02da2-0bf3-550e-b071-2b4ab13c49f0@i-love.sakura.ne.jpSigned-off-by: NJens Axboe <axboe@kernel.dk>

dfbb3409

04 9月, 2021 1 次提交

mm: remove flush_kernel_dcache_page · f358afc5

由 Christoph Hellwig 提交于 9月 02, 2021

flush_kernel_dcache_page is a rather confusing interface that implements a
subset of flush_dcache_page by not being able to properly handle page
cache mapped pages.

The only callers left are in the exec code as all other previous callers
were incorrect as they could have dealt with page cache pages.  Replace
the calls to flush_kernel_dcache_page with calls to flush_dcache_page,
which for all architectures does either exactly the same thing, can
contains one or more of the following:

 1) an optimization to defer the cache flush for page cache pages not
    mapped into userspace
 2) additional flushing for mapped page cache pages if cache aliases
    are possible

Link: https://lkml.kernel.org/r/20210712060928.4161649-7-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Cercueil <paul@crapouillou.net>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f358afc5

03 9月, 2021 1 次提交

bio: fix kerneldoc documentation for bio_alloc_kiocb() · 0ef47db1

由 Jens Axboe 提交于 9月 03, 2021

Apparently the last fixup got butter fingered a bit, the correct variable
name is 'nr_vecs', not 'nr_iovecs'.

Link: https://lore.kernel.org/lkml/20210903164939.02f6e8c5@canb.auug.org.au/Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0ef47db1

02 9月, 2021 2 次提交

block, bfq: honor already-setup queue merges · 2d52c58b

由 Paolo Valente 提交于 8月 02, 2021

The function bfq_setup_merge prepares the merging between two
bfq_queues, say bfqq and new_bfqq. To this goal, it assigns
bfqq->new_bfqq = new_bfqq. Then, each time some I/O for bfqq arrives,
the process that generated that I/O is disassociated from bfqq and
associated with new_bfqq (merging is actually a redirection). In this
respect, bfq_setup_merge increases new_bfqq->ref in advance, adding
the number of processes that are expected to be associated with
new_bfqq.

Unfortunately, the stable-merging mechanism interferes with this
setup. After bfqq->new_bfqq has been set by bfq_setup_merge, and
before all the expected processes have been associated with
bfqq->new_bfqq, bfqq may happen to be stably merged with a different
queue than the current bfqq->new_bfqq. In this case, bfqq->new_bfqq
gets changed. So, some of the processes that have been already
accounted for in the ref counter of the previous new_bfqq will not be
associated with that queue. This creates an unbalance, because those
references will never be decremented.

This commit fixes this issue by reestablishing the previous, natural
behaviour: once bfqq->new_bfqq has been set, it will not be changed
until all expected redirections have occurred.
Signed-off-by: NDavide Zini <davidezini2@gmail.com>
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Link: https://lore.kernel.org/r/20210802141352.74353-2-paolo.valente@linaro.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

2d52c58b

block/mq-deadline: Move dd_queued() to fix defined but not used warning · 55a51ea1

由 Geert Uytterhoeven 提交于 8月 30, 2021

If CONFIG_BLK_DEBUG_FS=n:

    block/mq-deadline.c:274:12: warning: ‘dd_queued’ defined but not used [-Wunused-function]
      274 | static u32 dd_queued(struct deadline_data *dd, enum dd_prio prio)
	  |            ^~~~~~~~~

Fix this by moving dd_queued() just before the sole function that calls
it.

Fixes: 7b05bf77 ("Revert "block/mq-deadline: Prioritize high-priority requests"")
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Fixes: 38ba64d1 ("block/mq-deadline: Track I/O statistics")
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210830091128.1854266-1-geert@linux-m68k.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

55a51ea1

27 8月, 2021 1 次提交

Revert "block/mq-deadline: Prioritize high-priority requests" · 7b05bf77

由 Jens Axboe 提交于 8月 26, 2021

This reverts commit fb926032.

Zhen reports that this commit slows down mq-deadline on a 128 thread
box, going from 258K IOPS to 170-180K. My testing shows that Optane
gen2 IOPS goes from 2.3M IOPS to 1.2M IOPS on a 64 thread box.

Looking in detail at the code, the main culprit here is needing to sum
percpu counters in the dispatch hot path, leading to very high CPU
utilization there. To make matters worse, the code currently needs to
sum 2 percpu counters, and it does so in the most naive way of iterating
possible CPUs _twice_.

Since we're close to release, revert this commit and we can re-do it
with regular per-priority counters instead for the 5.15 kernel.

Link: https://lore.kernel.org/linux-block/20210826144039.2143-1-thunder.leizhen@huawei.com/Reported-by: NZhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7b05bf77

25 8月, 2021 7 次提交

block, bfq: cleanup the repeated declaration · 1e294970

由 Shaokun Zhang 提交于 8月 25, 2021

Function 'bfq_entity_to_bfqq' is declared twice, so remove the
repeated declaration and blank line.

Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NShaokun Zhang <zhangshaokun@hisilicon.com>
Link: https://lore.kernel.org/r/1629872391-46399-1-git-send-email-zhangshaokun@hisilicon.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

1e294970

blk-crypto: fix check for too-large dun_bytes · cc40b722

由 Eric Biggers 提交于 8月 24, 2021

dun_bytes needs to be less than or equal to the IV size of the
encryption mode, not just less than or equal to BLK_CRYPTO_MAX_IV_SIZE.

Currently this doesn't matter since blk_crypto_init_key() is never
actually passed invalid values, but we might as well fix this.

Fixes: a892c8d5 ("block: Inline encryption support for blk-mq")
Signed-off-by: NEric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20210825055918.51975-1-ebiggers@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

cc40b722

mq-deadline: Fix request accounting · b6d2b054

由 Bart Van Assche 提交于 8月 24, 2021

The block layer may call the I/O scheduler .finish_request() callback
without having called the .insert_requests() callback. Make sure that the
mq-deadline I/O statistics are correct if the block layer inserts an I/O
request that bypasses the I/O scheduler. This patch prevents that lower
priority I/O is delayed longer than necessary for mixed I/O priority
workloads.

Cc: Niklas Cassel <Niklas.Cassel@wdc.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Reported-by: NNiklas Cassel <Niklas.Cassel@wdc.com>
Fixes: 08a9ad8b ("block/mq-deadline: Add cgroup support")
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210824170520.1659173-1-bvanassche@acm.orgReviewed-by: NNiklas Cassel <niklas.cassel@wdc.com>
Tested-by: NNiklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b6d2b054

blk-zoned: allow BLKREPORTZONE without CAP_SYS_ADMIN · 4d643b66

由 Niklas Cassel 提交于 8月 11, 2021

A user space process should not need the CAP_SYS_ADMIN capability set
in order to perform a BLKREPORTZONE ioctl.

Getting the zone report is required in order to get the write pointer.
Neither read() nor write() requires CAP_SYS_ADMIN, so it is reasonable
that a user space process that can read/write from/to the device, also
can get the write pointer. (Since e.g. writes have to be at the write
pointer.)

Fixes: 3ed05a98 ("blk-zoned: implement ioctls")
Signed-off-by: NNiklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NAravind Ramesh <aravind.ramesh@wdc.com>
Reviewed-by: NAdam Manzanares <a.manzanares@samsung.com>
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: stable@vger.kernel.org # v4.10+
Link: https://lore.kernel.org/r/20210811110505.29649-3-Niklas.Cassel@wdc.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

4d643b66

blk-zoned: allow zone management send operations without CAP_SYS_ADMIN · ead3b768

由 Niklas Cassel 提交于 8月 11, 2021

Zone management send operations (BLKRESETZONE, BLKOPENZONE, BLKCLOSEZONE
and BLKFINISHZONE) should be allowed under the same permissions as write().
(write() does not require CAP_SYS_ADMIN).

Additionally, other ioctls like BLKSECDISCARD and BLKZEROOUT only check if
the fd was successfully opened with FMODE_WRITE.
(They do not require CAP_SYS_ADMIN).

Currently, zone management send operations require both CAP_SYS_ADMIN
and that the fd was successfully opened with FMODE_WRITE.

Remove the CAP_SYS_ADMIN requirement, so that zone management send
operations match the access control requirement of write(), BLKSECDISCARD
and BLKZEROOUT.

Fixes: 3ed05a98 ("blk-zoned: implement ioctls")
Signed-off-by: NNiklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NAravind Ramesh <aravind.ramesh@wdc.com>
Reviewed-by: NAdam Manzanares <a.manzanares@samsung.com>
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: stable@vger.kernel.org # v4.10+
Link: https://lore.kernel.org/r/20210811110505.29649-2-Niklas.Cassel@wdc.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

ead3b768

block: refine the disk_live check in del_gendisk · 9f286992

由 Christoph Hellwig 提交于 8月 24, 2021

hidden gendisks will never be marked live.

Fixes: 40b3a52f ("block: add a sanity check for a live disk in del_gendisk")
Reported-by: NBruno Goncalves <bgoncalv@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210824144310.1487816-1-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

9f286992

partitions/efi: Support non-standard GPT location · 466d9c49

由 Dmitry Osipenko 提交于 8月 20, 2021

Support looking up GPT at a non-standard location specified by a block
device driver.
Acked-by: NDavidlohr Bueso <dbueso@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDmitry Osipenko <digetx@gmail.com>
Reviewed-by: NUlf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210820004536.15791-3-digetx@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

466d9c49

24 8月, 2021 22 次提交

bio: fix page leak bio_add_hw_page failure · d9cf3bd5

由 Pavel Begunkov 提交于 7月 19, 2021

__bio_iov_append_get_pages() doesn't put not appended pages on
bio_add_hw_page() failure, so potentially leaking them, fix it. Also, do
the same for __bio_iov_iter_get_pages(), even though it looks like it
can't be triggered by userspace in this case.

Fixes: 0512a75b ("block: Introduce REQ_OP_ZONE_APPEND")
Cc: stable@vger.kernel.org # 5.8+
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1edfa6a2ffd66d55e6345a477df5387d2c1415d0.1626653825.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

d9cf3bd5

block: remove CONFIG_DEBUG_BLOCK_EXT_DEVT · c4b2b7d1

由 Christoph Hellwig 提交于 8月 24, 2021

This might have been a neat debug aid when the extended dev_t was
added, but that time is long gone.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210824075216.1179406-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

c4b2b7d1

block: remove a pointless call to MINOR() in device_add_disk · 539711d7

由 Christoph Hellwig 提交于 8月 24, 2021

blk_alloc_ext_minor already returns just a minor number, so no need to
mask the high bits.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210824075216.1179406-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

539711d7

bio: improve kerneldoc documentation for bio_alloc_kiocb() · 3d5b3fbe

由 Jens Axboe 提交于 8月 13, 2021

We're missing a description for the 'nr_vecs' parameter. While in there,
clarify that freeing a bio allocated through this function must be done
from process context.

Fixes: 1cbbd31c4ada ("bio: add allocation cache abstraction")
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3d5b3fbe

block: provide bio_clear_hipri() helper · 270a1c91

由 Jens Axboe 提交于 8月 12, 2021

Any case that turns off REQ_HIPRI must also clear BIO_PERCPU_CACHE,
as non-polled IO may complete through hard/soft IRQ and hence isn't
safe for our polled bio alloc cache.

Provide a helper that does just that, and use it in the merging code as
well if we split a bio and turn off polling.

Fixes: be863b9e ("block: clear BIO_PERCPU_CACHE flag if polling isn't supported")
Reported-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

270a1c91

block: clear BIO_PERCPU_CACHE flag if polling isn't supported · be863b9e

由 Jens Axboe 提交于 8月 11, 2021

The bio alloc cache relies on the fact that a polled bio will complete
in process context, clear the cacheable flag if we disable polling
for a given bio.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

be863b9e

bio: add allocation cache abstraction · be4d234d

由 Jens Axboe 提交于 3月 08, 2021

Add a per-cpu bio_set cache for bio allocations, enabling us to quickly
recycle them instead of going through the slab allocator. This cache
isn't IRQ safe, and hence is only really suitable for polled IO.

Very simple - keeps a count of bio's in the cache, and maintains a max
of 512 with a slack of 64. If we get above max + slack, we drop slack
number of bio's.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

be4d234d

bio: optimize initialization of a bio · da521626

由 Jens Axboe 提交于 8月 11, 2021

The memset() used is measurably slower in targeted benchmarks, wasting
about 1% of the total runtime, or 50% of the (later) hot path cached
bio alloc. Get rid of it and fill in the bio manually.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

da521626

block: add error handling for device_add_disk / add_disk · 83cbce95

由 Luis Chamberlain 提交于 8月 18, 2021

Properly unwind on errors in device_add_disk.  This is the initial work
as drivers are not converted yet, which will follow in separate patches.
Signed-off-by: NLuis Chamberlain <mcgrof@kernel.org>
[hch: major rebase.  All bugs are probably mine]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20210818144542.19305-10-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

83cbce95

block: return errors from disk_alloc_events · 92e7755e

由 Luis Chamberlain 提交于 8月 18, 2021

Prepare for proper error handling in add_disk.
Signed-off-by: NLuis Chamberlain <mcgrof@kernel.org>
[hch: split from a larger patch]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20210818144542.19305-9-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

92e7755e

block: return errors from blk_integrity_add · 614310c9

由 Luis Chamberlain 提交于 8月 18, 2021

Prepare for proper error handling in add_disk.
Signed-off-by: NLuis Chamberlain <mcgrof@kernel.org>
[hch: split from a larger patch]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20210818144542.19305-8-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

614310c9

block: call blk_register_queue earlier in device_add_disk · 75f4dca5

由 Christoph Hellwig 提交于 8月 18, 2021

Ensure that all the sysfs bits are set up before bdev_add is called,
as that will make the upcomding error handling much easier. However
this means the call to disk_update_readahead has to be split as that
requires a bdi. Also remove various sanity checks that don't make
sense now that blk_register_queue only has a single caller.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210818144542.19305-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

75f4dca5

block: call blk_integrity_add earlier in device_add_disk · bab53f6b

由 Christoph Hellwig 提交于 8月 18, 2021

Doing all the sysfs file creation before adding the bdev and thus
allowing it to be opened will simplify the about to be added error
handling.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20210818144542.19305-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

bab53f6b

block: create the bdi link earlier in device_add_disk · 9d5ee676

由 Christoph Hellwig 提交于 8月 18, 2021

This will simplify error handling going forward.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20210818144542.19305-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

9d5ee676

block: call bdev_add later in device_add_disk · 8235b5c1

由 Christoph Hellwig 提交于 8月 18, 2021

Once bdev_add is called userspace can open the block device. Ensure
that the struct device, which is used for refcounting of the disk
besides various other things, is fully setup at that point.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20210818144542.19305-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

8235b5c1

block: fold register_disk into device_add_disk · 52b85909

由 Christoph Hellwig 提交于 8月 18, 2021

There is no real reason these should be separate. Also simplify the
groups assignment a bit.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210818144542.19305-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

52b85909

block: add a sanity check for a live disk in del_gendisk · 40b3a52f

由 Christoph Hellwig 提交于 8月 18, 2021

Add a sanity check to del_gendisk to do nothing when the disk wasn't
successfully added. This papers over the complete lack of add_disk
error handling, which is about to get fixed gradually.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20210818144542.19305-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

40b3a52f

block: add an explicit ->disk backpointer to the request_queue · d152c682

由 Christoph Hellwig 提交于 8月 16, 2021

Replace the magic lookup through the kobject tree with an explicit
backpointer, given that the device model links are set up and torn
down at times when I/O is still possible, leading to potential
NULL or invalid pointer dereferences.

Fixes: edb0872f ("block: move the bdi from the request_queue to the gendisk")
Reported-by: Nsyzbot <syzbot+aa0801b6b32dca9dda82@syzkaller.appspotmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NSven Schnelle <svens@linux.ibm.com>
Link: https://lore.kernel.org/r/20210816134624.GA24234@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

d152c682

block: hold a request_queue reference for the lifetime of struct gendisk · 61a35cfc

由 Christoph Hellwig 提交于 8月 16, 2021

Acquire the queue ref dropped in disk_release in __blk_alloc_disk so any
allocate gendisk always has a queue reference.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210816131910.615153-9-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

61a35cfc

block: pass a request_queue to __blk_alloc_disk · 4a1fa41d

由 Christoph Hellwig 提交于 8月 16, 2021

Pass in a request_queue and assign disk->queue in __blk_alloc_disk to
ensure struct gendisk always has a valid ->queue pointer.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210816131910.615153-8-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

4a1fa41d

block: remove the minors argument to __alloc_disk_node · a58bd768

由 Christoph Hellwig 提交于 8月 16, 2021

This was a leftover from the legacy alloc_disk interface. Switch
the scsi ULPs and dasd to set ->minors directly like all other
drivers and remove the argument.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com> [dasd]
Link: https://lore.kernel.org/r/20210816131910.615153-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

a58bd768

block: cleanup the lockdep handling in *alloc_disk · 4dcc4874

由 Christoph Hellwig 提交于 8月 16, 2021

Pass the lockdep name to the low-level __blk_alloc_disk helper and
hardcode the name for it given that the number of minors or node_id
are not very useful information. While this passes a pointless
argument for non-lockdep builds that is not really an issue as
disk allocation is a probe time only slow path.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210816131910.615153-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

4dcc4874

23 8月, 2021 1 次提交

block: fix argument type of bio_trim() · e83502ca

由 Chaitanya Kulkarni 提交于 7月 21, 2021

The function bio_trim has offset and size arguments that are declared
as int.

The callers of this function use sector_t type when passing the offset
and size, e.g. drivers/md/raid1.c:narrow_write_error() and
drivers/md/raid1.c:narrow_write_error().

Change offset and size arguments to sector_t type for bio_trim(). Also,
add WARN_ON_ONCE() to catch their overflow.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e83502ca

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功