提交 · e41d12f539f7c8bac9b0897031fee6cc9158a6be · openeuler / Kernel

18 10月, 2021 2 次提交

mm: don't include <linux/blk-cgroup.h> in <linux/backing-dev.h> · e41d12f5

由 Christoph Hellwig 提交于 9月 20, 2021

There is no need to pull blk-cgroup.h and thus blkdev.h in here, so
break the include chain.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

e41d12f5

blk-cgroup: blk_cgroup_bio_start() should use irq-safe operations on blkg->iostat_cpu · 3c08b093

由 Tejun Heo 提交于 10月 14, 2021

c3df5fb5 ("cgroup: rstat: fix A-A deadlock on 32bit around
u64_stats_sync") made u64_stats updates irq-safe to avoid A-A deadlocks.
Unfortunately, the conversion missed one in blk_cgroup_bio_start(). Fix it.

Fixes: 2d146aa3 ("mm: memcontrol: switch to rstat")
Cc: stable@vger.kernel.org # v5.13+
Reported-by: syzbot+9738c8815b375ce482a1@syzkaller.appspotmail.com
Signed-off-by: NTejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/YWi7NrQdVlxD6J9W@slm.duckdns.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

3c08b093

17 10月, 2021 2 次提交

block, bfq: reset last_bfqq_created on group change · d29bd414

由 Paolo Valente 提交于 10月 15, 2021

Since commit 430a67f9 ("block, bfq: merge bursts of newly-created
queues"), BFQ maintains a per-group pointer to the last bfq_queue
created. If such a queue, say bfqq, happens to move to a different
group, then bfqq is no more a valid last bfq_queue created for its
previous group. That pointer must then be cleared. Not resetting such
a pointer may also cause UAF, if bfqq happens to also be freed after
being moved to a different group. This commit performs this missing
reset. As such it fixes commit 430a67f9 ("block, bfq: merge bursts
of newly-created queues").

Such a missing reset is most likely the cause of the crash reported in [1].
With some analysis, we found that this crash was due to the
above UAF. And such UAF did go away with this commit applied [1].

Anyway, before this commit, that crash happened to be triggered in
conjunction with commit 2d52c58b ("block, bfq: honor already-setup
queue merges"). The latter was then reverted by commit ebc69e89
("Revert "block, bfq: honor already-setup queue merges""). Yet commit
2d52c58b ("block, bfq: honor already-setup queue merges") contains
no error related with the above UAF, and can then be restored.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=214503

Fixes: 430a67f9 ("block, bfq: merge bursts of newly-created queues")
Tested-by: NGrzegorz Kowal <custos.mentis@gmail.com>
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Link: https://lore.kernel.org/r/20211015144336.45894-2-paolo.valente@linaro.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

d29bd414

block: warn when putting the final reference on a registered disk · a2041761

由 Christoph Hellwig 提交于 10月 14, 2021

Warn when the last reference on a live disk is put without calling
del_gendisk first. There are some BDI related bug reports that look
like a case of this, so make sure we have the proper instrumentation
to catch it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211014130231.1468538-1-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

a2041761

16 10月, 2021 6 次提交

kyber: avoid q->disk dereferences in trace points · c4110804

由 Christoph Hellwig 提交于 10月 12, 2021

q->disk becomes invalid after the gendisk is removed.  Work around this
by caching the dev_t for the tracepoints.  The real fix would be to
properly tear down the I/O schedulers with the gendisk, but that is
a much more invasive change.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012093301.GA27795@lst.deTested-by: NYi Zhang <yi.zhang@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c4110804

block: keep q_usage_counter in atomic mode after del_gendisk · aec89dc5

由 Christoph Hellwig 提交于 9月 29, 2021

Don't switch back to percpu mode to avoid the double RCU grace period
when tearing down SCSI devices.  After removing the disk only passthrough
commands can be send anyway.
Suggested-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NDarrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20210929071241.934472-6-hch@lst.deTested-by: NYi Zhang <yi.zhang@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

aec89dc5

block: drain file system I/O on del_gendisk · 8e141f9e

由 Christoph Hellwig 提交于 9月 29, 2021

Instead of delaying draining of file system I/O related items like the
blk-qos queues, the integrity read workqueue and timeouts only when the
request_queue is removed, do that when del_gendisk is called. This is
important for SCSI where the upper level drivers that control the gendisk
are separate entities, and the disk can be freed much earlier than the
request_queue, or can even be unbound without tearing down the queue.

Fixes: edb0872f ("block: move the bdi from the request_queue to the gendisk")
Reported-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NDarrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20210929071241.934472-5-hch@lst.deTested-by: NYi Zhang <yi.zhang@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8e141f9e

block: split bio_queue_enter from blk_queue_enter · a6741536

由 Christoph Hellwig 提交于 9月 29, 2021

To prepare for fixing a gendisk shutdown race, open code the
blk_queue_enter logic in bio_queue_enter.  This also removes the
pointless flags translation.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NDarrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20210929071241.934472-4-hch@lst.deTested-by: NYi Zhang <yi.zhang@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a6741536

block: factor out a blk_try_enter_queue helper · 1f14a098

由 Christoph Hellwig 提交于 9月 29, 2021

Factor out the code to try to get q_usage_counter without blocking into
a separate helper. Both to improve code readability and to prepare for
splitting bio_queue_enter from blk_queue_enter.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NDarrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20210929071241.934472-3-hch@lst.deTested-by: NYi Zhang <yi.zhang@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1f14a098

block: call submit_bio_checks under q_usage_counter · cc9c884d

由 Christoph Hellwig 提交于 9月 29, 2021

Ensure all bios check the current values of the queue under freeze
protection, i.e. to make sure the zero capacity set by del_gendisk
is actually seen before dispatching to the driver.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210929071241.934472-2-hch@lst.deTested-by: NYi Zhang <yi.zhang@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cc9c884d

04 10月, 2021 1 次提交

block: decode QUEUE_FLAG_HCTX_ACTIVE in debugfs output · 1dbdd99b

由 Johannes Thumshirn 提交于 10月 04, 2021

While debugging an issue we've found that $DEBUGFS/block/$disk/state
doesn't decode QUEUE_FLAG_HCTX_ACTIVE but only displays its numerical
value.

Add QUEUE_FLAG(HCTX_ACTIVE) to the blk_queue_flag_name array so it'll get
decoded properly.
Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/4351076388918075bd80ef07756f9d2ce63be12c.1633332053.git.johannes.thumshirn@wdc.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

1dbdd99b

02 10月, 2021 1 次提交

block: genhd: fix double kfree() in __alloc_disk_node() · 06cc978d

由 Tetsuo Handa 提交于 10月 02, 2021

syzbot is reporting use-after-free read at bdev_free_inode() [1], for
kfree() from __alloc_disk_node() is called before bdev_free_inode()
(which is called after RCU grace period) reads bdev->bd_disk and calls
kfree(bdev->bd_disk).

Fix use-after-free read followed by double kfree() problem
by making sure that bdev->bd_disk is NULL when calling iput().

Link: https://syzkaller.appspot.com/bug?extid=8281086e8a6fbfbd952a [1]
Reported-by: Nsyzbot <syzbot+8281086e8a6fbfbd952a@syzkaller.appspotmail.com>
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/e6dd13c5-8db0-4392-6e78-a42ee5d2a1c4@i-love.sakura.ne.jpSigned-off-by: NJens Axboe <axboe@kernel.dk>

06cc978d

28 9月, 2021 1 次提交

Revert "block, bfq: honor already-setup queue merges" · ebc69e89

由 Jens Axboe 提交于 9月 28, 2021

This reverts commit 2d52c58b.

We have had several folks complain that this causes hangs for them, which
is especially problematic as the commit has also hit stable already.

As no resolution seems to be forthcoming right now, revert the patch.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=214503
Fixes: 2d52c58b ("block, bfq: honor already-setup queue merges")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ebc69e89

25 9月, 2021 2 次提交

block: hold ->invalidate_lock in blkdev_fallocate · f278eb3d

由 Ming Lei 提交于 9月 23, 2021

When running ->fallocate(), blkdev_fallocate() should hold
mapping->invalidate_lock to prevent page cache from being accessed,
otherwise stale data may be read in page cache.

Without this patch, blktests block/009 fails sometimes. With this patch,
block/009 can pass always.

Also as Jan pointed out, no pages can be created in the discarded area
while you are holding the invalidate_lock, so remove the 2nd
truncate_bdev_range().

Cc: Jan Kara <jack@suse.cz>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210923023751.1441091-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

f278eb3d

block: don't call rq_qos_ops->done_bio if the bio isn't tracked · a647a524

由 Ming Lei 提交于 9月 24, 2021

rq_qos framework is only applied on request based driver, so:

1) rq_qos_done_bio() needn't to be called for bio based driver

2) rq_qos_done_bio() needn't to be called for bio which isn't tracked,
such as bios ended from error handling code.

Especially in bio_endio():

1) request queue is referred via bio->bi_bdev->bd_disk->queue, which
may be gone since request queue refcount may not be held in above two
cases

2) q->rq_qos may be freed in blk_cleanup_queue() when calling into
__rq_qos_done_bio()

Fix the potential kernel panic by not calling rq_qos_ops->done_bio if
the bio isn't tracked. This way is safe because both ioc_rqos_done_bio()
and blkcg_iolatency_done_bio() are nop if the bio isn't tracked.
Reported-by: NYu Kuai <yukuai3@huawei.com>
Cc: tj@kernel.org
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NTejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20210924110704.1541818-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a647a524

16 9月, 2021 2 次提交

blk-cgroup: fix UAF by grabbing blkcg lock before destroying blkg pd · 858560b2

由 Li Jinlin 提交于 9月 14, 2021

KASAN reports a use-after-free report when doing fuzz test:

[693354.104835] ==================================================================
[693354.105094] BUG: KASAN: use-after-free in bfq_io_set_weight_legacy+0xd3/0x160
[693354.105336] Read of size 4 at addr ffff888be0a35664 by task sh/1453338

[693354.105607] CPU: 41 PID: 1453338 Comm: sh Kdump: loaded Not tainted 4.18.0-147
[693354.105610] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 0.81 07/02/2018
[693354.105612] Call Trace:
[693354.105621]  dump_stack+0xf1/0x19b
[693354.105626]  ? show_regs_print_info+0x5/0x5
[693354.105634]  ? printk+0x9c/0xc3
[693354.105638]  ? cpumask_weight+0x1f/0x1f
[693354.105648]  print_address_description+0x70/0x360
[693354.105654]  kasan_report+0x1b2/0x330
[693354.105659]  ? bfq_io_set_weight_legacy+0xd3/0x160
[693354.105665]  ? bfq_io_set_weight_legacy+0xd3/0x160
[693354.105670]  bfq_io_set_weight_legacy+0xd3/0x160
[693354.105675]  ? bfq_cpd_init+0x20/0x20
[693354.105683]  cgroup_file_write+0x3aa/0x510
[693354.105693]  ? ___slab_alloc+0x507/0x540
[693354.105698]  ? cgroup_file_poll+0x60/0x60
[693354.105702]  ? 0xffffffff89600000
[693354.105708]  ? usercopy_abort+0x90/0x90
[693354.105716]  ? mutex_lock+0xef/0x180
[693354.105726]  kernfs_fop_write+0x1ab/0x280
[693354.105732]  ? cgroup_file_poll+0x60/0x60
[693354.105738]  vfs_write+0xe7/0x230
[693354.105744]  ksys_write+0xb0/0x140
[693354.105749]  ? __ia32_sys_read+0x50/0x50
[693354.105760]  do_syscall_64+0x112/0x370
[693354.105766]  ? syscall_return_slowpath+0x260/0x260
[693354.105772]  ? do_page_fault+0x9b/0x270
[693354.105779]  ? prepare_exit_to_usermode+0xf9/0x1a0
[693354.105784]  ? enter_from_user_mode+0x30/0x30
[693354.105793]  entry_SYSCALL_64_after_hwframe+0x65/0xca

[693354.105875] Allocated by task 1453337:
[693354.106001]  kasan_kmalloc+0xa0/0xd0
[693354.106006]  kmem_cache_alloc_node_trace+0x108/0x220
[693354.106010]  bfq_pd_alloc+0x96/0x120
[693354.106015]  blkcg_activate_policy+0x1b7/0x2b0
[693354.106020]  bfq_create_group_hierarchy+0x1e/0x80
[693354.106026]  bfq_init_queue+0x678/0x8c0
[693354.106031]  blk_mq_init_sched+0x1f8/0x460
[693354.106037]  elevator_switch_mq+0xe1/0x240
[693354.106041]  elevator_switch+0x25/0x40
[693354.106045]  elv_iosched_store+0x1a1/0x230
[693354.106049]  queue_attr_store+0x78/0xb0
[693354.106053]  kernfs_fop_write+0x1ab/0x280
[693354.106056]  vfs_write+0xe7/0x230
[693354.106060]  ksys_write+0xb0/0x140
[693354.106064]  do_syscall_64+0x112/0x370
[693354.106069]  entry_SYSCALL_64_after_hwframe+0x65/0xca

[693354.106114] Freed by task 1453336:
[693354.106225]  __kasan_slab_free+0x130/0x180
[693354.106229]  kfree+0x90/0x1b0
[693354.106233]  blkcg_deactivate_policy+0x12c/0x220
[693354.106238]  bfq_exit_queue+0xf5/0x110
[693354.106241]  blk_mq_exit_sched+0x104/0x130
[693354.106245]  __elevator_exit+0x45/0x60
[693354.106249]  elevator_switch_mq+0xd6/0x240
[693354.106253]  elevator_switch+0x25/0x40
[693354.106257]  elv_iosched_store+0x1a1/0x230
[693354.106261]  queue_attr_store+0x78/0xb0
[693354.106264]  kernfs_fop_write+0x1ab/0x280
[693354.106268]  vfs_write+0xe7/0x230
[693354.106271]  ksys_write+0xb0/0x140
[693354.106275]  do_syscall_64+0x112/0x370
[693354.106280]  entry_SYSCALL_64_after_hwframe+0x65/0xca

[693354.106329] The buggy address belongs to the object at ffff888be0a35580
                 which belongs to the cache kmalloc-1k of size 1024
[693354.106736] The buggy address is located 228 bytes inside of
                 1024-byte region [ffff888be0a35580, ffff888be0a35980)
[693354.107114] The buggy address belongs to the page:
[693354.107273] page:ffffea002f828c00 count:1 mapcount:0 mapping:ffff888107c17080 index:0x0 compound_mapcount: 0
[693354.107606] flags: 0x17ffffc0008100(slab|head)
[693354.107760] raw: 0017ffffc0008100 ffffea002fcbc808 ffffea0030bd3a08 ffff888107c17080
[693354.108020] raw: 0000000000000000 00000000001c001c 00000001ffffffff 0000000000000000
[693354.108278] page dumped because: kasan: bad access detected

[693354.108511] Memory state around the buggy address:
[693354.108671]  ffff888be0a35500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[693354.116396]  ffff888be0a35580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[693354.124473] >ffff888be0a35600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[693354.132421]                                                        ^
[693354.140284]  ffff888be0a35680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[693354.147912]  ffff888be0a35700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[693354.155281] ==================================================================

blkgs are protected by both queue and blkcg locks and holding
either should stabilize them. However, the path of destroying
blkg policy data is only protected by queue lock in
blkcg_activate_policy()/blkcg_deactivate_policy(). Other tasks
can get the blkg policy data before the blkg policy data is
destroyed, and use it after destroyed, which will result in a
use-after-free.

CPU0                             CPU1
blkcg_deactivate_policy
  spin_lock_irq(&q->queue_lock)
                                 bfq_io_set_weight_legacy
                                   spin_lock_irq(&blkcg->lock)
                                   blkg_to_bfqg(blkg)
                                     pd_to_bfqg(blkg->pd[pol->plid])
                                     ^^^^^^blkg->pd[pol->plid] != NULL
                                           bfqg != NULL
  pol->pd_free_fn(blkg->pd[pol->plid])
    pd_to_bfqg(blkg->pd[pol->plid])
    bfqg_put(bfqg)
      kfree(bfqg)
  blkg->pd[pol->plid] = NULL
  spin_unlock_irq(q->queue_lock);
                                   bfq_group_set_weight(bfqg, val, 0)
                                     bfqg->entity.new_weight
                                     ^^^^^^trigger uaf here
                                   spin_unlock_irq(&blkcg->lock);

Fix by grabbing the matching blkcg lock before trying to
destroy blkg policy data.
Suggested-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NLi Jinlin <lijinlin3@huawei.com>
Acked-by: NTejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20210914042605.3260596-1-lijinlin3@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

858560b2

blkcg: fix memory leak in blk_iolatency_init · 6f5ddde4

由 Yanfei Xu 提交于 9月 15, 2021

BUG: memory leak
unreferenced object 0xffff888129acdb80 (size 96):
  comm "syz-executor.1", pid 12661, jiffies 4294962682 (age 15.220s)
  hex dump (first 32 bytes):
    20 47 c9 85 ff ff ff ff 20 d4 8e 29 81 88 ff ff   G...... ..)....
    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff82264ec8>] kmalloc include/linux/slab.h:591 [inline]
    [<ffffffff82264ec8>] kzalloc include/linux/slab.h:721 [inline]
    [<ffffffff82264ec8>] blk_iolatency_init+0x28/0x190 block/blk-iolatency.c:724
    [<ffffffff8225b8c4>] blkcg_init_queue+0xb4/0x1c0 block/blk-cgroup.c:1185
    [<ffffffff822253da>] blk_alloc_queue+0x22a/0x2e0 block/blk-core.c:566
    [<ffffffff8223b175>] blk_mq_init_queue_data block/blk-mq.c:3100 [inline]
    [<ffffffff8223b175>] __blk_mq_alloc_disk+0x25/0xd0 block/blk-mq.c:3124
    [<ffffffff826a9303>] loop_add+0x1c3/0x360 drivers/block/loop.c:2344
    [<ffffffff826a966e>] loop_control_get_free drivers/block/loop.c:2501 [inline]
    [<ffffffff826a966e>] loop_control_ioctl+0x17e/0x2e0 drivers/block/loop.c:2516
    [<ffffffff81597eec>] vfs_ioctl fs/ioctl.c:51 [inline]
    [<ffffffff81597eec>] __do_sys_ioctl fs/ioctl.c:874 [inline]
    [<ffffffff81597eec>] __se_sys_ioctl fs/ioctl.c:860 [inline]
    [<ffffffff81597eec>] __x64_sys_ioctl+0xfc/0x140 fs/ioctl.c:860
    [<ffffffff843fa745>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    [<ffffffff843fa745>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
    [<ffffffff84600068>] entry_SYSCALL_64_after_hwframe+0x44/0xae

Once blk_throtl_init() queue init failed, blkcg_iolatency_exit() will
not be invoked for cleanup. That leads a memory leak. Swap the
blk_throtl_init() and blk_iolatency_init() calls can solve this.

Reported-by: syzbot+01321b15cc98e6bf96d6@syzkaller.appspotmail.com
Fixes: 19688d7f (block/blk-cgroup: Swap the blk_throtl_init() and blk_iolatency_init() calls)
Signed-off-by: NYanfei Xu <yanfei.xu@windriver.com>
Acked-by: NTejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20210915072426.4022924-1-yanfei.xu@windriver.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

6f5ddde4

15 9月, 2021 2 次提交

block: flush the integrity workqueue in blk_integrity_unregister · 3df49967

由 Lihong Kou 提交于 9月 14, 2021

When the integrity profile is unregistered there can still be integrity
reads queued up which could see a NULL verify_fn as shown by the race
window below:

CPU0                                    CPU1
  process_one_work                      nvme_validate_ns
    bio_integrity_verify_fn                nvme_update_ns_info
	                                     nvme_update_disk_info
	                                       blk_integrity_unregister
                                               ---set queue->integrity as 0
	bio_integrity_process
	--access bi->profile->verify_fn(bi is a pointer of queue->integity)

Before calling blk_integrity_unregister in nvme_update_disk_info, we must
make sure that there is no work item in the kintegrityd_wq. Just call
blk_flush_integrity to flush the work queue so the bug can be resolved.
Signed-off-by: NLihong Kou <koulihong@huawei.com>
[hch: split up and shortened the changelog]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Link: https://lore.kernel.org/r/20210914070657.87677-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

3df49967

block: check if a profile is actually registered in blk_integrity_unregister · 783a40a1

由 Christoph Hellwig 提交于 9月 14, 2021

While clearing the profile itself is harmless, we really should not clear
the stable writes flag if it wasn't set due to a registered integrity
profile.
Reported-by: NLihong Kou <koulihong@huawei.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Link: https://lore.kernel.org/r/20210914070657.87677-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

783a40a1

14 9月, 2021 1 次提交

scsi: bsg: Fix device unregistration · 1a0db774

由 Zenghui Yu 提交于 9月 11, 2021

device_initialize() is used to take a refcount on the device. However,
put_device() is not called during device teardown. This leads to a
leak of private data of the driver core, dev_name(), etc. This is
reported by kmemleak at boot time if we compile kernel with
DEBUG_TEST_DRIVER_REMOVE.

Fix memory leaks during unregistration and implement a release
function.

Link: https://lore.kernel.org/r/20210911105306.1511-1-yuzenghui@huawei.com
Fixes: ead09dd3 ("scsi: bsg: Simplify device registration")
Reviewed-by: NJohan Hovold <johan@kernel.org>
Signed-off-by: NZenghui Yu <yuzenghui@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

1a0db774

13 9月, 2021 1 次提交

blk-mq: avoid to iterate over stale request · 67f3b2f8

由 Ming Lei 提交于 9月 06, 2021

blk-mq can't run allocating driver tag and updating ->rqs[tag]
atomically, meantime blk-mq doesn't clear ->rqs[tag] after the driver
tag is released.

So there is chance to iterating over one stale request just after the
tag is allocated and before updating ->rqs[tag].

scsi_host_busy_iter() calls scsi_host_check_in_flight() to count scsi
in-flight requests after scsi host is blocked, so no new scsi command can
be marked as SCMD_STATE_INFLIGHT. However, driver tag allocation still can
be run by blk-mq core. One request is marked as SCMD_STATE_INFLIGHT,
but this request may have been kept in another slot of ->rqs[], meantime
the slot can be allocated out but ->rqs[] isn't updated yet. Then this
in-flight request is counted twice as SCMD_STATE_INFLIGHT. This way causes
trouble in handling scsi error.

Fixes the issue by not iterating over stale request.

Cc: linux-scsi@vger.kernel.org
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Reported-by: Nluojiaxing <luojiaxing@huawei.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210906065003.439019-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

67f3b2f8

08 9月, 2021 1 次提交

blk-mq: allow 4x BLK_MAX_REQUEST_COUNT at blk_plug for multiple_queues · 7f2a6a69

由 Song Liu 提交于 9月 07, 2021

Limiting number of request to BLK_MAX_REQUEST_COUNT at blk_plug hurts
performance for large md arrays. [1] shows resync speed of md array drops
for md array with more than 16 HDDs.

Fix this by allowing more request at plug queue. The multiple_queue flag
is used to only apply higher limit to multiple queue cases.

[1] https://lore.kernel.org/linux-raid/CAFDAVznS71BXW8Jxv6k9dXc2iR3ysX3iZRBww_rzA8WifBFxGg@mail.gmail.com/Tested-by: NMarcin Wanat <marcin.wanat@gmail.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7f2a6a69

07 9月, 2021 4 次提交

block: move fs/block_dev.c to block/bdev.c · 0dca4462

由 Christoph Hellwig 提交于 9月 07, 2021

Move it together with the rest of the block layer.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210907141303.1371844-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

0dca4462

block: split out operations on block special files · cd82cca7

由 Christoph Hellwig 提交于 9月 07, 2021

Add a new block/fops.c for all the file and address_space operations
that provide the block special file support.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210907141303.1371844-2-hch@lst.de
[axboe: correct trailing whitespace while at it]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cd82cca7

blk-throttle: fix UAF by deleteing timer in blk_throtl_exit() · 884f0e84

由 Li Jinlin 提交于 9月 07, 2021

The pending timer has been set up in blk_throtl_init(). However, the
timer is not deleted in blk_throtl_exit(). This means that the timer
handler may still be running after freeing the timer, which would
result in a use-after-free.

Fix by calling del_timer_sync() to delete the timer in blk_throtl_exit().
Signed-off-by: NLi Jinlin <lijinlin3@huawei.com>
Link: https://lore.kernel.org/r/20210907121242.2885564-1-lijinlin3@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

884f0e84

block: genhd: don't call blkdev_show() with major_names_lock held · dfbb3409

由 Tetsuo Handa 提交于 9月 07, 2021

If CONFIG_BLK_DEV_LOOP && CONFIG_MTD (at least; there might be other
combinations), lockdep complains circular locking dependency at
__loop_clr_fd(), for major_names_lock serves as a locking dependency
aggregating hub across multiple block modules.

 ======================================================
 WARNING: possible circular locking dependency detected
 5.14.0+ #757 Tainted: G            E
 ------------------------------------------------------
 systemd-udevd/7568 is trying to acquire lock:
 ffff88800f334d48 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x70/0x560

 but task is already holding lock:
 ffff888014a7d4a0 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x4d/0x400 [loop]

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #6 (&lo->lo_mutex){+.+.}-{3:3}:
        lock_acquire+0xbe/0x1f0
        __mutex_lock_common+0xb6/0xe10
        mutex_lock_killable_nested+0x17/0x20
        lo_open+0x23/0x50 [loop]
        blkdev_get_by_dev+0x199/0x540
        blkdev_open+0x58/0x90
        do_dentry_open+0x144/0x3a0
        path_openat+0xa57/0xda0
        do_filp_open+0x9f/0x140
        do_sys_openat2+0x71/0x150
        __x64_sys_openat+0x78/0xa0
        do_syscall_64+0x3d/0xb0
        entry_SYSCALL_64_after_hwframe+0x44/0xae

 -> #5 (&disk->open_mutex){+.+.}-{3:3}:
        lock_acquire+0xbe/0x1f0
        __mutex_lock_common+0xb6/0xe10
        mutex_lock_nested+0x17/0x20
        bd_register_pending_holders+0x20/0x100
        device_add_disk+0x1ae/0x390
        loop_add+0x29c/0x2d0 [loop]
        blk_request_module+0x5a/0xb0
        blkdev_get_no_open+0x27/0xa0
        blkdev_get_by_dev+0x5f/0x540
        blkdev_open+0x58/0x90
        do_dentry_open+0x144/0x3a0
        path_openat+0xa57/0xda0
        do_filp_open+0x9f/0x140
        do_sys_openat2+0x71/0x150
        __x64_sys_openat+0x78/0xa0
        do_syscall_64+0x3d/0xb0
        entry_SYSCALL_64_after_hwframe+0x44/0xae

 -> #4 (major_names_lock){+.+.}-{3:3}:
        lock_acquire+0xbe/0x1f0
        __mutex_lock_common+0xb6/0xe10
        mutex_lock_nested+0x17/0x20
        blkdev_show+0x19/0x80
        devinfo_show+0x52/0x60
        seq_read_iter+0x2d5/0x3e0
        proc_reg_read_iter+0x41/0x80
        vfs_read+0x2ac/0x330
        ksys_read+0x6b/0xd0
        do_syscall_64+0x3d/0xb0
        entry_SYSCALL_64_after_hwframe+0x44/0xae

 -> #3 (&p->lock){+.+.}-{3:3}:
        lock_acquire+0xbe/0x1f0
        __mutex_lock_common+0xb6/0xe10
        mutex_lock_nested+0x17/0x20
        seq_read_iter+0x37/0x3e0
        generic_file_splice_read+0xf3/0x170
        splice_direct_to_actor+0x14e/0x350
        do_splice_direct+0x84/0xd0
        do_sendfile+0x263/0x430
        __se_sys_sendfile64+0x96/0xc0
        do_syscall_64+0x3d/0xb0
        entry_SYSCALL_64_after_hwframe+0x44/0xae

 -> #2 (sb_writers#3){.+.+}-{0:0}:
        lock_acquire+0xbe/0x1f0
        lo_write_bvec+0x96/0x280 [loop]
        loop_process_work+0xa68/0xc10 [loop]
        process_one_work+0x293/0x480
        worker_thread+0x23d/0x4b0
        kthread+0x163/0x180
        ret_from_fork+0x1f/0x30

 -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
        lock_acquire+0xbe/0x1f0
        process_one_work+0x280/0x480
        worker_thread+0x23d/0x4b0
        kthread+0x163/0x180
        ret_from_fork+0x1f/0x30

 -> #0 ((wq_completion)loop0){+.+.}-{0:0}:
        validate_chain+0x1f0d/0x33e0
        __lock_acquire+0x92d/0x1030
        lock_acquire+0xbe/0x1f0
        flush_workqueue+0x8c/0x560
        drain_workqueue+0x80/0x140
        destroy_workqueue+0x47/0x4f0
        __loop_clr_fd+0xb4/0x400 [loop]
        blkdev_put+0x14a/0x1d0
        blkdev_close+0x1c/0x20
        __fput+0xfd/0x220
        task_work_run+0x69/0xc0
        exit_to_user_mode_prepare+0x1ce/0x1f0
        syscall_exit_to_user_mode+0x26/0x60
        do_syscall_64+0x4c/0xb0
        entry_SYSCALL_64_after_hwframe+0x44/0xae

 other info that might help us debug this:

 Chain exists of:
   (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&lo->lo_mutex);
                                lock(&disk->open_mutex);
                                lock(&lo->lo_mutex);
   lock((wq_completion)loop0);

  *** DEADLOCK ***

 2 locks held by systemd-udevd/7568:
  #0: ffff888012554128 (&disk->open_mutex){+.+.}-{3:3}, at: blkdev_put+0x4c/0x1d0
  #1: ffff888014a7d4a0 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x4d/0x400 [loop]

 stack backtrace:
 CPU: 0 PID: 7568 Comm: systemd-udevd Tainted: G            E     5.14.0+ #757
 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020
 Call Trace:
  dump_stack_lvl+0x79/0xbf
  print_circular_bug+0x5d6/0x5e0
  ? stack_trace_save+0x42/0x60
  ? save_trace+0x3d/0x2d0
  check_noncircular+0x10b/0x120
  validate_chain+0x1f0d/0x33e0
  ? __lock_acquire+0x953/0x1030
  ? __lock_acquire+0x953/0x1030
  __lock_acquire+0x92d/0x1030
  ? flush_workqueue+0x70/0x560
  lock_acquire+0xbe/0x1f0
  ? flush_workqueue+0x70/0x560
  flush_workqueue+0x8c/0x560
  ? flush_workqueue+0x70/0x560
  ? sched_clock_cpu+0xe/0x1a0
  ? drain_workqueue+0x41/0x140
  drain_workqueue+0x80/0x140
  destroy_workqueue+0x47/0x4f0
  ? blk_mq_freeze_queue_wait+0xac/0xd0
  __loop_clr_fd+0xb4/0x400 [loop]
  ? __mutex_unlock_slowpath+0x35/0x230
  blkdev_put+0x14a/0x1d0
  blkdev_close+0x1c/0x20
  __fput+0xfd/0x220
  task_work_run+0x69/0xc0
  exit_to_user_mode_prepare+0x1ce/0x1f0
  syscall_exit_to_user_mode+0x26/0x60
  do_syscall_64+0x4c/0xb0
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7f0fd4c661f7
 Code: 00 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 13 fc ff ff
 RSP: 002b:00007ffd1c9e9fd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
 RAX: 0000000000000000 RBX: 00007f0fd46be6c8 RCX: 00007f0fd4c661f7
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000006
 RBP: 0000000000000006 R08: 000055fff1eaf400 R09: 0000000000000000
 R10: 00007f0fd46be6c8 R11: 0000000000000246 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000002f08 R15: 00007ffd1c9ea050

Commit 1c500ad7 ("loop: reduce the loop_ctl_mutex scope") is for
breaking "loop_ctl_mutex => &lo->lo_mutex" dependency chain. But enabling
a different block module results in forming circular locking dependency
due to shared major_names_lock mutex.

The simplest fix is to call probe function without holding
major_names_lock [1], but Christoph Hellwig does not like such idea.
Therefore, instead of holding major_names_lock in blkdev_show(),
introduce a different lock for blkdev_show() in order to break
"sb_writers#$N => &p->lock => major_names_lock" dependency chain.

Link: https://lkml.kernel.org/r/b2af8a5b-3c1b-204e-7f56-bea0b15848d6@i-love.sakura.ne.jp [1]
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: https://lore.kernel.org/r/18a02da2-0bf3-550e-b071-2b4ab13c49f0@i-love.sakura.ne.jpSigned-off-by: NJens Axboe <axboe@kernel.dk>

dfbb3409

04 9月, 2021 1 次提交

mm: remove flush_kernel_dcache_page · f358afc5

由 Christoph Hellwig 提交于 9月 02, 2021

flush_kernel_dcache_page is a rather confusing interface that implements a
subset of flush_dcache_page by not being able to properly handle page
cache mapped pages.

The only callers left are in the exec code as all other previous callers
were incorrect as they could have dealt with page cache pages.  Replace
the calls to flush_kernel_dcache_page with calls to flush_dcache_page,
which for all architectures does either exactly the same thing, can
contains one or more of the following:

 1) an optimization to defer the cache flush for page cache pages not
    mapped into userspace
 2) additional flushing for mapped page cache pages if cache aliases
    are possible

Link: https://lkml.kernel.org/r/20210712060928.4161649-7-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Cercueil <paul@crapouillou.net>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f358afc5

03 9月, 2021 1 次提交

bio: fix kerneldoc documentation for bio_alloc_kiocb() · 0ef47db1

由 Jens Axboe 提交于 9月 03, 2021

Apparently the last fixup got butter fingered a bit, the correct variable
name is 'nr_vecs', not 'nr_iovecs'.

Link: https://lore.kernel.org/lkml/20210903164939.02f6e8c5@canb.auug.org.au/Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0ef47db1

02 9月, 2021 2 次提交

block, bfq: honor already-setup queue merges · 2d52c58b

由 Paolo Valente 提交于 8月 02, 2021

The function bfq_setup_merge prepares the merging between two
bfq_queues, say bfqq and new_bfqq. To this goal, it assigns
bfqq->new_bfqq = new_bfqq. Then, each time some I/O for bfqq arrives,
the process that generated that I/O is disassociated from bfqq and
associated with new_bfqq (merging is actually a redirection). In this
respect, bfq_setup_merge increases new_bfqq->ref in advance, adding
the number of processes that are expected to be associated with
new_bfqq.

Unfortunately, the stable-merging mechanism interferes with this
setup. After bfqq->new_bfqq has been set by bfq_setup_merge, and
before all the expected processes have been associated with
bfqq->new_bfqq, bfqq may happen to be stably merged with a different
queue than the current bfqq->new_bfqq. In this case, bfqq->new_bfqq
gets changed. So, some of the processes that have been already
accounted for in the ref counter of the previous new_bfqq will not be
associated with that queue. This creates an unbalance, because those
references will never be decremented.

This commit fixes this issue by reestablishing the previous, natural
behaviour: once bfqq->new_bfqq has been set, it will not be changed
until all expected redirections have occurred.
Signed-off-by: NDavide Zini <davidezini2@gmail.com>
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Link: https://lore.kernel.org/r/20210802141352.74353-2-paolo.valente@linaro.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

2d52c58b

block/mq-deadline: Move dd_queued() to fix defined but not used warning · 55a51ea1

由 Geert Uytterhoeven 提交于 8月 30, 2021

If CONFIG_BLK_DEBUG_FS=n:

    block/mq-deadline.c:274:12: warning: ‘dd_queued’ defined but not used [-Wunused-function]
      274 | static u32 dd_queued(struct deadline_data *dd, enum dd_prio prio)
	  |            ^~~~~~~~~

Fix this by moving dd_queued() just before the sole function that calls
it.

Fixes: 7b05bf77 ("Revert "block/mq-deadline: Prioritize high-priority requests"")
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Fixes: 38ba64d1 ("block/mq-deadline: Track I/O statistics")
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210830091128.1854266-1-geert@linux-m68k.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

55a51ea1

27 8月, 2021 1 次提交

Revert "block/mq-deadline: Prioritize high-priority requests" · 7b05bf77

由 Jens Axboe 提交于 8月 26, 2021

This reverts commit fb926032.

Zhen reports that this commit slows down mq-deadline on a 128 thread
box, going from 258K IOPS to 170-180K. My testing shows that Optane
gen2 IOPS goes from 2.3M IOPS to 1.2M IOPS on a 64 thread box.

Looking in detail at the code, the main culprit here is needing to sum
percpu counters in the dispatch hot path, leading to very high CPU
utilization there. To make matters worse, the code currently needs to
sum 2 percpu counters, and it does so in the most naive way of iterating
possible CPUs _twice_.

Since we're close to release, revert this commit and we can re-do it
with regular per-priority counters instead for the 5.15 kernel.

Link: https://lore.kernel.org/linux-block/20210826144039.2143-1-thunder.leizhen@huawei.com/Reported-by: NZhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7b05bf77

25 8月, 2021 7 次提交

block, bfq: cleanup the repeated declaration · 1e294970

由 Shaokun Zhang 提交于 8月 25, 2021

Function 'bfq_entity_to_bfqq' is declared twice, so remove the
repeated declaration and blank line.

Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NShaokun Zhang <zhangshaokun@hisilicon.com>
Link: https://lore.kernel.org/r/1629872391-46399-1-git-send-email-zhangshaokun@hisilicon.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

1e294970

blk-crypto: fix check for too-large dun_bytes · cc40b722

由 Eric Biggers 提交于 8月 24, 2021

dun_bytes needs to be less than or equal to the IV size of the
encryption mode, not just less than or equal to BLK_CRYPTO_MAX_IV_SIZE.

Currently this doesn't matter since blk_crypto_init_key() is never
actually passed invalid values, but we might as well fix this.

Fixes: a892c8d5 ("block: Inline encryption support for blk-mq")
Signed-off-by: NEric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20210825055918.51975-1-ebiggers@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

cc40b722

mq-deadline: Fix request accounting · b6d2b054

由 Bart Van Assche 提交于 8月 24, 2021

The block layer may call the I/O scheduler .finish_request() callback
without having called the .insert_requests() callback. Make sure that the
mq-deadline I/O statistics are correct if the block layer inserts an I/O
request that bypasses the I/O scheduler. This patch prevents that lower
priority I/O is delayed longer than necessary for mixed I/O priority
workloads.

Cc: Niklas Cassel <Niklas.Cassel@wdc.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Reported-by: NNiklas Cassel <Niklas.Cassel@wdc.com>
Fixes: 08a9ad8b ("block/mq-deadline: Add cgroup support")
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210824170520.1659173-1-bvanassche@acm.orgReviewed-by: NNiklas Cassel <niklas.cassel@wdc.com>
Tested-by: NNiklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b6d2b054

blk-zoned: allow BLKREPORTZONE without CAP_SYS_ADMIN · 4d643b66

由 Niklas Cassel 提交于 8月 11, 2021

A user space process should not need the CAP_SYS_ADMIN capability set
in order to perform a BLKREPORTZONE ioctl.

Getting the zone report is required in order to get the write pointer.
Neither read() nor write() requires CAP_SYS_ADMIN, so it is reasonable
that a user space process that can read/write from/to the device, also
can get the write pointer. (Since e.g. writes have to be at the write
pointer.)

Fixes: 3ed05a98 ("blk-zoned: implement ioctls")
Signed-off-by: NNiklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NAravind Ramesh <aravind.ramesh@wdc.com>
Reviewed-by: NAdam Manzanares <a.manzanares@samsung.com>
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: stable@vger.kernel.org # v4.10+
Link: https://lore.kernel.org/r/20210811110505.29649-3-Niklas.Cassel@wdc.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

4d643b66

blk-zoned: allow zone management send operations without CAP_SYS_ADMIN · ead3b768

由 Niklas Cassel 提交于 8月 11, 2021

Zone management send operations (BLKRESETZONE, BLKOPENZONE, BLKCLOSEZONE
and BLKFINISHZONE) should be allowed under the same permissions as write().
(write() does not require CAP_SYS_ADMIN).

Additionally, other ioctls like BLKSECDISCARD and BLKZEROOUT only check if
the fd was successfully opened with FMODE_WRITE.
(They do not require CAP_SYS_ADMIN).

Currently, zone management send operations require both CAP_SYS_ADMIN
and that the fd was successfully opened with FMODE_WRITE.

Remove the CAP_SYS_ADMIN requirement, so that zone management send
operations match the access control requirement of write(), BLKSECDISCARD
and BLKZEROOUT.

Fixes: 3ed05a98 ("blk-zoned: implement ioctls")
Signed-off-by: NNiklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NAravind Ramesh <aravind.ramesh@wdc.com>
Reviewed-by: NAdam Manzanares <a.manzanares@samsung.com>
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: stable@vger.kernel.org # v4.10+
Link: https://lore.kernel.org/r/20210811110505.29649-2-Niklas.Cassel@wdc.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

ead3b768

block: refine the disk_live check in del_gendisk · 9f286992

由 Christoph Hellwig 提交于 8月 24, 2021

hidden gendisks will never be marked live.

Fixes: 40b3a52f ("block: add a sanity check for a live disk in del_gendisk")
Reported-by: NBruno Goncalves <bgoncalv@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210824144310.1487816-1-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

9f286992

partitions/efi: Support non-standard GPT location · 466d9c49

由 Dmitry Osipenko 提交于 8月 20, 2021

Support looking up GPT at a non-standard location specified by a block
device driver.
Acked-by: NDavidlohr Bueso <dbueso@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDmitry Osipenko <digetx@gmail.com>
Reviewed-by: NUlf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210820004536.15791-3-digetx@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

466d9c49

24 8月, 2021 2 次提交

bio: fix page leak bio_add_hw_page failure · d9cf3bd5

由 Pavel Begunkov 提交于 7月 19, 2021

__bio_iov_append_get_pages() doesn't put not appended pages on
bio_add_hw_page() failure, so potentially leaking them, fix it. Also, do
the same for __bio_iov_iter_get_pages(), even though it looks like it
can't be triggered by userspace in this case.

Fixes: 0512a75b ("block: Introduce REQ_OP_ZONE_APPEND")
Cc: stable@vger.kernel.org # 5.8+
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1edfa6a2ffd66d55e6345a477df5387d2c1415d0.1626653825.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

d9cf3bd5

block: remove CONFIG_DEBUG_BLOCK_EXT_DEVT · c4b2b7d1

由 Christoph Hellwig 提交于 8月 24, 2021

This might have been a neat debug aid when the extended dev_t was
added, but that time is long gone.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210824075216.1179406-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

c4b2b7d1

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功