- 15 3月, 2022 1 次提交
-
-
由 Ming Lei 提交于
blkcg_init_queue() may add rq qos structures to request queue, previously blk_cleanup_queue() calls rq_qos_exit() to release them, but commit 8e141f9e ("block: drain file system I/O on del_gendisk") moves rq_qos_exit() into del_gendisk(), so memory leak is caused because queues may not have disk, such as un-present scsi luns, nvme admin queue, ... Fixes the issue by adding rq_qos_exit() to blk_cleanup_queue() back. BTW, v5.18 won't need this patch any more since we move blkcg_init_queue()/blkcg_exit_queue() into disk allocation/release handler, and patches have been in for-5.18/block. Cc: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Fixes: 8e141f9e ("block: drain file system I/O on del_gendisk") Reported-by: syzbot+b42749a851a47a0f581b@syzkaller.appspotmail.com Signed-off-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220314043018.177141-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 3月, 2022 1 次提交
-
-
由 Shin'ichiro Kawasaki 提交于
Commit 9d497e29 ("block: don't protect submit_bio_checks by q_usage_counter") moved blk_mq_attempt_bio_merge and rq_qos_throttle calls out of q_usage_counter protection. However, these functions require q_usage_counter protection. The blk_mq_attempt_bio_merge call without the protection resulted in blktests block/005 failure with KASAN null- ptr-deref or use-after-free at bio merge. The rq_qos_throttle call without the protection caused kernel hang at qos throttle. To fix the failures, move the blk_mq_attempt_bio_merge and rq_qos_throttle calls back to q_usage_counter protection. Fixes: 9d497e29 ("block: don't protect submit_bio_checks by q_usage_counter") Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Link: https://lore.kernel.org/r/20220308080915.3473689-1-shinichiro.kawasaki@wdc.comReviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 28 2月, 2022 1 次提交
-
-
由 Yu Kuai 提交于
When tracing the whole disk, 'dropped' and 'msg' will be created under 'q->debugfs_dir' and 'bt->dir' is NULL, thus blk_trace_free() won't remove those files. What's worse, the following UAF can be triggered because of accessing stale 'dropped' and 'msg': ================================================================== BUG: KASAN: use-after-free in blk_dropped_read+0x89/0x100 Read of size 4 at addr ffff88816912f3d8 by task blktrace/1188 CPU: 27 PID: 1188 Comm: blktrace Not tainted 5.17.0-rc4-next-20220217+ #469 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-4 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_address_description.constprop.0.cold+0xab/0x381 ? blk_dropped_read+0x89/0x100 ? blk_dropped_read+0x89/0x100 kasan_report.cold+0x83/0xdf ? blk_dropped_read+0x89/0x100 kasan_check_range+0x140/0x1b0 blk_dropped_read+0x89/0x100 ? blk_create_buf_file_callback+0x20/0x20 ? kmem_cache_free+0xa1/0x500 ? do_sys_openat2+0x258/0x460 full_proxy_read+0x8f/0xc0 vfs_read+0xc6/0x260 ksys_read+0xb9/0x150 ? vfs_write+0x3d0/0x3d0 ? fpregs_assert_state_consistent+0x55/0x60 ? exit_to_user_mode_prepare+0x39/0x1e0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fbc080d92fd Code: ce 20 00 00 75 10 b8 00 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 1 RSP: 002b:00007fbb95ff9cb0 EFLAGS: 00000293 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 00007fbb95ff9dc0 RCX: 00007fbc080d92fd RDX: 0000000000000100 RSI: 00007fbb95ff9cc0 RDI: 0000000000000045 RBP: 0000000000000045 R08: 0000000000406299 R09: 00000000fffffffd R10: 000000000153afa0 R11: 0000000000000293 R12: 00007fbb780008c0 R13: 00007fbb78000938 R14: 0000000000608b30 R15: 00007fbb780029c8 </TASK> Allocated by task 1050: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 do_blk_trace_setup+0xcb/0x410 __blk_trace_setup+0xac/0x130 blk_trace_ioctl+0xe9/0x1c0 blkdev_ioctl+0xf1/0x390 __x64_sys_ioctl+0xa5/0xe0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Freed by task 1050: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_set_free_info+0x20/0x30 __kasan_slab_free+0x103/0x180 kfree+0x9a/0x4c0 __blk_trace_remove+0x53/0x70 blk_trace_ioctl+0x199/0x1c0 blkdev_common_ioctl+0x5e9/0xb30 blkdev_ioctl+0x1a5/0x390 __x64_sys_ioctl+0xa5/0xe0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae The buggy address belongs to the object at ffff88816912f380 which belongs to the cache kmalloc-96 of size 96 The buggy address is located 88 bytes inside of 96-byte region [ffff88816912f380, ffff88816912f3e0) The buggy address belongs to the page: page:000000009a1b4e7c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0f flags: 0x17ffffc0000200(slab|node=0|zone=2|lastcpupid=0x1fffff) raw: 0017ffffc0000200 ffffea00044f1100 dead000000000002 ffff88810004c780 raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88816912f280: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ffff88816912f300: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc >ffff88816912f380: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ^ ffff88816912f400: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ffff88816912f480: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ================================================================== Fixes: c0ea5760 ("blktrace: remove debugfs file dentries from struct blk_trace") Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20220228034354.4047385-1-yukuai3@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 24 2月, 2022 1 次提交
-
-
git://git.infradead.org/nvme由 Jens Axboe 提交于
Pull NVMe fixes from Christoph: "nvme fixes for Linux 5.17 - send H2CData PDUs based on MAXH2CDATA (Varun Prakash) - fix passthrough to namespaces with unsupported features (me)" * tag 'nvme-5.17-2022-02-24' of git://git.infradead.org/nvme: nvme-tcp: send H2CData PDUs based on MAXH2CDATA nvme: also mark passthrough-only namespaces ready in nvme_update_ns_info nvme: don't return an error from nvme_configure_metadata
-
- 23 2月, 2022 3 次提交
-
-
由 Varun Prakash 提交于
As per NVMe/TCP specification (revision 1.0a, section 3.6.2.3) Maximum Host to Controller Data length (MAXH2CDATA): Specifies the maximum number of PDU-Data bytes per H2CData PDU in bytes. This value is a multiple of dwords and should be no less than 4,096. Current code sets H2CData PDU data_length to r2t_length, it does not check MAXH2CDATA value. Fix this by setting H2CData PDU data_length to min(req->h2cdata_left, queue->maxh2cdata). Also validate MAXH2CDATA value returned by target in ICResp PDU, if it is not a multiple of dword or if it is less than 4096 return -EINVAL from nvme_tcp_init_connection(). Signed-off-by: NVarun Prakash <varun@chelsio.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Christoph Hellwig 提交于
Commit e7d65803 ("nvme-multipath: revalidate paths during rescan") introduced the NVME_NS_READY flag, which nvme_path_is_disabled() uses to check if a path can be used or not. We also need to set this flag for devices that fail the ZNS feature validation and which are available through passthrough devices only to that they can be used in multipathing setups. Fixes: e7d65803 ("nvme-multipath: revalidate paths during rescan") Reported-by: NKanchan Joshi <joshi.k@samsung.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NDaniel Wagner <dwagner@suse.de> Tested-by: NKanchan Joshi <joshi.k@samsung.com>
-
由 Christoph Hellwig 提交于
When a fabrics controller claims to support an invalidate metadata configuration we already warn and disable metadata support. No need to also return an error during revalidation. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NDaniel Wagner <dwagner@suse.de> Tested-by: NKanchan Joshi <joshi.k@samsung.com>
-
- 22 2月, 2022 1 次提交
-
-
由 Stefano Garzarella 提交于
iocb_bio_iopoll() expects iocb->private to be cleared before releasing the bio. We already do this in blkdev_bio_end_io(), but we forgot in the recently added blkdev_bio_end_io_async(). Fixes: 54a88eb8 ("block: add single bio async direct IO helper") Cc: asml.silence@gmail.com Signed-off-by: NStefano Garzarella <sgarzare@redhat.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220211090136.44471-1-sgarzare@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 17 2月, 2022 3 次提交
-
-
由 Laibin Qiu 提交于
Now that we disable wbt by set WBT_STATE_OFF_DEFAULT in wbt_disable_default() when switch elevator to bfq. And when we remove scsi device, wbt will be enabled by wbt_enable_default. If it become false positive between wbt_wait() and wbt_track() when submit write request. The following is the scenario that triggered the problem. T1 T2 T3 elevator_switch_mq bfq_init_queue wbt_disable_default <= Set rwb->enable_state (OFF) Submit_bio blk_mq_make_request rq_qos_throttle <= rwb->enable_state (OFF) scsi_remove_device sd_remove del_gendisk blk_unregister_queue elv_unregister_queue wbt_enable_default <= Set rwb->enable_state (ON) q_qos_track <= rwb->enable_state (ON) ^^^^^^ this request will mark WBT_TRACKED without inflight add and will lead to drop rqw->inflight to -1 in wbt_done() which will trigger IO hung. Fix this by move wbt_enable_default() from elv_unregister to bfq_exit_queue(). Only re-enable wbt when bfq exit. Fixes: 76a80408 ("blk-wbt: make sure throttle is enabled properly") Remove oneline stale comment, and kill one oneshot local variable. Signed-off-by: NMing Lei <ming.lei@rehdat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/linux-block/20211214133103.551813-1-qiulaibin@huawei.com/Signed-off-by: NLaibin Qiu <qiulaibin@huawei.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Various block drivers call blk_set_queue_dying to mark a disk as dead due to surprise removal events, but since commit 8e141f9e that doesn't work given that the GD_DEAD flag needs to be set to stop I/O. Replace the driver calls to blk_set_queue_dying with a new (and properly documented) blk_mark_disk_dead API, and fold blk_set_queue_dying into the only remaining caller. Fixes: 8e141f9e ("block: drain file system I/O on del_gendisk") Reported-by: NMarkus Blöchl <markus.bloechl@ipetronik.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20220217075231.1140-1-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Haimin Zhang 提交于
Add __GFP_ZERO flag for alloc_page in function bio_copy_kern to initialize the buffer of a bio. Signed-off-by: NHaimin Zhang <tcs.kernel@gmail.com> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220216084038.15635-1-tcs.kernel@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 12 2月, 2022 2 次提交
-
-
由 Ming Lei 提交于
If backing file's filesystem has implemented ->fallocate(), we think the loop device can support discard, then pass sb->s_blocksize as discard_granularity. However, some underlying FS, such as overlayfs, doesn't set sb->s_blocksize, and causes discard_granularity to be set as zero, then the warning in __blkdev_issue_discard() is triggered. Christoph suggested to pass kstatfs.f_bsize as discard granularity, and this way is fine because kstatfs.f_bsize means 'Optimal transfer block size', which still matches with definition of discard granularity. So fix the issue by setting discard_granularity as kstatfs.f_bsize if it is available, otherwise claims discard isn't supported. Cc: Christoph Hellwig <hch@lst.de> Cc: Vivek Goyal <vgoyal@redhat.com> Reported-by: NPei Zhang <pezhang@redhat.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220126035830.296465-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pankaj Raghav 提交于
Zone append command needs special handling to update the bi_sector field in the bio struct with the actual position of the data in the device. It is stored in __sector field of the request struct. Fixes: 5581a5dd ("block: add completion handler for fast path") Signed-off-by: NPankaj Raghav <p.raghav@samsung.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Tested-by: NAdam Manzanares <a.manzanares@samsung.com> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220211093425.43262-2-p.raghav@samsung.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 11 2月, 2022 1 次提交
-
-
由 Tetsuo Handa 提交于
The kernel test robot is reporting that xfstest which does umount ext2 on xfs umount xfs sequence started failing, for commit 322c4293 ("loop: make autoclear operation asynchronous") removed a guarantee that fput() of backing file is processed before lo_release() from close() returns to user mode. And syzbot is reporting that deferring destroy_workqueue() from __loop_clr_fd() to a WQ context did not help [1]. Revert that commit. Link: https://syzkaller.appspot.com/bug?extid=831661966588c802aae9 [1] Reported-by: Nkernel test robot <oliver.sang@intel.com> Acked-by: NJan Kara <jack@suse.cz> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reported-by: Nsyzbot <syzbot+831661966588c802aae9@syzkaller.appspotmail.com> Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: https://lore.kernel.org/r/20220211071554.3424-1-penguin-kernel@I-love.SAKURA.ne.jpSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 10 2月, 2022 1 次提交
-
-
git://git.infradead.org/nvme由 Jens Axboe 提交于
Pull NVMe fixes from Christoph: "nvme fixes for Linux 5.17 - nvme-tcp: fix bogus request completion when failing to send AER (Sagi Grimberg) - add the missing nvme_complete_req tracepoint for batched completion (Bean Huo)" * tag 'nvme-5.17-2022-02-10' of git://git.infradead.org/nvme: nvme-tcp: fix bogus request completion when failing to send AER nvme: add nvme_complete_req tracepoint for batched completion
-
- 09 2月, 2022 2 次提交
-
-
由 Sagi Grimberg 提交于
AER is not backed by a real request, hence we should not incorrectly assume that when failing to send a nvme command, it is a normal request but rather check if this is an aer and if so complete the aer (similar to the normal completion path). Cc: stable@vger.kernel.org Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Bean Huo 提交于
Add NVMe request completion trace in nvme_complete_batch_req() because nvme:nvme_complete_req tracepoint is missing in case of request batched completion. Signed-off-by: NBean Huo <beanhuo@micron.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 04 2月, 2022 3 次提交
-
-
由 Martin K. Petersen 提交于
Commit 309a62fa ("bio-integrity: bio_integrity_advance must update integrity seed") added code to update the integrity seed value when advancing a bio. However, it failed to take into account that the integrity interval might be larger than the 512-byte block layer sector size. This broke bio splitting on PI devices with 4KB logical blocks. The seed value should be advanced by bio_integrity_intervals() and not the number of sectors. Cc: Dmitry Monakhov <dmonakhov@openvz.org> Cc: stable@vger.kernel.org Fixes: 309a62fa ("bio-integrity: bio_integrity_advance must update integrity seed") Tested-by: NDmitry Ivanov <dmitry.ivanov2@hpe.com> Reported-by: NAlexey Lyashkov <alexey.lyashkov@hpe.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220204034209.4193-1-martin.petersen@oracle.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
git://git.infradead.org/nvme由 Jens Axboe 提交于
Pull NVMe fixes from Christoph: "nvme fixes for Linux 5.17 - fix a use-after-free in rdm and tcp controller reset (Sagi Grimberg) - fix the state check in nvmf_ctlr_matches_baseopts (Uday Shankar)" * tag 'nvme-5.17-2022-02-03' of git://git.infradead.org/nvme: nvme-fabrics: fix state check in nvmf_ctlr_matches_baseopts() nvme-rdma: fix possible use-after-free in transport error_recovery work nvme-tcp: fix possible use-after-free in transport error_recovery work nvme: fix a possible use-after-free in controller reset during load
-
https://git.kernel.org/pub/scm/linux/kernel/git/song/md由 Jens Axboe 提交于
Pull MD fix from Song: "Please consider pulling the following fix on top of your block-5.17 branch. It fixes a NULL ptr deref case with nowait." * 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md: fix NULL pointer deref with nowait but no mddev->queue
-
- 03 2月, 2022 2 次提交
-
-
由 Uday Shankar 提交于
Controller deletion/reset, immediately followed by or concurrent with a reconnect, is hard failing the connect attempt resulting in a complete loss of connectivity to the controller. In the connect request, fabrics looks for an existing controller with the same address components and aborts the connect if a controller already exists and the duplicate connect option isn't set. The match routine filters out controllers that are dead or dying, so they don't interfere with the new connect request. When NVME_CTRL_DELETING_NOIO was added, it missed updating the state filters in the nvmf_ctlr_matches_baseopts() routine. Thus, when in this new state, it's seen as a live controller and fails the connect request. Correct by adding the DELETING_NIO state to the match checks. Fixes: ecca390e ("nvme: fix deadlock in disconnect during scan_work and/or ana_work") Cc: <stable@vger.kernel.org> # v5.7+ Signed-off-by: NUday Shankar <ushankar@purestorage.com> Reviewed-by: NJames Smart <jsmart2021@gmail.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Song Liu 提交于
Leon reported NULL pointer deref with nowait support: [ 15.123761] device-mapper: raid: Loading target version 1.15.1 [ 15.124185] device-mapper: raid: Ignoring chunk size parameter for RAID 1 [ 15.124192] device-mapper: raid: Choosing default region size of 4MiB [ 15.129524] BUG: kernel NULL pointer dereference, address: 0000000000000060 [ 15.129530] #PF: supervisor write access in kernel mode [ 15.129533] #PF: error_code(0x0002) - not-present page [ 15.129535] PGD 0 P4D 0 [ 15.129538] Oops: 0002 [#1] PREEMPT SMP NOPTI [ 15.129541] CPU: 5 PID: 494 Comm: ldmtool Not tainted 5.17.0-rc2-1-mainline #1 9fe89d43dfcb215d2731e6f8851740520778615e [ 15.129546] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ELITE/X570 AORUS ELITE, BIOS F36e 10/14/2021 [ 15.129549] RIP: 0010:blk_queue_flag_set+0x7/0x20 [ 15.129555] Code: 00 00 00 0f 1f 44 00 00 48 8b 35 e4 e0 04 02 48 8d 57 28 bf 40 01 \ 00 00 e9 16 c1 be ff 66 0f 1f 44 00 00 0f 1f 44 00 00 89 ff <f0> 48 0f ab 7e 60 \ 31 f6 89 f7 c3 66 66 2e 0f 1f 84 00 00 00 00 00 [ 15.129559] RSP: 0018:ffff966b81987a88 EFLAGS: 00010202 [ 15.129562] RAX: ffff8b11c363a0d0 RBX: ffff8b11e294b070 RCX: 0000000000000000 [ 15.129564] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000001d [ 15.129566] RBP: ffff8b11e294b058 R08: 0000000000000000 R09: 0000000000000000 [ 15.129568] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b11e294b070 [ 15.129570] R13: 0000000000000000 R14: ffff8b11e294b000 R15: 0000000000000001 [ 15.129572] FS: 00007fa96e826780(0000) GS:ffff8b18deb40000(0000) knlGS:0000000000000000 [ 15.129575] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.129577] CR2: 0000000000000060 CR3: 000000010b8ce000 CR4: 00000000003506e0 [ 15.129580] Call Trace: [ 15.129582] <TASK> [ 15.129584] md_run+0x67c/0xc70 [md_mod 1e470c1b6bcf1114198109f42682f5a2740e9531] [ 15.129597] raid_ctr+0x134a/0x28ea [dm_raid 6a645dd7519e72834bd7e98c23497eeade14cd63] [ 15.129604] ? dm_split_args+0x63/0x150 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e] [ 15.129615] dm_table_add_target+0x188/0x380 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e] [ 15.129625] table_load+0x13b/0x370 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e] [ 15.129635] ? dev_suspend+0x2d0/0x2d0 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e] [ 15.129644] ctl_ioctl+0x1bd/0x460 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e] [ 15.129655] dm_ctl_ioctl+0xa/0x20 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e] [ 15.129663] __x64_sys_ioctl+0x8e/0xd0 [ 15.129667] do_syscall_64+0x5c/0x90 [ 15.129672] ? syscall_exit_to_user_mode+0x23/0x50 [ 15.129675] ? do_syscall_64+0x69/0x90 [ 15.129677] ? do_syscall_64+0x69/0x90 [ 15.129679] ? syscall_exit_to_user_mode+0x23/0x50 [ 15.129682] ? do_syscall_64+0x69/0x90 [ 15.129684] ? do_syscall_64+0x69/0x90 [ 15.129686] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 15.129689] RIP: 0033:0x7fa96ecd559b [ 15.129692] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c \ c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff \ ff 73 01 c3 48 8b 0d a5 a8 0c 00 f7 d8 64 89 01 48 [ 15.129696] RSP: 002b:00007ffcaf85c258 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 [ 15.129699] RAX: ffffffffffffffda RBX: 00007fa96f1b48f0 RCX: 00007fa96ecd559b [ 15.129701] RDX: 00007fa97017e610 RSI: 00000000c138fd09 RDI: 0000000000000003 [ 15.129702] RBP: 00007fa96ebab583 R08: 00007fa97017c9e0 R09: 00007ffcaf85bf27 [ 15.129704] R10: 0000000000000001 R11: 0000000000000206 R12: 00007fa97017e610 [ 15.129706] R13: 00007fa97017e640 R14: 00007fa97017e6c0 R15: 00007fa97017e530 [ 15.129709] </TASK> This is caused by missing mddev->queue check for setting QUEUE_FLAG_NOWAIT Fix this by moving the QUEUE_FLAG_NOWAIT logic to under mddev->queue check. Fixes: f51d46d0 ("md: add support for REQ_NOWAIT") Reported-by: NLeon Möller <jkhsjdhjs@totally.rip> Tested-by: NLeon Möller <jkhsjdhjs@totally.rip> Cc: Vishal Verma <vverma@digitalocean.com> Signed-off-by: NSong Liu <song@kernel.org>
-
- 02 2月, 2022 4 次提交
-
-
由 Ilya Dryomov 提交于
Commit ceaa7625 ("block: move direct_IO into our own read_iter handler") introduced several regressions for bdev DIO: 1. read spanning EOF always returns 0 instead of the number of bytes read. This is because "count" is assigned early and isn't updated when the iterator is truncated: $ lsblk -o name,size /dev/vdb NAME SIZE vdb 1G $ xfs_io -d -c 'pread -b 4M 1021M 4M' /dev/vdb read 0/4194304 bytes at offset 1070596096 0.000000 bytes, 0 ops; 0.0007 sec (0.000000 bytes/sec and 0.0000 ops/sec) instead of $ xfs_io -d -c 'pread -b 4M 1021M 4M' /dev/vdb read 3145728/4194304 bytes at offset 1070596096 3 MiB, 1 ops; 0.0007 sec (3.865 GiB/sec and 1319.2612 ops/sec) 2. truncated iterator isn't reexpanded 3. iterator isn't reverted on blkdev_direct_IO() error 4. zero size read no longer skips atime update Fixes: ceaa7625 ("block: move direct_IO into our own read_iter handler") Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220201100420.25875-1-idryomov@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Sagi Grimberg 提交于
While nvme_rdma_submit_async_event_work is checking the ctrl and queue state before preparing the AER command and scheduling io_work, in order to fully prevent a race where this check is not reliable the error recovery work must flush async_event_work before continuing to destroy the admin queue after setting the ctrl state to RESETTING such that there is no race .submit_async_event and the error recovery handler itself changing the ctrl state. Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
由 Sagi Grimberg 提交于
While nvme_tcp_submit_async_event_work is checking the ctrl and queue state before preparing the AER command and scheduling io_work, in order to fully prevent a race where this check is not reliable the error recovery work must flush async_event_work before continuing to destroy the admin queue after setting the ctrl state to RESETTING such that there is no race .submit_async_event and the error recovery handler itself changing the ctrl state. Tested-by: NChris Leech <cleech@redhat.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
由 Sagi Grimberg 提交于
Unlike .queue_rq, in .submit_async_event drivers may not check the ctrl readiness for AER submission. This may lead to a use-after-free condition that was observed with nvme-tcp. The race condition may happen in the following scenario: 1. driver executes its reset_ctrl_work 2. -> nvme_stop_ctrl - flushes ctrl async_event_work 3. ctrl sends AEN which is received by the host, which in turn schedules AEN handling 4. teardown admin queue (which releases the queue socket) 5. AEN processed, submits another AER, calling the driver to submit 6. driver attempts to send the cmd ==> use-after-free In order to fix that, add ctrl state check to validate the ctrl is actually able to accept the AER submission. This addresses the above race in controller resets because the driver during teardown should: 1. change ctrl state to RESETTING 2. flush async_event_work (as well as other async work elements) So after 1,2, any other AER command will find the ctrl state to be RESETTING and bail out without submitting the AER. Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
- 29 1月, 2022 3 次提交
-
-
由 Mike Snitzer 提交于
Record the start_time for a bio but defer the starting block core's IO accounting until after IO is submitted using bio_start_io_acct_time(). This approach avoids the need to mess around with any of the individual IO stats in response to a bio_split() that follows bio submission. Reported-by: NBud Brown <bubrown@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Depends-on: e45c47d1 ("block: add bio_start_io_acct_time() to control start_time") Signed-off-by: NMike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220128155841.39644-4-snitzer@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Mike Snitzer 提交于
Reverts a1e1cb72 ("dm: fix redundant IO accounting for bios that need splitting") because it was too narrow in scope (only addressed redundant 'sectors[]' accounting and not ios, nsecs[], etc). Cc: stable@vger.kernel.org Signed-off-by: NMike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220128155841.39644-3-snitzer@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Mike Snitzer 提交于
bio_start_io_acct_time() interface is like bio_start_io_acct() that allows start_time to be passed in. This gives drivers the ability to defer starting accounting until after IO is issued (but possibily not entirely due to bio splitting). Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220128155841.39644-2-snitzer@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 28 1月, 2022 1 次提交
-
-
由 Laibin Qiu 提交于
Commit 180dccb0 ("blk-mq: fix tag_get wait task can't be awakened") will recalculate wake_batch when incrementing or decrementing active_queues to avoid wake_batch > hctx_max_depth. At the same time, in order to not affect performance as much as possible, the minimum wakeup batch is set to 4. But when the QD is small (such as QD=1), if inc or dec active_queues increases wakeup batch, that can lead to a hang: Fix this problem with the following strategies: QD : >= 32 | < 32 --------------------------------- wakeup batch: 8~4 | 3~1 Fixes: 180dccb0 ("blk-mq: fix tag_get wait task can't be awakened") Link: https://lore.kernel.org/linux-block/78cafe94-a787-e006-8851-69906f0c2128@huawei.com/T/#tReported-by: NAlex Xu (Hello71) <alex_y_xu@yahoo.ca> Signed-off-by: NLaibin Qiu <qiulaibin@huawei.com> Tested-by: NAlex Xu (Hello71) <alex_y_xu@yahoo.ca> Link: https://lore.kernel.org/r/20220127100047.1763746-1-qiulaibin@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 27 1月, 2022 3 次提交
-
-
git://git.infradead.org/nvme由 Jens Axboe 提交于
Pull NVMe fixes from Christoph: "nvme fixes for Linux 5.17 - add the IGNORE_DEV_SUBNQN quirk for Intel P4500/P4600 SSDs (Wu Zheng) - remove the unneeded ret variable in nvmf_dev_show (Changcheng Deng)" * tag 'nvme-5.17-2022-01-27' of git://git.infradead.org/nvme: nvme-fabrics: remove the unneeded ret variable in nvmf_dev_show nvme-pci: add the IGNORE_DEV_SUBNQN quirk for Intel P4500/P4600 SSDs
-
由 Changcheng Deng 提交于
Remove unneeded variable and directly return 0. Reported-by: NZeal Robot <zealci@zte.com.cn> Signed-off-by: NChangcheng Deng <deng.changcheng@zte.com.cn> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Wu Zheng 提交于
The Intel P4500/P4600 SSDs do not report a subsystem NQN despite claiming compliance to a standards version where reporting one is required. Add the IGNORE_DEV_SUBNQN quirk to not fail the initialization of a second such SSDs in a system. Signed-off-by: NZheng Wu <wu.zheng@intel.com> Signed-off-by: NYe Jinhe <jinhe.ye@intel.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 26 1月, 2022 1 次提交
-
-
由 Yu Kuai 提交于
If blk_mq_request_issue_directly() failed from blk_insert_cloned_request(), the request will be accounted start. Currently, blk_insert_cloned_request() is only called by dm, and such request won't be accounted done by dm. In normal path, io will be accounted start from blk_mq_bio_to_request(), when the request is allocated, and such io will be accounted done from __blk_mq_end_request_acct() whether it succeeded or failed. Thus add blk_account_io_done() to fix the problem. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220126012132.3111551-1-yukuai3@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 24 1月, 2022 1 次提交
-
-
由 Miaoqian Lin 提交于
kobject_init_and_add() takes reference even when it fails. According to the doc of kobject_init_and_add() If this function returns an error, kobject_put() must be called to properly clean up the memory associated with the object. Fix this issue by adding kobject_put(). Callback function blk_ia_ranges_sysfs_release() in kobject_put() can handle the pointer "iars" properly. Fixes: a2247f19 ("block: Add independent access ranges support") Signed-off-by: NMiaoqian Lin <linmq006@gmail.com> Reviewed-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220120101025.22411-1-linmq006@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 23 1月, 2022 5 次提交
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux由 Linus Torvalds 提交于
Pull powerpc fixes from Michael Ellerman: - A series of bpf fixes, including an oops fix and some codegen fixes. - Fix a regression in syscall_get_arch() for compat processes. - Fix boot failure on some 32-bit systems with KASAN enabled. - A couple of other build/minor fixes. Thanks to Athira Rajeev, Christophe Leroy, Dmitry V. Levin, Jiri Olsa, Johan Almbladh, Maxime Bizon, Naveen N. Rao, and Nicholas Piggin. * tag 'powerpc-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Mask SRR0 before checking against the masked NIP powerpc/perf: Only define power_pmu_wants_prompt_pmi() for CONFIG_PPC64 powerpc/32s: Fix kasan_init_region() for KASAN powerpc/time: Fix build failure due to do_hard_irq_enable() on PPC32 powerpc/audit: Fix syscall_get_arch() powerpc64/bpf: Limit 'ldbrx' to processors compliant with ISA v2.06 tools/bpf: Rename 'struct event' to avoid naming conflict powerpc/bpf: Update ldimm64 instructions during extra pass powerpc32/bpf: Fix codegen for bpf-to-bpf calls bpf: Guard against accessing NULL pt_regs in bpf_get_task_stack()
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip由 Linus Torvalds 提交于
Pull irq fix from Borislav Petkov: "A single use-after-free fix in the PCI MSI irq domain allocation path" * tag 'irq_urgent_for_v5.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: PCI/MSI: Prevent UAF in error path
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip由 Linus Torvalds 提交于
Pull scheduler fixes from Borislav Petkov: "A bunch of fixes: forced idle time accounting, utilization values propagation in the sched hierarchies and other minor cleanups and improvements" * tag 'sched_urgent_for_v5.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kernel/sched: Remove dl_boosted flag comment sched: Avoid double preemption in __cond_resched_*lock*() sched/fair: Fix all kernel-doc warnings sched/core: Accounting forceidle time for all tasks except idle task sched/pelt: Relax the sync of load_sum with load_avg sched/pelt: Relax the sync of runnable_sum with runnable_avg sched/pelt: Continue to relax the sync of util_sum with util_avg sched/pelt: Relax the sync of util_sum with util_avg psi: Fix uaf issue when psi trigger is destroyed while being polled
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip由 Linus Torvalds 提交于
Pull perf fixes from Borislav Petkov: - Add support for accessing the general purpose counters on Alder Lake via MMIO - Add new LBR format v7 support which is v5 modulo TSX - Fix counter enumeration on Alder Lake hybrids - Overhaul how context time updates are done and get rid of perf_event::shadow_ctx_time. - The usual amount of fixes: event mask correction, supported event types reporting, etc. * tag 'perf_urgent_for_v5.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/perf: Avoid warning for Arch LBR without XSAVE perf/x86/intel/uncore: Add IMC uncore support for ADL perf/x86/intel/lbr: Add static_branch for LBR INFO flags perf/x86/intel/lbr: Support LBR format V7 perf/x86/rapl: fix AMD event handling perf/x86/intel/uncore: Fix CAS_COUNT_WRITE issue for ICX perf/x86/intel: Add a quirk for the calculation of the number of counters on Alder Lake perf: Fix perf_event_read_local() time
-
由 Linus Torvalds 提交于
-