- 03 6月, 2023 16 次提交
-
-
由 Li Nan 提交于
hulk inclusion category: bugfix bugzilla: 188377, https://gitee.com/openeuler/kernel/issues/I6GOYF CVE: NA -------------------------------- After commit 4ca40c2c ("md/raid10: Allow replacement device to be replace old drive.") mirrors->replacement can replace rdev during replacement's io pending, and repl_bio will write rdev (see raid10_write_one_disk()). We will get wrong device by r10conf in raid10_end_write_request(). In which case, r10_bio->devs[slot].repl_bio will be put but not set to IO_MADE_GOOD, and it will be put again later in raid_end_bio_io(), uaf occurs. Fix it by using r10_bio to record rdev. Put the operations of io fail and no replacement together, so no need to change repl. ================================================================== BUG: KASAN: use-after-free in bio_flagged include/linux/bio.h:238 [inline] BUG: KASAN: use-after-free in bio_put+0x78/0x80 block/bio.c:650 Read of size 2 at addr ffff888116524dd4 by task md0_raid10/2618 CPU: 0 PID: 2618 Comm: md0_raid10 Not tainted 5.10.0+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 sd 0:0:0:0: rejecting I/O to offline device Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x107/0x167 lib/dump_stack.c:118 print_address_description.constprop.0+0x1c/0x270 mm/kasan/report.c:390 __kasan_report mm/kasan/report.c:550 [inline] kasan_report.cold+0x22/0x3a mm/kasan/report.c:567 bio_flagged include/linux/bio.h:238 [inline] bio_put+0x78/0x80 block/bio.c:650 put_all_bios drivers/md/raid10.c:248 [inline] free_r10bio drivers/md/raid10.c:257 [inline] raid_end_bio_io+0x3b5/0x590 drivers/md/raid10.c:309 handle_write_completed drivers/md/raid10.c:2699 [inline] raid10d+0x2f85/0x5af0 drivers/md/raid10.c:2759 md_thread+0x444/0x4b0 drivers/md/md.c:7932 kthread+0x38c/0x470 kernel/kthread.c:313 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:299 Allocated by task 1400: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track mm/kasan/common.c:56 [inline] set_alloc_info mm/kasan/common.c:498 [inline] __kasan_kmalloc.constprop.0+0xb5/0xe0 mm/kasan/common.c:530 slab_post_alloc_hook mm/slab.h:512 [inline] slab_alloc_node mm/slub.c:2923 [inline] slab_alloc mm/slub.c:2931 [inline] kmem_cache_alloc+0x144/0x360 mm/slub.c:2936 mempool_alloc+0x146/0x360 mm/mempool.c:391 bio_alloc_bioset+0x375/0x610 block/bio.c:486 bio_clone_fast+0x20/0x50 block/bio.c:711 raid10_write_one_disk+0x166/0xd30 drivers/md/raid10.c:1240 raid10_write_request+0x1600/0x2c90 drivers/md/raid10.c:1484 __make_request drivers/md/raid10.c:1508 [inline] raid10_make_request+0x376/0x620 drivers/md/raid10.c:1537 md_handle_request+0x699/0x970 drivers/md/md.c:451 md_submit_bio+0x204/0x400 drivers/md/md.c:489 __submit_bio block/blk-core.c:959 [inline] __submit_bio_noacct block/blk-core.c:1007 [inline] submit_bio_noacct+0x2e3/0xcf0 block/blk-core.c:1086 submit_bio+0x1a0/0x3a0 block/blk-core.c:1146 submit_bh_wbc+0x685/0x8e0 fs/buffer.c:3053 ext4_commit_super+0x37e/0x6c0 fs/ext4/super.c:5696 flush_stashed_error_work+0x28b/0x400 fs/ext4/super.c:791 process_one_work+0x9a6/0x1590 kernel/workqueue.c:2280 worker_thread+0x61d/0x1310 kernel/workqueue.c:2426 kthread+0x38c/0x470 kernel/kthread.c:313 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:299 Freed by task 2618: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track+0x1c/0x30 mm/kasan/common.c:56 kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:361 __kasan_slab_free+0x151/0x180 mm/kasan/common.c:482 slab_free_hook mm/slub.c:1569 [inline] slab_free_freelist_hook+0xa9/0x180 mm/slub.c:1608 slab_free mm/slub.c:3179 [inline] kmem_cache_free+0xcd/0x3d0 mm/slub.c:3196 mempool_free+0xe3/0x3b0 mm/mempool.c:500 bio_free+0xe2/0x140 block/bio.c:266 bio_put+0x58/0x80 block/bio.c:651 raid10_end_write_request+0x885/0xb60 drivers/md/raid10.c:516 bio_endio+0x376/0x6a0 block/bio.c:1465 req_bio_endio block/blk-core.c:289 [inline] blk_update_request+0x5f5/0xf40 block/blk-core.c:1525 blk_mq_end_request+0x4c/0x510 block/blk-mq.c:654 blk_flush_complete_seq+0x835/0xd80 block/blk-flush.c:204 flush_end_io+0x7b7/0xb90 block/blk-flush.c:261 __blk_mq_end_request+0x282/0x4c0 block/blk-mq.c:645 scsi_end_request+0x3a8/0x850 drivers/scsi/scsi_lib.c:607 scsi_io_completion+0x3f5/0x1320 drivers/scsi/scsi_lib.c:970 scsi_softirq_done+0x11b/0x490 drivers/scsi/scsi_lib.c:1448 blk_mq_complete_request block/blk-mq.c:788 [inline] blk_mq_complete_request+0x84/0xb0 block/blk-mq.c:785 scsi_mq_done+0x155/0x360 drivers/scsi/scsi_lib.c:1603 virtscsi_vq_done drivers/scsi/virtio_scsi.c:184 [inline] virtscsi_req_done+0x14c/0x220 drivers/scsi/virtio_scsi.c:199 vring_interrupt drivers/virtio/virtio_ring.c:2061 [inline] vring_interrupt+0x27a/0x300 drivers/virtio/virtio_ring.c:2047 __handle_irq_event_percpu+0x2f8/0x830 kernel/irq/handle.c:156 handle_irq_event_percpu kernel/irq/handle.c:196 [inline] handle_irq_event+0x105/0x280 kernel/irq/handle.c:213 handle_edge_irq+0x258/0xd20 kernel/irq/chip.c:828 asm_call_irq_on_stack+0xf/0x20 __run_irq_on_irqstack arch/x86/include/asm/irq_stack.h:48 [inline] run_irq_on_irqstack_cond arch/x86/include/asm/irq_stack.h:101 [inline] handle_irq arch/x86/kernel/irq.c:230 [inline] __common_interrupt arch/x86/kernel/irq.c:249 [inline] common_interrupt+0xe2/0x190 arch/x86/kernel/irq.c:239 asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:626 Fixes: 4ca40c2c ("md/raid10: Allow replacement device to be replace old drive.") Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> (cherry picked from commit af959500)
-
由 Li Nan 提交于
hulk inclusion category: bugfix bugzilla: 188527, https://gitee.com/openeuler/kernel/issues/I6O3HO CVE: NA -------------------------------- need_replace will be set to 1 if no-Faulty mreplace exists, and mreplace will be deref later. However, the latter check of mreplace might set mreplace to NULL, null-ptr-deref occurs if need_replace is 1 at this time. Fix it by merging two checks into one. Fixes: ee37d731 ("md/raid10: Fix raid10 replace hang when new added disk faulty") Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> (cherry picked from commit 7718714e)
-
由 Li Nan 提交于
hulk inclusion category: bugfix bugzilla: 188787, https://gitee.com/openeuler/kernel/issues/I78YIW CVE: NA -------------------------------- When we remove a disk which has replacement, first set rdev to NULL and then set replacement to rdev, finally set replacement to NULL (see raid10_remove_disk()). If io is submitted during the same time, it might read both rdev and replacement as NULL, and io will not be submitted. rdev -> NULL read rdev replacement -> NULL read replacement Fix it by reading replacement first and rdev later, meanwhile, use smp_mb() to prevent memory reordering. Fixes: 475b0321 ("md/raid10: writes should get directed to replacement as well as original.") Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> (cherry picked from commit e8025850)
-
由 Li Nan 提交于
hulk inclusion category: bugfix bugzilla: 188804, https://gitee.com/openeuler/kernel/issues/I78YIS CVE: NA -------------------------------- When add a new disk to raid10, it will traverse conf->mirror from start and find one of the following mirror: 1. mirror->rdev is set to WantReplacement and it have no replacement, set new disk to mirror->replacement. 2. no rdev, set new disk to mirror->rdev. There is a array as below (sda is set to WantReplacement): Number Major Minor RaidDevice State 0 8 0 0 active sync set-A /dev/sda - 0 0 1 removed 2 8 32 2 active sync set-A /dev/sdc 3 8 48 3 active sync set-B /dev/sdd Use 'mdadm --add' to add a new disk to this array, the new disk will become sda's replacement instead of add to removed position, which is confusing for users. Meanwhile, after new disk recovery success, sda will be set to Faulty. Prioritize adding disk to 'removed' mirror is a better choice. In the above scenario, the behavior is the same as before, except sda will not be deleted. Before other disks are added, continued use sda is more reliable. Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> (cherry picked from commit 2e2e7ab6)
-
由 Li Nan 提交于
hulk inclusion category: bugfix, https://gitee.com/openeuler/kernel/issues/I71EKW bugzilla: 188628 CVE: NA -------------------------------- We first set rdev to WantRemove, and check if there is any io pending, if so, we will clear flag and return busy in raid10_remove_disk(). io will loss as below: raid10_remove_disk set WantRemove write rdev if WantRemove do not submit io if rdev->nr_pending clear WantRemove return BUSY read rdev get error data Fix it by md_error the rdev which io pending while removing. When the code reaches this point, it means this rdev will be removed later, so setting it as faulty has little impact. Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> (cherry picked from commit 894f89fa)
-
由 Li Nan 提交于
hulk inclusion category: bugfix bugzilla: 188533, https://gitee.com/openeuler/kernel/issues/I6O7YB CVE: NA -------------------------------- commit ceff49d9 ("md/raid1: fix a race between removing rdev and access conf->mirrors[i].rdev") fix a null-ptr-deref about raid1. There is same bug in raid10 and fix it in the same way. There is no sync_thread running while removing rdev, no need to check the flag in raid10_sync_request(). Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> (cherry picked from commit 4461a62e)
-
由 Li Nan 提交于
hulk inclusion category: bugfix bugzilla: 188380, https://gitee.com/openeuler/kernel/issues/I6GISC CVE: NA -------------------------------- commit fe630de0 ("md/raid10: avoid deadlock on recovery.") allowed normal io and sync io to exist at the same time. Task hung will occur as below: T1 T2 T3 T4 raid10d handle_read_error allow_barrier conf->nr_pending-- -> 0 //submit sync io raid10_sync_request raise_barrier ->will not be blocked ... //submit to drivers raid10_read_request wait_barrier conf->nr_pending++ -> 1 //retry read fail raid10_end_read_request reschedule_retry add to retry_list conf->nr_queued++ -> 1 //sync io fail end_sync_read __end_sync_read reschedule_retry add to retry_list conf->nr_queued++ -> 2 ... handle_read_error freeze_array wait nr_pending == nr_queued+1 ->1 ->3 //task hung retry read and sync io will be added to retry_list(nr_queued->2) if they fails. raid10d() called handle_read_error() and hung in freeze_array(). nr_queued will not decrease because raid10d is blocked, nr_pending will not increase because conf->barrier is not released. Fix it by moving allow_barrier() after raid10_read_request(). raise_barrier() will wait for nr_waiting to become 0. Therefore, sync io and regular io will not be issued at the same time. We also removed the check of nr_queued. It can be 0 but don't need to be blocked. MD_RECOVERY_RUNNING always is set after this patch, because all sync io is waitting in raise_barrier(), remove it, too. Fixes: fe630de0 ("md/raid10: avoid deadlock on recovery.") Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> (cherry picked from commit 1fe782f0)
-
由 Yu Kuai 提交于
mainline inclusion from mainline-v6.1-rc1 commit ed2e063f category: bugfix bugzilla: 188380, https://gitee.com/openeuler/kernel/issues/I6GISC CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=ed2e063f92c44c891ccd883e289dde6ca870edcc -------------------------------- Currently the nasty condition in wait_barrier() is hard to read. This patch factors out the condition into a function. There are no functional changes. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Acked-by: NPaul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: NLogan Gunthorpe <logang@deltatee.com> Acked-by: NGuoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: NSong Liu <song@kernel.org> conflict: drivers/md/raid10.c Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> (cherry picked from commit 7aad54e0)
-
由 Li Nan 提交于
hulk inclusion category: bugfix bugzilla: 188628, https://gitee.com/openeuler/kernel/issues/I6WKDR CVE: NA -------------------------------- There is no limit to the number of io for raid10 plug, whitch may result in excessive memory usage and potential softlockup when a large number of io are submitted at once. There is no good way to fix it now, just add schedule point to prevent softlockup. Fixes: 57c67df4 ("md/raid10: submit IO from originating thread instead of md thread.") Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> (cherry picked from commit f8cecf7a)
-
由 Jiang Li 提交于
mainline inclusion from mainline-v6.2-rc1 commit b611ad14 category: bugfix bugzilla: 188662, https://gitee.com/openeuler/kernel/issues/I6UMUF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=b611ad14006e5be2170d9e8e611bf49dff288911 -------------------------------- fail run raid1 array when we assemble array with the inactive disk only, but the mdx_raid1 thread were not stop, Even if the associated resources have been released. it will caused a NULL dereference when we do poweroff. This causes the following Oops: [ 287.587787] BUG: kernel NULL pointer dereference, address: 0000000000000070 [ 287.594762] #PF: supervisor read access in kernel mode [ 287.599912] #PF: error_code(0x0000) - not-present page [ 287.605061] PGD 0 P4D 0 [ 287.607612] Oops: 0000 [#1] SMP NOPTI [ 287.611287] CPU: 3 PID: 5265 Comm: md0_raid1 Tainted: G U 5.10.146 #0 [ 287.619029] Hardware name: xxxxxxx/To be filled by O.E.M, BIOS 5.19 06/16/2022 [ 287.626775] RIP: 0010:md_check_recovery+0x57/0x500 [md_mod] [ 287.632357] Code: fe 01 00 00 48 83 bb 10 03 00 00 00 74 08 48 89 ...... [ 287.651118] RSP: 0018:ffffc90000433d78 EFLAGS: 00010202 [ 287.656347] RAX: 0000000000000000 RBX: ffff888105986800 RCX: 0000000000000000 [ 287.663491] RDX: ffffc90000433bb0 RSI: 00000000ffffefff RDI: ffff888105986800 [ 287.670634] RBP: ffffc90000433da0 R08: 0000000000000000 R09: c0000000ffffefff [ 287.677771] R10: 0000000000000001 R11: ffffc90000433ba8 R12: ffff888105986800 [ 287.684907] R13: 0000000000000000 R14: fffffffffffffe00 R15: ffff888100b6b500 [ 287.692052] FS: 0000000000000000(0000) GS:ffff888277f80000(0000) knlGS:0000000000000000 [ 287.700149] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 287.705897] CR2: 0000000000000070 CR3: 000000000320a000 CR4: 0000000000350ee0 [ 287.713033] Call Trace: [ 287.715498] raid1d+0x6c/0xbbb [raid1] [ 287.719256] ? __schedule+0x1ff/0x760 [ 287.722930] ? schedule+0x3b/0xb0 [ 287.726260] ? schedule_timeout+0x1ed/0x290 [ 287.730456] ? __switch_to+0x11f/0x400 [ 287.734219] md_thread+0xe9/0x140 [md_mod] [ 287.738328] ? md_thread+0xe9/0x140 [md_mod] [ 287.742601] ? wait_woken+0x80/0x80 [ 287.746097] ? md_register_thread+0xe0/0xe0 [md_mod] [ 287.751064] kthread+0x11a/0x140 [ 287.754300] ? kthread_park+0x90/0x90 [ 287.757974] ret_from_fork+0x1f/0x30 In fact, when raid1 array run fail, we need to do md_unregister_thread() before raid1_free(). Signed-off-by: NJiang Li <jiang.li@ugreen.com> Signed-off-by: NSong Liu <song@kernel.org> Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> (cherry picked from commit 22eeb5d1)
-
由 Li Nan 提交于
hulk inclusion category: bugfix bugzilla: 188553, https://gitee.com/openeuler/kernel/issues/I6TNFX CVE: NA -------------------------------- rdev->del_work has not been queued to md_rdev_misc_wq and flush_workqueue will not flush it if tow threads add and remove same device. sysfs might WARN duplicate filename as below. //T1 //T2 mdadm write super add success remove unbind_rdev_from_array md_ioctl flush_workqueue INIT_WORK queue_work md_add_new_disk duplicate filename dev-xxx Check if there is any kobj with the same name, and return busy if true. Fixes: 5792a285 ("md: avoid a deadlock when removing a device from an md array via sysfs") Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> (cherry picked from commit 5815341f)
-
由 Li Nan 提交于
hulk inclusion category: bugfix bugzilla: 188553, https://gitee.com/openeuler/kernel/issues/I6TNFX CVE: NA -------------------------------- If we want to remove a device, first we delete it from mddev->disks list, then init rdev->del_work to put it (see unbind_rdev_from_array()). flush_rdev_wq() traverses mddev->disks to check if there is any pending rdev->del_work, if so, flush it. Howerver, rdev will not be in the list of mddev->disks if rdev->del_work exists, and flush_workqueue() will never be executed. Replace it with flush_workqueue() to ensure del_work has been completed when adding devices. Fixes: cc1ffe61 ("md: add new workqueue for delete rdev") Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> (cherry picked from commit ff461e2d)
-
由 David Sloan 提交于
mainline inclusion from mainline-v6.0-rc3 commit 5e8daf90 category: bugfix bugzilla: 188015, https://gitee.com/openeuler/kernel/issues/I6OERX CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=5e8daf906f890560df430d30617c692a794acb73 -------------------------------- A race condition still exists when removing and re-creating md devices in test cases. However, it is only seen on some setups. The race condition was tracked down to a reference still being held to the kobject by the rdev in the md_rdev_misc_wq which will be released in rdev_delayed_delete(). md_alloc() waits for previous deletions by waiting on the md_misc_wq, but the md_rdev_misc_wq may still be holding a reference to a recently removed device. To fix this, also flush the md_rdev_misc_wq in md_alloc(). Signed-off-by: NDavid Sloan <david.sloan@eideticom.com> [logang@deltatee.com: rewrote commit message] Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> Signed-off-by: NSong Liu <song@kernel.org> Conflict: drivers/md/md.c Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> (cherry picked from commit 5fa41917)
-
由 Jinke Han 提交于
mainline inclusion from mainline-v6.0-rc1 commit 14a6e2eb category: bugfix bugzilla: 188088, https://gitee.com/openeuler/kernel/issues/I66GIL CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=14a6e2eb7df5c7897c15b109cba29ab0c4a791b6 ---------------------------------------------------------------------- In our test of iocost, we encountered some list add/del corruptions of inner_walk list in ioc_timer_fn. The reason can be described as follows: cpu 0 cpu 1 ioc_qos_write ioc_qos_write ioc = q_to_ioc(queue); if (!ioc) { ioc = kzalloc(); ioc = q_to_ioc(queue); if (!ioc) { ioc = kzalloc(); ... rq_qos_add(q, rqos); } ... rq_qos_add(q, rqos); ... } When the io.cost.qos file is written by two cpus concurrently, rq_qos may be added to one disk twice. In that case, there will be two iocs enabled and running on one disk. They own different iocgs on their active list. In the ioc_timer_fn function, because of the iocgs from two iocs have the same root iocg, the root iocg's walk_list may be overwritten by each other and this leads to list add/del corruptions in building or destroying the inner_walk list. And so far, the blk-rq-qos framework works in case that one instance for one type rq_qos per queue by default. This patch make this explicit and also fix the crash above. Signed-off-by: NJinke Han <hanjinke.666@bytedance.com> Reviewed-by: NMuchun Song <songmuchun@bytedance.com> Acked-by: NTejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220720093616.70584-1-hanjinke.666@bytedance.comSigned-off-by: NJens Axboe <axboe@kernel.dk> Conflicts: block/blk-rq-qos.h block/blk-wbt.c Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> (cherry picked from commit 8ce9b7c4)
-
由 Li Nan 提交于
hulk inclusion category: bugfix bugzilla: 188152, https://gitee.com/openeuler/kernel/issues/I67BPT CVE: NA ------------------------------- adjust_inuse_and_calc_cost() use spin_lock_irq and IRQ will enable when unlock. DEADLOCK might happen if we have held other locks before: ================================ WARNING: inconsistent lock state 5.10.0-02758-g8e5f91fd772f #26 Not tainted -------------------------------- inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. kworker/2:3/388 [HC0[0]:SC0[0]:HE0:SE1] takes: ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: spin_lock_irq ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: bfq_bio_merge+0x141/0x390 {IN-HARDIRQ-W} state was registered at: __lock_acquire+0x3d7/0x1070 lock_acquire+0x197/0x4a0 __raw_spin_lock_irqsave _raw_spin_lock_irqsave+0x3b/0x60 bfq_idle_slice_timer_body bfq_idle_slice_timer+0x53/0x1d0 __run_hrtimer+0x477/0xa70 __hrtimer_run_queues+0x1c6/0x2d0 hrtimer_interrupt+0x302/0x9e0 local_apic_timer_interrupt __sysvec_apic_timer_interrupt+0xfd/0x420 run_sysvec_on_irqstack_cond sysvec_apic_timer_interrupt+0x46/0xa0 asm_sysvec_apic_timer_interrupt+0x12/0x20 irq event stamp: 837522 hardirqs last enabled at (837521): [<ffffffff84b9419d>] __raw_spin_unlock_irqrestore hardirqs last enabled at (837521): [<ffffffff84b9419d>] _raw_spin_unlock_irqrestore+0x3d/0x40 hardirqs last disabled at (837522): [<ffffffff84b93fa3>] __raw_spin_lock_irq hardirqs last disabled at (837522): [<ffffffff84b93fa3>] _raw_spin_lock_irq+0x43/0x50 softirqs last enabled at (835852): [<ffffffff84e00558>] __do_softirq+0x558/0x8ec softirqs last disabled at (835845): [<ffffffff84c010ff>] asm_call_irq_on_stack+0xf/0x20 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&bfqd->lock); <Interrupt> lock(&bfqd->lock); *** DEADLOCK *** 3 locks held by kworker/2:3/388: #0: ffff888107af0f38 ((wq_completion)kthrotld){+.+.}-{0:0}, at: process_one_work+0x742/0x13f0 #1: ffff8881176bfdd8 ((work_completion)(&td->dispatch_work)){+.+.}-{0:0}, at: process_one_work+0x777/0x13f0 #2: ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: spin_lock_irq #2: ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: bfq_bio_merge+0x141/0x390 stack backtrace: CPU: 2 PID: 388 Comm: kworker/2:3 Not tainted 5.10.0-02758-g8e5f91fd772f #26 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Workqueue: kthrotld blk_throtl_dispatch_work_fn Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x107/0x167 print_usage_bug valid_state mark_lock_irq.cold+0x32/0x3a mark_lock+0x693/0xbc0 mark_held_locks+0x9e/0xe0 __trace_hardirqs_on_caller lockdep_hardirqs_on_prepare.part.0+0x151/0x360 trace_hardirqs_on+0x5b/0x180 __raw_spin_unlock_irq _raw_spin_unlock_irq+0x24/0x40 spin_unlock_irq adjust_inuse_and_calc_cost+0x4fb/0x970 ioc_rqos_merge+0x277/0x740 __rq_qos_merge+0x62/0xb0 rq_qos_merge bio_attempt_back_merge+0x12c/0x4a0 blk_mq_sched_try_merge+0x1b6/0x4d0 bfq_bio_merge+0x24a/0x390 __blk_mq_sched_bio_merge+0xa6/0x460 blk_mq_sched_bio_merge blk_mq_submit_bio+0x2e7/0x1ee0 __submit_bio_noacct_mq+0x175/0x3b0 submit_bio_noacct+0x1fb/0x270 blk_throtl_dispatch_work_fn+0x1ef/0x2b0 process_one_work+0x83e/0x13f0 process_scheduled_works worker_thread+0x7e3/0xd80 kthread+0x353/0x470 ret_from_fork+0x1f/0x30 Fixes: b0853ab4 ("blk-iocost: revamp in-period donation snapbacks") Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> (cherry picked from commit 60e8843c)
-
由 Yu Kuai 提交于
hulk inclusion category: bugfix bugzilla: 188033, https://gitee.com/openeuler/kernel/issues/I663ZP CVE: NA -------------------------------- iocost is based on rq_qos, which can only work for request based device, thus it doesn't make sense to configure iocost for bio based device. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> (cherry picked from commit e11c64a9)
-
- 01 6月, 2023 3 次提交
-
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @ci-robot PR sync from: Li Huafei <lihuafei1@huawei.com> https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/thread/KUHVPGLDH5VREMJJOSMW2VRWWHEG5OML/ Link:https://gitee.com/openeuler/kernel/pulls/867 Reviewed-by: Jialin Zhang <zhangjialin11@huawei.com> Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @openeuler-sync-bot Origin pull request: https://gitee.com/openeuler/kernel/pulls/793 PV IPI with less VM exit can improve performance than iocsr emulation when sending ipi interrupt to multi cpus. This patch supports: 1. sending PV IPI by hypercall 2. recording and updating steal time for VM Link:https://gitee.com/openeuler/kernel/pulls/854 Reviewed-by: Jialin Zhang <zhangjialin11@huawei.com> Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>
-
由 openeuler-ci-bot 提交于
!853 [sync] PR-809: LoongArch: defconfig: enable memory and pci hotplug related configs for LoongArch Merge Pull Request from: @openeuler-sync-bot Origin pull request: https://gitee.com/openeuler/kernel/pulls/809 enable memory and pci hotplug related configs for LoongArch, used by LoongArch virtual machine now. Link:https://gitee.com/openeuler/kernel/pulls/853 Reviewed-by: Jialin Zhang <zhangjialin11@huawei.com> Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>
-
- 31 5月, 2023 21 次提交
-
-
由 Li Huafei 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6Y5Y1 ------------------------------- We call reserve_crashkernel_high() before map_mem() to reserve high memory in advance, which in turn can avoid using page level mapping for all memory above 4G to optimize performance. And after reserve_crashkernel_high(), reserve_crashkernel_low() is also needed to reserve low memory. But when the system RAM is less than 4G, the memory reserved by reserve_crashkernel_high() is already low memory (less than 4G), reserve_crashkernel_low() may reserve low memory again and the memory it reserves may be higher than that reserved by reserve_crashkernel_high(). Looking at /proc/iomem would have: # cat /proc/iomem | grep -i crash 65400000-953fffff : Crash kernel ==> crashk_res a7800000-b77fffff : Crash kernel ==> crashk_res_low At this point kexec-tools will incorrectly use the second memory segment for the kdump kernel image load, causing the kernel load address check to fail during kexec load (see sanity_check_segment_list()). When the memory reserved by reserve_crashkernel_high() meets the low memory requirement, reserve_crashkernel_low() is no longer called to reserve memory and avoid introducing problems with duplicate reservations. Fixes: baac34dd ("arm64: kdump: Use page-level mapping for the high memory of crashkernel") Signed-off-by: NLi Huafei <lihuafei1@huawei.com> Reviewed-by: NYang Jihong <yangjihong1@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @zhangjialin11 Pull new CVEs: CVE-2023-22998 cgroup bugfix from Gaosheng Cui sched bugfix from Xia Fukun block bugfixes from Zhong Jinghua and Yu Kuai iomap and ext4 bugfixes from Baokun Li md bugfixes from Yu Kuai Link:https://gitee.com/openeuler/kernel/pulls/862 Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
由 Xiu Jianfeng 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I798WQ CVE: NA ---------------------------------------------------------------------- We found a refcount UAF bug as follows: refcount_t: addition on 0; use-after-free. WARNING: CPU: 1 PID: 342 at lib/refcount.c:25 refcount_warn_saturate+0xa0/0x148 Workqueue: events cpuset_hotplug_workfn Call trace: refcount_warn_saturate+0xa0/0x148 __refcount_add.constprop.0+0x5c/0x80 css_task_iter_advance_css_set+0xd8/0x210 css_task_iter_advance+0xa8/0x120 css_task_iter_next+0x94/0x158 update_tasks_root_domain+0x58/0x98 rebuild_root_domains+0xa0/0x1b0 rebuild_sched_domains_locked+0x144/0x188 cpuset_hotplug_workfn+0x138/0x5a0 process_one_work+0x1e8/0x448 worker_thread+0x228/0x3e0 kthread+0xe0/0xf0 ret_from_fork+0x10/0x20 then a kernel panic will be triggered as below: Unable to handle kernel paging request at virtual address 00000000c0000010 Call trace: cgroup_apply_control_disable+0xa4/0x16c rebind_subsystems+0x224/0x590 cgroup_destroy_root+0x64/0x2e0 css_free_rwork_fn+0x198/0x2a0 process_one_work+0x1d4/0x4bc worker_thread+0x158/0x410 kthread+0x108/0x13c ret_from_fork+0x10/0x18 The race that cause this bug can be shown as below: (hotplug cpu) | (umount cpuset) mutex_lock(&cpuset_mutex) | mutex_lock(&cgroup_mutex) cpuset_hotplug_workfn | rebuild_root_domains | rebind_subsystems update_tasks_root_domain | spin_lock_irq(&css_set_lock) css_task_iter_start | list_move_tail(&cset->e_cset_node[ss->id] while(css_task_iter_next) | &dcgrp->e_csets[ss->id]); css_task_iter_end | spin_unlock_irq(&css_set_lock) mutex_unlock(&cpuset_mutex) | mutex_unlock(&cgroup_mutex) Inside css_task_iter_start/next/end, css_set_lock is hold and then released, so when iterating task(left side), the css_set may be moved to another list(right side), then it->cset_head points to the old list head and it->cset_pos->next points to the head node of new list, which can't be used as struct css_set. To fix this issue, introduce CSS_TASK_ITER_STOPPED flag for css_task_iter. when moving css_set to dcgrp->e_csets[ss->id] in rebind_subsystems(), stop the task iteration. Reported-by: NGaosheng Cui <cuigaosheng1@huawei.com> Link: https://www.spinics.net/lists/cgroups/msg37935.html Fixes: f9a25f77 ("cpusets: Rebuild root domain deadline accounting information") Signed-off-by: NXiu Jianfeng <xiujianfeng@huaweicloud.com> Signed-off-by: NGaosheng Cui <cuigaosheng1@huawei.com> Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Harshit Mogalapalli 提交于
stable inclusion from stable-v5.10.173 commit c5fe3fba1b7bfecb6f17f93a433782b8500fe377 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6IKWF CVE: CVE-2023-22998 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c5fe3fba1b7bfecb6f17f93a433782b8500fe377 -------------------------------- In virtio_gpu_object_shmem_init() we are passing NULL to PTR_ERR, which is returning 0/success. Fix this by storing error value in 'ret' variable before assigning shmem->pages to NULL. Found using static analysis with Smatch. Fixes: 64b88afb ("drm/virtio: Correct drm_gem_shmem_get_sg_table() error handling") Signed-off-by: NHarshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Reviewed-by: NDmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com> Reviewed-by: NWeilong Chen <chenweilong@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Dmitry Osipenko 提交于
stable inclusion from stable-v5.10.171 commit 87c647def389354c95263d6635c62ca0de7d12ca category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6IKWF CVE: CVE-2023-22998 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=87c647def389354c95263d6635c62ca0de7d12ca -------------------------------- commit 64b88afb upstream. Previous commit fixed checking of the ERR_PTR value returned by drm_gem_shmem_get_sg_table(), but it missed to zero out the shmem->pages, which will crash virtio_gpu_cleanup_object(). Add the missing zeroing of the shmem->pages. Fixes: c2496873 ("drm/virtio: Fix NULL vs IS_ERR checking in virtio_gpu_object_shmem_init") Reviewed-by: NEmil Velikov <emil.l.velikov@gmail.com> Signed-off-by: NDmitry Osipenko <dmitry.osipenko@collabora.com> Link: http://patchwork.freedesktop.org/patch/msgid/20220630200726.1884320-2-dmitry.osipenko@collabora.comSigned-off-by: NGerd Hoffmann <kraxel@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NOvidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com> Reviewed-by: NWeilong Chen <chenweilong@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Miaoqian Lin 提交于
stable inclusion from stable-v5.10.171 commit 0a4181b23acf53e9c95b351df6a7891116b98f9b category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6IKWF CVE: CVE-2023-22998 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=0a4181b23acf53e9c95b351df6a7891116b98f9b -------------------------------- commit c2496873 upstream. Since drm_prime_pages_to_sg() function return error pointers. The drm_gem_shmem_get_sg_table() function returns error pointers too. Using IS_ERR() to check the return value to fix this. Fixes: 2f2aa137 ("drm/virtio: move virtio_gpu_mem_entry initialization to new function") Signed-off-by: NMiaoqian Lin <linmq006@gmail.com> Link: http://patchwork.freedesktop.org/patch/msgid/20220602104223.54527-1-linmq006@gmail.comSigned-off-by: NGerd Hoffmann <kraxel@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NOvidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com> Reviewed-by: NWeilong Chen <chenweilong@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Xia Fukun 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6YJJQ CVE: NA ---------------------------------------- The function sd_llc_free_all() will be called to release allocated resources when space allocation for the scheduling domain structure fails. However, this function did not check if sd is a null pointer when releasing sdd resources, resulting in an error: "Unable to handle kernel paging request at virtual address". Fix this issue by adding null pointer discrimination. Fixes: 79bec4c6 ("sched/topology: Provide hooks to allocate data shared per LLC") Signed-off-by: NXia Fukun <xiafukun@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com> Reviewed-by: Nsongping yu <yusongping@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Zhong Jinghua 提交于
hulk inclusion category: bugfix bugzilla: 187268, https://gitee.com/openeuler/kernel/issues/I76JDY CVE: NA ---------------------------------------- In the block_ioctl, we can pass in the unsigned number 0x8000000000000000 as an input parameter, like below: block_ioctl blkdev_ioctl blkpg_ioctl blkpg_do_ioctl copy_from_user bdev_add_partition add_partition p->start_sect = start; // start = 0x8000000000000000 Then, there was an warning when submit bio: WARNING: CPU: 0 PID: 382 at fs/iomap/apply.c:54 Call trace: iomap_apply+0x644/0x6e0 __iomap_dio_rw+0x5cc/0xa24 iomap_dio_rw+0x4c/0xcc ext4_dio_read_iter ext4_file_read_iter ext4_file_read_iter+0x318/0x39c call_read_iter lo_rw_aio.isra.0+0x748/0x75c do_req_filebacked+0x2d4/0x370 loop_handle_cmd loop_queue_work+0x94/0x23c kthread_worker_fn+0x160/0x6bc loop_kthread_worker_fn+0x3c/0x50 kthread+0x20c/0x25c ret_from_fork+0x10/0x18 Stack: submit_bio_noacct submit_bio_checks blk_partition_remap bio->bi_iter.bi_sector += p->start_sect // bio->bi_iter.bi_sector = 0xffc0000000000000 + 65408 .. loop_queue_work loop_handle_cmd do_req_filebacked pos = ((loff_t) blk_rq_pos(rq) << 9) + lo->lo_offset // pos < 0 lo_rw_aio call_read_iter ext4_dio_read_iter __iomap_dio_rw iomap_apply ext4_iomap_begin map.m_lblk = offset >> blkbits ext4_set_iomap iomap->offset = (u64) map->m_lblk << blkbits // iomap->offset = 64512 WARN_ON(iomap.offset > pos) // iomap.offset = 64512 and pos < 0 This is unreasonable for start + length > disk->part0.nr_sects. There is already a similar check in blk_add_partition(). Fix it by adding a check in bdev_add_partition(). Signed-off-by: NZhong Jinghua <zhongjinghua@huawei.com> Reviewed-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Tudor Ambarus 提交于
stable inclusion from stable-v5.10.180 commit 0dde3141c527b09b96bef1e7eeb18b8127810ce9 category: bugfix bugzilla: 188791,https://gitee.com/openeuler/kernel/issues/I76XUJ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=0dde3141c527b09b96bef1e7eeb18b8127810ce9 -------------------------------- commit 4f043518 upstream. When modifying the block device while it is mounted by the filesystem, syzbot reported the following: BUG: KASAN: slab-out-of-bounds in crc16+0x206/0x280 lib/crc16.c:58 Read of size 1 at addr ffff888075f5c0a8 by task syz-executor.2/15586 CPU: 1 PID: 15586 Comm: syz-executor.2 Not tainted 6.2.0-rc5-syzkaller-00205-gc9661827 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1b1/0x290 lib/dump_stack.c:106 print_address_description+0x74/0x340 mm/kasan/report.c:306 print_report+0x107/0x1f0 mm/kasan/report.c:417 kasan_report+0xcd/0x100 mm/kasan/report.c:517 crc16+0x206/0x280 lib/crc16.c:58 ext4_group_desc_csum+0x81b/0xb20 fs/ext4/super.c:3187 ext4_group_desc_csum_set+0x195/0x230 fs/ext4/super.c:3210 ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline] ext4_free_blocks+0x191a/0x2810 fs/ext4/mballoc.c:6173 ext4_remove_blocks fs/ext4/extents.c:2527 [inline] ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline] ext4_ext_remove_space+0x24ef/0x46a0 fs/ext4/extents.c:2958 ext4_ext_truncate+0x177/0x220 fs/ext4/extents.c:4416 ext4_truncate+0xa6a/0xea0 fs/ext4/inode.c:4342 ext4_setattr+0x10c8/0x1930 fs/ext4/inode.c:5622 notify_change+0xe50/0x1100 fs/attr.c:482 do_truncate+0x200/0x2f0 fs/open.c:65 handle_truncate fs/namei.c:3216 [inline] do_open fs/namei.c:3561 [inline] path_openat+0x272b/0x2dd0 fs/namei.c:3714 do_filp_open+0x264/0x4f0 fs/namei.c:3741 do_sys_openat2+0x124/0x4e0 fs/open.c:1310 do_sys_open fs/open.c:1326 [inline] __do_sys_creat fs/open.c:1402 [inline] __se_sys_creat fs/open.c:1396 [inline] __x64_sys_creat+0x11f/0x160 fs/open.c:1396 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f72f8a8c0c9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f72f97e3168 EFLAGS: 00000246 ORIG_RAX: 0000000000000055 RAX: ffffffffffffffda RBX: 00007f72f8bac050 RCX: 00007f72f8a8c0c9 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000280 RBP: 00007f72f8ae7ae9 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffd165348bf R14: 00007f72f97e3300 R15: 0000000000022000 Replace le16_to_cpu(sbi->s_es->s_desc_size) with sbi->s_desc_size It reduces ext4's compiled text size, and makes the code more efficient (we remove an extra indirect reference and a potential byte swap on big endian systems), and there is no downside. It also avoids the potential KASAN / syzkaller failure, as a bonus. Reported-by: syzbot+fc51227e7100c9294894@syzkaller.appspotmail.com Reported-by: syzbot+8785e41224a3afd04321@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=70d28d11ab14bd7938f3e088365252aa923cff42 Link: https://syzkaller.appspot.com/bug?id=b85721b38583ecc6b5e72ff524c67302abbc30f3 Link: https://lore.kernel.org/all/000000000000ece18705f3b20934@google.com/ Fixes: 717d50e4 ("Ext4: Uninitialized Block Groups") Cc: stable@vger.kernel.org Signed-off-by: NTudor Ambarus <tudor.ambarus@linaro.org> Link: https://lore.kernel.org/r/20230504121525.3275886-1-tudor.ambarus@linaro.orgSigned-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NBaokun Li <libaokun1@huawei.com> Reviewed-by: NYang Erkun <yangerkun@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Darrick J. Wong 提交于
mainline inclusion from mainline-v5.19-rc1 commit e9c3a8e8 category: bugfix bugzilla: 188775, https://gitee.com/openeuler/kernel/issues/I73IFH Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e9c3a8e820ed0eeb2be05072f29f80d1b79f053b -------------------------------- XFS has the unique behavior (as compared to the other Linux filesystems) that on writeback errors it will completely invalidate the affected folio and force the page cache to reread the contents from disk. All other filesystems leave the page mapped and up to date. This is a rude awakening for user programs, since (in the case where write fails but reread doesn't) file contents will appear to revert to old disk contents with no notification other than an EIO on fsync. This might have been annoying back in the days when iomap dealt with one page at a time, but with multipage folios, we can now throw away *megabytes* worth of data for a single write error. On *most* Linux filesystems, a program can respond to an EIO on write by redirtying the entire file and scheduling it for writeback. This isn't foolproof, since the page that failed writeback is no longer dirty and could be evicted, but programs that want to recover properly *also* have to detect XFS and regenerate every write they've made to the file. When running xfs/314 on arm64, I noticed a UAF when xfs_discard_folio invalidates multipage folios that could be undergoing writeback. If, say, we have a 256K folio caching a mix of written and unwritten extents, it's possible that we could start writeback of the first (say) 64K of the folio and then hit a writeback error on the next 64K. We then free the iop attached to the folio, which is really bad because writeback completion on the first 64k will trip over the "blocks per folio > 1 && !iop" assertion. This can't be fixed by only invalidating the folio if writeback fails at the start of the folio, since the folio is marked !uptodate, which trips other assertions elsewhere. Get rid of the whole behavior entirely. Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NJeff Layton <jlayton@kernel.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Conflicts: fs/xfs/xfs_aops.c fs/iomap/buffered-io.c Signed-off-by: NBaokun Li <libaokun1@huawei.com> Reviewed-by: NYang Erkun <yangerkun@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Andreas Gruenbacher 提交于
mainline inclusion from mainline-v5.14-rc2 commit 229adf3c category: bugfix bugzilla: 188764, https://gitee.com/openeuler/kernel/issues/I736LW Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=229adf3c64dbeae4e2f45fb561907ada9fcc0d0c -------------------------------- Now that we create those objects in iomap_writepage_map when needed, there's no need to pre-create them in iomap_page_mkwrite_actor anymore. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NBaokun Li <libaokun1@huawei.com> Reviewed-by: NYang Erkun <yangerkun@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Andreas Gruenbacher 提交于
mainline inclusion from mainline-v5.14-rc2 commit 637d3375 category: bugfix bugzilla: 188764, https://gitee.com/openeuler/kernel/issues/I736LW Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=637d3375953e052a62c0db409557e3b3354be88a -------------------------------- In iomap_readpage_actor, don't create iop objects for inline inodes. Otherwise, iomap_read_inline_data will set PageUptodate without setting iop->uptodate, and iomap_page_release will eventually complain. To prevent this kind of bug from occurring in the future, make sure the page doesn't have private data attached in iomap_read_inline_data. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NBaokun Li <libaokun1@huawei.com> Reviewed-by: NYang Erkun <yangerkun@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Andreas Gruenbacher 提交于
mainline inclusion from mainline-v5.14-rc2 commit 8e1bcef8 category: bugfix bugzilla: 188764, https://gitee.com/openeuler/kernel/issues/I736LW Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8e1bcef8e18d0fec4afe527c074bb1fd6c2b140c -------------------------------- Create an iop in the writeback path if one doesn't exist. This allows us to avoid creating the iop in some cases. We'll initially do that for pages with inline data, but it can be extended to pages which are entirely within an extent. It also allows for an iop to be removed from pages in the future (eg page split). Co-developed-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NBaokun Li <libaokun1@huawei.com> Reviewed-by: NYang Erkun <yangerkun@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Yu Kuai 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6OMCC CVE: NA -------------------------------- Struct mddev is just used inside raid, just in case that md_mod is compiled from new kernel, and raid1/raid10 or other out-of-tree raid are compiled from old kernel. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Yu Kuai 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6OMCC CVE: NA -------------------------------- Before refactoring idle and frozen from action_store, interruptible apis is used so that hungtask warning won't be triggered if it takes too long to finish indle/frozen sync_thread. This patch do the same. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Yu Kuai 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6OMCC CVE: NA -------------------------------- We just replace md_reap_sync_thread() with wait_event(resync_wait, ...) from action_store(), this patch just make sure action_store() will still wait for everything to be done in md_reap_sync_thread(). Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Yu Kuai 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6OMCC CVE: NA -------------------------------- Our test found a following deadlock in raid10: 1) Issue a normal write, and such write failed: raid10_end_write_request set_bit(R10BIO_WriteError, &r10_bio->state) one_write_done reschedule_retry // later from md thread raid10d handle_write_completed list_add(&r10_bio->retry_list, &conf->bio_end_io_list) // later from md thread raid10d if (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags)) list_move(conf->bio_end_io_list.prev, &tmp) r10_bio = list_first_entry(&tmp, struct r10bio, retry_list) raid_end_bio_io(r10_bio) Dependency chain 1: normal io is waiting for updating superblock 2) Trigger a recovery: raid10_sync_request raise_barrier Dependency chain 2: sync thread is waiting for normal io 3) echo idle/frozen to sync_action: action_store mddev_lock md_unregister_thread kthread_stop Dependency chain 3: drop 'reconfig_mutex' is waiting for sync thread 4) md thread can't update superblock: raid10d md_check_recovery if (mddev_trylock(mddev)) md_update_sb Dependency chain 4: update superblock is waiting for 'reconfig_mutex' Hence cyclic dependency exist, in order to fix the problem, we must break one of them. Dependency 1 and 2 can't be broken because they are foundation design. Dependency 4 may be possible if it can be guaranteed that no io can be inflight, however, this requires a new mechanism which seems complex. Dependency 3 is a good choice, because idle/frozen only requires sync thread to finish, which can be done asynchronously that is already implemented, and 'reconfig_mutex' is not needed anymore. This patch switch 'idle' and 'frozen' to wait sync thread to be done asynchronously, and this patch also add a sequence counter to record how many times sync thread is done, so that 'idle' won't keep waiting on new started sync thread. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Yu Kuai 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6OMCC CVE: NA -------------------------------- Currently, for idle and frozen, action_store will hold 'reconfig_mutex' and call md_reap_sync_thread() to stop sync thread, however, this will cause deadlock (explained in the next patch). In order to fix the problem, following patch will release 'reconfig_mutex' and wait on 'resync_wait', like md_set_readonly() and do_md_stop() does. Consider that action_store() will set/clear 'MD_RECOVERY_FROZEN' unconditionally, which might cause unexpected problems, for example, frozen just set 'MD_RECOVERY_FROZEN' and is still in progress, while 'idle' clear 'MD_RECOVERY_FROZEN' and new sync thread is started, which might starve in progress frozen. This patch add a mutex to synchronize idle and frozen from action_store(). Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Yu Kuai 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6OMCC CVE: NA -------------------------------- Prepare to handle 'idle' and 'frozen' differently to fix a deadlock, there are no functional changes except that MD_RECOVERY_RUNNING is checked again after 'reconfig_mutex' is held. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Yu Kuai 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6OMCC CVE: NA -------------------------------- This reverts commit 9dfbdafd. Because it will introduce a defect that sync_thread can be running while MD_RECOVERY_RUNNING is cleared, which will cause some unexpected problems, for example: list_add corruption. prev->next should be next (ffff0001ac1daba0), but was ffff0000ce1a02a0. (prev=ffff0000ce1a02a0). Call trace: __list_add_valid+0xfc/0x140 insert_work+0x78/0x1a0 __queue_work+0x500/0xcf4 queue_work_on+0xe8/0x12c md_check_recovery+0xa34/0xf30 raid10d+0xb8/0x900 [raid10] md_thread+0x16c/0x2cc kthread+0x1a4/0x1ec ret_from_fork+0x10/0x18 This is because work is requeued while it's still inside workqueue: t1: t2: action_store mddev_lock if (mddev->sync_thread) mddev_unlock md_unregister_thread // first sync_thread is done md_check_recovery mddev_try_lock /* * once MD_RECOVERY_DONE is set, new sync_thread * can start. */ set_bit(MD_RECOVERY_RUNNING, &mddev->recovery) INIT_WORK(&mddev->del_work, md_start_sync) queue_work(md_misc_wq, &mddev->del_work) test_and_set_bit(WORK_STRUCT_PENDING_BIT, ...) // set pending bit insert_work list_add_tail mddev_unlock mddev_lock_nointr md_reap_sync_thread // MD_RECOVERY_RUNNING is cleared mddev_unlock t3: // before queued work started from t2 md_check_recovery // MD_RECOVERY_RUNNING is not set, a new sync_thread can be started INIT_WORK(&mddev->del_work, md_start_sync) work->data = 0 // work pending bit is cleared queue_work(md_misc_wq, &mddev->del_work) insert_work list_add_tail // list is corrupted This patch revert the commit to fix the problem, the deadlock this commit tries to fix will be fixed in following patches. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Signed-off-by: NSong Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230322064122.2384589-2-yukuai1@huaweicloud.comReviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Guoqing Jiang 提交于
mainline inclusion from mainline-v6.0-rc1 commit 9dfbdafd category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6OMCC CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.3-rc3&id=9dfbdafda3b34e262e43e786077bab8e476a89d1 -------------------------------- Since the bug which commit 8b48ec23 ("md: don't unregister sync_thread with reconfig_mutex held") fixed is related with action_store path, other callers which reap sync_thread didn't need to be changed. Let's pull md_unregister_thread from md_reap_sync_thread, then fix previous bug with belows. 1. unlock mddev before md_reap_sync_thread in action_store. 2. save reshape_position before unlock, then restore it to ensure position not changed accidentally by others. Signed-off-by: NGuoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: NSong Liu <song@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-