- 03 6月, 2023 28 次提交
-
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @zhangjialin11 This patch series fix block layer bug. 3 patchs fix iocost bug. Other patchs fix raid10 and badblocks bug. Link:https://gitee.com/openeuler/kernel/pulls/903 Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
由 Li Nan 提交于
hulk inclusion category: bugfix bugzilla: 188535, https://gitee.com/openeuler/kernel/issues/I6O61Q CVE: NA -------------------------------- Recovery will go to giveup and let chunks_skipped++ in raid10_sync_request if there are some bad_blocks, and it will return max_sector when chunks_skipped >= geo.raid_disks. Now, recovery fail and data is inconsistent but user think recovery is done, it is wrong. Fix it by set mirror's recovery_disabled and spare device shouln't be added to here. Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com>
-
由 Li Nan 提交于
hulk inclusion category: bugfix bugzilla: 188378, https://gitee.com/openeuler/kernel/issues/I6GGV7 CVE: NA -------------------------------- init_resync() init mempool and set conf->have_replacemnt at the begaining of sync, close_sync() free the mempool when sync is completed. After commit 7e83ccbe ("md/raid10: Allow skipping recovery when clean arrays are assembled"), recovery might skipped and init_resync() is called but close_sync() is not. null-ptr-deref occurs as below: 1) creat a array, wait for resync to complete, mddev->recovery_cp is set to MaxSector. 2) recovery is woken and it is skipped. conf->have_replacement is set to 0 in init_resync(). close_sync() not called. 3) some io errors and rdev A is set to WantReplacement. 4) a new device is added and set to A's replacement. 5) recovery is woken, A have replacement, but conf->have_replacemnt is 0. r10bio->dev[i].repl_bio will not be alloced and null-ptr-deref occurs. Fix it by not init_resync() if recovery skipped. Fixes: 7e83ccbe md/raid10: Allow skipping recovery when clean arrays are assembled") Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com>
-
由 Li Nan 提交于
hulk inclusion category: bugfix bugzilla: 188569, https://gitee.com/openeuler/kernel/issues/I6ZG5B CVE: NA -------------------------------- badblocks will loss if we set it as below: # echo 1 1 > bad_blocks # echo 3 1 > bad_blocks # echo 1 5 > bad_blocks # cat bad_blocks 1 3 we will combine badblocks if there is an intersection between p[lo] and p[hi] in badblocks_set(). The end of new badblocks is p[hi]'s end now. but p[lo] may cross p[hi] and new end should be the larger of p[lo] and p[hi]. lo: |------------------------| hi: |--------| Fixes: 9e0e252a ("badblocks: Add core badblock management code") Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com>
-
由 Li Nan 提交于
hulk inclusion category: bugfix bugzilla: 188569, https://gitee.com/openeuler/kernel/issues/I6ZG5B CVE: NA -------------------------------- Order of badblocks will be reversed if we set a large area at once. 'hi' remains unchanged while adding continuous badblocks is wrong, the next setting is greater than 'hi', it should be added to the next position. Let 'hi' +1 each cycle. # echo 0 2048 > bad_blocks # cat bad_blocks 1536 512 1024 512 512 512 0 512 Fixes: 9e0e252a ("badblocks: Add core badblock management code") Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com>
-
由 Li Nan 提交于
hulk inclusion category: bugfix bugzilla: 188569, https://gitee.com/openeuler/kernel/issues/I6XBZQ CVE: NA -------------------------------- If we set any badblocks fail, we will remove this rdev(set it to Faulty or set recovery_disabled). Previous patch "md/raid10: fix io hung in md_wait_for_blocked_rdev()" check badblocks->changed instead of return value in rdev_set_badblocks(), but return value of this func also changed accordingly, which is not what we expected. Keep the return value consistent with before. Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com>
-
由 Li Nan 提交于
hulk inclusion category: bugfix bugzilla: 188569, https://gitee.com/openeuler/kernel/issues/I6XBZQ CVE: NA -------------------------------- If badblocks are merged but bb->count exceedded, badblocks_set() will return 1 and merged badblocks will become un-ack. rdev_set_badblocks() will not set sb_flags and wakeup mddev->thread, io wait in md_wait_for_blocked_rdev() will hung because BlockedBadBlocks may not be cleared. Fix it by checking badblocks->changed instead of return value. This flag is set when badblocks changes. Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com>
-
由 Li Nan 提交于
hulk inclusion category: bugfix bugzilla: 188569, https://gitee.com/openeuler/kernel/issues/I6XBZQ CVE: NA -------------------------------- bb->changed and unacked_exist is set and badblocks_update_acked() is involked even if no badblocks changes in badblocks_set(). Only update them when badblocks changes. Fixes: 9e0e252a ("badblocks: Add core badblock management code") Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com>
-
由 Li Nan 提交于
hulk inclusion category: bugfix bugzilla: 188605, https://gitee.com/openeuler/kernel/issues/I6ZJ3T CVE: NA -------------------------------- We get rdev from mirrors.replacement twice in raid10_write_request(). If replacement changes between two reads, it will increase A->nr_pending and decrease B->nr_pending. T1 (write) T2 (remove) T3 (add) raid10_remove_disk raid10_write_request rrdev = conf->mirrors[d].replacement; ->rdev A A nr_pending++ p->rdev = p->replacement; ->rdev A p->replacement = NULL; //A it set to WantReplacement raid10_add_disk p->replacement = rdev; ->rdev B if blocked_rdev rdev = conf->mirrors[d].replacement; ->rdev B B nr_pending-- We will record rdev in r10bio, and get rdev from r10bio to fix it. Fixes: 475b0321 ("md/raid10: writes should get directed to replacement as well as original.") Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com>
-
由 Li Nan 提交于
hulk inclusion category: bugfix bugzilla: 188605, https://gitee.com/openeuler/kernel/issues/I6GOYF CVE: NA -------------------------------- It might read mirror.redev first and then mirror->replacement because of memory reordering in raid10_end_write_request(), WARN_ON occurs if we remove disk at the same time. T1 remove T2 io end raid10_remove_disk raid10_end_write_request p->rdev = NULL read rdev -> NULL smp_mb p->replacement = NULL read replacement -> NULL It is meaningless to compare rdev with mirror->rdev after we get it from r10_bio in raid10_end_write_request(). Remove this WANR_ON_ONCE. Fixes: 2ecf5e6ecbfd ("md/raid10: fix uaf if replacement replaces rdev") Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com>
-
由 Li Nan 提交于
hulk inclusion category: bugfix bugzilla: 188377, https://gitee.com/openeuler/kernel/issues/I6GOYF CVE: NA -------------------------------- After commit 4ca40c2c ("md/raid10: Allow replacement device to be replace old drive.") mirrors->replacement can replace rdev during replacement's io pending, and repl_bio will write rdev (see raid10_write_one_disk()). We will get wrong device by r10conf in raid10_end_write_request(). In which case, r10_bio->devs[slot].repl_bio will be put but not set to IO_MADE_GOOD, and it will be put again later in raid_end_bio_io(), uaf occurs. Fix it by using r10_bio to record rdev. Put the operations of io fail and no replacement together, so no need to change repl. ================================================================== BUG: KASAN: use-after-free in bio_flagged include/linux/bio.h:238 [inline] BUG: KASAN: use-after-free in bio_put+0x78/0x80 block/bio.c:650 Read of size 2 at addr ffff888116524dd4 by task md0_raid10/2618 CPU: 0 PID: 2618 Comm: md0_raid10 Not tainted 5.10.0+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 sd 0:0:0:0: rejecting I/O to offline device Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x107/0x167 lib/dump_stack.c:118 print_address_description.constprop.0+0x1c/0x270 mm/kasan/report.c:390 __kasan_report mm/kasan/report.c:550 [inline] kasan_report.cold+0x22/0x3a mm/kasan/report.c:567 bio_flagged include/linux/bio.h:238 [inline] bio_put+0x78/0x80 block/bio.c:650 put_all_bios drivers/md/raid10.c:248 [inline] free_r10bio drivers/md/raid10.c:257 [inline] raid_end_bio_io+0x3b5/0x590 drivers/md/raid10.c:309 handle_write_completed drivers/md/raid10.c:2699 [inline] raid10d+0x2f85/0x5af0 drivers/md/raid10.c:2759 md_thread+0x444/0x4b0 drivers/md/md.c:7932 kthread+0x38c/0x470 kernel/kthread.c:313 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:299 Allocated by task 1400: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track mm/kasan/common.c:56 [inline] set_alloc_info mm/kasan/common.c:498 [inline] __kasan_kmalloc.constprop.0+0xb5/0xe0 mm/kasan/common.c:530 slab_post_alloc_hook mm/slab.h:512 [inline] slab_alloc_node mm/slub.c:2923 [inline] slab_alloc mm/slub.c:2931 [inline] kmem_cache_alloc+0x144/0x360 mm/slub.c:2936 mempool_alloc+0x146/0x360 mm/mempool.c:391 bio_alloc_bioset+0x375/0x610 block/bio.c:486 bio_clone_fast+0x20/0x50 block/bio.c:711 raid10_write_one_disk+0x166/0xd30 drivers/md/raid10.c:1240 raid10_write_request+0x1600/0x2c90 drivers/md/raid10.c:1484 __make_request drivers/md/raid10.c:1508 [inline] raid10_make_request+0x376/0x620 drivers/md/raid10.c:1537 md_handle_request+0x699/0x970 drivers/md/md.c:451 md_submit_bio+0x204/0x400 drivers/md/md.c:489 __submit_bio block/blk-core.c:959 [inline] __submit_bio_noacct block/blk-core.c:1007 [inline] submit_bio_noacct+0x2e3/0xcf0 block/blk-core.c:1086 submit_bio+0x1a0/0x3a0 block/blk-core.c:1146 submit_bh_wbc+0x685/0x8e0 fs/buffer.c:3053 ext4_commit_super+0x37e/0x6c0 fs/ext4/super.c:5696 flush_stashed_error_work+0x28b/0x400 fs/ext4/super.c:791 process_one_work+0x9a6/0x1590 kernel/workqueue.c:2280 worker_thread+0x61d/0x1310 kernel/workqueue.c:2426 kthread+0x38c/0x470 kernel/kthread.c:313 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:299 Freed by task 2618: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track+0x1c/0x30 mm/kasan/common.c:56 kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:361 __kasan_slab_free+0x151/0x180 mm/kasan/common.c:482 slab_free_hook mm/slub.c:1569 [inline] slab_free_freelist_hook+0xa9/0x180 mm/slub.c:1608 slab_free mm/slub.c:3179 [inline] kmem_cache_free+0xcd/0x3d0 mm/slub.c:3196 mempool_free+0xe3/0x3b0 mm/mempool.c:500 bio_free+0xe2/0x140 block/bio.c:266 bio_put+0x58/0x80 block/bio.c:651 raid10_end_write_request+0x885/0xb60 drivers/md/raid10.c:516 bio_endio+0x376/0x6a0 block/bio.c:1465 req_bio_endio block/blk-core.c:289 [inline] blk_update_request+0x5f5/0xf40 block/blk-core.c:1525 blk_mq_end_request+0x4c/0x510 block/blk-mq.c:654 blk_flush_complete_seq+0x835/0xd80 block/blk-flush.c:204 flush_end_io+0x7b7/0xb90 block/blk-flush.c:261 __blk_mq_end_request+0x282/0x4c0 block/blk-mq.c:645 scsi_end_request+0x3a8/0x850 drivers/scsi/scsi_lib.c:607 scsi_io_completion+0x3f5/0x1320 drivers/scsi/scsi_lib.c:970 scsi_softirq_done+0x11b/0x490 drivers/scsi/scsi_lib.c:1448 blk_mq_complete_request block/blk-mq.c:788 [inline] blk_mq_complete_request+0x84/0xb0 block/blk-mq.c:785 scsi_mq_done+0x155/0x360 drivers/scsi/scsi_lib.c:1603 virtscsi_vq_done drivers/scsi/virtio_scsi.c:184 [inline] virtscsi_req_done+0x14c/0x220 drivers/scsi/virtio_scsi.c:199 vring_interrupt drivers/virtio/virtio_ring.c:2061 [inline] vring_interrupt+0x27a/0x300 drivers/virtio/virtio_ring.c:2047 __handle_irq_event_percpu+0x2f8/0x830 kernel/irq/handle.c:156 handle_irq_event_percpu kernel/irq/handle.c:196 [inline] handle_irq_event+0x105/0x280 kernel/irq/handle.c:213 handle_edge_irq+0x258/0xd20 kernel/irq/chip.c:828 asm_call_irq_on_stack+0xf/0x20 __run_irq_on_irqstack arch/x86/include/asm/irq_stack.h:48 [inline] run_irq_on_irqstack_cond arch/x86/include/asm/irq_stack.h:101 [inline] handle_irq arch/x86/kernel/irq.c:230 [inline] __common_interrupt arch/x86/kernel/irq.c:249 [inline] common_interrupt+0xe2/0x190 arch/x86/kernel/irq.c:239 asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:626 Fixes: 4ca40c2c ("md/raid10: Allow replacement device to be replace old drive.") Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com>
-
由 Li Nan 提交于
hulk inclusion category: bugfix bugzilla: 188527, https://gitee.com/openeuler/kernel/issues/I6O3HO CVE: NA -------------------------------- need_replace will be set to 1 if no-Faulty mreplace exists, and mreplace will be deref later. However, the latter check of mreplace might set mreplace to NULL, null-ptr-deref occurs if need_replace is 1 at this time. Fix it by merging two checks into one. Fixes: ee37d731 ("md/raid10: Fix raid10 replace hang when new added disk faulty") Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com>
-
由 Li Nan 提交于
hulk inclusion category: bugfix bugzilla: 188787, https://gitee.com/openeuler/kernel/issues/I78YIW CVE: NA -------------------------------- When we remove a disk which has replacement, first set rdev to NULL and then set replacement to rdev, finally set replacement to NULL (see raid10_remove_disk()). If io is submitted during the same time, it might read both rdev and replacement as NULL, and io will not be submitted. rdev -> NULL read rdev replacement -> NULL read replacement Fix it by reading replacement first and rdev later, meanwhile, use smp_mb() to prevent memory reordering. Fixes: 475b0321 ("md/raid10: writes should get directed to replacement as well as original.") Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com>
-
由 Li Nan 提交于
hulk inclusion category: bugfix bugzilla: 188804, https://gitee.com/openeuler/kernel/issues/I78YIS CVE: NA -------------------------------- When add a new disk to raid10, it will traverse conf->mirror from start and find one of the following mirror: 1. mirror->rdev is set to WantReplacement and it have no replacement, set new disk to mirror->replacement. 2. no rdev, set new disk to mirror->rdev. There is a array as below (sda is set to WantReplacement): Number Major Minor RaidDevice State 0 8 0 0 active sync set-A /dev/sda - 0 0 1 removed 2 8 32 2 active sync set-A /dev/sdc 3 8 48 3 active sync set-B /dev/sdd Use 'mdadm --add' to add a new disk to this array, the new disk will become sda's replacement instead of add to removed position, which is confusing for users. Meanwhile, after new disk recovery success, sda will be set to Faulty. Prioritize adding disk to 'removed' mirror is a better choice. In the above scenario, the behavior is the same as before, except sda will not be deleted. Before other disks are added, continued use sda is more reliable. Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com>
-
由 Li Nan 提交于
hulk inclusion category: bugfix, https://gitee.com/openeuler/kernel/issues/I71EKW bugzilla: 188628 CVE: NA -------------------------------- We first set rdev to WantRemove, and check if there is any io pending, if so, we will clear flag and return busy in raid10_remove_disk(). io will loss as below: raid10_remove_disk set WantRemove write rdev if WantRemove do not submit io if rdev->nr_pending clear WantRemove return BUSY read rdev get error data Fix it by md_error the rdev which io pending while removing. When the code reaches this point, it means this rdev will be removed later, so setting it as faulty has little impact. Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com>
-
由 Li Nan 提交于
hulk inclusion category: bugfix bugzilla: 188533, https://gitee.com/openeuler/kernel/issues/I6O7YB CVE: NA -------------------------------- commit ceff49d9 ("md/raid1: fix a race between removing rdev and access conf->mirrors[i].rdev") fix a null-ptr-deref about raid1. There is same bug in raid10 and fix it in the same way. There is no sync_thread running while removing rdev, no need to check the flag in raid10_sync_request(). Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com>
-
由 Li Nan 提交于
hulk inclusion category: bugfix bugzilla: 188380, https://gitee.com/openeuler/kernel/issues/I6GISC CVE: NA -------------------------------- commit fe630de0 ("md/raid10: avoid deadlock on recovery.") allowed normal io and sync io to exist at the same time. Task hung will occur as below: T1 T2 T3 T4 raid10d handle_read_error allow_barrier conf->nr_pending-- -> 0 //submit sync io raid10_sync_request raise_barrier ->will not be blocked ... //submit to drivers raid10_read_request wait_barrier conf->nr_pending++ -> 1 //retry read fail raid10_end_read_request reschedule_retry add to retry_list conf->nr_queued++ -> 1 //sync io fail end_sync_read __end_sync_read reschedule_retry add to retry_list conf->nr_queued++ -> 2 ... handle_read_error freeze_array wait nr_pending == nr_queued+1 ->1 ->3 //task hung retry read and sync io will be added to retry_list(nr_queued->2) if they fails. raid10d() called handle_read_error() and hung in freeze_array(). nr_queued will not decrease because raid10d is blocked, nr_pending will not increase because conf->barrier is not released. Fix it by moving allow_barrier() after raid10_read_request(). raise_barrier() will wait for nr_waiting to become 0. Therefore, sync io and regular io will not be issued at the same time. We also removed the check of nr_queued. It can be 0 but don't need to be blocked. MD_RECOVERY_RUNNING always is set after this patch, because all sync io is waitting in raise_barrier(), remove it, too. Fixes: fe630de0 ("md/raid10: avoid deadlock on recovery.") Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com>
-
由 Yu Kuai 提交于
mainline inclusion from mainline-v6.1-rc1 commit ed2e063f category: bugfix bugzilla: 188380, https://gitee.com/openeuler/kernel/issues/I6GISC CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=ed2e063f92c44c891ccd883e289dde6ca870edcc -------------------------------- Currently the nasty condition in wait_barrier() is hard to read. This patch factors out the condition into a function. There are no functional changes. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Acked-by: NPaul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: NLogan Gunthorpe <logang@deltatee.com> Acked-by: NGuoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: NSong Liu <song@kernel.org> conflict: drivers/md/raid10.c Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com>
-
由 Li Nan 提交于
hulk inclusion category: bugfix bugzilla: 188628, https://gitee.com/openeuler/kernel/issues/I6WKDR CVE: NA -------------------------------- There is no limit to the number of io for raid10 plug, whitch may result in excessive memory usage and potential softlockup when a large number of io are submitted at once. There is no good way to fix it now, just add schedule point to prevent softlockup. Fixes: 57c67df4 ("md/raid10: submit IO from originating thread instead of md thread.") Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com>
-
由 Jiang Li 提交于
mainline inclusion from mainline-v6.2-rc1 commit b611ad14 category: bugfix bugzilla: 188662, https://gitee.com/openeuler/kernel/issues/I6UMUF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=b611ad14006e5be2170d9e8e611bf49dff288911 -------------------------------- fail run raid1 array when we assemble array with the inactive disk only, but the mdx_raid1 thread were not stop, Even if the associated resources have been released. it will caused a NULL dereference when we do poweroff. This causes the following Oops: [ 287.587787] BUG: kernel NULL pointer dereference, address: 0000000000000070 [ 287.594762] #PF: supervisor read access in kernel mode [ 287.599912] #PF: error_code(0x0000) - not-present page [ 287.605061] PGD 0 P4D 0 [ 287.607612] Oops: 0000 [#1] SMP NOPTI [ 287.611287] CPU: 3 PID: 5265 Comm: md0_raid1 Tainted: G U 5.10.146 #0 [ 287.619029] Hardware name: xxxxxxx/To be filled by O.E.M, BIOS 5.19 06/16/2022 [ 287.626775] RIP: 0010:md_check_recovery+0x57/0x500 [md_mod] [ 287.632357] Code: fe 01 00 00 48 83 bb 10 03 00 00 00 74 08 48 89 ...... [ 287.651118] RSP: 0018:ffffc90000433d78 EFLAGS: 00010202 [ 287.656347] RAX: 0000000000000000 RBX: ffff888105986800 RCX: 0000000000000000 [ 287.663491] RDX: ffffc90000433bb0 RSI: 00000000ffffefff RDI: ffff888105986800 [ 287.670634] RBP: ffffc90000433da0 R08: 0000000000000000 R09: c0000000ffffefff [ 287.677771] R10: 0000000000000001 R11: ffffc90000433ba8 R12: ffff888105986800 [ 287.684907] R13: 0000000000000000 R14: fffffffffffffe00 R15: ffff888100b6b500 [ 287.692052] FS: 0000000000000000(0000) GS:ffff888277f80000(0000) knlGS:0000000000000000 [ 287.700149] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 287.705897] CR2: 0000000000000070 CR3: 000000000320a000 CR4: 0000000000350ee0 [ 287.713033] Call Trace: [ 287.715498] raid1d+0x6c/0xbbb [raid1] [ 287.719256] ? __schedule+0x1ff/0x760 [ 287.722930] ? schedule+0x3b/0xb0 [ 287.726260] ? schedule_timeout+0x1ed/0x290 [ 287.730456] ? __switch_to+0x11f/0x400 [ 287.734219] md_thread+0xe9/0x140 [md_mod] [ 287.738328] ? md_thread+0xe9/0x140 [md_mod] [ 287.742601] ? wait_woken+0x80/0x80 [ 287.746097] ? md_register_thread+0xe0/0xe0 [md_mod] [ 287.751064] kthread+0x11a/0x140 [ 287.754300] ? kthread_park+0x90/0x90 [ 287.757974] ret_from_fork+0x1f/0x30 In fact, when raid1 array run fail, we need to do md_unregister_thread() before raid1_free(). Signed-off-by: NJiang Li <jiang.li@ugreen.com> Signed-off-by: NSong Liu <song@kernel.org> Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com>
-
由 Li Nan 提交于
hulk inclusion category: bugfix bugzilla: 188553, https://gitee.com/openeuler/kernel/issues/I6TNFX CVE: NA -------------------------------- rdev->del_work has not been queued to md_rdev_misc_wq and flush_workqueue will not flush it if tow threads add and remove same device. sysfs might WARN duplicate filename as below. //T1 //T2 mdadm write super add success remove unbind_rdev_from_array md_ioctl flush_workqueue INIT_WORK queue_work md_add_new_disk duplicate filename dev-xxx Check if there is any kobj with the same name, and return busy if true. Fixes: 5792a285 ("md: avoid a deadlock when removing a device from an md array via sysfs") Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com>
-
由 Li Nan 提交于
hulk inclusion category: bugfix bugzilla: 188553, https://gitee.com/openeuler/kernel/issues/I6TNFX CVE: NA -------------------------------- If we want to remove a device, first we delete it from mddev->disks list, then init rdev->del_work to put it (see unbind_rdev_from_array()). flush_rdev_wq() traverses mddev->disks to check if there is any pending rdev->del_work, if so, flush it. Howerver, rdev will not be in the list of mddev->disks if rdev->del_work exists, and flush_workqueue() will never be executed. Replace it with flush_workqueue() to ensure del_work has been completed when adding devices. Fixes: cc1ffe61 ("md: add new workqueue for delete rdev") Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com>
-
由 David Sloan 提交于
mainline inclusion from mainline-v6.0-rc3 commit 5e8daf90 category: bugfix bugzilla: 188015, https://gitee.com/openeuler/kernel/issues/I6OERX CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=5e8daf906f890560df430d30617c692a794acb73 -------------------------------- A race condition still exists when removing and re-creating md devices in test cases. However, it is only seen on some setups. The race condition was tracked down to a reference still being held to the kobject by the rdev in the md_rdev_misc_wq which will be released in rdev_delayed_delete(). md_alloc() waits for previous deletions by waiting on the md_misc_wq, but the md_rdev_misc_wq may still be holding a reference to a recently removed device. To fix this, also flush the md_rdev_misc_wq in md_alloc(). Signed-off-by: NDavid Sloan <david.sloan@eideticom.com> [logang@deltatee.com: rewrote commit message] Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> Signed-off-by: NSong Liu <song@kernel.org> Conflict: drivers/md/md.c Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com>
-
由 Jinke Han 提交于
mainline inclusion from mainline-v6.0-rc1 commit 14a6e2eb category: bugfix bugzilla: 188088, https://gitee.com/openeuler/kernel/issues/I66GIL CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=14a6e2eb7df5c7897c15b109cba29ab0c4a791b6 ---------------------------------------------------------------------- In our test of iocost, we encountered some list add/del corruptions of inner_walk list in ioc_timer_fn. The reason can be described as follows: cpu 0 cpu 1 ioc_qos_write ioc_qos_write ioc = q_to_ioc(queue); if (!ioc) { ioc = kzalloc(); ioc = q_to_ioc(queue); if (!ioc) { ioc = kzalloc(); ... rq_qos_add(q, rqos); } ... rq_qos_add(q, rqos); ... } When the io.cost.qos file is written by two cpus concurrently, rq_qos may be added to one disk twice. In that case, there will be two iocs enabled and running on one disk. They own different iocgs on their active list. In the ioc_timer_fn function, because of the iocgs from two iocs have the same root iocg, the root iocg's walk_list may be overwritten by each other and this leads to list add/del corruptions in building or destroying the inner_walk list. And so far, the blk-rq-qos framework works in case that one instance for one type rq_qos per queue by default. This patch make this explicit and also fix the crash above. Signed-off-by: NJinke Han <hanjinke.666@bytedance.com> Reviewed-by: NMuchun Song <songmuchun@bytedance.com> Acked-by: NTejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220720093616.70584-1-hanjinke.666@bytedance.comSigned-off-by: NJens Axboe <axboe@kernel.dk> Conflicts: block/blk-rq-qos.h block/blk-wbt.c Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com>
-
由 Li Nan 提交于
hulk inclusion category: bugfix bugzilla: 188152, https://gitee.com/openeuler/kernel/issues/I67BPT CVE: NA ------------------------------- adjust_inuse_and_calc_cost() use spin_lock_irq and IRQ will enable when unlock. DEADLOCK might happen if we have held other locks before: ================================ WARNING: inconsistent lock state 5.10.0-02758-g8e5f91fd772f #26 Not tainted -------------------------------- inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. kworker/2:3/388 [HC0[0]:SC0[0]:HE0:SE1] takes: ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: spin_lock_irq ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: bfq_bio_merge+0x141/0x390 {IN-HARDIRQ-W} state was registered at: __lock_acquire+0x3d7/0x1070 lock_acquire+0x197/0x4a0 __raw_spin_lock_irqsave _raw_spin_lock_irqsave+0x3b/0x60 bfq_idle_slice_timer_body bfq_idle_slice_timer+0x53/0x1d0 __run_hrtimer+0x477/0xa70 __hrtimer_run_queues+0x1c6/0x2d0 hrtimer_interrupt+0x302/0x9e0 local_apic_timer_interrupt __sysvec_apic_timer_interrupt+0xfd/0x420 run_sysvec_on_irqstack_cond sysvec_apic_timer_interrupt+0x46/0xa0 asm_sysvec_apic_timer_interrupt+0x12/0x20 irq event stamp: 837522 hardirqs last enabled at (837521): [<ffffffff84b9419d>] __raw_spin_unlock_irqrestore hardirqs last enabled at (837521): [<ffffffff84b9419d>] _raw_spin_unlock_irqrestore+0x3d/0x40 hardirqs last disabled at (837522): [<ffffffff84b93fa3>] __raw_spin_lock_irq hardirqs last disabled at (837522): [<ffffffff84b93fa3>] _raw_spin_lock_irq+0x43/0x50 softirqs last enabled at (835852): [<ffffffff84e00558>] __do_softirq+0x558/0x8ec softirqs last disabled at (835845): [<ffffffff84c010ff>] asm_call_irq_on_stack+0xf/0x20 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&bfqd->lock); <Interrupt> lock(&bfqd->lock); *** DEADLOCK *** 3 locks held by kworker/2:3/388: #0: ffff888107af0f38 ((wq_completion)kthrotld){+.+.}-{0:0}, at: process_one_work+0x742/0x13f0 #1: ffff8881176bfdd8 ((work_completion)(&td->dispatch_work)){+.+.}-{0:0}, at: process_one_work+0x777/0x13f0 #2: ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: spin_lock_irq #2: ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: bfq_bio_merge+0x141/0x390 stack backtrace: CPU: 2 PID: 388 Comm: kworker/2:3 Not tainted 5.10.0-02758-g8e5f91fd772f #26 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Workqueue: kthrotld blk_throtl_dispatch_work_fn Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x107/0x167 print_usage_bug valid_state mark_lock_irq.cold+0x32/0x3a mark_lock+0x693/0xbc0 mark_held_locks+0x9e/0xe0 __trace_hardirqs_on_caller lockdep_hardirqs_on_prepare.part.0+0x151/0x360 trace_hardirqs_on+0x5b/0x180 __raw_spin_unlock_irq _raw_spin_unlock_irq+0x24/0x40 spin_unlock_irq adjust_inuse_and_calc_cost+0x4fb/0x970 ioc_rqos_merge+0x277/0x740 __rq_qos_merge+0x62/0xb0 rq_qos_merge bio_attempt_back_merge+0x12c/0x4a0 blk_mq_sched_try_merge+0x1b6/0x4d0 bfq_bio_merge+0x24a/0x390 __blk_mq_sched_bio_merge+0xa6/0x460 blk_mq_sched_bio_merge blk_mq_submit_bio+0x2e7/0x1ee0 __submit_bio_noacct_mq+0x175/0x3b0 submit_bio_noacct+0x1fb/0x270 blk_throtl_dispatch_work_fn+0x1ef/0x2b0 process_one_work+0x83e/0x13f0 process_scheduled_works worker_thread+0x7e3/0xd80 kthread+0x353/0x470 ret_from_fork+0x1f/0x30 Fixes: b0853ab4 ("blk-iocost: revamp in-period donation snapbacks") Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com>
-
由 Yu Kuai 提交于
hulk inclusion category: bugfix bugzilla: 188033, https://gitee.com/openeuler/kernel/issues/I663ZP CVE: NA -------------------------------- iocost is based on rq_qos, which can only work for request based device, thus it doesn't make sense to configure iocost for bio based device. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @openeuler-sync-bot Origin pull request: https://gitee.com/openeuler/kernel/pulls/895 PR sync from: Liu Jian liujian56@huawei.com https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/thread/H22NWZPMQBGSVUV22ISL3FTVOXQH676J/ Link:https://gitee.com/openeuler/kernel/pulls/899 Reviewed-by: Jialin Zhang <zhangjialin11@huawei.com> Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>
-
由 Liu Jian 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I79J3X CVE: N/A ---------------------------------------------------- It is already enabled by default on the x86 and loongarch platforms, so let's enable it on the arm64 platform. Signed-off-by: NLiu Jian <liujian56@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawe.com> (cherry picked from commit ace86952)
-
- 02 6月, 2023 6 次提交
-
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @openeuler-sync-bot Origin pull request: https://gitee.com/openeuler/kernel/pulls/881 Currently, the bitfield of HWCAP2_WFXT is (1 << 23) inconsistent with mainline set (1UL << 31). To avoid possible uapi break, keep the bitfield consistent with mainline. Link:https://gitee.com/openeuler/kernel/pulls/888 Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
由 Yanan Wang 提交于
mainline inclusion from mainline-v5.19-rc1 commit b2c4caf3 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I78WD0 ------------------------------------------------- Currently, the bitfield of HWCAP2_WFXT is (1 << 23) inconsistent with mainline set (1UL << 31). To avoid possible uapi break, keep the bitfield consistent with mainline. Signed-off-by: NYanan Wang <wangyanan55@huawei.com> (cherry picked from commit c0da3054)
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @openeuler-sync-bot Origin pull request: https://gitee.com/openeuler/kernel/pulls/860 Backport upstream commit 7adc6885 powercap: intel_rapl: add support for Emerald Rapids to OLK-5.10 to enable intel_rapl driver on EMR. Intel-kernel issue: https://gitee.com/openeuler/intel-kernel/issues/I77XHY Test: /sys/devices/virtual/powercap/intel-rapl/ will be available on EMR platform after this commit is merged. Known issue: N/A Default config change: N/A Link:https://gitee.com/openeuler/kernel/pulls/885 Reviewed-by: Jason Zeng <jason.zeng@intel.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
由 Zhang Rui 提交于
mainline inclusion from mainline-v6.3-rc1 commit 7adc6885 category: feature bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I77XHY CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7adc6885259edd4ef5c9a7a62fd4270cf38fdbfb Intel-SIG: commit 7adc6885 powercap: intel_rapl: add support for Emerald Rapids Backported to OLK-5.10 to enable intel_rapl driver on EMR. ------------------------------------- Add Emerald Rapids to the list of supported processor models in the Intel RAPL power capping driver. Signed-off-by: NZhang Rui <rui.zhang@intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NXiaolong Wang <xiaolong.wang@intel.com> (cherry picked from commit a4a34465)
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @stinft 1.#I7A2PB 2.#I7A2PK 3.#I7A2SA 4.#I7A2V2 5.#I7A2VV Link:https://gitee.com/openeuler/kernel/pulls/878 Reviewed-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @openeuler-sync-bot Origin pull request: https://gitee.com/openeuler/kernel/pulls/877 There are worse performance with the 'Fixes' when running "./lat_ctx -P $SYNC_MAX -s 64 16". The 'Fixes' which allocates memory for p->prefer_cpus even if "prefer_cpus" not be set. Before the 'Fixes', only test "p->prefer_cpus", after, add test "!cpumask_empty(p->prefer_cpus)" which causing performance degradation. select_task_rq_fair ->set_task_select_cpus ->prefer_cpus_valid ---- test cpumask_empty(p->prefer_cpus) Link:https://gitee.com/openeuler/kernel/pulls/879 Reviewed-by: Zucheng Zheng <zhengzucheng@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
- 01 6月, 2023 6 次提交
-
-
由 Hui Tang 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7A718 -------------------------------- There are worse performance with the 'Fixes' when running "./lat_ctx -P $SYNC_MAX -s 64 16". The 'Fixes' which allocates memory for p->prefer_cpus even if "prefer_cpus" not be set. Before the 'Fixes', only test "p->prefer_cpus", after, add test "!cpumask_empty(p->prefer_cpus)" which causing performance degradation. select_task_rq_fair ->set_task_select_cpus ->prefer_cpus_valid ---- test cpumask_empty(p->prefer_cpus) Fixes: ebeb84ad ("cpuset: Introduce new interface for scheduler ...") Signed-off-by: NHui Tang <tanghui20@huawei.com> (cherry picked from commit d8f77f89)
-
由 Chengchang Tang 提交于
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7A2VV --------------------------------------------------------------- When running rdma with the kasan version, the following error will be reported: [ 9013.831512] BUG: sleeping function called from invalid context at kernel/workqueue.c:3111 [ 9013.839719] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 7938, name: ib_send_bw [ 9013.847917] preempt_count: 1, expected: 0 [ 9013.851932] RCU nest depth: 0, expected: 0 [ 9013.856034] INFO: lockdep is turned off. [ 9013.859963] CPU: 28 PID: 7938 Comm: ib_send_bw Kdump: loaded Tainted: G W O 6.3.0-rc4+ #1 [ 9013.869360] Hardware name: To be filled by O.E.M. HixxxxEVB 1P 2T2N V2.1.5/To be filled by O.E.M., BIOS HixxxxEVB 1P 2T2N V2.1.5 05/04/ 23 [ 9013.881883] Call trace: [ 9013.884322] dump_backtrace+0xac/0xf0 [ 9013.887984] show_stack+0x20/0x38 [ 9013.891294] dump_stack_lvl+0xbc/0x120 [ 9013.895040] dump_stack+0x1c/0x28 [ 9013.898350] __might_resched+0x1c8/0x348 [ 9013.902271] __might_sleep+0x78/0xf8 [ 9013.905841] __flush_work+0x118/0x718 [ 9013.909501] __cancel_work_timer+0x228/0x2d8 [ 9013.913769] cancel_delayed_work_sync+0x1c/0x30 [ 9013.918296] cleanup_dca_context+0x48/0x1e8 [hns_roce_hw_v2] [ 9013.923998] hns_roce_unregister_udca+0x3c/0x98 [hns_roce_hw_v2] [ 9013.930043] hns_roce_dealloc_ucontext+0x15c/0x180 [hns_roce_hw_v2] [ 9013.936348] uverbs_destroy_ufile_hw+0xd0/0x160 [ 9013.940877] ib_uverbs_close+0x3c/0x170 [ 9013.944709] __fput+0xfc/0x3c8 [ 9013.947761] ____fput+0x18/0x30 [ 9013.950898] task_work_run+0x130/0x1b8 [ 9013.954643] do_exit+0x4f0/0xfa8 [ 9013.957867] do_group_exit+0x60/0xf8 [ 9013.961437] get_signal+0xf2c/0x1018 [ 9013.965009] do_notify_resume+0x2d0/0x1560 [ 9013.969102] el0_svc+0x98/0xa0 [ 9013.972152] el0t_64_sync_handler+0xb8/0xc0 [ 9013.976332] el0t_64_sync+0x1a4/0x1a8 This issue is introduced by DCA aging feature. To stop aging work during dca unload, cacel_delayed_work_sync() is called which cannot be used in an atomic context. At the same time, in order to avoid concurrency in this process, a spin lock is used for protection. But in fact, the aging of DCA is triggered during IO, and the unloading process of DCA is included in the process of ucontext detroy or driver uninstallation. At this time, all the QPs has been released. So, there is no concurrency in this process, and the spinlock here is redundant. This patch removes the unnecessary spinlock. Fixes: d3caaebd ("RDMA/hns: Optimize user DCA perfermance by sharing DCA status") Signed-off-by: NChengchang Tang <tangchengchang@huawei.com>
-
由 Junxian Huang 提交于
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7A2V2 --------------------------------------------------------------- When vf check fails, the current code dealloc ib_dev without kfree hr_dev->priv. This patch fixes this error. Fixes: a39f16a1 ("RDMA/hns: fix the error of RoCE VF based on RoCE Bonding PF") Signed-off-by: NJunxian Huang <huangjunxian6@hisilicon.com>
-
由 Junxian Huang 提交于
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7A2SA --------------------------------------------------------------- Currently, direct wqe is not supported for wr-list. RoCE driver excludes direct wqe for wr-list by judging whether the number of wr is 1. For a wr-list where the second wr is a length-error atomic wr, the post-send driver handles the first wr and adds 1 to the wr number counter firstly. While handling the second wr, the driver finds out a length error and terminates the wr handle process, remaining the counter at 1. This causes the driver mistakenly judges there is only 1 wr and thus enters the direct wqe process, carrying the current length-error atomic wqe. This patch fixes the error by adding a judgement whether the current wr is a bad wr. If so, use the normal doorbell process but not direct wqe despite the wr number is 1. Fixes: a77e1f87 ("RDMA/hns: Remove the condition of light load for posting DWQE") Signed-off-by: NJunxian Huang <huangjunxian6@hisilicon.com>
-
由 Junxian Huang 提交于
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7A2V2 --------------------------------------------------------------- This patch fixes inaccurate error label name in init instance. Fixes: 6f5f556d ("RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT") Signed-off-by: NJunxian Huang <huangjunxian6@hisilicon.com>
-
由 Junxian Huang 提交于
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7A2PB --------------------------------------------------------------- Remove VF extend configuration since the relative registers are configured in firmware currently. Fixes: 7e3913a7 ("RDMA/hns: Support getting max QP number from firmware") Signed-off-by: NJunxian Huang <huangjunxian6@hisilicon.com>
-