- 31 5月, 2023 7 次提交
-
-
由 Yu Kuai 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6OMCC CVE: NA -------------------------------- Currently, for idle and frozen, action_store will hold 'reconfig_mutex' and call md_reap_sync_thread() to stop sync thread, however, this will cause deadlock (explained in the next patch). In order to fix the problem, following patch will release 'reconfig_mutex' and wait on 'resync_wait', like md_set_readonly() and do_md_stop() does. Consider that action_store() will set/clear 'MD_RECOVERY_FROZEN' unconditionally, which might cause unexpected problems, for example, frozen just set 'MD_RECOVERY_FROZEN' and is still in progress, while 'idle' clear 'MD_RECOVERY_FROZEN' and new sync thread is started, which might starve in progress frozen. This patch add a mutex to synchronize idle and frozen from action_store(). Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Yu Kuai 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6OMCC CVE: NA -------------------------------- Prepare to handle 'idle' and 'frozen' differently to fix a deadlock, there are no functional changes except that MD_RECOVERY_RUNNING is checked again after 'reconfig_mutex' is held. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Yu Kuai 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6OMCC CVE: NA -------------------------------- This reverts commit 9dfbdafd. Because it will introduce a defect that sync_thread can be running while MD_RECOVERY_RUNNING is cleared, which will cause some unexpected problems, for example: list_add corruption. prev->next should be next (ffff0001ac1daba0), but was ffff0000ce1a02a0. (prev=ffff0000ce1a02a0). Call trace: __list_add_valid+0xfc/0x140 insert_work+0x78/0x1a0 __queue_work+0x500/0xcf4 queue_work_on+0xe8/0x12c md_check_recovery+0xa34/0xf30 raid10d+0xb8/0x900 [raid10] md_thread+0x16c/0x2cc kthread+0x1a4/0x1ec ret_from_fork+0x10/0x18 This is because work is requeued while it's still inside workqueue: t1: t2: action_store mddev_lock if (mddev->sync_thread) mddev_unlock md_unregister_thread // first sync_thread is done md_check_recovery mddev_try_lock /* * once MD_RECOVERY_DONE is set, new sync_thread * can start. */ set_bit(MD_RECOVERY_RUNNING, &mddev->recovery) INIT_WORK(&mddev->del_work, md_start_sync) queue_work(md_misc_wq, &mddev->del_work) test_and_set_bit(WORK_STRUCT_PENDING_BIT, ...) // set pending bit insert_work list_add_tail mddev_unlock mddev_lock_nointr md_reap_sync_thread // MD_RECOVERY_RUNNING is cleared mddev_unlock t3: // before queued work started from t2 md_check_recovery // MD_RECOVERY_RUNNING is not set, a new sync_thread can be started INIT_WORK(&mddev->del_work, md_start_sync) work->data = 0 // work pending bit is cleared queue_work(md_misc_wq, &mddev->del_work) insert_work list_add_tail // list is corrupted This patch revert the commit to fix the problem, the deadlock this commit tries to fix will be fixed in following patches. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Signed-off-by: NSong Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230322064122.2384589-2-yukuai1@huaweicloud.comReviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Guoqing Jiang 提交于
mainline inclusion from mainline-v6.0-rc1 commit 9dfbdafd category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6OMCC CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.3-rc3&id=9dfbdafda3b34e262e43e786077bab8e476a89d1 -------------------------------- Since the bug which commit 8b48ec23 ("md: don't unregister sync_thread with reconfig_mutex held") fixed is related with action_store path, other callers which reap sync_thread didn't need to be changed. Let's pull md_unregister_thread from md_reap_sync_thread, then fix previous bug with belows. 1. unlock mddev before md_reap_sync_thread in action_store. 2. save reshape_position before unlock, then restore it to ensure position not changed accidentally by others. Signed-off-by: NGuoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: NSong Liu <song@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Yu Kuai 提交于
mainline inclusion from mainline-v6.3-rc2 commit 428913bc category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6MQLP CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e5cfefa97bccf956ea0bb6464c1f6c84fd7a8d9f -------------------------------- If disk_scan_partitions() is called with 'FMODE_EXCL', blkdev_get_by_dev() will be called without 'FMODE_EXCL', however, follow blkdev_put() is still called with 'FMODE_EXCL', which will cause 'bd_holders' counter to leak. Fix the problem by using the right mode for blkdev_put(). Reported-by: syzbot+2bcc0d79e548c4f62a59@syzkaller.appspotmail.com Link: https://lore.kernel.org/lkml/f9649d501bc8c3444769418f6c26263555d9d3be.camel@linux.ibm.com/T/Tested-by: NJulian Ruess <julianr@linux.ibm.com> Fixes: e5cfefa9 ("block: fix scan partition for exclusively open device again") Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Yu Kuai 提交于
mainline inclusion from mainline-v6.3-rc1 commit e5cfefa9 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6MQLP CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e5cfefa97bccf956ea0bb6464c1f6c84fd7a8d9f -------------------------------- As explained in commit 36369f46 ("block: Do not reread partition table on exclusively open device"), reread partition on the device that is exclusively opened by someone else is problematic. This patch will make sure partition scan will only be proceed if current thread open the device exclusively, or the device is not opened exclusively, and in the later case, other scanners and exclusive openers will be blocked temporarily until partition scan is done. Fixes: 10c70d95 ("block: remove the bd_openers checks in blk_drop_partitions") Cc: <stable@vger.kernel.org> Suggested-by: NJan Kara <jack@suse.cz> Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230217022200.3092987-3-yukuai1@huaweicloud.comSigned-off-by: NJens Axboe <axboe@kernel.dk> Conflicts: block/genhd.c block/ioctl.c Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Christoph Hellwig 提交于
mainline inclusion from mainline-v5.17-rc1 commit e16e506c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6MQLP CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e16e506ccd673a3a888a34f8f694698305840044 -------------------------------- Unify the functionality that implements a partition rescan for a gendisk. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211122130625.1136848-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk> Conflicts: block/blk.h block/genhd.c block/ioctl.c Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
- 30 5月, 2023 29 次提交
-
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @zhaowenhui8 Expand qos_level from {-1,0} to [-2, 2], to distinguish the tasks expected to be with extremely high or low priority level. Using qos_level_weight to reweight the shares when calculating group's weight. Meanwhile, set offline task's schedule policy to SCHED_IDLE so that it can be preempted at check_preempt_wakeup. kernel option: CONFIG_QOS_SCHED_MULTILEVEL Link:https://gitee.com/openeuler/kernel/pulls/795 Reviewed-by: Zucheng Zheng <zhengzucheng@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @henryze The dying CPU has been removed from the online_mask, but the hotplug notifier have not been called to fold the percpu count into the global counter sum. This race condition is avoided by including the dying CPU in the iteration mask. Link:https://gitee.com/openeuler/kernel/pulls/850 Reviewed-by: Wei Li <liwei391@huawei.com> Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @henryze When users want to get frequency by cpuinfo_cur_freq under cpufreq sysfs, they often get the invalid result like: $ cat /sys/devices/system/cpu/cpu6/cpufreq/cpuinfo_cur_freq 4294967295 So this series provides fixes to the concerned issue. Reference: https://lore.kernel.org/all/20230516133248.712242-3-zengheng4@huawei.com/ Link:https://gitee.com/openeuler/kernel/pulls/849 Reviewed-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @xiongzhou4 Provides value profile support for kernel. The implementation is based on the existing GCOV feature of the kernel. When the option is opened, the GCOV option `-fprofile-arcs` is changed to `-fprofile-generate`. The latter includes the former and value profile, which can provide more comprehensive feedback directed optimization ability. The added feature is called _PGO kernel_ , which can be used to improve the performance of a single application runtime environment. kernel option(default is n): CONFIG_PGO_KERNEL=y Link:https://gitee.com/openeuler/kernel/pulls/773 Reviewed-by: Liu Chao <liuchao173@huawei.com> Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @svishen This pull Requests support hns3 driver provide ptp driver to get 1588 clock from ethernet. But only the first PF on main chip can support this, so if getting ptp time from other chip, may have some bus latency. The PTP sync device use to eliminate the bus latency. issue: https://gitee.com/openeuler/kernel/issues/I78MGV Link:https://gitee.com/openeuler/kernel/pulls/842 Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @NNNNicole 1.sched/pelt: Relax the sync of *_sum with *_avg (patch1-patch3) 2.Adjust NUMA imbalance for multiple LLCs(patch4-patch6) 3.sched: Queue task on wakelist in the same llc if the wakee cpu is idle(patch7) 4.Clear ttwu_pending after enqueue_task(patch8) Link:https://gitee.com/openeuler/kernel/pulls/844 Reviewed-by: Zucheng Zheng <zhengzucheng@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @stinft #I76PY9 #I76PUJ #I76PRT Link:https://gitee.com/openeuler/kernel/pulls/837 Reviewed-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
由 xiongzhou4 提交于
GCC inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I734PM --------------------------------- This feature add value profile support for kernel by changing GCOV option "-fprofile-arcs" to "-fprofile-generate" when the new added config "PGO_KERNEL" is set to y. Like GCOV, the symbols required by value profile are migrated from GCC source codes as they cannot be linked to kernel. Specifically, from libgcc/libgcov-profiler.c to kernel/gcov/gcc_base.c. kernel options: CONFIG_PGO_KERNEL=y Signed-off-by: NXiong Zhou <xiongzhou4@huawei.com> Reviewed-by: NLi Yancheng <liyancheng@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @xiao_jiang_shui ACC support no-sva feature issue:https://gitee.com/openeuler/kernel/issues/I773SD Link:https://gitee.com/openeuler/kernel/pulls/803 Reviewed-by: Yang Shen <shenyang39@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
由 Zhao Wenhui 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I737X1 ------------------------------- Expand qos_level from {-1,0} to [-2, 2], to distinguish the tasks expected to be with extremely high or low priority level. Using qos_level_weight to reweight the shares when calculating group's weight. Meanwhile, set offline task's schedule policy to SCHED_IDLE so that it can be preempted at check_preempt_wakeup. Signed-off-by: NZhao Wenhui <zhaowenhui8@huawei.com>
-
由 Tianchen Ding 提交于
mainline inclusion from mainline-v6.2-rc1 commit d6962c4f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I78WM8 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.4-rc3&id=d6962c4fe8f96f7d384d6489b6b5ab5bf3e35991 -------------------------------- We found a long tail latency in schbench whem m*t is close to nr_cpus. (e.g., "schbench -m 2 -t 16" on a machine with 32 cpus.) This is because when the wakee cpu is idle, rq->ttwu_pending is cleared too early, and idle_cpu() will return true until the wakee task enqueued. This will mislead the waker when selecting idle cpu, and wake multiple worker threads on the same wakee cpu. This situation is enlarged by commit f3dd3f67 ("sched: Remove the limitation of WF_ON_CPU on wakelist if wakee cpu is idle") because it tends to use wakelist. Here is the result of "schbench -m 2 -t 16" on a VM with 32vcpu (Intel(R) Xeon(R) Platinum 8369B). Latency percentiles (usec): base base+revert_f3dd3f67 base+this_patch 50.0000th: 9 13 9 75.0000th: 12 19 12 90.0000th: 15 22 15 95.0000th: 18 24 17 *99.0000th: 27 31 24 99.5000th: 3364 33 27 99.9000th: 12560 36 30 We also tested on unixbench and hackbench, and saw no performance change. Signed-off-by: NTianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NMel Gorman <mgorman@suse.de> Link: https://lkml.kernel.org/r/20221104023601.12844-1-dtcccc@linux.alibaba.com
-
由 Guan Jing 提交于
mainline inclusion from mainline-v6.0-rc1 commit f3dd3f67 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I78WM8 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.4-rc3&id=f3dd3f674555bd9455c5ae7fafce0696bd9931b3 -------------------------------- Wakelist can help avoid cache bouncing and offload the overhead of waker cpu. So far, using wakelist within the same llc only happens on WF_ON_CPU, and this limitation could be removed to further improve wakeup performance. The commit 518cd623 ("sched: Only queue remote wakeups when crossing cache boundaries") disabled queuing tasks on wakelist when the cpus share llc. This is because, at that time, the scheduler must send IPIs to do ttwu_queue_wakelist. Nowadays, ttwu_queue_wakelist also supports TIF_POLLING, so this is not a problem now when the wakee cpu is in idle polling. Benefits: Queuing the task on idle cpu can help improving performance on waker cpu and utilization on wakee cpu, and further improve locality because the wakee cpu can handle its own rq. This patch helps improving rt on our real java workloads where wakeup happens frequently. Consider the normal condition (CPU0 and CPU1 share same llc) Before this patch: CPU0 CPU1 select_task_rq() idle rq_lock(CPU1->rq) enqueue_task(CPU1->rq) notify CPU1 (by sending IPI or CPU1 polling) resched() After this patch: CPU0 CPU1 select_task_rq() idle add to wakelist of CPU1 notify CPU1 (by sending IPI or CPU1 polling) rq_lock(CPU1->rq) enqueue_task(CPU1->rq) resched() We see CPU0 can finish its work earlier. It only needs to put task to wakelist and return. While CPU1 is idle, so let itself handle its own runqueue data. This patch brings no difference about IPI. This patch only takes effect when the wakee cpu is: 1) idle polling 2) idle not polling For 1), there will be no IPI with or without this patch. For 2), there will always be an IPI before or after this patch. Before this patch: waker cpu will enqueue task and check preempt. Since "idle" will be sure to be preempted, waker cpu must send a resched IPI. After this patch: waker cpu will put the task to the wakelist of wakee cpu, and send an IPI. Benchmark: We've tested schbench, unixbench, and hachbench on both x86 and arm64. On x86 (Intel Xeon Platinum 8269CY): schbench -m 2 -t 8 Latency percentiles (usec) before after 50.0000th: 8 6 75.0000th: 10 7 90.0000th: 11 8 95.0000th: 12 8 *99.0000th: 13 10 99.5000th: 15 11 99.9000th: 18 14 Unixbench with full threads (104) before after Dhrystone 2 using register variables 3011862938 3009935994 -0.06% Double-Precision Whetstone 617119.3 617298.5 0.03% Execl Throughput 27667.3 27627.3 -0.14% File Copy 1024 bufsize 2000 maxblocks 785871.4 784906.2 -0.12% File Copy 256 bufsize 500 maxblocks 210113.6 212635.4 1.20% File Copy 4096 bufsize 8000 maxblocks 2328862.2 2320529.1 -0.36% Pipe Throughput 145535622.8 145323033.2 -0.15% Pipe-based Context Switching 3221686.4 3583975.4 11.25% Process Creation 101347.1 103345.4 1.97% Shell Scripts (1 concurrent) 120193.5 123977.8 3.15% Shell Scripts (8 concurrent) 17233.4 17138.4 -0.55% System Call Overhead 5300604.8 5312213.6 0.22% hackbench -g 1 -l 100000 before after Time 3.246 2.251 On arm64 (Ampere Altra): schbench -m 2 -t 8 Latency percentiles (usec) before after 50.0000th: 14 10 75.0000th: 19 14 90.0000th: 22 16 95.0000th: 23 16 *99.0000th: 24 17 99.5000th: 24 17 99.9000th: 28 25 Unixbench with full threads (80) before after Dhrystone 2 using register variables 3536194249 3537019613 0.02% Double-Precision Whetstone 629383.6 629431.6 0.01% Execl Throughput 65920.5 65846.2 -0.11% File Copy 1024 bufsize 2000 maxblocks 1063722.8 1064026b.8 0.03% File Copy 256 bufsize 500 maxblocks 322684.5 318724.5 -1.23% File Copy 4096 bufsize 8000 maxblocks 2348285.3 2328804.8 -0.83% Pipe Throughput 133542875.3 131619389.8 -1.44% Pipe-based Context Switching 3215356.1 3576945.1 11.25% Process Creation 108520.5 120184.6 10.75% Shell Scripts (1 concurrent) 122636.3 121888 -0.61% Shell Scripts (8 concurrent) 17462.1 17381.4 -0.46% System Call Overhead 4429998.9 44350061.7 0.11% hackbench -g 1 -l 100000 before after Time 4.217 2.916 Our patch has improvement on schbench, hackbench and Pipe-based Context Switching of unixbench when there exists idle cpus, and no obvious regression on other tests of unixbench. This can help improve rt in scenes where wakeup happens frequently. Signed-off-by: NTianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NValentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20220608233412.327341-3-dtcccc@linux.alibaba.comSigned-off-by: NGuan Jing <guanjing6@huawei.com>
-
由 Guan Jing 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I78WM8 CVE: NA -------------------------------- Signed-off-by: NGuan Jing <guanjing6@huawei.com>
-
由 Guan Jing 提交于
mainline inclusion from mainline-v5.18-rc1 commit e496132e category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I78WM8 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.4-rc3&id=e496132ebedd870b67f1f6d2428f9bb9d7ae27fd -------------------------------- Commit 7d2b5dd0 ("sched/numa: Allow a floating imbalance between NUMA nodes") allowed an imbalance between NUMA nodes such that communicating tasks would not be pulled apart by the load balancer. This works fine when there is a 1:1 relationship between LLC and node but can be suboptimal for multiple LLCs if independent tasks prematurely use CPUs sharing cache. Zen* has multiple LLCs per node with local memory channels and due to the allowed imbalance, it's far harder to tune some workloads to run optimally than it is on hardware that has 1 LLC per node. This patch allows an imbalance to exist up to the point where LLCs should be balanced between nodes. On a Zen3 machine running STREAM parallelised with OMP to have on instance per LLC the results and without binding, the results are 5.17.0-rc0 5.17.0-rc0 vanilla sched-numaimb-v6 MB/sec copy-16 162596.94 ( 0.00%) 580559.74 ( 257.05%) MB/sec scale-16 136901.28 ( 0.00%) 374450.52 ( 173.52%) MB/sec add-16 157300.70 ( 0.00%) 564113.76 ( 258.62%) MB/sec triad-16 151446.88 ( 0.00%) 564304.24 ( 272.61%) STREAM can use directives to force the spread if the OpenMP is new enough but that doesn't help if an application uses threads and it's not known in advance how many threads will be created. Coremark is a CPU and cache intensive benchmark parallelised with threads. When running with 1 thread per core, the vanilla kernel allows threads to contend on cache. With the patch; 5.17.0-rc0 5.17.0-rc0 vanilla sched-numaimb-v5 Min Score-16 368239.36 ( 0.00%) 389816.06 ( 5.86%) Hmean Score-16 388607.33 ( 0.00%) 427877.08 * 10.11%* Max Score-16 408945.69 ( 0.00%) 481022.17 ( 17.62%) Stddev Score-16 15247.04 ( 0.00%) 24966.82 ( -63.75%) CoeffVar Score-16 3.92 ( 0.00%) 5.82 ( -48.48%) It can also make a big difference for semi-realistic workloads like specjbb which can execute arbitrary numbers of threads without advance knowledge of how they should be placed. Even in cases where the average performance is neutral, the results are more stable. 5.17.0-rc0 5.17.0-rc0 vanilla sched-numaimb-v6 Hmean tput-1 71631.55 ( 0.00%) 73065.57 ( 2.00%) Hmean tput-8 582758.78 ( 0.00%) 556777.23 ( -4.46%) Hmean tput-16 1020372.75 ( 0.00%) 1009995.26 ( -1.02%) Hmean tput-24 1416430.67 ( 0.00%) 1398700.11 ( -1.25%) Hmean tput-32 1687702.72 ( 0.00%) 1671357.04 ( -0.97%) Hmean tput-40 1798094.90 ( 0.00%) 2015616.46 * 12.10%* Hmean tput-48 1972731.77 ( 0.00%) 2333233.72 ( 18.27%) Hmean tput-56 2386872.38 ( 0.00%) 2759483.38 ( 15.61%) Hmean tput-64 2909475.33 ( 0.00%) 2925074.69 ( 0.54%) Hmean tput-72 2585071.36 ( 0.00%) 2962443.97 ( 14.60%) Hmean tput-80 2994387.24 ( 0.00%) 3015980.59 ( 0.72%) Hmean tput-88 3061408.57 ( 0.00%) 3010296.16 ( -1.67%) Hmean tput-96 3052394.82 ( 0.00%) 2784743.41 ( -8.77%) Hmean tput-104 2997814.76 ( 0.00%) 2758184.50 ( -7.99%) Hmean tput-112 2955353.29 ( 0.00%) 2859705.09 ( -3.24%) Hmean tput-120 2889770.71 ( 0.00%) 2764478.46 ( -4.34%) Hmean tput-128 2871713.84 ( 0.00%) 2750136.73 ( -4.23%) Stddev tput-1 5325.93 ( 0.00%) 2002.53 ( 62.40%) Stddev tput-8 6630.54 ( 0.00%) 10905.00 ( -64.47%) Stddev tput-16 25608.58 ( 0.00%) 6851.16 ( 73.25%) Stddev tput-24 12117.69 ( 0.00%) 4227.79 ( 65.11%) Stddev tput-32 27577.16 ( 0.00%) 8761.05 ( 68.23%) Stddev tput-40 59505.86 ( 0.00%) 2048.49 ( 96.56%) Stddev tput-48 168330.30 ( 0.00%) 93058.08 ( 44.72%) Stddev tput-56 219540.39 ( 0.00%) 30687.02 ( 86.02%) Stddev tput-64 121750.35 ( 0.00%) 9617.36 ( 92.10%) Stddev tput-72 223387.05 ( 0.00%) 34081.13 ( 84.74%) Stddev tput-80 128198.46 ( 0.00%) 22565.19 ( 82.40%) Stddev tput-88 136665.36 ( 0.00%) 27905.97 ( 79.58%) Stddev tput-96 111925.81 ( 0.00%) 99615.79 ( 11.00%) Stddev tput-104 146455.96 ( 0.00%) 28861.98 ( 80.29%) Stddev tput-112 88740.49 ( 0.00%) 58288.23 ( 34.32%) Stddev tput-120 186384.86 ( 0.00%) 45812.03 ( 75.42%) Stddev tput-128 78761.09 ( 0.00%) 57418.48 ( 27.10%) Similarly, for embarassingly parallel problems like NPB-ep, there are improvements due to better spreading across LLC when the machine is not fully utilised. vanilla sched-numaimb-v6 Min ep.D 31.79 ( 0.00%) 26.11 ( 17.87%) Amean ep.D 31.86 ( 0.00%) 26.17 * 17.86%* Stddev ep.D 0.07 ( 0.00%) 0.05 ( 24.41%) CoeffVar ep.D 0.22 ( 0.00%) 0.20 ( 7.97%) Max ep.D 31.93 ( 0.00%) 26.21 ( 17.91%) Signed-off-by: NMel Gorman <mgorman@techsingularity.net> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NGautham R. Shenoy <gautham.shenoy@amd.com> Tested-by: NK Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20220208094334.16379-3-mgorman@techsingularity.netSigned-off-by: NGuan Jing <guanjing6@huawei.com>
-
由 Mel Gorman 提交于
mainline inclusion from mainline-v5.18-rc1 commit 2cfb7a1b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I78WM8 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.4-rc3&id=2cfb7a1b031b0e816af7a6ee0c6ab83b0acdf05a -------------------------------- There are inconsistencies when determining if a NUMA imbalance is allowed that should be corrected. o allow_numa_imbalance changes types and is not always examining the destination group so both the type should be corrected as well as the naming. o find_idlest_group uses the sched_domain's weight instead of the group weight which is different to find_busiest_group o find_busiest_group uses the source group instead of the destination which is different to task_numa_find_cpu o Both find_idlest_group and find_busiest_group should account for the number of running tasks if a move was allowed to be consistent with task_numa_find_cpu Fixes: 7d2b5dd0 ("sched/numa: Allow a floating imbalance between NUMA nodes") Signed-off-by: NMel Gorman <mgorman@techsingularity.net> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NGautham R. Shenoy <gautham.shenoy@amd.com> Link: https://lore.kernel.org/r/20220208094334.16379-2-mgorman@techsingularity.netSigned-off-by: NGuan Jing <guanjing6@huawei.com>
-
由 Guan Jing 提交于
mainline inclusion from mainline-v5.17-rc2 commit 2d02fa8c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I78WM8 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.4-rc3&id=2d02fa8cc21a93da35cfba462bf8ab87bf2db651 -------------------------------- Similarly to util_avg and util_sum, don't sync load_sum with the low bound of load_avg but only ensure that load_sum stays in the correct range. Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: NSachin Sant <sachinp@linux.ibm.com> Link: https://lkml.kernel.org/r/20220111134659.24961-5-vincent.guittot@linaro.orgSigned-off-by: NGuan Jing <guanjing6@huawei.com>
-
由 Dave Chinner 提交于
mainline inclusion from mainline-v6.3-rc4 commit 8b57b11c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6VS35 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8b57b11cca88f397035a95b9e12b03511847b0e8 -------------------------------- In commit f689054a ("percpu_counter: add percpu_counter_sum_all interface") a race condition between a cpu dying and percpu_counter_sum() iterating online CPUs was identified. The solution was to iterate all possible CPUs for summation via percpu_counter_sum_all(). We recently had a percpu_counter_sum() call in XFS trip over this same race condition and it fired a debug assert because the filesystem was unmounting and the counter *should* be zero just before we destroy it. That was reported here: https://lore.kernel.org/linux-kernel/20230314090649.326642-1-yebin@huaweicloud.com/ likely as a result of running generic/648 which exercises filesystems in the presence of CPU online/offline events. The solution to use percpu_counter_sum_all() is an awful one. We use percpu counters and percpu_counter_sum() for accurate and reliable threshold detection for space management, so a summation race condition during these operations can result in overcommit of available space and that may result in filesystem shutdowns. As percpu_counter_sum_all() iterates all possible CPUs rather than just those online or even those present, the mask can include CPUs that aren't even installed in the machine, or in the case of machines that can hot-plug CPU capable nodes, even have physical sockets present in the machine. Fundamentally, this race condition is caused by the CPU being offlined being removed from the cpu_online_mask before the notifier that cleans up per-cpu state is run. Hence percpu_counter_sum() will not sum the count for a cpu currently being taken offline, regardless of whether the notifier has run or not. This is the root cause of the bug. The percpu counter notifier iterates all the registered counters, locks the counter and moves the percpu count to the global sum. This is serialised against other operations that move the percpu counter to the global sum as well as percpu_counter_sum() operations that sum the percpu counts while holding the counter lock. Hence the notifier is safe to run concurrently with sum operations, and the only thing we actually need to care about is that percpu_counter_sum() iterates dying CPUs. That's trivial to do, and when there are no CPUs dying, it has no addition overhead except for a cpumask_or() operation. This change makes percpu_counter_sum() always do the right thing in the presence of CPU hot unplug events and makes percpu_counter_sum_all() unnecessary. This, in turn, means that filesystems like XFS, ext4, and btrfs don't have to work out when they should use percpu_counter_sum() vs percpu_counter_sum_all() in their space accounting algorithms conflicts: lib/percpu_counter.c Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NZeng Heng <zengheng4@huawei.com>
-
由 Dave Chinner 提交于
mainline inclusion from mainline-v6.3-rc4 commit 1470afef category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6VS35 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1470afefc3c42df5d1662f87d079b46651bdc95b -------------------------------- Equivalent of for_each_cpu_and, except it ORs the two masks together so it iterates all the CPUs present in either mask. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NZeng Heng <zengheng4@huawei.com>
-
由 Yury Norov 提交于
mainline inclusion from mainline-v5.13-rc1 commit 586eaebe category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6VS35 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=586eaebea5988302c5a8b018096dd6c6f4564940 -------------------------------- find_bit would also benefit from small_const_nbits() optimizations. The detailed comment is provided by Rasmus Villemoes. Link: https://lkml.kernel.org/r/20210401003153.97325-6-yury.norov@gmail.comSuggested-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: NYury Norov <yury.norov@gmail.com> Acked-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: NAndy Shevchenko <andy.shevchenko@gmail.com> Cc: Alexey Klimov <aklimov@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Sterba <dsterba@suse.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Jianpeng Ma <jianpeng.ma@intel.com> Cc: Joe Perches <joe@perches.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Rich Felker <dalias@libc.org> Cc: Stefano Brivio <sbrivio@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Wolfram Sang <wsa+renesas@sang-engineering.com> Cc: Yoshinori Sato <ysato@users.osdn.me> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NZeng Heng <zengheng4@huawei.com>
-
由 Peter Zijlstra 提交于
mainline inclusion from mainline-v5.13-rc1 commit e40f74c5 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6VS35 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e40f74c535b8a0ecf3ef0388b51a34cdadb34fb5 -------------------------------- Introduce a cpumask that indicates (for each CPU) what direction the CPU hotplug is currently going. Notably, it tracks rollbacks. Eg. when an up fails and we do a roll-back down, it will accurately reflect the direction. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NValentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210310150109.151441252@infradead.orgSigned-off-by: NZeng Heng <zengheng4@huawei.com>
-
由 Vincent Guittot 提交于
mainline inclusion from mainline-v5.17-rc2 commit 95246d1e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I78WM8 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.4-rc3&id=95246d1ec80b8d19d882cd8eb7ad094e63b41bb8 -------------------------------- Similarly to util_avg and util_sum, don't sync runnable_sum with the low bound of runnable_avg but only ensure that runnable_sum stays in the correct range. Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: NSachin Sant <sachinp@linux.ibm.com> Link: https://lkml.kernel.org/r/20220111134659.24961-4-vincent.guittot@linaro.orgSigned-off-by: NGuan Jing <guanjing6@huawei.com>
-
由 Vincent Guittot 提交于
mainline inclusion from mainline-v5.17-rc2 commit 7ceb7710 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I78WM8 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.4-rc3&id=7ceb77103001544a43e11d7f3a8a69a2c1f422cf -------------------------------- Rick reported performance regressions in bugzilla because of cpu frequency being lower than before: https://bugzilla.kernel.org/show_bug.cgi?id=215045 He bisected the problem to: commit 1c35b07e ("sched/fair: Ensure _sum and _avg values stay consistent") This commit forces util_sum to be synced with the new util_avg after removing the contribution of a task and before the next periodic sync. By doing so util_sum is rounded to its lower bound and might lost up to LOAD_AVG_MAX-1 of accumulated contribution which has not yet been reflected in util_avg. update_tg_cfs_util() is not the only place where we round util_sum and lost some accumulated contributions that are not already reflected in util_avg. Modify update_tg_cfs_util() and detach_entity_load_avg() to not sync util_sum with the new util_avg. Instead of always setting util_sum to the low bound of util_avg, which can significantly lower the utilization, we propagate the difference. In addition, we also check that cfs's util_sum always stays above the lower bound for a given util_avg as it has been observed that sched_entity's util_sum is sometimes above cfs one. Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: NSachin Sant <sachinp@linux.ibm.com> Link: https://lkml.kernel.org/r/20220111134659.24961-3-vincent.guittot@linaro.orgSigned-off-by: NGuan Jing <guanjing6@huawei.com>
-
由 Weili Qian 提交于
driver inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I773SD CVE: NA ---------------------------------------------------------------------- support no-sva feature. Signed-off-by: NWeili Qian <qianweili@huawei.com> Signed-off-by: NJiangShui Yang <yangjiangshui@h-partners.com>
-
由 Kai Ye 提交于
driver inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I773SD CVE: NA ---------------------------------------------------------------------- 1. UACCE_MODE_NOIOMMU for warpdrive. 2. some dfx logs 3. fix some static checking. Signed-off-by: NKai Ye <yekai13@huawei.com> Signed-off-by: NJiangShui Yang <yangjiangshui@h-partners.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @hejunhao3 Some HiSilicon SMMU PMCG suffers the erratum that the global PMU disable control sometimes fail to disable each used the counters. This will lead to error or inaccurate data since before we enable the counters the counter's still counting for the event used in last perf session. This patch tries to fix this by hardening the global disable process. Before disable the PMU, writing an invalid event type (0xff) to focibly stop the counters. Link:https://gitee.com/openeuler/kernel/pulls/851 Reviewed-by: Yang Shen <shenyang39@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @Hongchen_Zhang PV IPI with less VM exit can improve performance than iocsr emulation when sending ipi interrupt to multi cpus. This patch supports: 1. sending PV IPI by hypercall 2. recording and updating steal time for VM Link:https://gitee.com/openeuler/kernel/pulls/793 Reviewed-by: Guo Dongtai <guodongtai@kylinos.cn> Reviewed-by: Kevin Zhu <zhukeqian1@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @jiayingbao there is new sysfs interface current_freq_khz added in upstream. Backport related patches for uncore-freq driver. below 3 patches is backported: ae7b2ce5 platform/x86/intel/uncore-freq: Use sysfs API to create. 414eef27 platform/x86/intel/uncore-freq: Display uncore current frequency. 8d75f7b4 platform/x86: intel-uncore-freq: Prevent driver loading in guests. test: build pass new sysfs current_freq_khz functional as expected Link:https://gitee.com/openeuler/kernel/pulls/840 Reviewed-by: Jason Zeng <jason.zeng@intel.com> Reviewed-by: Aichun Shi <aichun.shi@intel.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @jiayingbao backport some patches for Intel_pstate no-HWP and OOB mode, and also one HWP update for all server platforms including below commit: cd23f02f cpufreq: intel_pstate: Add Ice Lake server to out-of-band IDs addca285 cpufreq: intel_pstate: Handle no_turbo in frequency invariance. bbd67f1b cpufreq: intel_pstate: Support Sapphire Rapids OOB mode. df51f287 cpufreq: intel_pstate: Add Sapphire Rapids support in no-HWP mode. 1f5e62f5 cpufreq: intel_pstate: Enable HWP IO boost for all servers. test: build pass intel_pstate initial and functional as expected Link:https://gitee.com/openeuler/kernel/pulls/839 Reviewed-by: Jason Zeng <jason.zeng@intel.com> Reviewed-by: Aichun Shi <aichun.shi@intel.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @Hongchen_Zhang enable memory and pci hotplug related configs for LoongArch, used by LoongArch virtual machine now. Link:https://gitee.com/openeuler/kernel/pulls/809 Reviewed-by: Guo Dongtai <guodongtai@kylinos.cn> Reviewed-by: Liu Chao <liuchao173@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
- 29 5月, 2023 4 次提交
-
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @leoliu-oc Recent Zhaoxin/Centaur CPUs support X86_FEATURE_IDA and the turbo boost can be dynamically enabled or disabled through MSR 0x1a0[38] in the same way as Intel. So add turbo boost control support for these CPUs too. Issue https://gitee.com/openeuler/kernel/issues/I6SKVN Link:https://gitee.com/openeuler/kernel/pulls/547 Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @leoliu-oc Zhaoxin CPUs which use CENTAUR as vendor id, also have NONSTOP TSC feature. So enable the ACPI driver support for it too. Issue https://gitee.com/openeuler/kernel/issues/I6SJQ8 Link:https://gitee.com/openeuler/kernel/pulls/544 Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @hejunhao3 1. Added support for HiSilicon T6 ETM 2. Fix CPU hold issue caused by hip09 ETM overflow Link:https://gitee.com/openeuler/kernel/pulls/848 Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
由 Yicong Yang 提交于
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I798Y2 CVE: NA ---------------------------------------------------------------------- Some HiSilicon SMMU PMCG suffers the erratum that the global PMU disable control sometimes fail to disable each used the counters. This will lead to error or inaccurate data since before we enable the counters the counter's still counting for the event used in last perf session. This patch tries to fix this by hardening the global disable process. Before disable the PMU, writing an invalid event type (0xff) to focibly stop the counters. Signed-off-by: NYicong Yang <yangyicong@hisilicon.com> Signed-off-by: NJunhao He <hejunhao3@huawei.com>
-