- 30 6月, 2023 1 次提交
-
-
由 Hui Tang 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7HFZV CVE: NA ---------------------------------------- There may be NULL pointer derefrence when hotplug running and creating taskgroup concurrently. sched_autogroup_create_attach -> sched_create_group -> alloc_fair_sched_group -> init_auto_affinity -> init_affinity_domains -> cpumask_copy(xx, sched_domain_span(tmp)) { tmp may be free due rcu lock missing } { hotplug will rebuild sched domain } sched_cpu_activate -> build_sched_domains -> cpuset_cpu_active -> partition_sched_domains -> build_sched_domains -> cpu_attach_domain -> destroy_sched_domains -> call_rcu(&sd->rcu, destroy_sched_domains_rcu) So sd should be protect with rcu lock in entire critical zone. [ 599.811593] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 600.112821] pc : init_affinity_domains+0xf4/0x200 [ 600.125918] lr : init_affinity_domains+0xd4/0x200 [ 600.331355] Call trace: [ 600.338734] init_affinity_domains+0xf4/0x200 [ 600.347955] init_auto_affinity+0x78/0xc0 [ 600.356622] alloc_fair_sched_group+0xd8/0x210 [ 600.365594] sched_create_group+0x48/0xc0 [ 600.373970] sched_autogroup_create_attach+0x54/0x190 [ 600.383311] ksys_setsid+0x110/0x130 [ 600.391014] __arm64_sys_setsid+0x18/0x24 [ 600.399156] el0_svc_common+0x118/0x170 [ 600.406818] el0_svc_handler+0x3c/0x80 [ 600.414188] el0_svc+0x8/0x640 [ 600.420719] Code: b40002c0 9104e002 f9402061 a9401444 (a9001424) [ 600.430504] SMP: stopping secondary CPUs [ 600.441751] Starting crashdump kernel... Fixes: 713cfd26 ("sched: Introduce smart grid scheduling strategy for cfs") Signed-off-by: NHui Tang <tanghui20@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com> Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
-
- 25 6月, 2023 2 次提交
-
-
由 Hui Tang 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7FBJM CVE: NA ---------------------------------------- Free ad->domains_orig[] in 'free_affinity_domains', otherwise the memory will leak. Fixes: 713cfd26 ("sched: Introduce smart grid scheduling strategy for cfs") Signed-off-by: NHui Tang <tanghui20@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com> Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
-
由 Hui Tang 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7F7KV CVE: NA ------------------------------- Delete redundant updates to p->prefer_cpus when smart grid used. Add missed check for p->prefer_cpus when !CONFIG_QOS_SCHED_SMART_GRID. Fixes: 21e5d85e ("sched: Fix possible deadlock in tg_set_dynamic_affinity_mode") Signed-off-by: NHui Tang <tanghui20@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com> Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
-
- 20 6月, 2023 3 次提交
-
-
由 Hui Tang 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7EBNA CVE: NA ------------------------------- Fix memory leak on error branch for smart grid. Fixes: 713cfd26 ("sched: Introduce smart grid scheduling strategy for cfs") Signed-off-by: NHui Tang <tanghui20@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com> Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
-
由 Hui Tang 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7EA1X CVE: NA ------------------------------- tg->auto_affinity is NULL if init_auto_affinity() failed. So add checking for tg->auto_affinity before derefrence. Fixes: 713cfd26 ("sched: Introduce smart grid scheduling strategy for cfs") Signed-off-by: NHui Tang <tanghui20@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com> Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
-
由 Hui Tang 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7DSX6 CVE: NA ------------------------------- Timer storm may be triggered if !cpumask_weight(ad->domains[i]) which is set in cpu offline. Fixes: 713cfd26 ("sched: Introduce smart grid scheduling strategy for cfs") Signed-off-by: NHui Tang <tanghui20@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com> Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
-
- 15 6月, 2023 5 次提交
-
-
由 Hui Tang 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7DA63 CVE: NA -------------------------------- Add mutex lock to prevent negative count for jump label. [28612.530675] ------------[ cut here ]------------ [28612.532708] jump label: negative count! [28612.535031] WARNING: CPU: 4 PID: 3899 at kernel/jump_label.c:202 __static_key_slow_dec_cpuslocked+0x204/0x240 [28612.538216] Kernel panic - not syncing: panic_on_warn set ... [28612.538216] [28612.540487] CPU: 4 PID: 3899 Comm: sh Kdump: loaded Not tainted [28612.542788] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) [28612.546455] Call Trace: [28612.547339] dump_stack+0xc6/0x11e [28612.548546] ? __static_key_slow_dec_cpuslocked+0x200/0x240 [28612.550352] panic+0x1d6/0x46b [28612.551375] ? refcount_error_report+0x2a5/0x2a5 [28612.552915] ? kmsg_dump_rewind_nolock+0xde/0xde [28612.554358] ? sched_clock_cpu+0x18/0x1b0 [28612.555699] ? __warn+0x1d1/0x210 [28612.556799] ? __static_key_slow_dec_cpuslocked+0x204/0x240 [28612.558548] __warn+0x1ec/0x210 [28612.559621] ? __static_key_slow_dec_cpuslocked+0x204/0x240 [28612.561536] report_bug+0x1ee/0x2b0 [28612.562706] fixup_bug.part.4+0x37/0x80 [28612.563937] do_error_trap+0x21c/0x260 [28612.565109] ? fixup_bug.part.4+0x80/0x80 [28612.566453] ? check_preemption_disabled+0x34/0x1f0 [28612.567991] ? trace_hardirqs_off_thunk+0x1a/0x1c [28612.569534] ? lockdep_hardirqs_off+0x1cb/0x2b0 [28612.570993] ? error_entry+0x9a/0x130 [28612.572138] ? trace_hardirqs_off_caller+0x59/0x1a0 [28612.573710] ? trace_hardirqs_off_thunk+0x1a/0x1c [28612.575232] invalid_op+0x14/0x20 [root@lo[ca2lh8ost6 12.576387] ? vprintk_func+0x68/0x1a0 [28612.577827] ? __static_key_slow_dec_cpuslocked+0x204/0x240 smartg[ri2d]8# 612.579662] ? __static_key_slow_dec_cpuslocked+0x204/0x240 [28612.581781] ? static_key_disable+0x30/0x30 [28612.583248] ? s tatic_key_slow_dec+0x57/0x90 [28612.584997] ? tg_set_dynamic_affinity_mode+0x42/0x70 [28612.586714] ? cgroup_file_write+0x471/0x6a0 [28612.588162] ? cgroup_css.part.4+0x100/0x100 [28612.589579] ? cgroup_css.part.4+0x100/0x100 [28612.591031] ? kernfs_fop_write+0x2af/0x430 [28612.592625] ? kernfs_vma_page_mkwrite+0x230/0x230 [28612.594274] ? __vfs_write+0xef/0x680 [28612.595590] ? kernel_read+0x110/0x110 ea8612.596899] ? check_preemption_disabled+0x3mkd4ir/: 0canxno1t fcr0 Signed-off-by: NHui Tang <tanghui20@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com> Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
-
由 Hui Tang 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7CGD0 CVE: NA ---------------------------------------- Deadlock occurs in two situations as follows: The first case: tg_set_dynamic_affinity_mode --- raw_spin_lock_irq(&auto_affi->lock); ->start_auto_affintiy --- trigger timer ->tg_update_task_prefer_cpus >css_task_inter_next ->raw_spin_unlock_irq hr_timer_run_queues ->sched_auto_affi_period_timer --- try spin lock (&auto_affi->lock) The second case as follows: [ 291.470810] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 291.472715] rcu: 1-...0: (0 ticks this GP) idle=a6a/1/0x4000000000000002 softirq=78516/78516 fqs=5249 [ 291.475268] rcu: (detected by 6, t=21006 jiffies, g=202169, q=9862) [ 291.477038] Sending NMI from CPU 6 to CPUs 1: [ 291.481268] NMI backtrace for cpu 1 [ 291.481273] CPU: 1 PID: 1923 Comm: sh Kdump: loaded Not tainted 4.19.90+ #150 [ 291.481278] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 [ 291.481281] RIP: 0010:queued_spin_lock_slowpath+0x136/0x9a0 [ 291.481289] Code: c0 74 3f 49 89 dd 48 89 dd 48 b8 00 00 00 00 00 fc ff df 49 c1 ed 03 83 e5 07 49 01 c5 83 c5 03 48 83 05 c4 66 b9 05 01 f3 90 <41> 0f b6 45 00 40 38 c5 7c 08 84 c0 0f 85 ad 07 00 00 0 [ 291.481292] RSP: 0018:ffff88801de87cd8 EFLAGS: 00000002 [ 291.481297] RAX: 0000000000000101 RBX: ffff888001be0a28 RCX: ffffffffb8090f7d [ 291.481301] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff888001be0a28 [ 291.481304] RBP: 0000000000000003 R08: ffffed100037c146 R09: ffffed100037c146 [ 291.481307] R10: 000000001106b143 R11: ffffed100037c145 R12: 1ffff11003bd0f9c [ 291.481311] R13: ffffed100037c145 R14: fffffbfff7a38dee R15: dffffc0000000000 [ 291.481315] FS: 00007fac4f306740(0000) GS:ffff88801de80000(0000) knlGS:0000000000000000 [ 291.481318] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 291.481321] CR2: 00007fac4f4bb650 CR3: 00000000046b6000 CR4: 00000000000006e0 [ 291.481323] Call Trace: [ 291.481324] <IRQ> [ 291.481326] ? osq_unlock+0x2a0/0x2a0 [ 291.481329] ? check_preemption_disabled+0x4c/0x290 [ 291.481331] ? rcu_accelerate_cbs+0x33/0xed0 [ 291.481333] _raw_spin_lock_irqsave+0x83/0xa0 [ 291.481336] sched_auto_affi_period_timer+0x251/0x820 [ 291.481338] ? __remove_hrtimer+0x151/0x200 [ 291.481340] __hrtimer_run_queues+0x39d/0xa50 [ 291.481343] ? tg_update_affinity_domain_down+0x460/0x460 [ 291.481345] ? enqueue_hrtimer+0x2e0/0x2e0 [ 291.481348] ? ktime_get_update_offsets_now+0x1d7/0x2c0 [ 291.481350] hrtimer_run_queues+0x243/0x470 [ 291.481352] run_local_timers+0x5e/0x150 [ 291.481354] update_process_times+0x36/0xb0 [ 291.481357] tick_sched_handle.isra.4+0x7c/0x180 [ 291.481359] tick_nohz_handler+0xd1/0x1d0 [ 291.481365] smp_apic_timer_interrupt+0x12c/0x4e0 [ 291.481368] apic_timer_interrupt+0xf/0x20 [ 291.481370] </IRQ> [ 291.481372] ? smp_call_function_many+0x68c/0x840 [ 291.481375] ? smp_call_function_many+0x6ab/0x840 [ 291.481377] ? arch_unregister_cpu+0x60/0x60 [ 291.481379] ? native_set_fixmap+0x100/0x180 [ 291.481381] ? arch_unregister_cpu+0x60/0x60 [ 291.481384] ? set_task_select_cpus+0x116/0x940 [ 291.481386] ? smp_call_function+0x53/0xc0 [ 291.481388] ? arch_unregister_cpu+0x60/0x60 [ 291.481390] ? on_each_cpu+0x49/0xf0 [ 291.481393] ? set_task_select_cpus+0x115/0x940 [ 291.481395] ? text_poke_bp+0xff/0x180 [ 291.481397] ? poke_int3_handler+0xc0/0xc0 [ 291.481400] ? __set_prefer_cpus_ptr.constprop.4+0x1cd/0x900 [ 291.481402] ? hrtick+0x1b0/0x1b0 [ 291.481404] ? set_task_select_cpus+0x115/0x940 [ 291.481407] ? __jump_label_transform.isra.0+0x3a1/0x470 [ 291.481409] ? kernel_init+0x280/0x280 [ 291.481411] ? kasan_check_read+0x1d/0x30 [ 291.481413] ? mutex_lock+0x96/0x100 [ 291.481415] ? __mutex_lock_slowpath+0x30/0x30 [ 291.481418] ? arch_jump_label_transform+0x52/0x80 [ 291.481420] ? set_task_select_cpus+0x115/0x940 [ 291.481422] ? __jump_label_update+0x1a1/0x1e0 [ 291.481424] ? jump_label_update+0x2ee/0x3b0 [ 291.481427] ? static_key_slow_inc_cpuslocked+0x1c8/0x2d0 [ 291.481430] ? start_auto_affinity+0x190/0x200 [ 291.481432] ? tg_set_dynamic_affinity_mode+0xad/0xf0 [ 291.481435] ? cpu_affinity_mode_write_u64+0x22/0x30 [ 291.481437] ? cgroup_file_write+0x46f/0x660 [ 291.481439] ? cgroup_init_cftypes+0x300/0x300 [ 291.481441] ? __mutex_lock_slowpath+0x30/0x30 Signed-off-by: NHui Tang <tanghui20@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com> Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
-
由 Hui Tang 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I7BQZ0 CVE: NA ---------------------------------------- The WARNING report when run: echo 1 > /sys/fs/cgroup/cpu/cpu.dynamic_affinity_mode [ 147.276757] WARNING: CPU: 5 PID: 1770 at kernel/cpu.c:326 \ lockdep_assert_cpus_held+0xac/0xd0 [ 147.279670] Kernel panic - not syncing: panic_on_warn set ... [ 147.279670] [ 147.282211] CPU: 5 PID: 1770 Comm: bash Kdump: loaded Not tainted 4.19 [ 147.284796] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996).. [ 147.290963] Call Trace: [ 147.292459] dump_stack+0xc6/0x11e [ 147.294295] ? lockdep_assert_cpus_held+0xa0/0xd0 [ 147.296876] panic+0x1d6/0x46b [ 147.298591] ? refcount_error_report+0x2a5/0x2a5 [ 147.301131] ? kmsg_dump_rewind_nolock+0xde/0xde [ 147.303738] ? sched_clock_cpu+0x18/0x1b0 [ 147.305943] ? __warn+0x1d1/0x210 [ 147.307831] ? lockdep_assert_cpus_held+0xac/0xd0 [ 147.310469] __warn+0x1ec/0x210 [ 147.312271] ? lockdep_assert_cpus_held+0xac/0xd0 [ 147.314838] report_bug+0x1ee/0x2b0 [ 147.316798] fixup_bug.part.4+0x37/0x80 [ 147.318946] do_error_trap+0x21c/0x260 [ 147.321062] ? fixup_bug.part.4+0x80/0x80 [ 147.323253] ? check_preemption_disabled+0x34/0x1f0 [ 147.324886] ? trace_hardirqs_off_thunk+0x1a/0x1c [ 147.326277] ? lockdep_hardirqs_off+0x1cb/0x2b0 [ 147.327505] ? error_entry+0x9a/0x130 [ 147.328523] ? trace_hardirqs_off_caller+0x59/0x1a0 [ 147.329844] ? trace_hardirqs_off_thunk+0x1a/0x1c [ 147.331124] invalid_op+0x14/0x20 [ 147.332057] ? vprintk_func+0x68/0x1a0 [ 147.333082] ? lockdep_assert_cpus_held+0xac/0xd0 [ 147.334355] ? lockdep_assert_cpus_held+0xac/0xd0 [ 147.335624] ? static_key_slow_inc_cpuslocked+0x5a/0x230 [ 147.337079] ? tg_set_dynamic_affinity_mode+0x4f/0x70 [ 147.338444] ? cgroup_file_write+0x471/0x6a0 [ 147.339604] ? cgroup_css.part.4+0x100/0x100 [ 147.340782] ? cgroup_css.part.4+0x100/0x100 [ 147.341943] ? kernfs_fop_write+0x2af/0x430 [ 147.343083] ? kernfs_vma_page_mkwrite+0x230/0x230 [ 147.344401] ? __vfs_write+0xef/0x680 [ 147.345404] ? kernel_read+0x110/0x110 Signed-off-by: NHui Tang <tanghui20@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com> Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
-
由 Hui Tang 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7D98G CVE: NA ---------------------------------------- smart_grid_usage_dec() should called when free taskgroup if the mode is auto. Signed-off-by: NHui Tang <tanghui20@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com> Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
-
由 Hui Tang 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7A718 -------------------------------- Add static key to reduce noise when not enable dynamic affinity. There are better performance in some case, such for lmbench. Fixes: 243865da ("cpuset: Introduce new interface for scheduler ...") Signed-off-by: NHui Tang <tanghui20@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com> Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
-
- 09 6月, 2023 2 次提交
-
-
由 Wang ShaoBo 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I7BQZ0 CVE: NA ---------------------------------------- As smart grid scheduling (SGS) may shrink resources and affect task QOS, We provide methods for evaluating task QOS in divided grid, we mainly focus on the following two aspects: 1. Evaluate whether (such as CPU or memory) resources meet our demand 2. Ensure the least impact when working with (cpufreq and cpuidle) governors For tackling this questions, we have summarized several sampling methods to obtain tasks' characteristics at same time reducing scheduling noise as much as possible: 1. we detected the key factors that how sensitive a process is in cpufreq or cpuidle adjustment, and to guide the cpufreq/cpuidle governor 2. We dynamically monitor process memory bandwidth and adjust memory allocation to minimize cross-remote memory access 3. We provide a variety of load tracking mechanisms to adapt to different types of task's load change --------------------------------- ----------------- | class A | | class B | | -------- -------- | | -------- | | | group0 | | group1 | |---| | group2 | |----------+ | -------- -------- | | -------- | | | CPU/memory sensitive type | | balance type | | ----------------+---------------- --------+-------- | v v | (target cpufreq) ------------------------------------------------------- | (sensitivity) | Not satisfied with QOS? | | --------------------------+---------------------------- | v v ------------------------------------------------------- ---------------- | expand or shrink resource |<--| energy model | ----------------------------+-------------------------- ---------------- v | ----------- ----------- ------------ v | | | | | | --------------- | GRID0 +--------+ GRID1 +--------+ GRID2 |<-- | governor | | | | | | | --------------- ----------- ----------- ------------ \ | / \ ------------------- / | pages migration | ------------------- We will introduce the energy model in the follow-up implementation, and consider the dynamic affinity adjustment between each divided grid in the runtime. Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
-
由 Hui Tang 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I7BQZ0 CVE: NA ---------------------------------------- We want to dynamically expand or shrink the affinity range of tasks based on the CPU topology level while meeting the minimum resource requirements of tasks. We divide several level of affinity domains according to sched domains: level4 * SOCKET [ ] level3 * DIE [ ] level2 * MC [ ] [ ] level1 * SMT [ ] [ ] [ ] [ ] level0 * CPU 0 1 2 3 4 5 6 7 Whether users tend to choose power saving or performance will affect strategy of adjusting affinity, when selecting the power saving mode, we will choose a more appropriate affinity based on the energy model to reduce power consumption, while considering the QOS of resources such as CPU and memory consumption, for instance, if the current task CPU load is less than required, smart grid will judge whether to aggregate tasks together into a smaller range or not according to energy model. The main difference from EAS is that we pay more attention to the impact of power consumption brought by such as cpuidle and DVFS, and classify tasks to reduce interference and ensure resource QOS in each divided unit, which are more suitable for general-purpose on non-heterogeneous CPUs. -------- -------- -------- | group0 | | group1 | | group2 | -------- -------- -------- | | | v | v ---------------------+----- ----------------- | ---v-- | | | DIE0 | MC1 | | | DIE1 | ------ | | --------------------------- ----------------- We regularly count the resource satisfaction of groups, and adjust the affinity, scheduling balance and migrating memory will be considered based on memory location for better meetting resource requirements. Signed-off-by: NHui Tang <tanghui20@huawei.com> Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com> Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
-
- 06 4月, 2023 3 次提交
-
-
由 Vincent Guittot 提交于
mainline inclusion from mainline-v6.3-rc4 commit a53ce18c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TE76 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=a53ce18cacb477dd0513c607f187d16f0fa96f71 -------------------------------- Commit 829c1651 ("sched/fair: sanitize vruntime of entity being placed") fixes an overflowing bug, but ignore a case that se->exec_start is reset after a migration. For fixing this case, we delay the reset of se->exec_start after placing the entity which se->exec_start to detect long sleeping task. In order to take into account a possible divergence between the clock_task of 2 rqs, we increase the threshold to around 104 days. Fixes: 829c1651 ("sched/fair: sanitize vruntime of entity being placed") Originally-by: NZhang Qiao <zhangqiao22@huawei.com> Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Tested-by: NZhang Qiao <zhangqiao22@huawei.com> Link: https://lore.kernel.org/r/20230317160810.107988-1-vincent.guittot@linaro.orgSigned-off-by: NSongping Yu <yusongping@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Zhang Qiao 提交于
mainline inclusion from mainline-v6.3-rc4 commit 829c1651 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TE76 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=829c1651e9c4a6f78398d3e67651cef9bb6b42cc -------------------------------- When a scheduling entity is placed onto cfs_rq, its vruntime is pulled to the base level (around cfs_rq->min_vruntime), so that the entity doesn't gain extra boost when placed backwards. However, if the entity being placed wasn't executed for a long time, its vruntime may get too far behind (e.g. while cfs_rq was executing a low-weight hog), which can inverse the vruntime comparison due to s64 overflow. This results in the entity being placed with its original vruntime way forwards, so that it will effectively never get to the cpu. To prevent that, ignore the vruntime of the entity being placed if it didn't execute for much longer than the characteristic sheduler time scale. [rkagan: formatted, adjusted commit log, comments, cutoff value] Signed-off-by: NZhang Qiao <zhangqiao22@huawei.com> Co-developed-by: NRoman Kagan <rkagan@amazon.de> Signed-off-by: NRoman Kagan <rkagan@amazon.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230130122216.3555094-1-rkagan@amazon.deSigned-off-by: NSongping Yu <yusongping@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com> Reviewed-by: Nchenhui <judy.chenhui@huawei.com> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Songping Yu 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TE76 CVE: NA -------------------------------- This reverts commit 3a44e838. Signed-off-by: NSongping Yu <yusongping@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com> Reviewed-by: Nchenhui <judy.chenhui@huawei.com> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
- 24 12月, 2022 1 次提交
-
-
由 Zhang Qiao 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I67BL1 CVE: NA ------------------------------- If a task sleep for long time, it maybe cause a s64 overflow issue at max_vruntime() and the task will be set an incorrect vruntime, lead to the task be starve. For fix it, we set the task's vruntime as cfs_rq->min_vruntime when wakeup. Signed-off-by: NZhang Qiao <zhangqiao22@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Reviewed-by: Nsongping yu <yusongping@huawei.com> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
- 16 12月, 2022 1 次提交
-
-
由 Zhang Qiao 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I64OUS CVE: NA ------------------------------- When a cfs_rq throttled by qos, mark cfs_rq->throttled as 1, and cfs bw will unthrottled this cfs_rq by mistake, it cause a list_del_valid warning. So add macro QOS_THROTTLED(=2), when a cfs_rq is throttled by qos, we mark the cfs_rq->throttled as QOS_THROTTLED, will check the value of cfs_rq->throttled before unthrottle a cfs_rq. Signed-off-by: NZhang Qiao <zhangqiao22@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Reviewed-by: Nzheng zucheng <zhengzucheng@huawei.com> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
- 15 8月, 2022 1 次提交
-
-
由 Hui Tang 提交于
hulk inclusion category: bugfix bugzilla: 187419, https://gitee.com/openeuler/kernel/issues/I5LIPL CVE: NA ------------------------------- do_el0_svc+0x50/0x11c arch/arm64/kernel/syscall.c:217 el0_svc+0x20/0x30 arch/arm64/kernel/entry-common.c:353 el0_sync_handler+0xe4/0x1e0 arch/arm64/kernel/entry-common.c:369 el0_sync+0x148/0x180 arch/arm64/kernel/entry.S:683 ================================================================== BUG: KASAN: null-ptr-deref in rq_of kernel/sched/sched.h:1118 [inline] BUG: KASAN: null-ptr-deref in unthrottle_qos_sched_group kernel/sched/fair.c:7619 [inline] BUG: KASAN: null-ptr-deref in free_fair_sched_group+0x124/0x320 kernel/sched/fair.c:12131 Read of size 8 at addr 0000000000000130 by task syz-executor100/223 CPU: 3 PID: 223 Comm: syz-executor100 Not tainted 5.10.0 #6 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x40c arch/arm64/kernel/stacktrace.c:132 show_stack+0x30/0x40 arch/arm64/kernel/stacktrace.c:196 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b4/0x248 lib/dump_stack.c:118 __kasan_report mm/kasan/report.c:551 [inline] kasan_report+0x18c/0x210 mm/kasan/report.c:564 check_memory_region_inline mm/kasan/generic.c:187 [inline] __asan_load8+0x98/0xc0 mm/kasan/generic.c:253 rq_of kernel/sched/sched.h:1118 [inline] unthrottle_qos_sched_group kernel/sched/fair.c:7619 [inline] free_fair_sched_group+0x124/0x320 kernel/sched/fair.c:12131 sched_free_group kernel/sched/core.c:7767 [inline] sched_create_group+0x48/0xc0 kernel/sched/core.c:7798 cpu_cgroup_css_alloc+0x18/0x40 kernel/sched/core.c:7930 css_create+0x7c/0x4a0 kernel/cgroup/cgroup.c:5328 cgroup_apply_control_enable+0x288/0x340 kernel/cgroup/cgroup.c:3135 cgroup_apply_control kernel/cgroup/cgroup.c:3217 [inline] cgroup_subtree_control_write+0x668/0x8b0 kernel/cgroup/cgroup.c:3375 cgroup_file_write+0x1a8/0x37c kernel/cgroup/cgroup.c:3909 kernfs_fop_write_iter+0x220/0x2f4 fs/kernfs/file.c:296 call_write_iter include/linux/fs.h:1960 [inline] new_sync_write+0x260/0x370 fs/read_write.c:515 vfs_write+0x3dc/0x4ac fs/read_write.c:602 ksys_write+0xfc/0x200 fs/read_write.c:655 __do_sys_write fs/read_write.c:667 [inline] __se_sys_write fs/read_write.c:664 [inline] __arm64_sys_write+0x50/0x60 fs/read_write.c:664 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] el0_svc_common.constprop.0+0xf4/0x414 arch/arm64/kernel/syscall.c:155 do_el0_svc+0x50/0x11c arch/arm64/kernel/syscall.c:217 el0_svc+0x20/0x30 arch/arm64/kernel/entry-common.c:353 el0_sync_handler+0xe4/0x1e0 arch/arm64/kernel/entry-common.c:369 el0_sync+0x148/0x180 arch/arm64/kernel/entry.S:683 So add check for tg->cfs_rq[i] before unthrottle_qos_sched_group() called. Signed-off-by: NHui Tang <tanghui20@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
- 21 7月, 2022 3 次提交
-
-
由 Hui Tang 提交于
hulk inclusion category: feature bugzilla: 187173, https://gitee.com/openeuler/kernel/issues/I5G4IH CVE: NA -------------------------------- Signed-off-by: NHui Tang <tanghui20@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Hui Tang 提交于
hulk inclusion category: feature bugzilla: 187173, https://gitee.com/openeuler/kernel/issues/I5G4IH CVE: NA -------------------------------- Signed-off-by: NHui Tang <tanghui20@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Hui Tang 提交于
hulk inclusion category: feature bugzilla: 187173, https://gitee.com/openeuler/kernel/issues/I5G4IH CVE: NA -------------------------------- Compare taskgroup 'util_avg' in perferred cpu with capacity preferred cpu, dynamicly adjust cpu range for task wakeup process. Signed-off-by: NHui Tang <tanghui20@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
- 01 6月, 2022 2 次提交
-
-
由 Phil Auld 提交于
mainline inclusion from mainline-v5.r7-rc6 commit b34cb07d category: bugfix bugzilla: 91404, https://gitee.com/openeuler/kernel/issues/I59VLJ CVE: NA -------------------------------- The recent patch, fe61468b (sched/fair: Fix enqueue_task_fair warning) did not fully resolve the issues with the rq->tmp_alone_branch != &rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where the first for_each_sched_entity loop exits due to on_rq, having incompletely updated the list. In this case the second for_each_sched_entity loop can further modify se. The later code to fix up the list management fails to do what is needed because se does not point to the sched_entity which broke out of the first loop. The list is not fixed up because the throttled parent was already added back to the list by a task enqueue in a parallel child hierarchy. Address this by calling list_add_leaf_cfs_rq if there are throttled parents while doing the second for_each_sched_entity loop. Fixes: fe61468b ("sched/fair: Fix enqueue_task_fair warning") Suggested-by: NVincent Guittot <vincent.guittot@linaro.org> Signed-off-by: NPhil Auld <pauld@redhat.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com> Reviewed-by: NVincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20200512135222.GC2201@lorien.usersys.redhat.comSigned-off-by: NHui Tang <tanghui20@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Vincent Guittot 提交于
mainline inclusion from mainline-v5.6-rc4 commit fe61468b category: bugfix bugzilla: 93902, https://gitee.com/openeuler/kernel/issues/I59VLJ CVE: NA -------------------------------- When a cfs rq is throttled, the latter and its child are removed from the leaf list but their nr_running is not changed which includes staying higher than 1. When a task is enqueued in this throttled branch, the cfs rqs must be added back in order to ensure correct ordering in the list but this can only happens if nr_running == 1. When cfs bandwidth is used, we call unconditionnaly list_add_leaf_cfs_rq() when enqueuing an entity to make sure that the complete branch will be added. Similarly unthrottle_cfs_rq() can stop adding cfs in the list when a parent is throttled. Iterate the remaining entity to ensure that the complete branch will be added in the list. Reported-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: NChristian Borntraeger <borntraeger@de.ibm.com> Tested-by: NDietmar Eggemann <dietmar.eggemann@arm.com> Cc: stable@vger.kernel.org Cc: stable@vger.kernel.org #v5.1+ Link: https://lkml.kernel.org/r/20200306135257.25044-1-vincent.guittot@linaro.orgSigned-off-by: NHui Tang <tanghui20@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
- 23 5月, 2022 2 次提交
-
-
由 Zhang Qiao 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4VZJT CVE: NA -------------------------------- 1. Qos throttle reuse tg_{throttle,unthrottle}_{up,down} that can write some cfs-bandwidth fields, it may cause some unknown data error. So add qos_tg_{throttle,unthrottle}_{up,down} for qos throttle. 2. walk_tg_tree_from() caller must hold rcu_lock, currently there is none, so add it now. Signed-off-by: NZhang Qiao <zhangqiao22@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Zhang Qiao 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4VZJT CVE: NA -------------------------------- Before, when detect the cpu is overloaded, we throttle offline tasks at exit_to_user_mode_loop() before returning to user mode. Some architects(e.g.,arm64) do not support QOS scheduler because a task do not via exit_to_user_mode_loop() return to userspace at these platforms. In order to slove this problem and support qos scheduler on all architectures, if we require throttling offline tasks, we set flag TIF_NOTIFY_RESUME to an offline task when it is picked and throttle it at tracehook_notify_resume(). Signed-off-by: NZhang Qiao <zhangqiao22@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
- 19 4月, 2022 1 次提交
-
-
由 Xunlei Pang 提交于
mainline inclusion from mainline-v5.10-rc1 commit df3cb4ea category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I52QKB CVE: NA -------------------------------- We've met problems that occasionally tasks with full cpumask (e.g. by putting it into a cpuset or setting to full affinity) were migrated to our isolated cpus in production environment. After some analysis, we found that it is due to the current select_idle_smt() not considering the sched_domain mask. Steps to reproduce on my 31-CPU hyperthreads machine: 1. with boot parameter: "isolcpus=domain,2-31" (thread lists: 0,16 and 1,17) 2. cgcreate -g cpu:test; cgexec -g cpu:test "test_threads" 3. some threads will be migrated to the isolated cpu16~17. Fix it by checking the valid domain mask in select_idle_smt(). Fixes: 10e2f1ac ("sched/core: Rewrite and improve select_idle_siblings()) Reported-by: NWetp Zhang <wetp.zy@linux.alibaba.com> Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NJiang Biao <benbjiang@tencent.com> Reviewed-by: NVincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/1600930127-76857-1-git-send-email-xlpang@linux.alibaba.com Conflicts: kernel/sched/fair.c Signed-off-by: NYu jiahua <yujiahua1@huawei.com> Reviewed-by: Nzheng zucheng <zhengzucheng@huawei.com> Reviewed-by: Nzheng zucheng <zhengzucheng@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
- 02 4月, 2022 1 次提交
-
-
由 Zhang Qiao 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I50PPU CVE: NA ----------------------------------------------------------------- when unthrottle a cfs_rq at distribute_cfs_runtime(), another cpu may re-throttle this cfs_rq at qos_throttle_cfs_rq() before access the cfs_rq->throttle_list.next, but meanwhile, qos throttle will attach the cfs_rq throttle_list node to percpu qos_throttled_cfs_rq, it will change cfs_rq->throttle_list.next and cause panic or hardlockup at distribute_cfs_runtime(). Fix it by adding a qos_throttle_list node in struct cfs_rq, and qos throttle disuse the cfs_rq->throttle_list. Signed-off-by: NZhang Qiao <zhangqiao22@huawei.com> Reviewed-by: Nzheng zucheng <zhengzucheng@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
- 29 11月, 2021 8 次提交
-
-
由 Zheng Zucheng 提交于
hulk inclusion category: feature bugzilla: 51828, https://gitee.com/openeuler/kernel/issues/I4K96G CVE: NA -------------------------------- When online tasks occupy cpu long time, offline task will not get cpu to run, the priority inversion issue may be triggered in this case. If the above case occurs, we will unthrottle offline tasks and let its get a chance to run. When online tasks occupy cpu over 5s(defaule value), we will unthrottle offline tasks and enter a msleep loop before exit to usermode util the cpu goto idle. Signed-off-by: NZhang Qiao <zhangqiao22@huawei.com> Signed-off-by: NZheng Zucheng <zhengzucheng@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Zhang Qiao 提交于
hulk inclusion category: bugfix bugzilla: 51828, https://gitee.com/openeuler/kernel/issues/I4K96G CVE: NA -------------------------------- Signed-off-by: NZhang Qiao <zhangqiao22@huawei.com> Signed-off-by: NZheng Zucheng <zhengzucheng@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Zheng Zucheng 提交于
hulk inclusion category: bugfix bugzilla: 51828, https://gitee.com/openeuler/kernel/issues/I4K96G CVE: NA -------------------------------- In some corner case, when we throttle the last group sched entity in rq, kernel will be panic if there is no residual sched entities in rq. So we add a protection to prevent this case. Signed-off-by: NZheng Zucheng <zhengzucheng@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Zheng Zucheng 提交于
hulk inclusion category: bugfix bugzilla: 51828, https://gitee.com/openeuler/kernel/issues/I4K96G CVE: NA -------------------------------- When unthrottle offline tasks successfully, we should clear idle_stamp. Signed-off-by: NZhang Qiao <zhangqiao22@huawei.com> Signed-off-by: NZheng Zucheng <zhengzucheng@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Zheng Zucheng 提交于
hulk inclusion category: bugfix bugzilla: 51828, https://gitee.com/openeuler/kernel/issues/I4K96G CVE: NA -------------------------------- Signed-off-by: NZheng Zucheng <zhengzucheng@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Zheng Zucheng 提交于
hulk inclusion category: bugfix bugzilla: 51828, https://gitee.com/openeuler/kernel/issues/I4K96G CVE: NA -------------------------------- offline task invokes sched_setscheduler interface to change the scheduling policy to SCHED_OTHER, trigger a system panic. Signed-off-by: NZheng Zucheng <zhengzucheng@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Zhang Qiao 提交于
hulk inclusion category: feature bugzilla: 51828, https://gitee.com/openeuler/kernel/issues/I4K96G CVE: NA -------------------------------- In order for the throttled task to continue running after the CPU is offline, we need unthrottle the throttled cfs rq saved in the qos_throttled_cfs_rq. Signed-off-by: NZhang Qiao <zhangqiao22@huawei.com> Signed-off-by: NZheng Zucheng <zhengzucheng@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Zhang Qiao 提交于
hulk inclusion category: feature bugzilla: 51828, https://gitee.com/openeuler/kernel/issues/I4K96G CVE: NA -------------------------------- In a co-location scenario, we usually deploy online and offline task groups in the same server. The online tasks are more important than offline tasks. And to avoid offline tasks affects online tasks, we will throttle the offline tasks group when some online task groups running in the same cpu and unthrottle offline tasks when the cpu is about to enter idle state. Signed-off-by: NZhang Qiao <zhangqiao22@huawei.com> Signed-off-by: NZheng Zucheng <zhengzucheng@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
- 17 9月, 2021 1 次提交
-
-
由 Eric W. Biederman 提交于
mainline inclusion from mainline-5.4-rc1 commit 154abafc category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I3UKOW CVE: NA ------------------------------------------------- Remove work arounds that were written before there was a grace period after tasks left the runqueue in finish_task_switch(). In particular now that there tasks exiting the runqueue exprience a RCU grace period none of the work performed by task_rcu_dereference() excpet the rcu_dereference() is necessary so replace task_rcu_dereference() with rcu_dereference(). Remove the code in rcuwait_wait_event() that checks to ensure the current task has not exited. It is no longer necessary as it is guaranteed that any running task will experience a RCU grace period after it leaves the run queueue. Remove the comment in rcuwait_wake_up() as it is no longer relevant. [Li Hua: change task_rcu_dereference to rcu_dereference in sync_runqueues_membarrier_state] Ref: 8f95c90c ("sched/wait, RCU: Introduce rcuwait machinery") Ref: 150593bf ("sched/api: Introduce task_rcu_dereference() and try_get_task_struct()") Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Christoph Lameter <cl@linux.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Kirill Tkhai <tkhai@yandex.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/87lfurdpk9.fsf_-_@x220.int.ebiederm.orgSigned-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NLi Hua <hucool.lihua@huawei.com> Signed-off-by: NZheng Zucheng <zhengzucheng@huawei.com> Conflicts: kernel/sched/membarrier.c Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
- 02 8月, 2021 1 次提交
-
-
由 Odin Ugedal 提交于
stable inclusion from linux-4.19.199 commit aa36bd8fc187f513af2d250b82a3c913053ceaf1 -------------------------------- [ Upstream commit 72d0ad7c ] The time remaining until expiry of the refresh_timer can be negative. Casting the type to an unsigned 64-bit value will cause integer underflow, making the runtime_refresh_within return false instead of true. These situations are rare, but they do happen. This does not cause user-facing issues or errors; other than possibly unthrottling cfs_rq's using runtime from the previous period(s), making the CFS bandwidth enforcement less strict in those (special) situations. Signed-off-by: NOdin Ugedal <odin@uged.al> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NBen Segall <bsegall@google.com> Link: https://lore.kernel.org/r/20210629121452.18429-1-odin@uged.alSigned-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
- 19 7月, 2021 1 次提交
-
-
由 Vincent Guittot 提交于
stable inclusion from linux-4.19.195 commit 4af84445075da26439598984d1a01c1cdadb0e2d -------------------------------- commit 02da26ad upstream. During the update of fair blocked load (__update_blocked_fair()), we update the contribution of the cfs in tg->load_avg if cfs_rq's pelt has decayed. Nevertheless, the pelt values of a cfs_rq could have been recently updated while propagating the change of a child. In this case, cfs_rq's pelt will not decayed because it has already been updated and we don't update tg->load_avg. __update_blocked_fair ... for_each_leaf_cfs_rq_safe: child cfs_rq update cfs_rq_load_avg() for child cfs_rq ... update_load_avg(cfs_rq_of(se), se, 0) ... update cfs_rq_load_avg() for parent cfs_rq -propagation of child's load makes parent cfs_rq->load_sum becoming null -UPDATE_TG is not set so it doesn't update parent cfs_rq->tg_load_avg_contrib .. for_each_leaf_cfs_rq_safe: parent cfs_rq update cfs_rq_load_avg() for parent cfs_rq - nothing to do because parent cfs_rq has already been updated recently so cfs_rq->tg_load_avg_contrib is not updated ... parent cfs_rq is decayed list_del_leaf_cfs_rq parent cfs_rq - but it still contibutes to tg->load_avg we must set UPDATE_TG flags when propagting pending load to the parent Fixes: 039ae8bc ("sched/fair: Fix O(nr_cgroups) in the load balancing path") Reported-by: NOdin Ugedal <odin@uged.al> Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NOdin Ugedal <odin@uged.al> Link: https://lkml.kernel.org/r/20210527122916.27683-3-vincent.guittot@linaro.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
- 30 6月, 2021 1 次提交
-
-
由 Odin Ugedal 提交于
stable inclusion from linux-4.19.191 commit 434ea8c1d1bf296a2597aeb28f6ccf62ae82f235 -------------------------------- [ Upstream commit 0258bdfa ] This fixes an issue where old load on a cfs_rq is not properly decayed, resulting in strange behavior where fairness can decrease drastically. Real workloads with equally weighted control groups have ended up getting a respective 99% and 1%(!!) of cpu time. When an idle task is attached to a cfs_rq by attaching a pid to a cgroup, the old load of the task is attached to the new cfs_rq and sched_entity by attach_entity_cfs_rq. If the task is then moved to another cpu (and therefore cfs_rq) before being enqueued/woken up, the load will be moved to cfs_rq->removed from the sched_entity. Such a move will happen when enforcing a cpuset on the task (eg. via a cgroup) that force it to move. The load will however not be removed from the task_group itself, making it look like there is a constant load on that cfs_rq. This causes the vruntime of tasks on other sibling cfs_rq's to increase faster than they are supposed to; causing severe fairness issues. If no other task is started on the given cfs_rq, and due to the cpuset it would not happen, this load would never be properly unloaded. With this patch the load will be properly removed inside update_blocked_averages. This also applies to tasks moved to the fair scheduling class and moved to another cpu, and this path will also fix that. For fork, the entity is queued right away, so this problem does not affect that. This applies to cases where the new process is the first in the cfs_rq, issue introduced 3d30544f ("sched/fair: Apply more PELT fixes"), and when there has previously been load on the cgroup but the cgroup was removed from the leaflist due to having null PELT load, indroduced in 039ae8bc ("sched/fair: Fix O(nr_cgroups) in the load balancing path"). For a simple cgroup hierarchy (as seen below) with two equally weighted groups, that in theory should get 50/50 of cpu time each, it often leads to a load of 60/40 or 70/30. parent/ cg-1/ cpu.weight: 100 cpuset.cpus: 1 cg-2/ cpu.weight: 100 cpuset.cpus: 1 If the hierarchy is deeper (as seen below), while keeping cg-1 and cg-2 equally weighted, they should still get a 50/50 balance of cpu time. This however sometimes results in a balance of 10/90 or 1/99(!!) between the task groups. $ ps u -C stress USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 18568 1.1 0.0 3684 100 pts/12 R+ 13:36 0:00 stress --cpu 1 root 18580 99.3 0.0 3684 100 pts/12 R+ 13:36 0:09 stress --cpu 1 parent/ cg-1/ cpu.weight: 100 sub-group/ cpu.weight: 1 cpuset.cpus: 1 cg-2/ cpu.weight: 100 sub-group/ cpu.weight: 10000 cpuset.cpus: 1 This can be reproduced by attaching an idle process to a cgroup and moving it to a given cpuset before it wakes up. The issue is evident in many (if not most) container runtimes, and has been reproduced with both crun and runc (and therefore docker and all its "derivatives"), and with both cgroup v1 and v2. Fixes: 3d30544f ("sched/fair: Apply more PELT fixes") Fixes: 039ae8bc ("sched/fair: Fix O(nr_cgroups) in the load balancing path") Signed-off-by: NOdin Ugedal <odin@uged.al> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NVincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20210501141950.23622-2-odin@uged.alSigned-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-