提交 · 70dc4628964d3b9c6acd04141bfdc72363a88063 · openeuler / Kernel

30 6月, 2023 1 次提交

sched: Fix null pointer derefrence for sd->span · 70dc4628

由 Hui Tang 提交于 6月 30, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7HFZV
CVE: NA

----------------------------------------

There may be NULL pointer derefrence when hotplug running and
creating taskgroup concurrently.

sched_autogroup_create_attach
  -> sched_create_group
    -> alloc_fair_sched_group
      -> init_auto_affinity
        -> init_affinity_domains
           -> cpumask_copy(xx, sched_domain_span(tmp))
              { tmp may be free due rcu lock missing }

{ hotplug will rebuild sched domain }
sched_cpu_activate
  -> build_sched_domains
    -> cpuset_cpu_active
      -> partition_sched_domains
        -> build_sched_domains
          -> cpu_attach_domain
            -> destroy_sched_domains
              -> call_rcu(&sd->rcu, destroy_sched_domains_rcu)

So sd should be protect with rcu lock in entire critical zone.

[  599.811593] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[  600.112821] pc : init_affinity_domains+0xf4/0x200
[  600.125918] lr : init_affinity_domains+0xd4/0x200
[  600.331355] Call trace:
[  600.338734]  init_affinity_domains+0xf4/0x200
[  600.347955]  init_auto_affinity+0x78/0xc0
[  600.356622]  alloc_fair_sched_group+0xd8/0x210
[  600.365594]  sched_create_group+0x48/0xc0
[  600.373970]  sched_autogroup_create_attach+0x54/0x190
[  600.383311]  ksys_setsid+0x110/0x130
[  600.391014]  __arm64_sys_setsid+0x18/0x24
[  600.399156]  el0_svc_common+0x118/0x170
[  600.406818]  el0_svc_handler+0x3c/0x80
[  600.414188]  el0_svc+0x8/0x640
[  600.420719] Code: b40002c0 9104e002 f9402061 a9401444 (a9001424)
[  600.430504] SMP: stopping secondary CPUs
[  600.441751] Starting crashdump kernel...

Fixes: 713cfd26 ("sched: Introduce smart grid scheduling strategy for cfs")
Signed-off-by: NHui Tang <tanghui20@huawei.com>
Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>

70dc4628

25 6月, 2023 2 次提交

sched: Fix memory leak for smart grid · 2b5d1aa5

由 Hui Tang 提交于 6月 25, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7FBJM
CVE: NA

----------------------------------------

Free ad->domains_orig[] in 'free_affinity_domains',
otherwise the memory will leak.

Fixes: 713cfd26 ("sched: Introduce smart grid scheduling strategy for cfs")
Signed-off-by: NHui Tang <tanghui20@huawei.com>
Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>

2b5d1aa5

sched: Delete redundant updates to p->prefer_cpus · 4f3df479

由 Hui Tang 提交于 6月 25, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7F7KV
CVE: NA

-------------------------------

Delete redundant updates to p->prefer_cpus when smart grid used.
Add missed check for p->prefer_cpus when !CONFIG_QOS_SCHED_SMART_GRID.

Fixes: 21e5d85e ("sched: Fix possible deadlock in tg_set_dynamic_affinity_mode")
Signed-off-by: NHui Tang <tanghui20@huawei.com>
Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>

4f3df479

20 6月, 2023 4 次提交

sched: Adjust few parameters range for smart grid · 27c8c87a

由 Hui Tang 提交于 6月 20, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7EEF3
CVE: NA

-------------------------------

Adjust few parameters range for smart grid.

Fixes: 713cfd26 ("sched: Introduce smart grid scheduling strategy for cfs")
Signed-off-by: NHui Tang <tanghui20@huawei.com>
Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>

27c8c87a

sched: Fix memory leak on error branch · d791cf33

由 Hui Tang 提交于 6月 20, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7EBNA
CVE: NA

-------------------------------

Fix memory leak on error branch for smart grid.

Fixes: 713cfd26 ("sched: Introduce smart grid scheduling strategy for cfs")
Signed-off-by: NHui Tang <tanghui20@huawei.com>
Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>

d791cf33

sched: fix dereference NULL pointers · b43a1c9e

由 Hui Tang 提交于 6月 20, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7EA1X
CVE: NA

-------------------------------

tg->auto_affinity is NULL if init_auto_affinity() failed.
So add checking for tg->auto_affinity before derefrence.

Fixes: 713cfd26 ("sched: Introduce smart grid scheduling strategy for cfs")
Signed-off-by: NHui Tang <tanghui20@huawei.com>
Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>

b43a1c9e

sched: Fix timer storm for smart grid · 12521356

由 Hui Tang 提交于 6月 20, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7DSX6
CVE: NA

-------------------------------

Timer storm may be triggered if !cpumask_weight(ad->domains[i])
which is set in cpu offline.

Fixes: 713cfd26 ("sched: Introduce smart grid scheduling strategy for cfs")
Signed-off-by: NHui Tang <tanghui20@huawei.com>
Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>

12521356

19 6月, 2023 1 次提交

sched/rt: Fix possible warn when push_rt_task · 3e40e3aa

由 Hui Tang 提交于 6月 19, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7DX9Y
CVE: NA

-------------------------------

A warn may be triggered during reboot, as follows:

reboot
  ->kernel_restart
    ->machine_restart
      ->smp_send_stop --- ipi handler set_cpu_online(cpu, false)

balance_callback
-> __balance_callback
  ->push_rt_task
    -> find_lock_lowest_rq <从vec->mask获取的rq>
      -> find_lowest_rq
        -> cpupri_find
          -> cpupri_find_fitness
            -> __cpupri_find [cpumask_and(..., vec->mask)]
    -> set_task_cpu(next_task, lowest_rq->cpu) --- WARN_ON(!oneline(cpu)

So add !cpu_online(lowest_rq->cpu) check before set_task_cpu().
The fix does not completely fix the problem, since cpu_online_mask may
be cleared after check.

Fixes: 4ff9083b ("sched/core: WARN() when migrating to an offline CPU")
Signed-off-by: NHui Tang <tanghui20@huawei.com>
Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

3e40e3aa

15 6月, 2023 5 次提交

sched: Fix negative count for jump label · cde6dbb8

由 Hui Tang 提交于 6月 15, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7DA63
CVE: NA

--------------------------------

Add mutex lock to prevent negative count for jump label.

[28612.530675] ------------[ cut here ]------------
[28612.532708] jump label: negative count!
[28612.535031] WARNING: CPU: 4 PID: 3899 at kernel/jump_label.c:202
	__static_key_slow_dec_cpuslocked+0x204/0x240
[28612.538216] Kernel panic - not syncing: panic_on_warn set ...
[28612.538216]
[28612.540487] CPU: 4 PID: 3899 Comm: sh Kdump: loaded Not tainted
[28612.542788] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
[28612.546455] Call Trace:
[28612.547339]  dump_stack+0xc6/0x11e
[28612.548546]  ? __static_key_slow_dec_cpuslocked+0x200/0x240
[28612.550352]  panic+0x1d6/0x46b
[28612.551375]  ? refcount_error_report+0x2a5/0x2a5
[28612.552915]  ? kmsg_dump_rewind_nolock+0xde/0xde
[28612.554358]  ? sched_clock_cpu+0x18/0x1b0
[28612.555699]  ? __warn+0x1d1/0x210
[28612.556799]  ? __static_key_slow_dec_cpuslocked+0x204/0x240
[28612.558548]  __warn+0x1ec/0x210
[28612.559621]  ? __static_key_slow_dec_cpuslocked+0x204/0x240
[28612.561536]  report_bug+0x1ee/0x2b0
[28612.562706]  fixup_bug.part.4+0x37/0x80
[28612.563937]  do_error_trap+0x21c/0x260
[28612.565109]  ? fixup_bug.part.4+0x80/0x80
[28612.566453]  ? check_preemption_disabled+0x34/0x1f0
[28612.567991]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[28612.569534]  ? lockdep_hardirqs_off+0x1cb/0x2b0
[28612.570993]  ? error_entry+0x9a/0x130
[28612.572138]  ? trace_hardirqs_off_caller+0x59/0x1a0
[28612.573710]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[28612.575232]  invalid_op+0x14/0x20
[root@lo[ca2lh8ost6 12.576387]  ? vprintk_func+0x68/0x1a0
[28612.577827]  ? __static_key_slow_dec_cpuslocked+0x204/0x240
smartg[ri2d]8# 612.579662]  ? __static_key_slow_dec_cpuslocked+0x204/0x240
[28612.581781]  ? static_key_disable+0x30/0x30
[28612.583248]  ? s
tatic_key_slow_dec+0x57/0x90
[28612.584997]  ? tg_set_dynamic_affinity_mode+0x42/0x70
[28612.586714]  ? cgroup_file_write+0x471/0x6a0
[28612.588162]  ? cgroup_css.part.4+0x100/0x100
[28612.589579]  ? cgroup_css.part.4+0x100/0x100
[28612.591031]  ? kernfs_fop_write+0x2af/0x430
[28612.592625]  ? kernfs_vma_page_mkwrite+0x230/0x230
[28612.594274]  ? __vfs_write+0xef/0x680
[28612.595590]  ? kernel_read+0x110/0x110
ea8612.596899]  ? check_preemption_disabled+0x3mkd4ir/: 0canxno1t fcr0
Signed-off-by: NHui Tang <tanghui20@huawei.com>
Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>

cde6dbb8

sched: Fix possible deadlock in tg_set_dynamic_affinity_mode · 21e5d85e

由 Hui Tang 提交于 6月 15, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7CGD0
CVE: NA

----------------------------------------

Deadlock occurs in two situations as follows:

The first case:

tg_set_dynamic_affinity_mode    --- raw_spin_lock_irq(&auto_affi->lock);
	->start_auto_affintiy   --- trigger timer
		->tg_update_task_prefer_cpus
			>css_task_inter_next
				->raw_spin_unlock_irq

hr_timer_run_queues
  ->sched_auto_affi_period_timer --- try spin lock (&auto_affi->lock)

The second case as follows:

[  291.470810] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  291.472715] rcu:     1-...0: (0 ticks this GP) idle=a6a/1/0x4000000000000002 softirq=78516/78516 fqs=5249
[  291.475268] rcu:     (detected by 6, t=21006 jiffies, g=202169, q=9862)
[  291.477038] Sending NMI from CPU 6 to CPUs 1:
[  291.481268] NMI backtrace for cpu 1
[  291.481273] CPU: 1 PID: 1923 Comm: sh Kdump: loaded Not tainted 4.19.90+ #150
[  291.481278] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
[  291.481281] RIP: 0010:queued_spin_lock_slowpath+0x136/0x9a0
[  291.481289] Code: c0 74 3f 49 89 dd 48 89 dd 48 b8 00 00 00 00 00 fc ff df 49 c1 ed 03 83 e5 07 49 01 c5 83 c5 03 48 83 05 c4 66 b9 05 01 f3 90 <41> 0f b6 45 00 40 38 c5 7c 08 84 c0 0f 85 ad 07 00 00 0
[  291.481292] RSP: 0018:ffff88801de87cd8 EFLAGS: 00000002
[  291.481297] RAX: 0000000000000101 RBX: ffff888001be0a28 RCX: ffffffffb8090f7d
[  291.481301] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff888001be0a28
[  291.481304] RBP: 0000000000000003 R08: ffffed100037c146 R09: ffffed100037c146
[  291.481307] R10: 000000001106b143 R11: ffffed100037c145 R12: 1ffff11003bd0f9c
[  291.481311] R13: ffffed100037c145 R14: fffffbfff7a38dee R15: dffffc0000000000
[  291.481315] FS:  00007fac4f306740(0000) GS:ffff88801de80000(0000) knlGS:0000000000000000
[  291.481318] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  291.481321] CR2: 00007fac4f4bb650 CR3: 00000000046b6000 CR4: 00000000000006e0
[  291.481323] Call Trace:
[  291.481324]  <IRQ>
[  291.481326]  ? osq_unlock+0x2a0/0x2a0
[  291.481329]  ? check_preemption_disabled+0x4c/0x290
[  291.481331]  ? rcu_accelerate_cbs+0x33/0xed0
[  291.481333]  _raw_spin_lock_irqsave+0x83/0xa0
[  291.481336]  sched_auto_affi_period_timer+0x251/0x820
[  291.481338]  ? __remove_hrtimer+0x151/0x200
[  291.481340]  __hrtimer_run_queues+0x39d/0xa50
[  291.481343]  ? tg_update_affinity_domain_down+0x460/0x460
[  291.481345]  ? enqueue_hrtimer+0x2e0/0x2e0
[  291.481348]  ? ktime_get_update_offsets_now+0x1d7/0x2c0
[  291.481350]  hrtimer_run_queues+0x243/0x470
[  291.481352]  run_local_timers+0x5e/0x150
[  291.481354]  update_process_times+0x36/0xb0
[  291.481357]  tick_sched_handle.isra.4+0x7c/0x180
[  291.481359]  tick_nohz_handler+0xd1/0x1d0
[  291.481365]  smp_apic_timer_interrupt+0x12c/0x4e0
[  291.481368]  apic_timer_interrupt+0xf/0x20
[  291.481370]  </IRQ>
[  291.481372]  ? smp_call_function_many+0x68c/0x840
[  291.481375]  ? smp_call_function_many+0x6ab/0x840
[  291.481377]  ? arch_unregister_cpu+0x60/0x60
[  291.481379]  ? native_set_fixmap+0x100/0x180
[  291.481381]  ? arch_unregister_cpu+0x60/0x60
[  291.481384]  ? set_task_select_cpus+0x116/0x940
[  291.481386]  ? smp_call_function+0x53/0xc0
[  291.481388]  ? arch_unregister_cpu+0x60/0x60
[  291.481390]  ? on_each_cpu+0x49/0xf0
[  291.481393]  ? set_task_select_cpus+0x115/0x940
[  291.481395]  ? text_poke_bp+0xff/0x180
[  291.481397]  ? poke_int3_handler+0xc0/0xc0
[  291.481400]  ? __set_prefer_cpus_ptr.constprop.4+0x1cd/0x900
[  291.481402]  ? hrtick+0x1b0/0x1b0
[  291.481404]  ? set_task_select_cpus+0x115/0x940
[  291.481407]  ? __jump_label_transform.isra.0+0x3a1/0x470
[  291.481409]  ? kernel_init+0x280/0x280
[  291.481411]  ? kasan_check_read+0x1d/0x30
[  291.481413]  ? mutex_lock+0x96/0x100
[  291.481415]  ? __mutex_lock_slowpath+0x30/0x30
[  291.481418]  ? arch_jump_label_transform+0x52/0x80
[  291.481420]  ? set_task_select_cpus+0x115/0x940
[  291.481422]  ? __jump_label_update+0x1a1/0x1e0
[  291.481424]  ? jump_label_update+0x2ee/0x3b0
[  291.481427]  ? static_key_slow_inc_cpuslocked+0x1c8/0x2d0
[  291.481430]  ? start_auto_affinity+0x190/0x200
[  291.481432]  ? tg_set_dynamic_affinity_mode+0xad/0xf0
[  291.481435]  ? cpu_affinity_mode_write_u64+0x22/0x30
[  291.481437]  ? cgroup_file_write+0x46f/0x660
[  291.481439]  ? cgroup_init_cftypes+0x300/0x300
[  291.481441]  ? __mutex_lock_slowpath+0x30/0x30
Signed-off-by: NHui Tang <tanghui20@huawei.com>
Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>

21e5d85e

sched: fix WARN found by deadlock detect · 217edab9

由 Hui Tang 提交于 6月 15, 2023

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I7BQZ0
CVE: NA

----------------------------------------

The WARNING report when run:
echo 1 > /sys/fs/cgroup/cpu/cpu.dynamic_affinity_mode

[  147.276757] WARNING: CPU: 5 PID: 1770 at kernel/cpu.c:326 \
	lockdep_assert_cpus_held+0xac/0xd0
[  147.279670] Kernel panic - not syncing: panic_on_warn set ...
[  147.279670]
[  147.282211] CPU: 5 PID: 1770 Comm: bash Kdump: loaded Not tainted 4.19
[  147.284796] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)..
[  147.290963] Call Trace:
[  147.292459]  dump_stack+0xc6/0x11e
[  147.294295]  ? lockdep_assert_cpus_held+0xa0/0xd0
[  147.296876]  panic+0x1d6/0x46b
[  147.298591]  ? refcount_error_report+0x2a5/0x2a5
[  147.301131]  ? kmsg_dump_rewind_nolock+0xde/0xde
[  147.303738]  ? sched_clock_cpu+0x18/0x1b0
[  147.305943]  ? __warn+0x1d1/0x210
[  147.307831]  ? lockdep_assert_cpus_held+0xac/0xd0
[  147.310469]  __warn+0x1ec/0x210
[  147.312271]  ? lockdep_assert_cpus_held+0xac/0xd0
[  147.314838]  report_bug+0x1ee/0x2b0
[  147.316798]  fixup_bug.part.4+0x37/0x80
[  147.318946]  do_error_trap+0x21c/0x260
[  147.321062]  ? fixup_bug.part.4+0x80/0x80
[  147.323253]  ? check_preemption_disabled+0x34/0x1f0
[  147.324886]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[  147.326277]  ? lockdep_hardirqs_off+0x1cb/0x2b0
[  147.327505]  ? error_entry+0x9a/0x130
[  147.328523]  ? trace_hardirqs_off_caller+0x59/0x1a0
[  147.329844]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[  147.331124]  invalid_op+0x14/0x20
[  147.332057]  ? vprintk_func+0x68/0x1a0
[  147.333082]  ? lockdep_assert_cpus_held+0xac/0xd0
[  147.334355]  ? lockdep_assert_cpus_held+0xac/0xd0
[  147.335624]  ? static_key_slow_inc_cpuslocked+0x5a/0x230
[  147.337079]  ? tg_set_dynamic_affinity_mode+0x4f/0x70
[  147.338444]  ? cgroup_file_write+0x471/0x6a0
[  147.339604]  ? cgroup_css.part.4+0x100/0x100
[  147.340782]  ? cgroup_css.part.4+0x100/0x100
[  147.341943]  ? kernfs_fop_write+0x2af/0x430
[  147.343083]  ? kernfs_vma_page_mkwrite+0x230/0x230
[  147.344401]  ? __vfs_write+0xef/0x680
[  147.345404]  ? kernel_read+0x110/0x110
Signed-off-by: NHui Tang <tanghui20@huawei.com>
Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>

217edab9

sched: fix smart grid usage count · d9099163

由 Hui Tang 提交于 6月 15, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7D98G
CVE: NA

----------------------------------------

smart_grid_usage_dec() should called when free taskgroup
if the mode is auto.
Signed-off-by: NHui Tang <tanghui20@huawei.com>
Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>

d9099163

sched: Add static key to reduce noise · 373fd236

由 Hui Tang 提交于 6月 15, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7A718

--------------------------------

Add static key to reduce noise when not enable dynamic affinity.
There are better performance in some case, such for lmbench.

Fixes: 243865da ("cpuset: Introduce new interface for scheduler ...")
Signed-off-by: NHui Tang <tanghui20@huawei.com>
Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>

373fd236

09 6月, 2023 2 次提交

sched: smart grid: init sched_grid_qos structure on QOS purpose · ce35ded5

由 Wang ShaoBo 提交于 6月 09, 2023

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I7BQZ0
CVE: NA

----------------------------------------

As smart grid scheduling (SGS) may shrink resources and affect task QOS,
We provide methods for evaluating task QOS in divided grid, we mainly
focus on the following two aspects:

   1. Evaluate whether (such as CPU or memory) resources meet our demand
   2. Ensure the least impact when working with (cpufreq and cpuidle) governors

For tackling this questions, we have summarized several sampling methods
to obtain tasks' characteristics at same time reducing scheduling noise
as much as possible:

  1. we detected the key factors that how sensitive a process is in cpufreq
     or cpuidle adjustment, and to guide the cpufreq/cpuidle governor
  2. We dynamically monitor process memory bandwidth and adjust memory
     allocation to minimize cross-remote memory access
  3. We provide a variety of load tracking mechanisms to adapt to different
     types of task's load change

     ---------------------------------     -----------------
    |            class A              |   |     class B     |
    |    --------        --------     |   |     --------    |
    |   | group0 |      | group1 |    |---|    | group2 |   |----------+
    |    --------        --------     |   |     --------    |          |
    |    CPU/memory sensitive type    |   |   balance type  |          |
     ----------------+----------------     --------+--------           |
                     v                             v                   | (target cpufreq)
     -------------------------------------------------------           | (sensitivity)
    |              Not satisfied with QOS?                  |          |
     --------------------------+----------------------------           |
                               v                                       v
     -------------------------------------------------------     ----------------
    |              expand or shrink resource                |<--|  energy model  |
     ----------------------------+--------------------------     ----------------
                                 v                                     |
     -----------          -----------          ------------            v
    |           |        |           |        |            |     ---------------
    |   GRID0   +--------+   GRID1   +--------+   GRID2    |<-- |   governor    |
    |           |        |           |        |            |     ---------------
     -----------          -----------          ------------
                   \            |            /
                    \  -------------------  /
                      |  pages migration  |
                       -------------------

We will introduce the energy model in the follow-up implementation, and consider
the dynamic affinity adjustment between each divided grid in the runtime.
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>

ce35ded5

sched: Introduce smart grid scheduling strategy for cfs · 713cfd26

由 Hui Tang 提交于 6月 09, 2023

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I7BQZ0
CVE: NA

----------------------------------------

We want to dynamically expand or shrink the affinity range of tasks
based on the CPU topology level while meeting the minimum resource
requirements of tasks.

We divide several level of affinity domains according to sched domains:

level4   * SOCKET  [                                                  ]
level3   * DIE     [                             ]
level2   * MC      [             ] [             ]
level1   * SMT     [     ] [     ] [     ] [     ]
level0   * CPU      0   1   2   3   4   5   6   7

Whether users tend to choose power saving or performance will affect
strategy of adjusting affinity, when selecting the power saving mode,
we will choose a more appropriate affinity based on the energy model
to reduce power consumption, while considering the QOS of resources
such as CPU and memory consumption, for instance, if the current task
CPU load is less than required, smart grid will judge whether to aggregate
tasks together into a smaller range or not according to energy model.

The main difference from EAS is that we pay more attention to the impact
of power consumption brought by such as cpuidle and DVFS, and classify
tasks to reduce interference and ensure resource QOS in each divided unit,
which are more suitable for general-purpose on non-heterogeneous CPUs.

        --------        --------        --------
       | group0 |      | group1 |      | group2 |
        --------        --------        --------
	   |                |              |
	   v                |              v
       ---------------------+-----     -----------------
      |                  ---v--   |   |
      |       DIE0      |  MC1 |  |   |   DIE1
      |                  ------   |   |
       ---------------------------     -----------------

We regularly count the resource satisfaction of groups, and adjust the
affinity, scheduling balance and migrating memory will be considered
based on memory location for better meetting resource requirements.
Signed-off-by: NHui Tang <tanghui20@huawei.com>
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>

713cfd26

08 5月, 2023 1 次提交

sched_getaffinity: don't assume 'cpumask_size()' is fully initialized · e7b1f698

由 Linus Torvalds 提交于 5月 08, 2023

stable inclusion
from stable-v4.19.280
commit 178ff87d2a0c2d3d74081e1c2efbb33b3487267d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I715PM
CVE: NA

--------------------------------

[ Upstream commit 6015b1ac ]

The getaffinity() system call uses 'cpumask_size()' to decide how big
the CPU mask is - so far so good.  It is indeed the allocation size of a
cpumask.

But the code also assumes that the whole allocation is initialized
without actually doing so itself.  That's wrong, because we might have
fixed-size allocations (making copying and clearing more efficient), but
not all of it is then necessarily used if 'nr_cpu_ids' is smaller.

Having checked other users of 'cpumask_size()', they all seem to be ok,
either using it purely for the allocation size, or explicitly zeroing
the cpumask before using the size in bytes to copy it.

See for example the ublk_ctrl_get_queue_affinity() function that uses
the proper 'zalloc_cpumask_var()' to make sure that the whole mask is
cleared, whether the storage is on the stack or if it was an external
allocation.

Fix this by just zeroing the allocation before using it.  Do the same
for the compat version of sched_getaffinity(), which had the same logic.

Also, for consistency, make sched_getaffinity() use 'cpumask_bits()' to
access the bits.  For a cpumask_var_t, it ends up being a pointer to the
same data either way, but it's just a good idea to treat it like you
would a 'cpumask_t'.  The compat case already did that.
Reported-by: NRyan Roberts <ryan.roberts@arm.com>
Link: https://lore.kernel.org/lkml/7d026744-6bd6-6827-0471-b5e8eae0be3f@arm.com/
Cc: Yury Norov <yury.norov@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

e7b1f698

06 4月, 2023 3 次提交

sched/fair: Sanitize vruntime of entity being migrated · 2ff1290e

由 Vincent Guittot 提交于 4月 06, 2023

mainline inclusion
from mainline-v6.3-rc4
commit a53ce18c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TE76
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=a53ce18cacb477dd0513c607f187d16f0fa96f71

--------------------------------

Commit 829c1651 ("sched/fair: sanitize vruntime of entity being placed")
fixes an overflowing bug, but ignore a case that se->exec_start is reset
after a migration.

For fixing this case, we delay the reset of se->exec_start after
placing the entity which se->exec_start to detect long sleeping task.

In order to take into account a possible divergence between the clock_task
of 2 rqs, we increase the threshold to around 104 days.

Fixes: 829c1651 ("sched/fair: sanitize vruntime of entity being placed")
Originally-by: NZhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NZhang Qiao <zhangqiao22@huawei.com>
Link: https://lore.kernel.org/r/20230317160810.107988-1-vincent.guittot@linaro.orgSigned-off-by: NSongping Yu <yusongping@huawei.com>
Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

2ff1290e

sched/fair: sanitize vruntime of entity being placed · 9b35c87f

由 Zhang Qiao 提交于 4月 06, 2023

mainline inclusion
from mainline-v6.3-rc4
commit 829c1651
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TE76
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=829c1651e9c4a6f78398d3e67651cef9bb6b42cc

--------------------------------

When a scheduling entity is placed onto cfs_rq, its vruntime is pulled
to the base level (around cfs_rq->min_vruntime), so that the entity
doesn't gain extra boost when placed backwards.

However, if the entity being placed wasn't executed for a long time, its
vruntime may get too far behind (e.g. while cfs_rq was executing a
low-weight hog), which can inverse the vruntime comparison due to s64
overflow.  This results in the entity being placed with its original
vruntime way forwards, so that it will effectively never get to the cpu.

To prevent that, ignore the vruntime of the entity being placed if it
didn't execute for much longer than the characteristic sheduler time
scale.

[rkagan: formatted, adjusted commit log, comments, cutoff value]
Signed-off-by: NZhang Qiao <zhangqiao22@huawei.com>
Co-developed-by: NRoman Kagan <rkagan@amazon.de>
Signed-off-by: NRoman Kagan <rkagan@amazon.de>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230130122216.3555094-1-rkagan@amazon.deSigned-off-by: NSongping Yu <yusongping@huawei.com>
Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
Reviewed-by: Nchenhui <judy.chenhui@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

9b35c87f

Revert "sched: Reinit task's vruntime if a task sleep over 200 days" · c0f17a99

由 Songping Yu 提交于 4月 06, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TE76
CVE: NA

--------------------------------

This reverts commit 3a44e838.
Signed-off-by: NSongping Yu <yusongping@huawei.com>
Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
Reviewed-by: Nchenhui <judy.chenhui@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

c0f17a99

09 2月, 2023 1 次提交

sched/rt: Optimize checking group RT scheduler constraints · 4cf97d6f

由 Konstantin Khlebnikov 提交于 2月 09, 2023

mainline inclusion
from mainline-v5.7-rc1
commit b4fb015e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6DZRO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b4fb015eeff7f3e5518a7dbe8061169a3e2f2bc7

--------------------------------

Group RT scheduler contains protection against setting zero runtime for
cgroup with RT tasks. Right now function tg_set_rt_bandwidth() iterates
over all CPU cgroups and calls tg_has_rt_tasks() for any cgroup which
runtime is zero (not only for changed one). Default RT runtime is zero,
thus tg_has_rt_tasks() will is called for almost at CPU cgroups.

This protection already is slightly racy: runtime limit could be changed
between cpu_cgroup_can_attach() and cpu_cgroup_attach() because changing
cgroup attribute does not lock cgroup_mutex while attach does not lock
rt_constraints_mutex. Changing task scheduler class also races with
changing rt runtime: check in __sched_setscheduler() isn't protected.

Function tg_has_rt_tasks() iterates over all threads in the system.
This gives NR_CGROUPS * NR_TASKS operations under single tasklist_lock
locked for read tg_set_rt_bandwidth(). Any concurrent attempt of locking
tasklist_lock for write (for example fork) will stuck with disabled irqs.

This patch makes two optimizations:
1) Remove locking tasklist_lock and iterate only tasks in cgroup
2) Call tg_has_rt_tasks() iff rt runtime changes from non-zero to zero

All changed code is under CONFIG_RT_GROUP_SCHED.

Testcase:

 # mkdir /sys/fs/cgroup/cpu/test{1..10000}
 # echo 0 | tee /sys/fs/cgroup/cpu/test*/cpu.rt_runtime_us

At the same time without patch fork time will be >100ms:

 # perf trace -e clone --duration 100 stress-ng --fork 1

Also remote ping will show timings >100ms caused by irq latency.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/157996383820.4651.11292439232549211693.stgit@buzzSigned-off-by: NZhao Wenhui <zhaowenhui8@huawei.com>
Reviewed-by: Nchenhui <judy.chenhui@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

4cf97d6f

24 12月, 2022 1 次提交

sched: Reinit task's vruntime if a task sleep over 200 days · 3a44e838

由 Zhang Qiao 提交于 12月 24, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I67BL1
CVE: NA

-------------------------------

If a task sleep for long time, it maybe cause a s64 overflow
issue at max_vruntime() and the task will be set an incorrect
vruntime, lead to the task be starve.

For fix it, we set the task's vruntime as cfs_rq->min_vruntime
when wakeup.
Signed-off-by: NZhang Qiao <zhangqiao22@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Reviewed-by: Nsongping yu <yusongping@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

3a44e838

16 12月, 2022 1 次提交

sched/qos: Don't unthrottle cfs_rq when cfs_rq is throttled by qos · fbea24f5

由 Zhang Qiao 提交于 12月 16, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I64OUS
CVE: NA

-------------------------------

When a cfs_rq throttled by qos, mark cfs_rq->throttled as 1,
and cfs bw will unthrottled this cfs_rq by mistake, it cause
a list_del_valid warning.
So add macro QOS_THROTTLED(=2), when a cfs_rq is throttled by
qos, we mark the cfs_rq->throttled as QOS_THROTTLED, will check
the value of cfs_rq->throttled before unthrottle a cfs_rq.
Signed-off-by: NZhang Qiao <zhangqiao22@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Reviewed-by: Nzheng zucheng <zhengzucheng@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

fbea24f5

07 9月, 2022 1 次提交

nohz/full, sched/rt: Fix missed tick-reenabling bug in dequeue_task_rt() · 5635594b

由 Nicolas Saenz Julienne 提交于 9月 07, 2022

stable inclusion
from stable-v4.19.256
commit 952146183df68a7fb29c3c29d56ffdc941fbdc39
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ
CVE: NA

--------------------------------

[ Upstream commit 5c66d1b9 ]

dequeue_task_rt() only decrements 'rt_rq->rt_nr_running' after having
called sched_update_tick_dependency() preventing it from re-enabling the
tick on systems that no longer have pending SCHED_RT tasks but have
multiple runnable SCHED_OTHER tasks:

  dequeue_task_rt()
    dequeue_rt_entity()
      dequeue_rt_stack()
        dequeue_top_rt_rq()
	  sub_nr_running()	// decrements rq->nr_running
	    sched_update_tick_dependency()
	      sched_can_stop_tick()	// checks rq->rt.rt_nr_running,
	      ...
        __dequeue_rt_entity()
          dec_rt_tasks()	// decrements rq->rt.rt_nr_running
	  ...

Every other scheduler class performs the operation in the opposite
order, and sched_update_tick_dependency() expects the values to be
updated as such. So avoid the misbehaviour by inverting the order in
which the above operations are performed in the RT scheduler.

Fixes: 76d92ac3 ("sched: Migrate sched to use new tick dependency mask model")
Signed-off-by: NNicolas Saenz Julienne <nsaenzju@redhat.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NValentin Schneider <vschneid@redhat.com>
Reviewed-by: NPhil Auld <pauld@redhat.com>
Link: https://lore.kernel.org/r/20220628092259.330171-1-nsaenzju@redhat.comSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

5635594b

15 8月, 2022 1 次提交

sched: Fix null-ptr-deref in free_fair_sched_group · 0d2df28e

由 Hui Tang 提交于 8月 15, 2022

hulk inclusion
category: bugfix
bugzilla: 187419, https://gitee.com/openeuler/kernel/issues/I5LIPL
CVE: NA

-------------------------------

do_el0_svc+0x50/0x11c arch/arm64/kernel/syscall.c:217
 el0_svc+0x20/0x30 arch/arm64/kernel/entry-common.c:353
 el0_sync_handler+0xe4/0x1e0 arch/arm64/kernel/entry-common.c:369
 el0_sync+0x148/0x180 arch/arm64/kernel/entry.S:683

==================================================================
BUG: KASAN: null-ptr-deref in rq_of kernel/sched/sched.h:1118 [inline]
BUG: KASAN: null-ptr-deref in unthrottle_qos_sched_group kernel/sched/fair.c:7619 [inline]
BUG: KASAN: null-ptr-deref in free_fair_sched_group+0x124/0x320 kernel/sched/fair.c:12131
Read of size 8 at addr 0000000000000130 by task syz-executor100/223

CPU: 3 PID: 223 Comm: syz-executor100 Not tainted 5.10.0 #6
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace+0x0/0x40c arch/arm64/kernel/stacktrace.c:132
 show_stack+0x30/0x40 arch/arm64/kernel/stacktrace.c:196
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b4/0x248 lib/dump_stack.c:118
 __kasan_report mm/kasan/report.c:551 [inline]
 kasan_report+0x18c/0x210 mm/kasan/report.c:564
 check_memory_region_inline mm/kasan/generic.c:187 [inline]
 __asan_load8+0x98/0xc0 mm/kasan/generic.c:253
 rq_of kernel/sched/sched.h:1118 [inline]
 unthrottle_qos_sched_group kernel/sched/fair.c:7619 [inline]
 free_fair_sched_group+0x124/0x320 kernel/sched/fair.c:12131
 sched_free_group kernel/sched/core.c:7767 [inline]
 sched_create_group+0x48/0xc0 kernel/sched/core.c:7798
 cpu_cgroup_css_alloc+0x18/0x40 kernel/sched/core.c:7930
 css_create+0x7c/0x4a0 kernel/cgroup/cgroup.c:5328
 cgroup_apply_control_enable+0x288/0x340 kernel/cgroup/cgroup.c:3135
 cgroup_apply_control kernel/cgroup/cgroup.c:3217 [inline]
 cgroup_subtree_control_write+0x668/0x8b0 kernel/cgroup/cgroup.c:3375
 cgroup_file_write+0x1a8/0x37c kernel/cgroup/cgroup.c:3909
 kernfs_fop_write_iter+0x220/0x2f4 fs/kernfs/file.c:296
 call_write_iter include/linux/fs.h:1960 [inline]
 new_sync_write+0x260/0x370 fs/read_write.c:515
 vfs_write+0x3dc/0x4ac fs/read_write.c:602
 ksys_write+0xfc/0x200 fs/read_write.c:655
 __do_sys_write fs/read_write.c:667 [inline]
 __se_sys_write fs/read_write.c:664 [inline]
 __arm64_sys_write+0x50/0x60 fs/read_write.c:664
 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
 invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
 el0_svc_common.constprop.0+0xf4/0x414 arch/arm64/kernel/syscall.c:155
 do_el0_svc+0x50/0x11c arch/arm64/kernel/syscall.c:217
 el0_svc+0x20/0x30 arch/arm64/kernel/entry-common.c:353
 el0_sync_handler+0xe4/0x1e0 arch/arm64/kernel/entry-common.c:369
 el0_sync+0x148/0x180 arch/arm64/kernel/entry.S:683

So add check for tg->cfs_rq[i] before unthrottle_qos_sched_group() called.
Signed-off-by: NHui Tang <tanghui20@huawei.com>
Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

0d2df28e

21 7月, 2022 4 次提交

sched: Add statistics for scheduler dynamic affinity · ebca52ab

由 Hui Tang 提交于 7月 21, 2022

hulk inclusion
category: feature
bugzilla: 187173, https://gitee.com/openeuler/kernel/issues/I5G4IH
CVE: NA

--------------------------------
Signed-off-by: NHui Tang <tanghui20@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

ebca52ab

sched: Adjust cpu range in load balance dynamicly · 2af15a46

由 Hui Tang 提交于 7月 21, 2022

hulk inclusion
category: feature
bugzilla: 187173, https://gitee.com/openeuler/kernel/issues/I5G4IH
CVE: NA

--------------------------------
Signed-off-by: NHui Tang <tanghui20@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

2af15a46

sched: Adjust wakeup cpu range according CPU util dynamicly · 70a232a5

由 Hui Tang 提交于 7月 21, 2022

hulk inclusion
category: feature
bugzilla: 187173, https://gitee.com/openeuler/kernel/issues/I5G4IH
CVE: NA

--------------------------------

Compare taskgroup 'util_avg' in perferred cpu with capacity preferred cpu,
dynamicly adjust cpu range for task wakeup process.
Signed-off-by: NHui Tang <tanghui20@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

70a232a5

cpuset: Introduce new interface for scheduler dynamic affinity · 243865da

由 Hui Tang 提交于 7月 21, 2022

hulk inclusion
category: feature
bugzilla: 187173, https://gitee.com/openeuler/kernel/issues/I5G4IH
CVE: NA

--------------------------------

Add 'prefer_cpus' sysfs and related interface in cgroup cpuset.
Signed-off-by: NHui Tang <tanghui20@huawei.com>
Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

243865da

15 6月, 2022 2 次提交

Revert "sched: Fix sched_fork() access an invalid sched_task_group" · 44bf746f

由 Zhang Qiao 提交于 6月 15, 2022

hulk inclusion
category: bugfix
bugzilla: 186973, https://gitee.com/openeuler/kernel/issues/I5CA6K
CVE: NA

--------------------------------

This reverts commit 74bd9b82.
Signed-off-by: NZhang Qiao <zhangqiao22@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

44bf746f

Revert "sched: Fix yet more sched_fork() races" · 82a543fe

由 Zhang Qiao 提交于 6月 15, 2022

hulk inclusion
category: bugfix
bugzilla: 186973, https://gitee.com/openeuler/kernel/issues/I5CA6K
CVE: NA

--------------------------------

This reverts commit af98db5f.
the patch af98db5f("sched: Fix yet more sched_fork()") may
be cause a process sleep at cgroup_post_fork()->freezer_fork()
while taking group_threadgroup_rwsem lock long time, it cause
a problem that other tasks will wait while fork child process
and the system will stall.
Signed-off-by: NZhang Qiao <zhangqiao22@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

82a543fe

01 6月, 2022 2 次提交

sched/fair: Fix enqueue_task_fair() warning some more · 47a6e1c3

由 Phil Auld 提交于 6月 01, 2022

mainline inclusion
from mainline-v5.r7-rc6
commit b34cb07d
category: bugfix
bugzilla: 91404, https://gitee.com/openeuler/kernel/issues/I59VLJ
CVE: NA

--------------------------------

The recent patch, fe61468b (sched/fair: Fix enqueue_task_fair warning)
did not fully resolve the issues with the rq->tmp_alone_branch !=
&rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where
the first for_each_sched_entity loop exits due to on_rq, having incompletely
updated the list.  In this case the second for_each_sched_entity loop can
further modify se. The later code to fix up the list management fails to do
what is needed because se does not point to the sched_entity which broke out
of the first loop. The list is not fixed up because the throttled parent was
already added back to the list by a task enqueue in a parallel child hierarchy.

Address this by calling list_add_leaf_cfs_rq if there are throttled parents
while doing the second for_each_sched_entity loop.

Fixes: fe61468b ("sched/fair: Fix enqueue_task_fair warning")
Suggested-by: NVincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: NPhil Auld <pauld@redhat.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20200512135222.GC2201@lorien.usersys.redhat.comSigned-off-by: NHui Tang <tanghui20@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

47a6e1c3

sched/fair: Fix enqueue_task_fair warning · b66e423f

由 Vincent Guittot 提交于 6月 01, 2022

mainline inclusion
from mainline-v5.6-rc4
commit fe61468b
category: bugfix
bugzilla: 93902, https://gitee.com/openeuler/kernel/issues/I59VLJ
CVE: NA

--------------------------------

When a cfs rq is throttled, the latter and its child are removed from the
leaf list but their nr_running is not changed which includes staying higher
than 1. When a task is enqueued in this throttled branch, the cfs rqs must
be added back in order to ensure correct ordering in the list but this can
only happens if nr_running == 1.
When cfs bandwidth is used, we call unconditionnaly list_add_leaf_cfs_rq()
when enqueuing an entity to make sure that the complete branch will be
added.

Similarly unthrottle_cfs_rq() can stop adding cfs in the list when a parent
is throttled. Iterate the remaining entity to ensure that the complete
branch will be added in the list.
Reported-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Tested-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
Cc: stable@vger.kernel.org
Cc: stable@vger.kernel.org #v5.1+
Link: https://lkml.kernel.org/r/20200306135257.25044-1-vincent.guittot@linaro.orgSigned-off-by: NHui Tang <tanghui20@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

b66e423f

23 5月, 2022 2 次提交

sched/qos: Add qos_tg_{throttle,unthrottle}_{up,down} · 453eaea6

由 Zhang Qiao 提交于 5月 23, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4VZJT
CVE: NA

--------------------------------

1. Qos throttle reuse tg_{throttle,unthrottle}_{up,down} that
can write some cfs-bandwidth fields, it may cause some unknown
data error. So add qos_tg_{throttle,unthrottle}_{up,down} for
qos throttle.

2. walk_tg_tree_from() caller must hold rcu_lock, currently there is
none, so add it now.
Signed-off-by: NZhang Qiao <zhangqiao22@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

453eaea6

sched: Throttle offline task at tracehook_notify_resume() · 2701a7bb

由 Zhang Qiao 提交于 5月 23, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4VZJT
CVE: NA

--------------------------------

Before, when detect the cpu is overloaded, we throttle offline
tasks at exit_to_user_mode_loop() before returning to user mode.
Some architects(e.g.,arm64) do not support QOS scheduler because
a task do not via exit_to_user_mode_loop() return to userspace at
these platforms.
In order to slove this problem and support qos scheduler on all
architectures, if we require throttling offline tasks, we set flag
TIF_NOTIFY_RESUME to an offline task when it is picked and throttle
it at tracehook_notify_resume().
Signed-off-by: NZhang Qiao <zhangqiao22@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

2701a7bb

19 4月, 2022 2 次提交

sched: Fix yet more sched_fork() races · af98db5f

由 Peter Zijlstra 提交于 4月 19, 2022

mainline inclusion
from mainline-v5.17-rc5
commit b1e82065
category: bugfix
bugzilla: 186609, https://gitee.com/openeuler/kernel/issues/I532B0
CVE: NA

--------------------------------

Where commit 4ef0c5c6 ("kernel/sched: Fix sched_fork() access an
invalid sched_task_group") fixed a fork race vs cgroup, it opened up a
race vs syscalls by not placing the task on the runqueue before it
gets exposed through the pidhash.

Commit 13765de8 ("sched/fair: Fix fault in reweight_entity") is
trying to fix a single instance of this, instead fix the whole class
of issues, effectively reverting this commit.

Fixes: 4ef0c5c6 ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NTadeusz Struk <tadeusz.struk@linaro.org>
Tested-by: NZhang Qiao <zhangqiao22@huawei.com>
Tested-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lkml.kernel.org/r/YgoeCbwj5mbCR0qA@hirez.programming.kicks-ass.net

conflict:
	include/linux/sched/task.h
	kernel/fork.c
	kernel/sched/core.c
Signed-off-by: NZhang Qiao <zhangqiao22@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

af98db5f

sched/fair: Fix wrong cpu selecting from isolated domain · 742a0b5b

由 Xunlei Pang 提交于 4月 19, 2022

mainline inclusion
from mainline-v5.10-rc1
commit df3cb4ea
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I52QKB
CVE: NA

--------------------------------

We've met problems that occasionally tasks with full cpumask
(e.g. by putting it into a cpuset or setting to full affinity)
were migrated to our isolated cpus in production environment.

After some analysis, we found that it is due to the current
select_idle_smt() not considering the sched_domain mask.

Steps to reproduce on my 31-CPU hyperthreads machine:
1. with boot parameter: "isolcpus=domain,2-31"
   (thread lists: 0,16 and 1,17)
2. cgcreate -g cpu:test; cgexec -g cpu:test "test_threads"
3. some threads will be migrated to the isolated cpu16~17.

Fix it by checking the valid domain mask in select_idle_smt().

Fixes: 10e2f1ac ("sched/core: Rewrite and improve select_idle_siblings())
Reported-by: NWetp Zhang <wetp.zy@linux.alibaba.com>
Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NJiang Biao <benbjiang@tencent.com>
Reviewed-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/1600930127-76857-1-git-send-email-xlpang@linux.alibaba.com

Conflicts:
	kernel/sched/fair.c
Signed-off-by: NYu jiahua <yujiahua1@huawei.com>
Reviewed-by: Nzheng zucheng <zhengzucheng@huawei.com>
Reviewed-by: Nzheng zucheng <zhengzucheng@huawei.com>
Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

742a0b5b

02 4月, 2022 1 次提交

sched/fair: Add qos_throttle_list node in struct cfs_rq · fb59563c

由 Zhang Qiao 提交于 4月 02, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I50PPU
CVE: NA

-----------------------------------------------------------------

when unthrottle a cfs_rq at distribute_cfs_runtime(), another cpu
may re-throttle this cfs_rq at qos_throttle_cfs_rq() before access
the cfs_rq->throttle_list.next, but meanwhile, qos throttle will
attach the cfs_rq throttle_list node to percpu qos_throttled_cfs_rq,
it will change cfs_rq->throttle_list.next and cause panic or hardlockup
at distribute_cfs_runtime().

Fix it by adding a qos_throttle_list node in struct cfs_rq, and qos
throttle disuse the cfs_rq->throttle_list.
Signed-off-by: NZhang Qiao <zhangqiao22@huawei.com>
Reviewed-by: Nzheng zucheng <zhengzucheng@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

fb59563c

14 3月, 2022 2 次提交

cputime, cpuacct: Include guest time in user time in cpuacct.stat · 5b303f3d

由 Andrey Ryabinin 提交于 3月 14, 2022

stable inclusion
from linux-4.19.226
commit 952514c8565cf72a966993b473fae1708c3684f3

--------------------------------

commit 9731698e upstream.

cpuacct.stat in no-root cgroups shows user time without guest time
included int it. This doesn't match with user time shown in root
cpuacct.stat and /proc/<pid>/stat. This also affects cgroup2's cpu.stat
in the same way.

Make account_guest_time() to add user time to cgroup's cpustat to
fix this.

Fixes: ef12fefa ("cpuacct: add per-cgroup utime/stime statistics")
Signed-off-by: NAndrey Ryabinin <arbn@yandex-team.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
Acked-by: NTejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20211115164607.23784-1-arbn@yandex-team.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: NLaibin Qiu <qiulaibin@huawei.com>

5b303f3d

sched/rt: Try to restart rt period timer when rt runtime exceeded · cc97ecdb

由 Li Hua 提交于 3月 14, 2022

stable inclusion
from linux-4.19.226
commit e6bc7279b16517fab9ed3cdbd58ad7b08060c246

--------------------------------

[ Upstream commit 9b58e976 ]

When rt_runtime is modified from -1 to a valid control value, it may
cause the task to be throttled all the time. Operations like the following
will trigger the bug. E.g:

  1. echo -1 > /proc/sys/kernel/sched_rt_runtime_us
  2. Run a FIFO task named A that executes while(1)
  3. echo 950000 > /proc/sys/kernel/sched_rt_runtime_us

When rt_runtime is -1, The rt period timer will not be activated when task
A enqueued. And then the task will be throttled after setting rt_runtime to
950,000. The task will always be throttled because the rt period timer is
not activated.

Fixes: d0b27fa7 ("sched: rt-group: synchonised bandwidth period")
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NLi Hua <hucool.lihua@huawei.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20211203033618.11895-1-hucool.lihua@huawei.comSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: NLaibin Qiu <qiulaibin@huawei.com>

cc97ecdb

11 3月, 2022 1 次提交

sched: Fix sleeping in atomic context at cpu_qos_write() · 21544710

由 Zhang Qiao 提交于 3月 11, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4WOPM
CVE: NA

--------------------------------

cfs_bandwidth_usage_inc() need hold jump_label_mutex and
might sleep, so we can not call it in atomic context.
Fix this by moving cfs_bandwidth_usage_{inc,dec}() out of
rcu read critical section.

Fixes: f7b390cd ("sched: Change cgroup task scheduler policy")
Signed-off-by: NZhang Qiao <zhangqiao22@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NLaibin Qiu <qiulaibin@huawei.com>

21544710

openeuler / Kernel 10 个月 前同步成功

openeuler / Kernel
10 个月前同步成功