1. 30 6月, 2023 1 次提交
    • H
      sched: Fix null pointer derefrence for sd->span · 70dc4628
      Hui Tang 提交于
      hulk inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7HFZV
      CVE: NA
      
      ----------------------------------------
      
      There may be NULL pointer derefrence when hotplug running and
      creating taskgroup concurrently.
      
      sched_autogroup_create_attach
        -> sched_create_group
          -> alloc_fair_sched_group
            -> init_auto_affinity
              -> init_affinity_domains
                 -> cpumask_copy(xx, sched_domain_span(tmp))
                    { tmp may be free due rcu lock missing }
      
      { hotplug will rebuild sched domain }
      sched_cpu_activate
        -> build_sched_domains
          -> cpuset_cpu_active
            -> partition_sched_domains
              -> build_sched_domains
                -> cpu_attach_domain
                  -> destroy_sched_domains
                    -> call_rcu(&sd->rcu, destroy_sched_domains_rcu)
      
      So sd should be protect with rcu lock in entire critical zone.
      
      [  599.811593] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
      [  600.112821] pc : init_affinity_domains+0xf4/0x200
      [  600.125918] lr : init_affinity_domains+0xd4/0x200
      [  600.331355] Call trace:
      [  600.338734]  init_affinity_domains+0xf4/0x200
      [  600.347955]  init_auto_affinity+0x78/0xc0
      [  600.356622]  alloc_fair_sched_group+0xd8/0x210
      [  600.365594]  sched_create_group+0x48/0xc0
      [  600.373970]  sched_autogroup_create_attach+0x54/0x190
      [  600.383311]  ksys_setsid+0x110/0x130
      [  600.391014]  __arm64_sys_setsid+0x18/0x24
      [  600.399156]  el0_svc_common+0x118/0x170
      [  600.406818]  el0_svc_handler+0x3c/0x80
      [  600.414188]  el0_svc+0x8/0x640
      [  600.420719] Code: b40002c0 9104e002 f9402061 a9401444 (a9001424)
      [  600.430504] SMP: stopping secondary CPUs
      [  600.441751] Starting crashdump kernel...
      
      Fixes: 713cfd26 ("sched: Introduce smart grid scheduling strategy for cfs")
      Signed-off-by: NHui Tang <tanghui20@huawei.com>
      Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
      Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
      70dc4628
  2. 25 6月, 2023 2 次提交
  3. 20 6月, 2023 3 次提交
  4. 15 6月, 2023 5 次提交
    • H
      sched: Fix negative count for jump label · cde6dbb8
      Hui Tang 提交于
      hulk inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7DA63
      CVE: NA
      
      --------------------------------
      
      Add mutex lock to prevent negative count for jump label.
      
      [28612.530675] ------------[ cut here ]------------
      [28612.532708] jump label: negative count!
      [28612.535031] WARNING: CPU: 4 PID: 3899 at kernel/jump_label.c:202
      	__static_key_slow_dec_cpuslocked+0x204/0x240
      [28612.538216] Kernel panic - not syncing: panic_on_warn set ...
      [28612.538216]
      [28612.540487] CPU: 4 PID: 3899 Comm: sh Kdump: loaded Not tainted
      [28612.542788] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
      [28612.546455] Call Trace:
      [28612.547339]  dump_stack+0xc6/0x11e
      [28612.548546]  ? __static_key_slow_dec_cpuslocked+0x200/0x240
      [28612.550352]  panic+0x1d6/0x46b
      [28612.551375]  ? refcount_error_report+0x2a5/0x2a5
      [28612.552915]  ? kmsg_dump_rewind_nolock+0xde/0xde
      [28612.554358]  ? sched_clock_cpu+0x18/0x1b0
      [28612.555699]  ? __warn+0x1d1/0x210
      [28612.556799]  ? __static_key_slow_dec_cpuslocked+0x204/0x240
      [28612.558548]  __warn+0x1ec/0x210
      [28612.559621]  ? __static_key_slow_dec_cpuslocked+0x204/0x240
      [28612.561536]  report_bug+0x1ee/0x2b0
      [28612.562706]  fixup_bug.part.4+0x37/0x80
      [28612.563937]  do_error_trap+0x21c/0x260
      [28612.565109]  ? fixup_bug.part.4+0x80/0x80
      [28612.566453]  ? check_preemption_disabled+0x34/0x1f0
      [28612.567991]  ? trace_hardirqs_off_thunk+0x1a/0x1c
      [28612.569534]  ? lockdep_hardirqs_off+0x1cb/0x2b0
      [28612.570993]  ? error_entry+0x9a/0x130
      [28612.572138]  ? trace_hardirqs_off_caller+0x59/0x1a0
      [28612.573710]  ? trace_hardirqs_off_thunk+0x1a/0x1c
      [28612.575232]  invalid_op+0x14/0x20
      [root@lo[ca2lh8ost6 12.576387]  ? vprintk_func+0x68/0x1a0
      [28612.577827]  ? __static_key_slow_dec_cpuslocked+0x204/0x240
      smartg[ri2d]8# 612.579662]  ? __static_key_slow_dec_cpuslocked+0x204/0x240
      [28612.581781]  ? static_key_disable+0x30/0x30
      [28612.583248]  ? s
      tatic_key_slow_dec+0x57/0x90
      [28612.584997]  ? tg_set_dynamic_affinity_mode+0x42/0x70
      [28612.586714]  ? cgroup_file_write+0x471/0x6a0
      [28612.588162]  ? cgroup_css.part.4+0x100/0x100
      [28612.589579]  ? cgroup_css.part.4+0x100/0x100
      [28612.591031]  ? kernfs_fop_write+0x2af/0x430
      [28612.592625]  ? kernfs_vma_page_mkwrite+0x230/0x230
      [28612.594274]  ? __vfs_write+0xef/0x680
      [28612.595590]  ? kernel_read+0x110/0x110
      ea8612.596899]  ? check_preemption_disabled+0x3mkd4ir/: 0canxno1t fcr0
      Signed-off-by: NHui Tang <tanghui20@huawei.com>
      Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
      Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
      cde6dbb8
    • H
      sched: Fix possible deadlock in tg_set_dynamic_affinity_mode · 21e5d85e
      Hui Tang 提交于
      hulk inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7CGD0
      CVE: NA
      
      ----------------------------------------
      
      Deadlock occurs in two situations as follows:
      
      The first case:
      
      tg_set_dynamic_affinity_mode    --- raw_spin_lock_irq(&auto_affi->lock);
      	->start_auto_affintiy   --- trigger timer
      		->tg_update_task_prefer_cpus
      			>css_task_inter_next
      				->raw_spin_unlock_irq
      
      hr_timer_run_queues
        ->sched_auto_affi_period_timer --- try spin lock (&auto_affi->lock)
      
      The second case as follows:
      
      [  291.470810] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
      [  291.472715] rcu:     1-...0: (0 ticks this GP) idle=a6a/1/0x4000000000000002 softirq=78516/78516 fqs=5249
      [  291.475268] rcu:     (detected by 6, t=21006 jiffies, g=202169, q=9862)
      [  291.477038] Sending NMI from CPU 6 to CPUs 1:
      [  291.481268] NMI backtrace for cpu 1
      [  291.481273] CPU: 1 PID: 1923 Comm: sh Kdump: loaded Not tainted 4.19.90+ #150
      [  291.481278] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
      [  291.481281] RIP: 0010:queued_spin_lock_slowpath+0x136/0x9a0
      [  291.481289] Code: c0 74 3f 49 89 dd 48 89 dd 48 b8 00 00 00 00 00 fc ff df 49 c1 ed 03 83 e5 07 49 01 c5 83 c5 03 48 83 05 c4 66 b9 05 01 f3 90 <41> 0f b6 45 00 40 38 c5 7c 08 84 c0 0f 85 ad 07 00 00 0
      [  291.481292] RSP: 0018:ffff88801de87cd8 EFLAGS: 00000002
      [  291.481297] RAX: 0000000000000101 RBX: ffff888001be0a28 RCX: ffffffffb8090f7d
      [  291.481301] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff888001be0a28
      [  291.481304] RBP: 0000000000000003 R08: ffffed100037c146 R09: ffffed100037c146
      [  291.481307] R10: 000000001106b143 R11: ffffed100037c145 R12: 1ffff11003bd0f9c
      [  291.481311] R13: ffffed100037c145 R14: fffffbfff7a38dee R15: dffffc0000000000
      [  291.481315] FS:  00007fac4f306740(0000) GS:ffff88801de80000(0000) knlGS:0000000000000000
      [  291.481318] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  291.481321] CR2: 00007fac4f4bb650 CR3: 00000000046b6000 CR4: 00000000000006e0
      [  291.481323] Call Trace:
      [  291.481324]  <IRQ>
      [  291.481326]  ? osq_unlock+0x2a0/0x2a0
      [  291.481329]  ? check_preemption_disabled+0x4c/0x290
      [  291.481331]  ? rcu_accelerate_cbs+0x33/0xed0
      [  291.481333]  _raw_spin_lock_irqsave+0x83/0xa0
      [  291.481336]  sched_auto_affi_period_timer+0x251/0x820
      [  291.481338]  ? __remove_hrtimer+0x151/0x200
      [  291.481340]  __hrtimer_run_queues+0x39d/0xa50
      [  291.481343]  ? tg_update_affinity_domain_down+0x460/0x460
      [  291.481345]  ? enqueue_hrtimer+0x2e0/0x2e0
      [  291.481348]  ? ktime_get_update_offsets_now+0x1d7/0x2c0
      [  291.481350]  hrtimer_run_queues+0x243/0x470
      [  291.481352]  run_local_timers+0x5e/0x150
      [  291.481354]  update_process_times+0x36/0xb0
      [  291.481357]  tick_sched_handle.isra.4+0x7c/0x180
      [  291.481359]  tick_nohz_handler+0xd1/0x1d0
      [  291.481365]  smp_apic_timer_interrupt+0x12c/0x4e0
      [  291.481368]  apic_timer_interrupt+0xf/0x20
      [  291.481370]  </IRQ>
      [  291.481372]  ? smp_call_function_many+0x68c/0x840
      [  291.481375]  ? smp_call_function_many+0x6ab/0x840
      [  291.481377]  ? arch_unregister_cpu+0x60/0x60
      [  291.481379]  ? native_set_fixmap+0x100/0x180
      [  291.481381]  ? arch_unregister_cpu+0x60/0x60
      [  291.481384]  ? set_task_select_cpus+0x116/0x940
      [  291.481386]  ? smp_call_function+0x53/0xc0
      [  291.481388]  ? arch_unregister_cpu+0x60/0x60
      [  291.481390]  ? on_each_cpu+0x49/0xf0
      [  291.481393]  ? set_task_select_cpus+0x115/0x940
      [  291.481395]  ? text_poke_bp+0xff/0x180
      [  291.481397]  ? poke_int3_handler+0xc0/0xc0
      [  291.481400]  ? __set_prefer_cpus_ptr.constprop.4+0x1cd/0x900
      [  291.481402]  ? hrtick+0x1b0/0x1b0
      [  291.481404]  ? set_task_select_cpus+0x115/0x940
      [  291.481407]  ? __jump_label_transform.isra.0+0x3a1/0x470
      [  291.481409]  ? kernel_init+0x280/0x280
      [  291.481411]  ? kasan_check_read+0x1d/0x30
      [  291.481413]  ? mutex_lock+0x96/0x100
      [  291.481415]  ? __mutex_lock_slowpath+0x30/0x30
      [  291.481418]  ? arch_jump_label_transform+0x52/0x80
      [  291.481420]  ? set_task_select_cpus+0x115/0x940
      [  291.481422]  ? __jump_label_update+0x1a1/0x1e0
      [  291.481424]  ? jump_label_update+0x2ee/0x3b0
      [  291.481427]  ? static_key_slow_inc_cpuslocked+0x1c8/0x2d0
      [  291.481430]  ? start_auto_affinity+0x190/0x200
      [  291.481432]  ? tg_set_dynamic_affinity_mode+0xad/0xf0
      [  291.481435]  ? cpu_affinity_mode_write_u64+0x22/0x30
      [  291.481437]  ? cgroup_file_write+0x46f/0x660
      [  291.481439]  ? cgroup_init_cftypes+0x300/0x300
      [  291.481441]  ? __mutex_lock_slowpath+0x30/0x30
      Signed-off-by: NHui Tang <tanghui20@huawei.com>
      Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
      Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
      21e5d85e
    • H
      sched: fix WARN found by deadlock detect · 217edab9
      Hui Tang 提交于
      hulk inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7BQZ0
      CVE: NA
      
      ----------------------------------------
      
      The WARNING report when run:
      echo 1 > /sys/fs/cgroup/cpu/cpu.dynamic_affinity_mode
      
      [  147.276757] WARNING: CPU: 5 PID: 1770 at kernel/cpu.c:326 \
      	lockdep_assert_cpus_held+0xac/0xd0
      [  147.279670] Kernel panic - not syncing: panic_on_warn set ...
      [  147.279670]
      [  147.282211] CPU: 5 PID: 1770 Comm: bash Kdump: loaded Not tainted 4.19
      [  147.284796] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)..
      [  147.290963] Call Trace:
      [  147.292459]  dump_stack+0xc6/0x11e
      [  147.294295]  ? lockdep_assert_cpus_held+0xa0/0xd0
      [  147.296876]  panic+0x1d6/0x46b
      [  147.298591]  ? refcount_error_report+0x2a5/0x2a5
      [  147.301131]  ? kmsg_dump_rewind_nolock+0xde/0xde
      [  147.303738]  ? sched_clock_cpu+0x18/0x1b0
      [  147.305943]  ? __warn+0x1d1/0x210
      [  147.307831]  ? lockdep_assert_cpus_held+0xac/0xd0
      [  147.310469]  __warn+0x1ec/0x210
      [  147.312271]  ? lockdep_assert_cpus_held+0xac/0xd0
      [  147.314838]  report_bug+0x1ee/0x2b0
      [  147.316798]  fixup_bug.part.4+0x37/0x80
      [  147.318946]  do_error_trap+0x21c/0x260
      [  147.321062]  ? fixup_bug.part.4+0x80/0x80
      [  147.323253]  ? check_preemption_disabled+0x34/0x1f0
      [  147.324886]  ? trace_hardirqs_off_thunk+0x1a/0x1c
      [  147.326277]  ? lockdep_hardirqs_off+0x1cb/0x2b0
      [  147.327505]  ? error_entry+0x9a/0x130
      [  147.328523]  ? trace_hardirqs_off_caller+0x59/0x1a0
      [  147.329844]  ? trace_hardirqs_off_thunk+0x1a/0x1c
      [  147.331124]  invalid_op+0x14/0x20
      [  147.332057]  ? vprintk_func+0x68/0x1a0
      [  147.333082]  ? lockdep_assert_cpus_held+0xac/0xd0
      [  147.334355]  ? lockdep_assert_cpus_held+0xac/0xd0
      [  147.335624]  ? static_key_slow_inc_cpuslocked+0x5a/0x230
      [  147.337079]  ? tg_set_dynamic_affinity_mode+0x4f/0x70
      [  147.338444]  ? cgroup_file_write+0x471/0x6a0
      [  147.339604]  ? cgroup_css.part.4+0x100/0x100
      [  147.340782]  ? cgroup_css.part.4+0x100/0x100
      [  147.341943]  ? kernfs_fop_write+0x2af/0x430
      [  147.343083]  ? kernfs_vma_page_mkwrite+0x230/0x230
      [  147.344401]  ? __vfs_write+0xef/0x680
      [  147.345404]  ? kernel_read+0x110/0x110
      Signed-off-by: NHui Tang <tanghui20@huawei.com>
      Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
      Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
      217edab9
    • H
      sched: fix smart grid usage count · d9099163
      Hui Tang 提交于
      hulk inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7D98G
      CVE: NA
      
      ----------------------------------------
      
      smart_grid_usage_dec() should called when free taskgroup
      if the mode is auto.
      Signed-off-by: NHui Tang <tanghui20@huawei.com>
      Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
      Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
      d9099163
    • H
      sched: Add static key to reduce noise · 373fd236
      Hui Tang 提交于
      hulk inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7A718
      
      --------------------------------
      
      Add static key to reduce noise when not enable dynamic affinity.
      There are better performance in some case, such for lmbench.
      
      Fixes: 243865da ("cpuset: Introduce new interface for scheduler ...")
      Signed-off-by: NHui Tang <tanghui20@huawei.com>
      Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
      Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
      373fd236
  5. 09 6月, 2023 2 次提交
    • W
      sched: smart grid: init sched_grid_qos structure on QOS purpose · ce35ded5
      Wang ShaoBo 提交于
      hulk inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7BQZ0
      CVE: NA
      
      ----------------------------------------
      
      As smart grid scheduling (SGS) may shrink resources and affect task QOS,
      We provide methods for evaluating task QOS in divided grid, we mainly
      focus on the following two aspects:
      
         1. Evaluate whether (such as CPU or memory) resources meet our demand
         2. Ensure the least impact when working with (cpufreq and cpuidle) governors
      
      For tackling this questions, we have summarized several sampling methods
      to obtain tasks' characteristics at same time reducing scheduling noise
      as much as possible:
      
        1. we detected the key factors that how sensitive a process is in cpufreq
           or cpuidle adjustment, and to guide the cpufreq/cpuidle governor
        2. We dynamically monitor process memory bandwidth and adjust memory
           allocation to minimize cross-remote memory access
        3. We provide a variety of load tracking mechanisms to adapt to different
           types of task's load change
      
           ---------------------------------     -----------------
          |            class A              |   |     class B     |
          |    --------        --------     |   |     --------    |
          |   | group0 |      | group1 |    |---|    | group2 |   |----------+
          |    --------        --------     |   |     --------    |          |
          |    CPU/memory sensitive type    |   |   balance type  |          |
           ----------------+----------------     --------+--------           |
                           v                             v                   | (target cpufreq)
           -------------------------------------------------------           | (sensitivity)
          |              Not satisfied with QOS?                  |          |
           --------------------------+----------------------------           |
                                     v                                       v
           -------------------------------------------------------     ----------------
          |              expand or shrink resource                |<--|  energy model  |
           ----------------------------+--------------------------     ----------------
                                       v                                     |
           -----------          -----------          ------------            v
          |           |        |           |        |            |     ---------------
          |   GRID0   +--------+   GRID1   +--------+   GRID2    |<-- |   governor    |
          |           |        |           |        |            |     ---------------
           -----------          -----------          ------------
                         \            |            /
                          \  -------------------  /
                            |  pages migration  |
                             -------------------
      
      We will introduce the energy model in the follow-up implementation, and consider
      the dynamic affinity adjustment between each divided grid in the runtime.
      Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
      Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
      ce35ded5
    • H
      sched: Introduce smart grid scheduling strategy for cfs · 713cfd26
      Hui Tang 提交于
      hulk inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7BQZ0
      CVE: NA
      
      ----------------------------------------
      
      We want to dynamically expand or shrink the affinity range of tasks
      based on the CPU topology level while meeting the minimum resource
      requirements of tasks.
      
      We divide several level of affinity domains according to sched domains:
      
      level4   * SOCKET  [                                                  ]
      level3   * DIE     [                             ]
      level2   * MC      [             ] [             ]
      level1   * SMT     [     ] [     ] [     ] [     ]
      level0   * CPU      0   1   2   3   4   5   6   7
      
      Whether users tend to choose power saving or performance will affect
      strategy of adjusting affinity, when selecting the power saving mode,
      we will choose a more appropriate affinity based on the energy model
      to reduce power consumption, while considering the QOS of resources
      such as CPU and memory consumption, for instance, if the current task
      CPU load is less than required, smart grid will judge whether to aggregate
      tasks together into a smaller range or not according to energy model.
      
      The main difference from EAS is that we pay more attention to the impact
      of power consumption brought by such as cpuidle and DVFS, and classify
      tasks to reduce interference and ensure resource QOS in each divided unit,
      which are more suitable for general-purpose on non-heterogeneous CPUs.
      
              --------        --------        --------
             | group0 |      | group1 |      | group2 |
              --------        --------        --------
      	   |                |              |
      	   v                |              v
             ---------------------+-----     -----------------
            |                  ---v--   |   |
            |       DIE0      |  MC1 |  |   |   DIE1
            |                  ------   |   |
             ---------------------------     -----------------
      
      We regularly count the resource satisfaction of groups, and adjust the
      affinity, scheduling balance and migrating memory will be considered
      based on memory location for better meetting resource requirements.
      Signed-off-by: NHui Tang <tanghui20@huawei.com>
      Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
      Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
      Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
      Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
      713cfd26
  6. 06 4月, 2023 3 次提交
  7. 24 12月, 2022 1 次提交
  8. 16 12月, 2022 1 次提交
  9. 15 8月, 2022 1 次提交
    • H
      sched: Fix null-ptr-deref in free_fair_sched_group · 0d2df28e
      Hui Tang 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 187419, https://gitee.com/openeuler/kernel/issues/I5LIPL
      CVE: NA
      
      -------------------------------
      
      do_el0_svc+0x50/0x11c arch/arm64/kernel/syscall.c:217
       el0_svc+0x20/0x30 arch/arm64/kernel/entry-common.c:353
       el0_sync_handler+0xe4/0x1e0 arch/arm64/kernel/entry-common.c:369
       el0_sync+0x148/0x180 arch/arm64/kernel/entry.S:683
      
      ==================================================================
      BUG: KASAN: null-ptr-deref in rq_of kernel/sched/sched.h:1118 [inline]
      BUG: KASAN: null-ptr-deref in unthrottle_qos_sched_group kernel/sched/fair.c:7619 [inline]
      BUG: KASAN: null-ptr-deref in free_fair_sched_group+0x124/0x320 kernel/sched/fair.c:12131
      Read of size 8 at addr 0000000000000130 by task syz-executor100/223
      
      CPU: 3 PID: 223 Comm: syz-executor100 Not tainted 5.10.0 #6
      Hardware name: linux,dummy-virt (DT)
      Call trace:
       dump_backtrace+0x0/0x40c arch/arm64/kernel/stacktrace.c:132
       show_stack+0x30/0x40 arch/arm64/kernel/stacktrace.c:196
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1b4/0x248 lib/dump_stack.c:118
       __kasan_report mm/kasan/report.c:551 [inline]
       kasan_report+0x18c/0x210 mm/kasan/report.c:564
       check_memory_region_inline mm/kasan/generic.c:187 [inline]
       __asan_load8+0x98/0xc0 mm/kasan/generic.c:253
       rq_of kernel/sched/sched.h:1118 [inline]
       unthrottle_qos_sched_group kernel/sched/fair.c:7619 [inline]
       free_fair_sched_group+0x124/0x320 kernel/sched/fair.c:12131
       sched_free_group kernel/sched/core.c:7767 [inline]
       sched_create_group+0x48/0xc0 kernel/sched/core.c:7798
       cpu_cgroup_css_alloc+0x18/0x40 kernel/sched/core.c:7930
       css_create+0x7c/0x4a0 kernel/cgroup/cgroup.c:5328
       cgroup_apply_control_enable+0x288/0x340 kernel/cgroup/cgroup.c:3135
       cgroup_apply_control kernel/cgroup/cgroup.c:3217 [inline]
       cgroup_subtree_control_write+0x668/0x8b0 kernel/cgroup/cgroup.c:3375
       cgroup_file_write+0x1a8/0x37c kernel/cgroup/cgroup.c:3909
       kernfs_fop_write_iter+0x220/0x2f4 fs/kernfs/file.c:296
       call_write_iter include/linux/fs.h:1960 [inline]
       new_sync_write+0x260/0x370 fs/read_write.c:515
       vfs_write+0x3dc/0x4ac fs/read_write.c:602
       ksys_write+0xfc/0x200 fs/read_write.c:655
       __do_sys_write fs/read_write.c:667 [inline]
       __se_sys_write fs/read_write.c:664 [inline]
       __arm64_sys_write+0x50/0x60 fs/read_write.c:664
       __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
       el0_svc_common.constprop.0+0xf4/0x414 arch/arm64/kernel/syscall.c:155
       do_el0_svc+0x50/0x11c arch/arm64/kernel/syscall.c:217
       el0_svc+0x20/0x30 arch/arm64/kernel/entry-common.c:353
       el0_sync_handler+0xe4/0x1e0 arch/arm64/kernel/entry-common.c:369
       el0_sync+0x148/0x180 arch/arm64/kernel/entry.S:683
      
      So add check for tg->cfs_rq[i] before unthrottle_qos_sched_group() called.
      Signed-off-by: NHui Tang <tanghui20@huawei.com>
      Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      0d2df28e
  10. 21 7月, 2022 3 次提交
  11. 01 6月, 2022 2 次提交
  12. 23 5月, 2022 2 次提交
  13. 19 4月, 2022 1 次提交
  14. 02 4月, 2022 1 次提交
  15. 29 11月, 2021 8 次提交
  16. 17 9月, 2021 1 次提交
  17. 02 8月, 2021 1 次提交
  18. 19 7月, 2021 1 次提交
  19. 30 6月, 2021 1 次提交
    • O
      sched/fair: Fix unfairness caused by missing load decay · 909ad0a4
      Odin Ugedal 提交于
      stable inclusion
      from linux-4.19.191
      commit 434ea8c1d1bf296a2597aeb28f6ccf62ae82f235
      
      --------------------------------
      
      [ Upstream commit 0258bdfa ]
      
      This fixes an issue where old load on a cfs_rq is not properly decayed,
      resulting in strange behavior where fairness can decrease drastically.
      Real workloads with equally weighted control groups have ended up
      getting a respective 99% and 1%(!!) of cpu time.
      
      When an idle task is attached to a cfs_rq by attaching a pid to a cgroup,
      the old load of the task is attached to the new cfs_rq and sched_entity by
      attach_entity_cfs_rq. If the task is then moved to another cpu (and
      therefore cfs_rq) before being enqueued/woken up, the load will be moved
      to cfs_rq->removed from the sched_entity. Such a move will happen when
      enforcing a cpuset on the task (eg. via a cgroup) that force it to move.
      
      The load will however not be removed from the task_group itself, making
      it look like there is a constant load on that cfs_rq. This causes the
      vruntime of tasks on other sibling cfs_rq's to increase faster than they
      are supposed to; causing severe fairness issues. If no other task is
      started on the given cfs_rq, and due to the cpuset it would not happen,
      this load would never be properly unloaded. With this patch the load
      will be properly removed inside update_blocked_averages. This also
      applies to tasks moved to the fair scheduling class and moved to another
      cpu, and this path will also fix that. For fork, the entity is queued
      right away, so this problem does not affect that.
      
      This applies to cases where the new process is the first in the cfs_rq,
      issue introduced 3d30544f ("sched/fair: Apply more PELT fixes"), and
      when there has previously been load on the cgroup but the cgroup was
      removed from the leaflist due to having null PELT load, indroduced
      in 039ae8bc ("sched/fair: Fix O(nr_cgroups) in the load balancing
      path").
      
      For a simple cgroup hierarchy (as seen below) with two equally weighted
      groups, that in theory should get 50/50 of cpu time each, it often leads
      to a load of 60/40 or 70/30.
      
      parent/
        cg-1/
          cpu.weight: 100
          cpuset.cpus: 1
        cg-2/
          cpu.weight: 100
          cpuset.cpus: 1
      
      If the hierarchy is deeper (as seen below), while keeping cg-1 and cg-2
      equally weighted, they should still get a 50/50 balance of cpu time.
      This however sometimes results in a balance of 10/90 or 1/99(!!) between
      the task groups.
      
      $ ps u -C stress
      USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
      root       18568  1.1  0.0   3684   100 pts/12   R+   13:36   0:00 stress --cpu 1
      root       18580 99.3  0.0   3684   100 pts/12   R+   13:36   0:09 stress --cpu 1
      
      parent/
        cg-1/
          cpu.weight: 100
          sub-group/
            cpu.weight: 1
            cpuset.cpus: 1
        cg-2/
          cpu.weight: 100
          sub-group/
            cpu.weight: 10000
            cpuset.cpus: 1
      
      This can be reproduced by attaching an idle process to a cgroup and
      moving it to a given cpuset before it wakes up. The issue is evident in
      many (if not most) container runtimes, and has been reproduced
      with both crun and runc (and therefore docker and all its "derivatives"),
      and with both cgroup v1 and v2.
      
      Fixes: 3d30544f ("sched/fair: Apply more PELT fixes")
      Fixes: 039ae8bc ("sched/fair: Fix O(nr_cgroups) in the load balancing path")
      Signed-off-by: NOdin Ugedal <odin@uged.al>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NVincent Guittot <vincent.guittot@linaro.org>
      Link: https://lkml.kernel.org/r/20210501141950.23622-2-odin@uged.alSigned-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      909ad0a4