1. 01 6月, 2023 28 次提交
  2. 30 5月, 2023 12 次提交
    • O
      !795 sched/fair: Introduce multiple qos level · c4fb2bc6
      openeuler-ci-bot 提交于
      Merge Pull Request from: @zhaowenhui8 
       
      Expand qos_level from {-1,0} to [-2, 2], to distinguish the tasks expected
      to be with extremely high or low priority level. Using qos_level_weight
      to reweight the shares when calculating group's weight. Meanwhile,
      set offline task's schedule policy to SCHED_IDLE so that it can be
      preempted at check_preempt_wakeup.
      
      kernel option:
      CONFIG_QOS_SCHED_MULTILEVEL 
       
      Link:https://gitee.com/openeuler/kernel/pulls/795 
      
      Reviewed-by: Zucheng Zheng <zhengzucheng@huawei.com> 
      Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      c4fb2bc6
    • O
      !850 Fix race condition in __percpu_counter_sum() function within cpu hotplug · 623763f1
      openeuler-ci-bot 提交于
      Merge Pull Request from: @henryze 
       
      The dying CPU has been removed from the online_mask, but the hotplug notifier have not been called to fold the percpu count into the global counter sum.
      This race condition is avoided by including the dying CPU in the iteration mask. 
       
      Link:https://gitee.com/openeuler/kernel/pulls/850 
      
      Reviewed-by: Wei Li <liwei391@huawei.com> 
      Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      623763f1
    • O
      !849 drivers/cpufreq: gain accurate CPU frequency from cpufreq/cpuinfo_cur_freq · f1189855
      openeuler-ci-bot 提交于
      Merge Pull Request from: @henryze 
       
      When users want to get frequency by cpuinfo_cur_freq under cpufreq sysfs,
      they often get the invalid result like:
      
      $ cat /sys/devices/system/cpu/cpu6/cpufreq/cpuinfo_cur_freq
      4294967295
      
      So this series provides fixes to the concerned issue.
      
      Reference: https://lore.kernel.org/all/20230516133248.712242-3-zengheng4@huawei.com/ 
       
      Link:https://gitee.com/openeuler/kernel/pulls/849 
      
      Reviewed-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> 
      Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      f1189855
    • O
      !773 Compiler: Add value profile support for kernel. · dec74be4
      openeuler-ci-bot 提交于
      Merge Pull Request from: @xiongzhou4 
       
      Provides value profile support for kernel.
      The implementation is based on the existing GCOV feature of the kernel. When the option is opened, the GCOV option `-fprofile-arcs` is changed to `-fprofile-generate`. The latter includes the former and value profile, which can provide more comprehensive feedback directed optimization ability.
      The added feature is called  _PGO kernel_ , which can be used to improve the performance of a single application runtime environment.
      
      kernel option(default is n):
      CONFIG_PGO_KERNEL=y 
       
      Link:https://gitee.com/openeuler/kernel/pulls/773 
      
      Reviewed-by: Liu Chao <liuchao173@huawei.com> 
      Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      Acked-by: Xie XiuQi <xiexiuqi@huawei.com> 
      dec74be4
    • O
      !842 net: hns3: add support for Hisilicon ptp sync device · 866cc5dd
      openeuler-ci-bot 提交于
      Merge Pull Request from: @svishen 
       
      This pull Requests support hns3 driver provide ptp driver to get 1588 clock from ethernet.
      But only the first PF on main chip can support this, so if getting ptp time from other chip, 
      may have some bus latency. The PTP sync device use to eliminate the bus latency.
      
      issue:
      https://gitee.com/openeuler/kernel/issues/I78MGV 
       
      Link:https://gitee.com/openeuler/kernel/pulls/842 
      
      Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      866cc5dd
    • O
      !844 A patchset of sched to improve benchmark performance · df9cfeee
      openeuler-ci-bot 提交于
      Merge Pull Request from: @NNNNicole 
       
      1.sched/pelt: Relax the sync of *_sum with *_avg (patch1-patch3)
      2.Adjust NUMA imbalance for multiple LLCs(patch4-patch6)
      3.sched: Queue task on wakelist in the same llc if the wakee cpu is idle(patch7)
      4.Clear ttwu_pending after enqueue_task(patch8)
       
       
      Link:https://gitee.com/openeuler/kernel/pulls/844 
      
      Reviewed-by: Zucheng Zheng <zhengzucheng@huawei.com> 
      Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      df9cfeee
    • O
      !837 Backport bugfixes for RDMA/hns · 162d1b0b
      openeuler-ci-bot 提交于
      Merge Pull Request from: @stinft 
       
      #I76PY9 
      #I76PUJ 
      #I76PRT  
       
      Link:https://gitee.com/openeuler/kernel/pulls/837 
      
      Reviewed-by: Chengchang Tang <tangchengchang@huawei.com> 
      Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      162d1b0b
    • X
      GCC: Add value profile support for kernel. · 2872514e
      xiongzhou4 提交于
      GCC inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I734PM
      
      ---------------------------------
      
      This feature add value profile support for kernel by changing GCOV
      option "-fprofile-arcs" to "-fprofile-generate" when the new added
      config "PGO_KERNEL" is set to y.
      
      Like GCOV, the symbols required by value profile are migrated from
      GCC source codes as they cannot be linked to kernel. Specifically,
      from libgcc/libgcov-profiler.c to kernel/gcov/gcc_base.c.
      
      kernel options:
      CONFIG_PGO_KERNEL=y
      Signed-off-by: NXiong Zhou <xiongzhou4@huawei.com>
      Reviewed-by: NLi Yancheng <liyancheng@huawei.com>
      2872514e
    • O
      !803 ACC support no-sva feature · edb5d824
      openeuler-ci-bot 提交于
      Merge Pull Request from: @xiao_jiang_shui 
       
      ACC support no-sva feature
      issue:https://gitee.com/openeuler/kernel/issues/I773SD
       
       
      Link:https://gitee.com/openeuler/kernel/pulls/803 
      
      Reviewed-by: Yang Shen <shenyang39@huawei.com> 
      Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      edb5d824
    • Z
      sched/fair: Introduce multiple qos level · c51ad919
      Zhao Wenhui 提交于
      hulk inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I737X1
      
      -------------------------------
      
      Expand qos_level from {-1,0} to [-2, 2], to distinguish the tasks expected
      to be with extremely high or low priority level. Using qos_level_weight
      to reweight the shares when calculating group's weight. Meanwhile,
      set offline task's schedule policy to SCHED_IDLE so that it can be
      preempted at check_preempt_wakeup.
      Signed-off-by: NZhao Wenhui <zhaowenhui8@huawei.com>
      c51ad919
    • T
      sched: Clear ttwu_pending after enqueue_task() · a6dcd26f
      Tianchen Ding 提交于
      mainline inclusion
      from mainline-v6.2-rc1
      commit d6962c4f
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I78WM8
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.4-rc3&id=d6962c4fe8f96f7d384d6489b6b5ab5bf3e35991
      
      --------------------------------
      
      We found a long tail latency in schbench whem m*t is close to nr_cpus.
      (e.g., "schbench -m 2 -t 16" on a machine with 32 cpus.)
      
      This is because when the wakee cpu is idle, rq->ttwu_pending is cleared
      too early, and idle_cpu() will return true until the wakee task enqueued.
      This will mislead the waker when selecting idle cpu, and wake multiple
      worker threads on the same wakee cpu. This situation is enlarged by
      commit f3dd3f67 ("sched: Remove the limitation of WF_ON_CPU on
      wakelist if wakee cpu is idle") because it tends to use wakelist.
      
      Here is the result of "schbench -m 2 -t 16" on a VM with 32vcpu
      (Intel(R) Xeon(R) Platinum 8369B).
      
      Latency percentiles (usec):
                      base      base+revert_f3dd3f67   base+this_patch
      50.0000th:         9                            13                 9
      75.0000th:        12                            19                12
      90.0000th:        15                            22                15
      95.0000th:        18                            24                17
      *99.0000th:       27                            31                24
      99.5000th:      3364                            33                27
      99.9000th:     12560                            36                30
      
      We also tested on unixbench and hackbench, and saw no performance
      change.
      Signed-off-by: NTianchen Ding <dtcccc@linux.alibaba.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Link: https://lkml.kernel.org/r/20221104023601.12844-1-dtcccc@linux.alibaba.com
      a6dcd26f
    • G
      sched: Remove the limitation of WF_ON_CPU on wakelist if wakee cpu is idle · 588d8f44
      Guan Jing 提交于
      mainline inclusion
      from mainline-v6.0-rc1
      commit f3dd3f67
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I78WM8
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.4-rc3&id=f3dd3f674555bd9455c5ae7fafce0696bd9931b3
      
      --------------------------------
      
      Wakelist can help avoid cache bouncing and offload the overhead of waker
      cpu. So far, using wakelist within the same llc only happens on
      WF_ON_CPU, and this limitation could be removed to further improve
      wakeup performance.
      
      The commit 518cd623 ("sched: Only queue remote wakeups when
      crossing cache boundaries") disabled queuing tasks on wakelist when
      the cpus share llc. This is because, at that time, the scheduler must
      send IPIs to do ttwu_queue_wakelist. Nowadays, ttwu_queue_wakelist also
      supports TIF_POLLING, so this is not a problem now when the wakee cpu is
      in idle polling.
      
      Benefits:
        Queuing the task on idle cpu can help improving performance on waker cpu
        and utilization on wakee cpu, and further improve locality because
        the wakee cpu can handle its own rq. This patch helps improving rt on
        our real java workloads where wakeup happens frequently.
      
        Consider the normal condition (CPU0 and CPU1 share same llc)
        Before this patch:
      
               CPU0                                       CPU1
      
          select_task_rq()                                idle
          rq_lock(CPU1->rq)
          enqueue_task(CPU1->rq)
          notify CPU1 (by sending IPI or CPU1 polling)
      
                                                          resched()
      
        After this patch:
      
               CPU0                                       CPU1
      
          select_task_rq()                                idle
          add to wakelist of CPU1
          notify CPU1 (by sending IPI or CPU1 polling)
      
                                                          rq_lock(CPU1->rq)
                                                          enqueue_task(CPU1->rq)
                                                          resched()
      
        We see CPU0 can finish its work earlier. It only needs to put task to
        wakelist and return.
        While CPU1 is idle, so let itself handle its own runqueue data.
      
      This patch brings no difference about IPI.
        This patch only takes effect when the wakee cpu is:
        1) idle polling
        2) idle not polling
      
        For 1), there will be no IPI with or without this patch.
      
        For 2), there will always be an IPI before or after this patch.
        Before this patch: waker cpu will enqueue task and check preempt. Since
        "idle" will be sure to be preempted, waker cpu must send a resched IPI.
        After this patch: waker cpu will put the task to the wakelist of wakee
        cpu, and send an IPI.
      
      Benchmark:
      We've tested schbench, unixbench, and hachbench on both x86 and arm64.
      
      On x86 (Intel Xeon Platinum 8269CY):
        schbench -m 2 -t 8
      
          Latency percentiles (usec)              before        after
              50.0000th:                             8            6
              75.0000th:                            10            7
              90.0000th:                            11            8
              95.0000th:                            12            8
              *99.0000th:                           13           10
              99.5000th:                            15           11
              99.9000th:                            18           14
      
        Unixbench with full threads (104)
                                                  before        after
          Dhrystone 2 using register variables  3011862938    3009935994  -0.06%
          Double-Precision Whetstone              617119.3      617298.5   0.03%
          Execl Throughput                         27667.3       27627.3  -0.14%
          File Copy 1024 bufsize 2000 maxblocks   785871.4      784906.2  -0.12%
          File Copy 256 bufsize 500 maxblocks     210113.6      212635.4   1.20%
          File Copy 4096 bufsize 8000 maxblocks  2328862.2     2320529.1  -0.36%
          Pipe Throughput                      145535622.8   145323033.2  -0.15%
          Pipe-based Context Switching           3221686.4     3583975.4  11.25%
          Process Creation                        101347.1      103345.4   1.97%
          Shell Scripts (1 concurrent)            120193.5      123977.8   3.15%
          Shell Scripts (8 concurrent)             17233.4       17138.4  -0.55%
          System Call Overhead                   5300604.8     5312213.6   0.22%
      
        hackbench -g 1 -l 100000
                                                  before        after
          Time                                     3.246        2.251
      
      On arm64 (Ampere Altra):
        schbench -m 2 -t 8
      
          Latency percentiles (usec)              before        after
              50.0000th:                            14           10
              75.0000th:                            19           14
              90.0000th:                            22           16
              95.0000th:                            23           16
              *99.0000th:                           24           17
              99.5000th:                            24           17
              99.9000th:                            28           25
      
        Unixbench with full threads (80)
                                                  before        after
          Dhrystone 2 using register variables  3536194249    3537019613   0.02%
          Double-Precision Whetstone              629383.6      629431.6   0.01%
          Execl Throughput                         65920.5       65846.2  -0.11%
          File Copy 1024 bufsize 2000 maxblocks  1063722.8     1064026b.8   0.03%
          File Copy 256 bufsize 500 maxblocks     322684.5      318724.5  -1.23%
          File Copy 4096 bufsize 8000 maxblocks  2348285.3     2328804.8  -0.83%
          Pipe Throughput                      133542875.3   131619389.8  -1.44%
          Pipe-based Context Switching           3215356.1     3576945.1  11.25%
          Process Creation                        108520.5      120184.6  10.75%
          Shell Scripts (1 concurrent)            122636.3        121888  -0.61%
          Shell Scripts (8 concurrent)             17462.1       17381.4  -0.46%
          System Call Overhead                   4429998.9     44350061.7   0.11%
      
        hackbench -g 1 -l 100000
                                                  before        after
          Time                                     4.217        2.916
      
      Our patch has improvement on schbench, hackbench
      and Pipe-based Context Switching of unixbench
      when there exists idle cpus,
      and no obvious regression on other tests of unixbench.
      This can help improve rt in scenes where wakeup happens frequently.
      Signed-off-by: NTianchen Ding <dtcccc@linux.alibaba.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NValentin Schneider <vschneid@redhat.com>
      Link: https://lore.kernel.org/r/20220608233412.327341-3-dtcccc@linux.alibaba.comSigned-off-by: NGuan Jing <guanjing6@huawei.com>
      588d8f44