提交 · 1c617ac5419b2160f86ae0999b782b3f95befa24 · openeuler / Kernel

31 5月, 2023 7 次提交

md: add a mutex to synchronize idle and frozen in action_store() · 1c617ac5

由 Yu Kuai 提交于 5月 31, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6OMCC
CVE: NA

--------------------------------

Currently, for idle and frozen, action_store will hold 'reconfig_mutex'
and call md_reap_sync_thread() to stop sync thread, however, this will
cause deadlock (explained in the next patch). In order to fix the
problem, following patch will release 'reconfig_mutex' and wait on
'resync_wait', like md_set_readonly() and do_md_stop() does.

Consider that action_store() will set/clear 'MD_RECOVERY_FROZEN'
unconditionally, which might cause unexpected problems, for example,
frozen just set 'MD_RECOVERY_FROZEN' and is still in progress, while
'idle' clear 'MD_RECOVERY_FROZEN' and new sync thread is started, which
might starve in progress frozen.

This patch add a mutex to synchronize idle and frozen from
action_store().
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

1c617ac5

md: refactor action_store() for 'idle' and 'frozen' · e98a235f

由 Yu Kuai 提交于 5月 31, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6OMCC
CVE: NA

--------------------------------

Prepare to handle 'idle' and 'frozen' differently to fix a deadlock, there
are no functional changes except that MD_RECOVERY_RUNNING is checked
again after 'reconfig_mutex' is held.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

e98a235f

Revert "md: unlock mddev before reap sync_thread in action_store" · 4a53e631

由 Yu Kuai 提交于 5月 31, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6OMCC
CVE: NA

--------------------------------

This reverts commit 9dfbdafd.

Because it will introduce a defect that sync_thread can be running while
MD_RECOVERY_RUNNING is cleared, which will cause some unexpected problems,
for example:

list_add corruption. prev->next should be next (ffff0001ac1daba0), but was ffff0000ce1a02a0. (prev=ffff0000ce1a02a0).
Call trace:
 __list_add_valid+0xfc/0x140
 insert_work+0x78/0x1a0
 __queue_work+0x500/0xcf4
 queue_work_on+0xe8/0x12c
 md_check_recovery+0xa34/0xf30
 raid10d+0xb8/0x900 [raid10]
 md_thread+0x16c/0x2cc
 kthread+0x1a4/0x1ec
 ret_from_fork+0x10/0x18

This is because work is requeued while it's still inside workqueue:

t1:			t2:
action_store
 mddev_lock
  if (mddev->sync_thread)
   mddev_unlock
   md_unregister_thread
   // first sync_thread is done
			md_check_recovery
			 mddev_try_lock
			 /*
			  * once MD_RECOVERY_DONE is set, new sync_thread
			  * can start.
			  */
			 set_bit(MD_RECOVERY_RUNNING, &mddev->recovery)
			 INIT_WORK(&mddev->del_work, md_start_sync)
			 queue_work(md_misc_wq, &mddev->del_work)
			  test_and_set_bit(WORK_STRUCT_PENDING_BIT, ...)
			  // set pending bit
			  insert_work
			   list_add_tail
			 mddev_unlock
   mddev_lock_nointr
   md_reap_sync_thread
   // MD_RECOVERY_RUNNING is cleared
 mddev_unlock

t3:

// before queued work started from t2
md_check_recovery
 // MD_RECOVERY_RUNNING is not set, a new sync_thread can be started
 INIT_WORK(&mddev->del_work, md_start_sync)
  work->data = 0
  // work pending bit is cleared
 queue_work(md_misc_wq, &mddev->del_work)
  insert_work
   list_add_tail
   // list is corrupted

This patch revert the commit to fix the problem, the deadlock this
commit tries to fix will be fixed in following patches.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Signed-off-by: NSong Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230322064122.2384589-2-yukuai1@huaweicloud.comReviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

4a53e631

md: unlock mddev before reap sync_thread in action_store · dd3bd170

由 Guoqing Jiang 提交于 5月 31, 2023

mainline inclusion
from mainline-v6.0-rc1
commit 9dfbdafd
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6OMCC
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.3-rc3&id=9dfbdafda3b34e262e43e786077bab8e476a89d1

--------------------------------

Since the bug which commit 8b48ec23 ("md: don't unregister sync_thread
with reconfig_mutex held") fixed is related with action_store path, other
callers which reap sync_thread didn't need to be changed.

Let's pull md_unregister_thread from md_reap_sync_thread, then fix previous
bug with belows.

1. unlock mddev before md_reap_sync_thread in action_store.
2. save reshape_position before unlock, then restore it to ensure position
   not changed accidentally by others.
Signed-off-by: NGuoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: NSong Liu <song@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

dd3bd170

block: fix wrong mode for blkdev_put() from disk_scan_partitions() · 7058c39d

由 Yu Kuai 提交于 5月 31, 2023

mainline inclusion
from mainline-v6.3-rc2
commit 428913bc
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6MQLP
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e5cfefa97bccf956ea0bb6464c1f6c84fd7a8d9f

--------------------------------

If disk_scan_partitions() is called with 'FMODE_EXCL',
blkdev_get_by_dev() will be called without 'FMODE_EXCL', however, follow
blkdev_put() is still called with 'FMODE_EXCL', which will cause
'bd_holders' counter to leak.

Fix the problem by using the right mode for blkdev_put().

Reported-by: syzbot+2bcc0d79e548c4f62a59@syzkaller.appspotmail.com
Link: https://lore.kernel.org/lkml/f9649d501bc8c3444769418f6c26263555d9d3be.camel@linux.ibm.com/T/Tested-by: NJulian Ruess <julianr@linux.ibm.com>
Fixes: e5cfefa9 ("block: fix scan partition for exclusively open device again")
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

7058c39d

block: fix scan partition for exclusively open device again · 8f9c8fc5

由 Yu Kuai 提交于 5月 31, 2023

mainline inclusion
from mainline-v6.3-rc1
commit e5cfefa9
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6MQLP
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e5cfefa97bccf956ea0bb6464c1f6c84fd7a8d9f

--------------------------------

As explained in commit 36369f46 ("block: Do not reread partition table
on exclusively open device"), reread partition on the device that is
exclusively opened by someone else is problematic.

This patch will make sure partition scan will only be proceed if current
thread open the device exclusively, or the device is not opened
exclusively, and in the later case, other scanners and exclusive openers
will be blocked temporarily until partition scan is done.

Fixes: 10c70d95 ("block: remove the bd_openers checks in blk_drop_partitions")
Cc: <stable@vger.kernel.org>
Suggested-by: NJan Kara <jack@suse.cz>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230217022200.3092987-3-yukuai1@huaweicloud.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

Conflicts:
	block/genhd.c
	block/ioctl.c
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

8f9c8fc5

block: merge disk_scan_partitions and blkdev_reread_part · 1c0b1b48

由 Christoph Hellwig 提交于 5月 31, 2023

mainline inclusion
from mainline-v5.17-rc1
commit e16e506c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6MQLP
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e16e506ccd673a3a888a34f8f694698305840044

--------------------------------

Unify the functionality that implements a partition rescan for a
gendisk.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

Conflicts:
	block/blk.h
	block/genhd.c
	block/ioctl.c
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

1c0b1b48

30 5月, 2023 29 次提交

!795 sched/fair: Introduce multiple qos level · c4fb2bc6

由 openeuler-ci-bot 提交于 5月 30, 2023

Merge Pull Request from: @zhaowenhui8 
 
Expand qos_level from {-1,0} to [-2, 2], to distinguish the tasks expected
to be with extremely high or low priority level. Using qos_level_weight
to reweight the shares when calculating group's weight. Meanwhile,
set offline task's schedule policy to SCHED_IDLE so that it can be
preempted at check_preempt_wakeup.

kernel option:
CONFIG_QOS_SCHED_MULTILEVEL 
 
Link:https://gitee.com/openeuler/kernel/pulls/795 

Reviewed-by: Zucheng Zheng <zhengzucheng@huawei.com> 
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>

c4fb2bc6

!850 Fix race condition in __percpu_counter_sum() function within cpu hotplug · 623763f1

由 openeuler-ci-bot 提交于 5月 30, 2023

Merge Pull Request from: @henryze

The dying CPU has been removed from the online_mask, but the hotplug notifier have not been called to fold the percpu count into the global counter sum.
This race condition is avoided by including the dying CPU in the iteration mask.

Link:https://gitee.com/openeuler/kernel/pulls/850

Reviewed-by: Wei Li <liwei391@huawei.com>
Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>

623763f1

!849 drivers/cpufreq: gain accurate CPU frequency from cpufreq/cpuinfo_cur_freq · f1189855

由 openeuler-ci-bot 提交于 5月 30, 2023

Merge Pull Request from: @henryze 
 
When users want to get frequency by cpuinfo_cur_freq under cpufreq sysfs,
they often get the invalid result like:

$ cat /sys/devices/system/cpu/cpu6/cpufreq/cpuinfo_cur_freq
4294967295

So this series provides fixes to the concerned issue.

Reference: https://lore.kernel.org/all/20230516133248.712242-3-zengheng4@huawei.com/ 
 
Link:https://gitee.com/openeuler/kernel/pulls/849 

Reviewed-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> 
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>

f1189855

!773 Compiler: Add value profile support for kernel. · dec74be4

由 openeuler-ci-bot 提交于 5月 30, 2023

Merge Pull Request from: @xiongzhou4

Provides value profile support for kernel.
The implementation is based on the existing GCOV feature of the kernel. When the option is opened, the GCOV option `-fprofile-arcs` is changed to `-fprofile-generate`. The latter includes the former and value profile, which can provide more comprehensive feedback directed optimization ability.
The added feature is called _PGO kernel_ , which can be used to improve the performance of a single application runtime environment.

kernel option(default is n):
CONFIG_PGO_KERNEL=y

Link:https://gitee.com/openeuler/kernel/pulls/773

Reviewed-by: Liu Chao <liuchao173@huawei.com>
Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Acked-by: Xie XiuQi <xiexiuqi@huawei.com>

dec74be4

!842 net: hns3: add support for Hisilicon ptp sync device · 866cc5dd

由 openeuler-ci-bot 提交于 5月 30, 2023

Merge Pull Request from: @svishen 
 
This pull Requests support hns3 driver provide ptp driver to get 1588 clock from ethernet.
But only the first PF on main chip can support this, so if getting ptp time from other chip, 
may have some bus latency. The PTP sync device use to eliminate the bus latency.

issue:
https://gitee.com/openeuler/kernel/issues/I78MGV 
 
Link:https://gitee.com/openeuler/kernel/pulls/842 

Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>

866cc5dd

!844 A patchset of sched to improve benchmark performance · df9cfeee

由 openeuler-ci-bot 提交于 5月 30, 2023

Merge Pull Request from: @NNNNicole 
 
1.sched/pelt: Relax the sync of *_sum with *_avg (patch1-patch3)
2.Adjust NUMA imbalance for multiple LLCs(patch4-patch6)
3.sched: Queue task on wakelist in the same llc if the wakee cpu is idle(patch7)
4.Clear ttwu_pending after enqueue_task(patch8)
 
 
Link:https://gitee.com/openeuler/kernel/pulls/844 

Reviewed-by: Zucheng Zheng <zhengzucheng@huawei.com> 
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>

df9cfeee

!837 Backport bugfixes for RDMA/hns · 162d1b0b

由 openeuler-ci-bot 提交于 5月 30, 2023

Merge Pull Request from: @stinft 
 
#I76PY9 
#I76PUJ 
#I76PRT  
 
Link:https://gitee.com/openeuler/kernel/pulls/837 

Reviewed-by: Chengchang Tang <tangchengchang@huawei.com> 
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>

162d1b0b

GCC: Add value profile support for kernel. · 2872514e

由 xiongzhou4 提交于 5月 16, 2023

GCC inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I734PM

---------------------------------

This feature add value profile support for kernel by changing GCOV
option "-fprofile-arcs" to "-fprofile-generate" when the new added
config "PGO_KERNEL" is set to y.

Like GCOV, the symbols required by value profile are migrated from
GCC source codes as they cannot be linked to kernel. Specifically,
from libgcc/libgcov-profiler.c to kernel/gcov/gcc_base.c.

kernel options:
CONFIG_PGO_KERNEL=y
Signed-off-by: NXiong Zhou <xiongzhou4@huawei.com>
Reviewed-by: NLi Yancheng <liyancheng@huawei.com>

2872514e

!803 ACC support no-sva feature · edb5d824

由 openeuler-ci-bot 提交于 5月 30, 2023

Merge Pull Request from: @xiao_jiang_shui 
 
ACC support no-sva feature
issue：https://gitee.com/openeuler/kernel/issues/I773SD
 
 
Link:https://gitee.com/openeuler/kernel/pulls/803 

Reviewed-by: Yang Shen <shenyang39@huawei.com> 
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>

edb5d824

sched/fair: Introduce multiple qos level · c51ad919

由 Zhao Wenhui 提交于 5月 30, 2023

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I737X1

-------------------------------

Expand qos_level from {-1,0} to [-2, 2], to distinguish the tasks expected
to be with extremely high or low priority level. Using qos_level_weight
to reweight the shares when calculating group's weight. Meanwhile,
set offline task's schedule policy to SCHED_IDLE so that it can be
preempted at check_preempt_wakeup.
Signed-off-by: NZhao Wenhui <zhaowenhui8@huawei.com>

c51ad919

sched: Clear ttwu_pending after enqueue_task() · a6dcd26f

由 Tianchen Ding 提交于 5月 28, 2023

mainline inclusion
from mainline-v6.2-rc1
commit d6962c4f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I78WM8

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.4-rc3&id=d6962c4fe8f96f7d384d6489b6b5ab5bf3e35991

--------------------------------

We found a long tail latency in schbench whem m*t is close to nr_cpus.
(e.g., "schbench -m 2 -t 16" on a machine with 32 cpus.)

This is because when the wakee cpu is idle, rq->ttwu_pending is cleared
too early, and idle_cpu() will return true until the wakee task enqueued.
This will mislead the waker when selecting idle cpu, and wake multiple
worker threads on the same wakee cpu. This situation is enlarged by
commit f3dd3f67 ("sched: Remove the limitation of WF_ON_CPU on
wakelist if wakee cpu is idle") because it tends to use wakelist.

Here is the result of "schbench -m 2 -t 16" on a VM with 32vcpu
(Intel(R) Xeon(R) Platinum 8369B).

Latency percentiles (usec):
                base      base+revert_f3dd3f67   base+this_patch
50.0000th:         9                            13                 9
75.0000th:        12                            19                12
90.0000th:        15                            22                15
95.0000th:        18                            24                17
*99.0000th:       27                            31                24
99.5000th:      3364                            33                27
99.9000th:     12560                            36                30

We also tested on unixbench and hackbench, and saw no performance
change.
Signed-off-by: NTianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NMel Gorman <mgorman@suse.de>
Link: https://lkml.kernel.org/r/20221104023601.12844-1-dtcccc@linux.alibaba.com

a6dcd26f

sched: Remove the limitation of WF_ON_CPU on wakelist if wakee cpu is idle · 588d8f44

由 Guan Jing 提交于 5月 28, 2023

mainline inclusion
from mainline-v6.0-rc1
commit f3dd3f67
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I78WM8

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.4-rc3&id=f3dd3f674555bd9455c5ae7fafce0696bd9931b3

--------------------------------

Wakelist can help avoid cache bouncing and offload the overhead of waker
cpu. So far, using wakelist within the same llc only happens on
WF_ON_CPU, and this limitation could be removed to further improve
wakeup performance.

The commit 518cd623 ("sched: Only queue remote wakeups when
crossing cache boundaries") disabled queuing tasks on wakelist when
the cpus share llc. This is because, at that time, the scheduler must
send IPIs to do ttwu_queue_wakelist. Nowadays, ttwu_queue_wakelist also
supports TIF_POLLING, so this is not a problem now when the wakee cpu is
in idle polling.

Benefits:
  Queuing the task on idle cpu can help improving performance on waker cpu
  and utilization on wakee cpu, and further improve locality because
  the wakee cpu can handle its own rq. This patch helps improving rt on
  our real java workloads where wakeup happens frequently.

  Consider the normal condition (CPU0 and CPU1 share same llc)
  Before this patch:

         CPU0                                       CPU1

    select_task_rq()                                idle
    rq_lock(CPU1->rq)
    enqueue_task(CPU1->rq)
    notify CPU1 (by sending IPI or CPU1 polling)

                                                    resched()

  After this patch:

         CPU0                                       CPU1

    select_task_rq()                                idle
    add to wakelist of CPU1
    notify CPU1 (by sending IPI or CPU1 polling)

                                                    rq_lock(CPU1->rq)
                                                    enqueue_task(CPU1->rq)
                                                    resched()

  We see CPU0 can finish its work earlier. It only needs to put task to
  wakelist and return.
  While CPU1 is idle, so let itself handle its own runqueue data.

This patch brings no difference about IPI.
  This patch only takes effect when the wakee cpu is:
  1) idle polling
  2) idle not polling

  For 1), there will be no IPI with or without this patch.

  For 2), there will always be an IPI before or after this patch.
  Before this patch: waker cpu will enqueue task and check preempt. Since
  "idle" will be sure to be preempted, waker cpu must send a resched IPI.
  After this patch: waker cpu will put the task to the wakelist of wakee
  cpu, and send an IPI.

Benchmark:
We've tested schbench, unixbench, and hachbench on both x86 and arm64.

On x86 (Intel Xeon Platinum 8269CY):
  schbench -m 2 -t 8

    Latency percentiles (usec)              before        after
        50.0000th:                             8            6
        75.0000th:                            10            7
        90.0000th:                            11            8
        95.0000th:                            12            8
        *99.0000th:                           13           10
        99.5000th:                            15           11
        99.9000th:                            18           14

  Unixbench with full threads (104)
                                            before        after
    Dhrystone 2 using register variables  3011862938    3009935994  -0.06%
    Double-Precision Whetstone              617119.3      617298.5   0.03%
    Execl Throughput                         27667.3       27627.3  -0.14%
    File Copy 1024 bufsize 2000 maxblocks   785871.4      784906.2  -0.12%
    File Copy 256 bufsize 500 maxblocks     210113.6      212635.4   1.20%
    File Copy 4096 bufsize 8000 maxblocks  2328862.2     2320529.1  -0.36%
    Pipe Throughput                      145535622.8   145323033.2  -0.15%
    Pipe-based Context Switching           3221686.4     3583975.4  11.25%
    Process Creation                        101347.1      103345.4   1.97%
    Shell Scripts (1 concurrent)            120193.5      123977.8   3.15%
    Shell Scripts (8 concurrent)             17233.4       17138.4  -0.55%
    System Call Overhead                   5300604.8     5312213.6   0.22%

  hackbench -g 1 -l 100000
                                            before        after
    Time                                     3.246        2.251

On arm64 (Ampere Altra):
  schbench -m 2 -t 8

    Latency percentiles (usec)              before        after
        50.0000th:                            14           10
        75.0000th:                            19           14
        90.0000th:                            22           16
        95.0000th:                            23           16
        *99.0000th:                           24           17
        99.5000th:                            24           17
        99.9000th:                            28           25

  Unixbench with full threads (80)
                                            before        after
    Dhrystone 2 using register variables  3536194249    3537019613   0.02%
    Double-Precision Whetstone              629383.6      629431.6   0.01%
    Execl Throughput                         65920.5       65846.2  -0.11%
    File Copy 1024 bufsize 2000 maxblocks  1063722.8     1064026b.8   0.03%
    File Copy 256 bufsize 500 maxblocks     322684.5      318724.5  -1.23%
    File Copy 4096 bufsize 8000 maxblocks  2348285.3     2328804.8  -0.83%
    Pipe Throughput                      133542875.3   131619389.8  -1.44%
    Pipe-based Context Switching           3215356.1     3576945.1  11.25%
    Process Creation                        108520.5      120184.6  10.75%
    Shell Scripts (1 concurrent)            122636.3        121888  -0.61%
    Shell Scripts (8 concurrent)             17462.1       17381.4  -0.46%
    System Call Overhead                   4429998.9     44350061.7   0.11%

  hackbench -g 1 -l 100000
                                            before        after
    Time                                     4.217        2.916

Our patch has improvement on schbench, hackbench
and Pipe-based Context Switching of unixbench
when there exists idle cpus,
and no obvious regression on other tests of unixbench.
This can help improve rt in scenes where wakeup happens frequently.
Signed-off-by: NTianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NValentin Schneider <vschneid@redhat.com>
Link: https://lore.kernel.org/r/20220608233412.327341-3-dtcccc@linux.alibaba.comSigned-off-by: NGuan Jing <guanjing6@huawei.com>

588d8f44

sched/fair: Fix kabi borken in sched_domain · 00d7e686

由 Guan Jing 提交于 5月 28, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I78WM8
CVE: NA

--------------------------------
Signed-off-by: NGuan Jing <guanjing6@huawei.com>

00d7e686

sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans multiple LLCs · edd5e1ef

由 Guan Jing 提交于 5月 28, 2023

mainline inclusion
from mainline-v5.18-rc1
commit e496132e
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I78WM8

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.4-rc3&id=e496132ebedd870b67f1f6d2428f9bb9d7ae27fd

--------------------------------

Commit 7d2b5dd0 ("sched/numa: Allow a floating imbalance between NUMA
nodes") allowed an imbalance between NUMA nodes such that communicating
tasks would not be pulled apart by the load balancer. This works fine when
there is a 1:1 relationship between LLC and node but can be suboptimal
for multiple LLCs if independent tasks prematurely use CPUs sharing cache.

Zen* has multiple LLCs per node with local memory channels and due to
the allowed imbalance, it's far harder to tune some workloads to run
optimally than it is on hardware that has 1 LLC per node. This patch
allows an imbalance to exist up to the point where LLCs should be balanced
between nodes.

On a Zen3 machine running STREAM parallelised with OMP to have on instance
per LLC the results and without binding, the results are

5.17.0-rc0 5.17.0-rc0
vanilla sched-numaimb-v6
MB/sec copy-16 162596.94 ( 0.00%) 580559.74 ( 257.05%)
MB/sec scale-16 136901.28 ( 0.00%) 374450.52 ( 173.52%)
MB/sec add-16 157300.70 ( 0.00%) 564113.76 ( 258.62%)
MB/sec triad-16 151446.88 ( 0.00%) 564304.24 ( 272.61%)

STREAM can use directives to force the spread if the OpenMP is new
enough but that doesn't help if an application uses threads and
it's not known in advance how many threads will be created.

Coremark is a CPU and cache intensive benchmark parallelised with
threads. When running with 1 thread per core, the vanilla kernel
allows threads to contend on cache. With the patch;

5.17.0-rc0 5.17.0-rc0
vanilla sched-numaimb-v5
Min Score-16 368239.36 ( 0.00%) 389816.06 ( 5.86%)
Hmean Score-16 388607.33 ( 0.00%) 427877.08 * 10.11%*
Max Score-16 408945.69 ( 0.00%) 481022.17 ( 17.62%)
Stddev Score-16 15247.04 ( 0.00%) 24966.82 ( -63.75%)
CoeffVar Score-16 3.92 ( 0.00%) 5.82 ( -48.48%)

It can also make a big difference for semi-realistic workloads
like specjbb which can execute arbitrary numbers of threads without
advance knowledge of how they should be placed. Even in cases where
the average performance is neutral, the results are more stable.

5.17.0-rc0 5.17.0-rc0
vanilla sched-numaimb-v6
Hmean tput-1 71631.55 ( 0.00%) 73065.57 ( 2.00%)
Hmean tput-8 582758.78 ( 0.00%) 556777.23 ( -4.46%)
Hmean tput-16 1020372.75 ( 0.00%) 1009995.26 ( -1.02%)
Hmean tput-24 1416430.67 ( 0.00%) 1398700.11 ( -1.25%)
Hmean tput-32 1687702.72 ( 0.00%) 1671357.04 ( -0.97%)
Hmean tput-40 1798094.90 ( 0.00%) 2015616.46 * 12.10%*
Hmean tput-48 1972731.77 ( 0.00%) 2333233.72 ( 18.27%)
Hmean tput-56 2386872.38 ( 0.00%) 2759483.38 ( 15.61%)
Hmean tput-64 2909475.33 ( 0.00%) 2925074.69 ( 0.54%)
Hmean tput-72 2585071.36 ( 0.00%) 2962443.97 ( 14.60%)
Hmean tput-80 2994387.24 ( 0.00%) 3015980.59 ( 0.72%)
Hmean tput-88 3061408.57 ( 0.00%) 3010296.16 ( -1.67%)
Hmean tput-96 3052394.82 ( 0.00%) 2784743.41 ( -8.77%)
Hmean tput-104 2997814.76 ( 0.00%) 2758184.50 ( -7.99%)
Hmean tput-112 2955353.29 ( 0.00%) 2859705.09 ( -3.24%)
Hmean tput-120 2889770.71 ( 0.00%) 2764478.46 ( -4.34%)
Hmean tput-128 2871713.84 ( 0.00%) 2750136.73 ( -4.23%)
Stddev tput-1 5325.93 ( 0.00%) 2002.53 ( 62.40%)
Stddev tput-8 6630.54 ( 0.00%) 10905.00 ( -64.47%)
Stddev tput-16 25608.58 ( 0.00%) 6851.16 ( 73.25%)
Stddev tput-24 12117.69 ( 0.00%) 4227.79 ( 65.11%)
Stddev tput-32 27577.16 ( 0.00%) 8761.05 ( 68.23%)
Stddev tput-40 59505.86 ( 0.00%) 2048.49 ( 96.56%)
Stddev tput-48 168330.30 ( 0.00%) 93058.08 ( 44.72%)
Stddev tput-56 219540.39 ( 0.00%) 30687.02 ( 86.02%)
Stddev tput-64 121750.35 ( 0.00%) 9617.36 ( 92.10%)
Stddev tput-72 223387.05 ( 0.00%) 34081.13 ( 84.74%)
Stddev tput-80 128198.46 ( 0.00%) 22565.19 ( 82.40%)
Stddev tput-88 136665.36 ( 0.00%) 27905.97 ( 79.58%)
Stddev tput-96 111925.81 ( 0.00%) 99615.79 ( 11.00%)
Stddev tput-104 146455.96 ( 0.00%) 28861.98 ( 80.29%)
Stddev tput-112 88740.49 ( 0.00%) 58288.23 ( 34.32%)
Stddev tput-120 186384.86 ( 0.00%) 45812.03 ( 75.42%)
Stddev tput-128 78761.09 ( 0.00%) 57418.48 ( 27.10%)

Similarly, for embarassingly parallel problems like NPB-ep, there are
improvements due to better spreading across LLC when the machine is not
fully utilised.

vanilla sched-numaimb-v6
Min ep.D 31.79 ( 0.00%) 26.11 ( 17.87%)
Amean ep.D 31.86 ( 0.00%) 26.17 * 17.86%*
Stddev ep.D 0.07 ( 0.00%) 0.05 ( 24.41%)
CoeffVar ep.D 0.22 ( 0.00%) 0.20 ( 7.97%)
Max ep.D 31.93 ( 0.00%) 26.21 ( 17.91%)
Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NGautham R. Shenoy <gautham.shenoy@amd.com>
Tested-by: NK Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20220208094334.16379-3-mgorman@techsingularity.netSigned-off-by: NGuan Jing <guanjing6@huawei.com>

edd5e1ef

sched/fair: Improve consistency of allowed NUMA balance calculations · 8f6acf75

由 Mel Gorman 提交于 5月 28, 2023

mainline inclusion
from mainline-v5.18-rc1
commit 2cfb7a1b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I78WM8

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.4-rc3&id=2cfb7a1b031b0e816af7a6ee0c6ab83b0acdf05a

--------------------------------

There are inconsistencies when determining if a NUMA imbalance is allowed
that should be corrected.

o allow_numa_imbalance changes types and is not always examining
  the destination group so both the type should be corrected as
  well as the naming.
o find_idlest_group uses the sched_domain's weight instead of the
  group weight which is different to find_busiest_group
o find_busiest_group uses the source group instead of the destination
  which is different to task_numa_find_cpu
o Both find_idlest_group and find_busiest_group should account
  for the number of running tasks if a move was allowed to be
  consistent with task_numa_find_cpu

Fixes: 7d2b5dd0 ("sched/numa: Allow a floating imbalance between NUMA nodes")
Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NGautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20220208094334.16379-2-mgorman@techsingularity.netSigned-off-by: NGuan Jing <guanjing6@huawei.com>

8f6acf75

sched/pelt: Relax the sync of load_sum with load_avg · 928668d2

由 Guan Jing 提交于 5月 28, 2023

mainline inclusion
from mainline-v5.17-rc2
commit 2d02fa8c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I78WM8

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.4-rc3&id=2d02fa8cc21a93da35cfba462bf8ab87bf2db651

--------------------------------

Similarly to util_avg and util_sum, don't sync load_sum with the low
bound of load_avg but only ensure that load_sum stays in the correct range.
Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: NSachin Sant <sachinp@linux.ibm.com>
Link: https://lkml.kernel.org/r/20220111134659.24961-5-vincent.guittot@linaro.orgSigned-off-by: NGuan Jing <guanjing6@huawei.com>

928668d2

pcpcntrs: fix dying cpu summation race · b020fa54

由 Dave Chinner 提交于 3月 15, 2023

mainline inclusion
from mainline-v6.3-rc4
commit 8b57b11c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6VS35

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8b57b11cca88f397035a95b9e12b03511847b0e8

--------------------------------

In commit f689054a ("percpu_counter: add percpu_counter_sum_all
interface") a race condition between a cpu dying and
percpu_counter_sum() iterating online CPUs was identified. The
solution was to iterate all possible CPUs for summation via
percpu_counter_sum_all().

We recently had a percpu_counter_sum() call in XFS trip over this
same race condition and it fired a debug assert because the
filesystem was unmounting and the counter *should* be zero just
before we destroy it. That was reported here:

https://lore.kernel.org/linux-kernel/20230314090649.326642-1-yebin@huaweicloud.com/

likely as a result of running generic/648 which exercises
filesystems in the presence of CPU online/offline events.

The solution to use percpu_counter_sum_all() is an awful one. We
use percpu counters and percpu_counter_sum() for accurate and
reliable threshold detection for space management, so a summation
race condition during these operations can result in overcommit of
available space and that may result in filesystem shutdowns.

As percpu_counter_sum_all() iterates all possible CPUs rather than
just those online or even those present, the mask can include CPUs
that aren't even installed in the machine, or in the case of
machines that can hot-plug CPU capable nodes, even have physical
sockets present in the machine.

Fundamentally, this race condition is caused by the CPU being
offlined being removed from the cpu_online_mask before the notifier
that cleans up per-cpu state is run. Hence percpu_counter_sum() will
not sum the count for a cpu currently being taken offline,
regardless of whether the notifier has run or not. This is
the root cause of the bug.

The percpu counter notifier iterates all the registered counters,
locks the counter and moves the percpu count to the global sum.
This is serialised against other operations that move the percpu
counter to the global sum as well as percpu_counter_sum() operations
that sum the percpu counts while holding the counter lock.

Hence the notifier is safe to run concurrently with sum operations,
and the only thing we actually need to care about is that
percpu_counter_sum() iterates dying CPUs. That's trivial to do,
and when there are no CPUs dying, it has no addition overhead except
for a cpumask_or() operation.

This change makes percpu_counter_sum() always do the right thing in
the presence of CPU hot unplug events and makes
percpu_counter_sum_all() unnecessary. This, in turn, means that
filesystems like XFS, ext4, and btrfs don't have to work out when
they should use percpu_counter_sum() vs percpu_counter_sum_all() in
their space accounting algorithms

conflicts:
lib/percpu_counter.c
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NZeng Heng <zengheng4@huawei.com>

b020fa54

cpumask: introduce for_each_cpu_or · 0cba0556

由 Dave Chinner 提交于 3月 15, 2023

mainline inclusion
from mainline-v6.3-rc4
commit 1470afef
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6VS35

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1470afefc3c42df5d1662f87d079b46651bdc95b

--------------------------------

Equivalent of for_each_cpu_and, except it ORs the two masks together
so it iterates all the CPUs present in either mask.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NZeng Heng <zengheng4@huawei.com>

0cba0556

lib: extend the scope of small_const_nbits() macro · 5f14daa6

由 Yury Norov 提交于 5月 06, 2021

mainline inclusion
from mainline-v5.13-rc1
commit 586eaebe
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6VS35

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=586eaebea5988302c5a8b018096dd6c6f4564940

--------------------------------

find_bit would also benefit from small_const_nbits() optimizations.  The
detailed comment is provided by Rasmus Villemoes.

Link: https://lkml.kernel.org/r/20210401003153.97325-6-yury.norov@gmail.comSuggested-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NYury Norov <yury.norov@gmail.com>
Acked-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: NAndy Shevchenko <andy.shevchenko@gmail.com>
Cc: Alexey Klimov <aklimov@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Joe Perches <joe@perches.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NZeng Heng <zengheng4@huawei.com>

5f14daa6

cpumask: Introduce DYING mask · 50612a26

由 Peter Zijlstra 提交于 1月 19, 2021

mainline inclusion
from mainline-v5.13-rc1
commit e40f74c5
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6VS35

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e40f74c535b8a0ecf3ef0388b51a34cdadb34fb5

--------------------------------

Introduce a cpumask that indicates (for each CPU) what direction the
CPU hotplug is currently going. Notably, it tracks rollbacks. Eg. when
an up fails and we do a roll-back down, it will accurately reflect the
direction.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NValentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210310150109.151441252@infradead.orgSigned-off-by: NZeng Heng <zengheng4@huawei.com>

50612a26

sched/pelt: Relax the sync of runnable_sum with runnable_avg · 30123568

由 Vincent Guittot 提交于 5月 28, 2023

mainline inclusion
from mainline-v5.17-rc2
commit 95246d1e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I78WM8

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.4-rc3&id=95246d1ec80b8d19d882cd8eb7ad094e63b41bb8

--------------------------------

Similarly to util_avg and util_sum, don't sync runnable_sum with the low
bound of runnable_avg but only ensure that runnable_sum stays in the
correct range.
Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: NSachin Sant <sachinp@linux.ibm.com>
Link: https://lkml.kernel.org/r/20220111134659.24961-4-vincent.guittot@linaro.orgSigned-off-by: NGuan Jing <guanjing6@huawei.com>

30123568

sched/pelt: Continue to relax the sync of util_sum with util_avg · f7818da0

由 Vincent Guittot 提交于 5月 28, 2023

mainline inclusion
from mainline-v5.17-rc2
commit 7ceb7710
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I78WM8

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.4-rc3&id=7ceb77103001544a43e11d7f3a8a69a2c1f422cf

--------------------------------

Rick reported performance regressions in bugzilla because of cpu frequency
being lower than before:
    https://bugzilla.kernel.org/show_bug.cgi?id=215045

He bisected the problem to:
commit 1c35b07e ("sched/fair: Ensure _sum and _avg values stay consistent")

This commit forces util_sum to be synced with the new util_avg after
removing the contribution of a task and before the next periodic sync. By
doing so util_sum is rounded to its lower bound and might lost up to
LOAD_AVG_MAX-1 of accumulated contribution which has not yet been
reflected in util_avg.

update_tg_cfs_util() is not the only place where we round util_sum and
lost some accumulated contributions that are not already reflected in
util_avg. Modify update_tg_cfs_util() and detach_entity_load_avg() to not
sync util_sum with the new util_avg. Instead of always setting util_sum to
the low bound of util_avg, which can significantly lower the utilization,
we propagate the difference. In addition, we also check that cfs's util_sum
always stays above the lower bound for a given util_avg as it has been
observed that sched_entity's util_sum is sometimes above cfs one.
Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: NSachin Sant <sachinp@linux.ibm.com>
Link: https://lkml.kernel.org/r/20220111134659.24961-3-vincent.guittot@linaro.orgSigned-off-by: NGuan Jing <guanjing6@huawei.com>

f7818da0

crypto: hisilicon/qm - support no-sva feature · a1666f44

由 Weili Qian 提交于 5月 22, 2023

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I773SD
CVE: NA

----------------------------------------------------------------------

support no-sva feature.
Signed-off-by: NWeili Qian <qianweili@huawei.com>
Signed-off-by: NJiangShui Yang <yangjiangshui@h-partners.com>

a1666f44

uacce: add UACCE_MODE_NOIOMMU for warpdrive · 92e58150

由 Kai Ye 提交于 5月 22, 2023

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I773SD
CVE: NA

----------------------------------------------------------------------

1. UACCE_MODE_NOIOMMU for warpdrive.
2. some dfx logs
3. fix some static checking.
Signed-off-by: NKai Ye <yekai13@huawei.com>
Signed-off-by: NJiangShui Yang <yangjiangshui@h-partners.com>

92e58150

!851 perf/smmuv3: Enable HiSilicon Erratum quirk · de2a0c7f

由 openeuler-ci-bot 提交于 5月 30, 2023

Merge Pull Request from: @hejunhao3 
 
Some HiSilicon SMMU PMCG suffers the erratum that the global PMU disable
control sometimes fail to disable each used the counters. This will lead
to error or inaccurate data since before we enable the counters the
counter's still counting for the event used in last perf session.

This patch tries to fix this by hardening the global disable process.
Before disable the PMU, writing an invalid event type (0xff) to focibly
stop the counters. 
 
Link:https://gitee.com/openeuler/kernel/pulls/851 

Reviewed-by: Yang Shen <shenyang39@huawei.com> 
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>

de2a0c7f

!793 LoongArch: kvm: add pv ipi support · a83e7947

由 openeuler-ci-bot 提交于 5月 30, 2023

Merge Pull Request from: @Hongchen_Zhang 
 
PV IPI with less VM exit can improve performance than
iocsr emulation when sending ipi interrupt to multi cpus.
    
This patch supports:
1. sending PV IPI by hypercall
2. recording and updating steal time for VM
 
 
Link:https://gitee.com/openeuler/kernel/pulls/793 

Reviewed-by: Guo Dongtai <guodongtai@kylinos.cn> 
Reviewed-by: Kevin Zhu <zhukeqian1@huawei.com> 
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>

a83e7947

!840 intel: backport uncore-freq current frequency sysfs related patches · b6923261

由 openeuler-ci-bot 提交于 5月 30, 2023

Merge Pull Request from: @jiayingbao 
 
there is new sysfs interface current_freq_khz added in upstream. Backport related patches for uncore-freq driver.

below 3 patches is backported: 
ae7b2ce5 platform/x86/intel/uncore-freq: Use sysfs API to create.
414eef27 platform/x86/intel/uncore-freq: Display uncore current frequency.
8d75f7b4 platform/x86: intel-uncore-freq: Prevent driver loading in guests.

test:
build pass
new sysfs current_freq_khz functional as expected 
 
Link:https://gitee.com/openeuler/kernel/pulls/840 

Reviewed-by: Jason Zeng <jason.zeng@intel.com> 
Reviewed-by: Aichun Shi <aichun.shi@intel.com> 
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>

b6923261

!839 intel: backport intel-pstate patches for Server platforms · bd7ac404

由 openeuler-ci-bot 提交于 5月 30, 2023

Merge Pull Request from: @jiayingbao 
 
backport some patches for Intel_pstate no-HWP and OOB mode, and also one HWP update for all server platforms
including below commit:

cd23f02f cpufreq: intel_pstate: Add Ice Lake server to out-of-band IDs
addca285 cpufreq: intel_pstate: Handle no_turbo in frequency invariance.
bbd67f1b cpufreq: intel_pstate: Support Sapphire Rapids OOB mode.
df51f287 cpufreq: intel_pstate: Add Sapphire Rapids support in no-HWP mode.
1f5e62f5 cpufreq: intel_pstate: Enable HWP IO boost for all servers.

test:
build pass
intel_pstate initial and functional as expected
 
 
Link:https://gitee.com/openeuler/kernel/pulls/839 

Reviewed-by: Jason Zeng <jason.zeng@intel.com> 
Reviewed-by: Aichun Shi <aichun.shi@intel.com> 
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>

bd7ac404

!809 LoongArch: defconfig: enable memory and pci hotplug related configs for LoongArch · 93bf728a

由 openeuler-ci-bot 提交于 5月 30, 2023

Merge Pull Request from: @Hongchen_Zhang 
 
enable memory and pci hotplug related configs for LoongArch,
   used by LoongArch virtual machine now. 
 
Link:https://gitee.com/openeuler/kernel/pulls/809 

Reviewed-by: Guo Dongtai <guodongtai@kylinos.cn> 
Reviewed-by: Liu Chao <liuchao173@huawei.com> 
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>

93bf728a

29 5月, 2023 4 次提交

!547 [OLK-5.10] cpufreq: ACPI: Add Zhaoxin/Centaur turbo boost control interface support · 2c317c48

由 openeuler-ci-bot 提交于 5月 29, 2023

Merge Pull Request from: @leoliu-oc 
 
Recent Zhaoxin/Centaur CPUs support X86_FEATURE_IDA and the turbo boost can be dynamically enabled or
disabled through MSR 0x1a0[38] in the same way as Intel. So add turbo boost control support for these CPUs too.

Issue
https://gitee.com/openeuler/kernel/issues/I6SKVN
 
 
Link:https://gitee.com/openeuler/kernel/pulls/547 

Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> 
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>

2c317c48

!544 [OLK-5.10] ACPI, x86: Improve Zhaoxin processors support for NONSTOP TSC · c40b6398

由 openeuler-ci-bot 提交于 5月 29, 2023

Merge Pull Request from: @leoliu-oc 
 
Zhaoxin CPUs  which use CENTAUR as vendor id, also have NONSTOP TSC feature.
So enable the ACPI driver support for it too.

Issue
https://gitee.com/openeuler/kernel/issues/I6SJQ8
 
 
Link:https://gitee.com/openeuler/kernel/pulls/544 

Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> 
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>

c40b6398

!848 Support T6 ETM and Workaround CPU hung bug on hip09 · ec482c50

由 openeuler-ci-bot 提交于 5月 29, 2023

Merge Pull Request from: @hejunhao3 
 
1. Added support for HiSilicon T6 ETM
2. Fix CPU hold issue caused by hip09 ETM overflow 
 
Link:https://gitee.com/openeuler/kernel/pulls/848 

Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>

ec482c50

perf/smmuv3: Enable HiSilicon Erratum quirk · fda37f5b

由 Yicong Yang 提交于 5月 29, 2023

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I798Y2
CVE: NA

----------------------------------------------------------------------

Some HiSilicon SMMU PMCG suffers the erratum that the global PMU disable
control sometimes fail to disable each used the counters. This will lead
to error or inaccurate data since before we enable the counters the
counter's still counting for the event used in last perf session.

This patch tries to fix this by hardening the global disable process.
Before disable the PMU, writing an invalid event type (0xff) to focibly
stop the counters.
Signed-off-by: NYicong Yang <yangyicong@hisilicon.com>
Signed-off-by: NJunhao He <hejunhao3@huawei.com>

fda37f5b

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功