提交 · 20ec0aa8d699d0426c8914176f5a6316f294e144 · openeuler / Kernel

11 10月, 2022 12 次提交

net: hns3: add querying and setting fec llrs mode from firmware · 20ec0aa8

由 Hao Lan 提交于 10月 11, 2022

mainline inclusion
from mainline-v6.0-rc2
commit 5c4f7284
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5QIIF
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5c4f72842d1d

----------------------------------------------------------------------

This patch supports llrs fec mode in speed 200G for some new devices, and
suppoprts querying llrs fec ability from firmware.
Signed-off-by: NHao Lan <lanhao@huawei.com>
Signed-off-by: NGuangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NJiantao Xiao <xiaojiantao1@h-partners.com>
Reviewed-by: NJian Shen <shenjian15@huawei.com>
Reviewed-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

20ec0aa8

net: hns3: add querying fec ability from firmware · 5538b273

由 Guangbin Huang 提交于 10月 11, 2022

mainline inclusion
from mainline-v6.0-rc2
commit eaf83ae5
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5QIIF
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=eaf83ae59e18

----------------------------------------------------------------------

For some new devices, driver can queries fec ability from firmware to
decide which FEC mode can be supported.

If devices of old version which not support querying fec ability, driver
sets fixed ability according to current speed.
Signed-off-by: NGuangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NJiantao Xiao <xiaojiantao1@h-partners.com>
Reviewed-by: NJian Shen <shenjian15@huawei.com>
Reviewed-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5538b273

net: hns3: add getting capabilities of gro offload and fd from firmware · 72b042b4

由 Guangbin Huang 提交于 10月 11, 2022

mainline inclusion
from mainline-v6.0-rc2
commit 507e46ae
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5QIIF
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=507e46ae26ea

----------------------------------------------------------------------

As some new devices may not support GRO offload and flow table director,
to support these devices, driver needs to querying capabilities of GRO
offload and flow table director from firmware. Whether the driver
supports these two features depends on capabilities.

For old device of version HNAE3_DEVICE_VERSION_V2, driver sets their
capabilities of these two features to fixed value.

Setting default features of netdev and debugfs also need to identify
whether support these two features.
Signed-off-by: NGuangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NJiantao Xiao <xiaojiantao1@h-partners.com>
Reviewed-by: NJian Shen <shenjian15@huawei.com>
Reviewed-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

72b042b4

RDMA/hns: Support MR's restrack raw ops for hns driver · 4d20406b

由 Wenpeng Liang 提交于 10月 11, 2022

mainline inclusion
from mainline-for-next
commit 3d67e7e2
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5TAQ5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?id=3d67e7e236ad

----------------------------------------------------------------------

The MR raw restrack attributes come from the queue context maintained by
the ROCEE.

For example:

$ rdma res show mr dev hns_0 mrn 6 -dd -jp -r
[ {
        "ifindex": 4,
        "ifname": "hns_0",
        "data": [ 1,0,0,0,2,0,0,0,0,3,0,0,0,0,2,0,0,0,0,0,32,0,0,0,2,0,0,0,
		  2,0,0,0,0,0,0,0 ]
    } ]

Link: https://lore.kernel.org/r/20220822104455.2311053-8-liangwenpeng@huawei.comSigned-off-by: NWenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NZhengfeng Luo <luozhengfeng@h-partners.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>
Reviewed-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

4d20406b

RDMA/hns: Support MR's restrack ops for hns driver · 98b07261

由 Wenpeng Liang 提交于 10月 11, 2022

mainline inclusion
from mainline-for-next
commit dc9981ef
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5TAQ5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?id=dc9981ef17c6

----------------------------------------------------------------------

The MR restrack attributes come from the queue information maintained by
the driver.

For example:

$ rdma res show mr dev hns_0 mrn 6 -dd -jp
[ {
        "ifindex": 4,
        "ifname": "hns_0",
        "mrn": 6,
        "rkey": "300",
        "lkey": "300",
        "mrlen": 131072,
        "pdn": 8,
        "pid": 1524,
        "comm": "ib_send_bw"
    },
    "drv_pbl_hop_num": 2,
    "drv_ba_pg_shift": 14,
    "drv_buf_pg_shift": 12
}

Link: https://lore.kernel.org/r/20220822104455.2311053-7-liangwenpeng@huawei.comSigned-off-by: NWenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NZhengfeng Luo <luozhengfeng@h-partners.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>
Reviewed-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

98b07261

RDMA/hns: Support QP's restrack raw ops for hns driver · 0480d2ff

由 Wenpeng Liang 提交于 10月 11, 2022

mainline inclusion
from mainline-for-next
commit 3e89d78b
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5TAQ5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?id=3e89d78b21a8

----------------------------------------------------------------------

The QP raw restrack attributes come from the queue context maintained by
the ROCEE.

For example:

$ rdma res show qp link hns_0 -jp -dd -r
[ {
        "ifindex": 4,
        "ifname": "hns_0",
        "data": [ 2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0,
		  5,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,255,156,0,0,63,156,0,0,
		  7,0,0,0,1,0,0,0,9,0,0,0,0,0,0,0,2,0,0,0,2,0,0,0,0,0,0,
		  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
		  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,63,156,0,
		  0,0,0,0,0 ]
    } ]

Link: https://lore.kernel.org/r/20220822104455.2311053-6-liangwenpeng@huawei.comSigned-off-by: NWenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NZhengfeng Luo <luozhengfeng@h-partners.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>
Reviewed-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0480d2ff

RDMA/hns: Support QP's restrack ops for hns driver · 78425b64

由 Wenpeng Liang 提交于 10月 11, 2022

mainline inclusion
from mainline-for-next
commit e198d65d
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5TAQ5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?id=e198d65d76e9

----------------------------------------------------------------------

The QP restrack attributes come from the queue information maintained by
the driver.

For example:

$ rdma res show qp link hns_0 lqpn 41 -jp -dd
[ {
        "ifindex": 4,
        "ifname": "hns_0",
        "port": 1,
        "lqpn": 41,
        "rqpn": 40,
        "type": "RC",
        "state": "RTR",
        "rq-psn": 12474738,
        "sq-psn": 0,
        "path-mig-state": "ARMED",
        "pdn": 9,
        "pid": 1523,
        "comm": "ib_send_bw"
    },
    "drv_sq_wqe_cnt": 128,
    "drv_sq_max_gs": 1,
    "drv_rq_wqe_cnt": 512,
    "drv_rq_max_gs": 2,
    "drv_ext_sge_sge_cnt": 0
}

Link: https://lore.kernel.org/r/20220822104455.2311053-5-liangwenpeng@huawei.comSigned-off-by: NWenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NZhengfeng Luo <luozhengfeng@h-partners.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>
Reviewed-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

78425b64

RDMA/hns: Support CQ's restrack raw ops for hns driver · 78163ff3

由 Wenpeng Liang 提交于 10月 11, 2022

mainline inclusion
from mainline-for-next
commit f2b070f3
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5TAQ5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?id=f2b070f36d1b

----------------------------------------------------------------------

The CQ raw restrack attributes come from the queue context maintained by
the ROCEE.

For example:

$ rdma res show cq dev hns_0 cqn 14 -dd -jp -r
[ {
        "ifindex": 4,
        "ifname": "hns_0",
        "data": [ 1,0,0,0,7,0,0,0,0,0,0,0,0,82,6,0,0,82,6,0,0,82,6,0,
		  1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,
		  6,0,0,0,0,0,0,0 ]
    } ]

Link: https://lore.kernel.org/r/20220822104455.2311053-4-liangwenpeng@huawei.comSigned-off-by: NWenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NZhengfeng Luo <luozhengfeng@h-partners.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>
Reviewed-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

78163ff3

RDMA/hns: Add or remove CQ's restrack attributes · 9dd913cd

由 Wenpeng Liang 提交于 10月 11, 2022

mainline inclusion
from mainline-for-next
commit eb00b9a0
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5TAQ5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?id=eb00b9a08b9d

----------------------------------------------------------------------

Remove the resttrack attributes from the queue context held by ROCEE, and
add the resttrack attributes from the queue information maintained by the
driver.

For example:

$ rdma res show cq dev hns_0 cqn 14 -dd -jp
[ {
        "ifindex": 4,
        "ifname": "hns_0",
        "cqn": 14,
        "cqe": 127,
        "users": 1,
        "adaptive-moderation": false,
        "ctxn": 8,
        "pid": 1524,
        "comm": "ib_send_bw"
    },
    "drv_cq_depth": 128,
    "drv_cons_index": 0,
    "drv_cqe_size": 32,
    "drv_arm_sn": 1
}

Link: https://lore.kernel.org/r/20220822104455.2311053-3-liangwenpeng@huawei.comSigned-off-by: NWenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NZhengfeng Luo <luozhengfeng@h-partners.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>
Reviewed-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9dd913cd

RDMA/hns: Remove redundant DFX file and DFX ops structure · 9c9edf68

由 Wenpeng Liang 提交于 10月 11, 2022

mainline inclusion
from mainline-for-next
commit 40b4b79c
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5TAQ5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?id=40b4b79c866f

----------------------------------------------------------------------

There is no need to use a dedicated DXF file and DFX structure to manage
the interface of the query queue context.

Link: https://lore.kernel.org/r/20220822104455.2311053-2-liangwenpeng@huawei.comSigned-off-by: NWenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NZhengfeng Luo <luozhengfeng@h-partners.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>
Reviewed-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9c9edf68

!129 [OLK-5.10] update pmu for Zhaoxin CPUs · 85778f7b

由 openeuler-ci-bot 提交于 10月 11, 2022

Merge Pull Request from: @leoliu-oc 
 
Add support for more Zhaoxin processors. And improve the uncore code to provide more functions and support.

### Issue
https://gitee.com/openeuler/kernel/issues/I5SRF7

### Test
`perf list`will display more infomations.

### Knowe Issue
N/A

### Default config change
N/A 
 
Link:https://gitee.com/openeuler/kernel/pulls/129 
Reviewed-by: Jiao Fenfang <jiaofenfang@uniontech.com> 
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>

85778f7b

!156 Enable NVMe over TCP for arm64 · fcf9c9b7

由 openeuler-ci-bot 提交于 10月 11, 2022

Merge Pull Request from: @xin3liang 
 
Enable NVMe over TCP for arm64. 
 
Link:https://gitee.com/openeuler/kernel/pulls/156 
Reviewed-by: Liu Chao <liuchao173@huawei.com> 
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>

fcf9c9b7

08 10月, 2022 1 次提交

arm64: openeuler_defconfig: enable nvmf tcp · 7ec04979

由 Xinliang Liu 提交于 10月 08, 2022

openEuler inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5SBW1
CVE: NA

--------------------------------

Enable NVMe over TCP for arm64.
Signed-off-by: NXinliang Liu <xinliang.liu@linaro.org>

7ec04979

30 9月, 2022 27 次提交

sched: fix kabi for core scheduling · ec840b5b

由 Lin Shengwang 提交于 9月 30, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

--------------------------------------------------------------------------
Signed-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ec840b5b

sched/core: Change depends of SCHED_CORE · 22e41f75

由 Lin Shengwang 提交于 9月 30, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

--------------------------------------------------------------------------

Due to the conflict between Core Scheduling and SMT expeller,
so SCHED_CORE and QOS_SCHED_SMT_EXPELLER are mutually exclusive.
Signed-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

22e41f75

sched/core: Fix the bug that task won't enqueue into core tree when update cookie · 548a5b0f

由 Cruz Zhao 提交于 9月 30, 2022

mainline inclusion
from mainline-v6.0-rc1
commit 91caa5ae
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=91caa5ae242465c3ab9fd473e50170faa7e944f4

--------------------------------------------------------------------------

In function sched_core_update_cookie(), a task will enqueue into the
core tree only when it enqueued before, that is, if an uncookied task
is cookied, it will not enqueue into the core tree until it enqueue
again, which will result in unnecessary force idle.

Here follows the scenario:
CPU x and CPU y are a pair of SMT siblings.
1. Start task a running on CPU x without sleeping, and task b and
task c running on CPU y without sleeping.
2. We create a cookie and share it to task a and task b, and then
we create another cookie and share it to task c.
3. Simpling core_forceidle_sum of task a and b from /proc/PID/sched

And we will find out that core_forceidle_sum of task a takes 30%
time of the sampling period, which shouldn't happen as task a and b
have the same cookie.

Then we migrate task a to CPU x', migrate task b and c to CPU y', where
CPU x' and CPU y' are a pair of SMT siblings, and sampling again, we
will found out that core_forceidle_sum of task a and b are almost zero.

To solve this problem, we enqueue the task into the core tree if it's
on rq.

Fixes: 6e33cad0("sched: Trivial core scheduling cookie management")
Signed-off-by: NCruz Zhao <CruzZhao@linux.alibaba.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1656403045-100840-2-git-send-email-CruzZhao@linux.alibaba.com
Conflicts:
kernel/sched/core_sched.c
[Feature 4feee7d1("sched/core: Forced idle accounting") is not applied.]
Signed-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

548a5b0f

sched/core: Avoid obvious double update_rq_clock warning · 54e68ff1

由 Hao Jia 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.19-rc1
commit 2679a837
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2679a83731d51a744657f718fc02c3b077e47562

--------------------------------------------------------------------------

When we use raw_spin_rq_lock() to acquire the rq lock and have to
update the rq clock while holding the lock, the kernel may issue
a WARN_DOUBLE_CLOCK warning.

Since we directly use raw_spin_rq_lock() to acquire rq lock instead of
rq_lock(), there is no corresponding change to rq->clock_update_flags.
In particular, we have obtained the rq lock of other CPUs, the
rq->clock_update_flags of this CPU may be RQCF_UPDATED at this time, and
then calling update_rq_clock() will trigger the WARN_DOUBLE_CLOCK warning.

So we need to clear RQCF_UPDATED of rq->clock_update_flags to avoid
the WARN_DOUBLE_CLOCK warning.

For the sched_rt_period_timer() and migrate_task_rq_dl() cases
we simply replace raw_spin_rq_lock()/raw_spin_rq_unlock() with
rq_lock()/rq_unlock().

For the {pull,push}_{rt,dl}_task() cases, we add the
double_rq_clock_clear_update() function to clear RQCF_UPDATED of
rq->clock_update_flags, and call double_rq_clock_clear_update()
before double_lock_balance()/double_rq_lock() returns to avoid the
WARN_DOUBLE_CLOCK warning.

Some call trace reports:
Call Trace 1:
 <IRQ>
 sched_rt_period_timer+0x10f/0x3a0
 ? enqueue_top_rt_rq+0x110/0x110
 __hrtimer_run_queues+0x1a9/0x490
 hrtimer_interrupt+0x10b/0x240
 __sysvec_apic_timer_interrupt+0x8a/0x250
 sysvec_apic_timer_interrupt+0x9a/0xd0
 </IRQ>
 <TASK>
 asm_sysvec_apic_timer_interrupt+0x12/0x20

Call Trace 2:
 <TASK>
 activate_task+0x8b/0x110
 push_rt_task.part.108+0x241/0x2c0
 push_rt_tasks+0x15/0x30
 finish_task_switch+0xaa/0x2e0
 ? __switch_to+0x134/0x420
 __schedule+0x343/0x8e0
 ? hrtimer_start_range_ns+0x101/0x340
 schedule+0x4e/0xb0
 do_nanosleep+0x8e/0x160
 hrtimer_nanosleep+0x89/0x120
 ? hrtimer_init_sleeper+0x90/0x90
 __x64_sys_nanosleep+0x96/0xd0
 do_syscall_64+0x34/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Call Trace 3:
 <TASK>
 deactivate_task+0x93/0xe0
 pull_rt_task+0x33e/0x400
 balance_rt+0x7e/0x90
 __schedule+0x62f/0x8e0
 do_task_dead+0x3f/0x50
 do_exit+0x7b8/0xbb0
 do_group_exit+0x2d/0x90
 get_signal+0x9df/0x9e0
 ? preempt_count_add+0x56/0xa0
 ? __remove_hrtimer+0x35/0x70
 arch_do_signal_or_restart+0x36/0x720
 ? nanosleep_copyout+0x39/0x50
 ? do_nanosleep+0x131/0x160
 ? audit_filter_inodes+0xf5/0x120
 exit_to_user_mode_prepare+0x10f/0x1e0
 syscall_exit_to_user_mode+0x17/0x30
 do_syscall_64+0x40/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Call Trace 4:
 update_rq_clock+0x128/0x1a0
 migrate_task_rq_dl+0xec/0x310
 set_task_cpu+0x84/0x1e4
 try_to_wake_up+0x1d8/0x5c0
 wake_up_process+0x1c/0x30
 hrtimer_wakeup+0x24/0x3c
 __hrtimer_run_queues+0x114/0x270
 hrtimer_interrupt+0xe8/0x244
 arch_timer_handler_phys+0x30/0x50
 handle_percpu_devid_irq+0x88/0x140
 generic_handle_domain_irq+0x40/0x60
 gic_handle_irq+0x48/0xe0
 call_on_irq_stack+0x2c/0x60
 do_interrupt_handler+0x80/0x84

Steps to reproduce:
1. Enable CONFIG_SCHED_DEBUG when compiling the kernel
2. echo 1 > /sys/kernel/debug/clear_warn_once
   echo "WARN_DOUBLE_CLOCK" > /sys/kernel/debug/sched/features
   echo "NO_RT_PUSH_IPI" > /sys/kernel/debug/sched/features
3. Run some rt/dl tasks that periodically work and sleep, e.g.
Create 2*n rt or dl (90% running) tasks via rt-app (on a system
with n CPUs), and Dietmar Eggemann reports Call Trace 4 when running
on PREEMPT_RT kernel.
Signed-off-by: NHao Jia <jiahao.os@bytedance.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lore.kernel.org/r/20220430085843.62939-2-jiahao.os@bytedance.comSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

54e68ff1

arch/arm64: Fix topology initialization for core scheduling · 24f89029

由 Phil Auld 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.18-rc2
commit 5524cbb1
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5524cbb1bfcdff0cad0aaa9f94e6092002a07259

--------------------------------------------------------------------------

Arm64 systems rely on store_cpu_topology() to call update_siblings_masks()
to transfer the toplogy to the various cpu masks. This needs to be done
before the call to notify_cpu_starting() which tells the scheduler about
each cpu found, otherwise the core scheduling data structures are setup
in a way that does not match the actual topology.

With smt_mask not setup correctly we bail on `cpumask_weight(smt_mask) == 1`
for !leaders in:

 notify_cpu_starting()
   cpuhp_invoke_callback_range()
     sched_cpu_starting()
       sched_core_cpu_starting()

which leads to rq->core not being correctly set for !leader-rq's.

Without this change stress-ng (which enables core scheduling in its prctl
tests in newer versions -- i.e. with PR_SCHED_CORE support) causes a warning
and then a crash (trimmed for legibility):

[ 1853.805168] ------------[ cut here ]------------
[ 1853.809784] task_rq(b)->core != rq->core
[ 1853.809792] WARNING: CPU: 117 PID: 0 at kernel/sched/fair.c:11102 cfs_prio_less+0x1b4/0x1c4
...
[ 1854.015210] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
...
[ 1854.231256] Call trace:
[ 1854.233689]  pick_next_task+0x3dc/0x81c
[ 1854.237512]  __schedule+0x10c/0x4cc
[ 1854.240988]  schedule_idle+0x34/0x54

Fixes: 9edeaea1 ("sched: Core-wide rq->lock")
Signed-off-by: NPhil Auld <pauld@redhat.com>
Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lore.kernel.org/r/20220331153926.25742-1-pauld@redhat.comSigned-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

24f89029

sched: Teach the forced-newidle balancer about CPU affinity limitation. · 0a47eb22

由 Sebastian Andrzej Siewior 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.18-rc2
commit 386ef214
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=386ef214c3c6ab111d05e1790e79475363abaa05

--------------------------------------------------------------------------

try_steal_cookie() looks at task_struct::cpus_mask to decide if the
task could be moved to `this' CPU. It ignores that the task might be in
a migration disabled section while not on the CPU. In this case the task
must not be moved otherwise per-CPU assumption are broken.

Use is_cpu_allowed(), as suggested by Peter Zijlstra, to decide if the a
task can be moved.

Fixes: d2dfa17b ("sched: Trivial forced-newidle balancer")
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/YjNK9El+3fzGmswf@linutronix.deSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0a47eb22

sched/core: Fix forceidle balancing · f0cbe3af

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.18-rc2
commit 5b6547ed
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5b6547ed97f4f5dfc23f8e3970af6d11d7b7ed7e

--------------------------------------------------------------------------

Steve reported that ChromeOS encounters the forceidle balancer being
ran from rt_mutex_setprio()'s balance_callback() invocation and
explodes.

Now, the forceidle balancer gets queued every time the idle task gets
selected, set_next_task(), which is strictly too often.
rt_mutex_setprio() also uses set_next_task() in the 'change' pattern:

	queued = task_on_rq_queued(p); /* p->on_rq == TASK_ON_RQ_QUEUED */
	running = task_current(rq, p); /* rq->curr == p */

	if (queued)
		dequeue_task(...);
	if (running)
		put_prev_task(...);

	/* change task properties */

	if (queued)
		enqueue_task(...);
	if (running)
		set_next_task(...);

However, rt_mutex_setprio() will explicitly not run this pattern on
the idle task (since priority boosting the idle task is quite insane).
Most other 'change' pattern users are pidhash based and would also not
apply to idle.

Also, the change pattern doesn't contain a __balance_callback()
invocation and hence we could have an out-of-band balance-callback,
which *should* trigger the WARN in rq_pin_lock() (which guards against
this exact anti-pattern).

So while none of that explains how this happens, it does indicate that
having it in set_next_task() might not be the most robust option.

Instead, explicitly queue the forceidle balancer from pick_next_task()
when it does indeed result in forceidle selection. Having it here,
ensures it can only be triggered under the __schedule() rq->lock
instance, and hence must be ran from that context.

This also happens to clean up the code a little, so win-win.

Fixes: d2dfa17b ("sched: Trivial forced-newidle balancer")
Reported-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NT.J. Alumbaugh <talumbau@chromium.org>
Link: https://lkml.kernel.org/r/20220330160535.GN8939@worktop.programming.kicks-ass.netSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f0cbe3af

sched: Make cookie functions static · 1dcba20d

由 Shaokun Zhang 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.16-rc1
commit d07b2eee
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d07b2eee4501c393cbf5bfcad36143310cfd72f9

--------------------------------------------------------------------------

Make cookie functions static as these are no longer invoked directly
by other code.

No functional change intended.
Signed-off-by: NShaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210922085735.52812-1-zhangshaokun@hisilicon.comSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1dcba20d

kselftests/sched: cleanup the child processes · d7151843

由 Li Zhijian 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.16-rc1
commit 1c36432b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1c36432b278cecf1499f21fae19836e614954309

--------------------------------------------------------------------------

Previously, 'make -C sched run_tests' will block forever when it occurs
something wrong where the *selftests framework* is waiting for its child
processes to exit.

[root@iaas-rpma sched]# ./cs_prctl_test

 ## Create a thread/process/process group hiearchy
Not a core sched system
tid=74985, / tgid=74985 / pgid=74985: ffffffffffffffff
Not a core sched system
    tid=74986, / tgid=74986 / pgid=74985: ffffffffffffffff
Not a core sched system
        tid=74988, / tgid=74986 / pgid=74985: ffffffffffffffff
Not a core sched system
        tid=74989, / tgid=74986 / pgid=74985: ffffffffffffffff
Not a core sched system
        tid=74990, / tgid=74986 / pgid=74985: ffffffffffffffff
Not a core sched system
    tid=74987, / tgid=74987 / pgid=74985: ffffffffffffffff
Not a core sched system
        tid=74991, / tgid=74987 / pgid=74985: ffffffffffffffff
Not a core sched system
        tid=74992, / tgid=74987 / pgid=74985: ffffffffffffffff
Not a core sched system
        tid=74993, / tgid=74987 / pgid=74985: ffffffffffffffff

Not a core sched system
(268) FAILED: get_cs_cookie(0) == 0

 ## Set a cookie on entire process group
-1 = prctl(62, 1, 0, 2, 0)
core_sched create failed -- PGID: Invalid argument
(cs_prctl_test.c:272) -
[root@iaas-rpma sched]# ps
    PID TTY          TIME CMD
   4605 pts/2    00:00:00 bash
  74986 pts/2    00:00:00 cs_prctl_test
  74987 pts/2    00:00:00 cs_prctl_test
  74999 pts/2    00:00:00 ps
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NLi Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NChris Hyser <chris.hyser@oracle.com>
Link: https://lore.kernel.org/r/20210902024333.75983-1-lizhijian@cn.fujitsu.comSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d7151843

uapi/linux/prctl: provide macro definitions for the PR_SCHED_CORE type argument · 0b89a690

由 Eugene Syromiatnikov 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.16-rc1
commit 61bc346c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=61bc346ce64a3864ac55f5d18bdc1572cda4fb18

--------------------------------------------------------------------------

Commit 7ac592aa ("sched: prctl() core-scheduling interface")
made use of enum pid_type in prctl's arg4; this type and the associated
enumeration definitions are not exposed to userspace.  Christian
has suggested to provide additional macro definitions that convey
the meaning of the type argument more in alignment with its actual
usage, and this patch does exactly that.

Link: https://lore.kernel.org/r/20210825170613.GA3884@asgard.redhat.comSuggested-by: NChristian Brauner <christian.brauner@ubuntu.com>
Acked-by: NChristian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: NEugene Syromiatnikov <esyr@redhat.com>
Complements: 7ac592aa ("sched: prctl() core-scheduling interface")
Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0b89a690

sched/core: Simplify core-wide task selection · b3ba365f

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.16-rc1
commit bc9ffef3
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bc9ffef31bf59819c9fc032178534ff9ed7c4981

--------------------------------------------------------------------------

Tao suggested a two-pass task selection to avoid the retry loop.

Not only does it avoid the retry loop, it results in *much* simpler
code.

This also fixes an issue spotted by Josh Don where, for SMT3+, we can
forget to update max on the first pass and get to do an extra round.
Suggested-by: NTao Zhou <tao.zhou@linux.dev>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NJosh Don <joshdon@google.com>
Reviewed-by: NVineeth Pillai (Microsoft) <vineethrp@gmail.com>
Link: https://lkml.kernel.org/r/YSS9+k1teA9oPEKl@hirez.programming.kicks-ass.netSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b3ba365f

sched: Fix Core-wide rq->lock for uninitialized CPUs · 9cec77f2

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14
commit 3c474b32
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c474b3239f12fe0b00d7e82481f36a1f31e79ab

--------------------------------------------------------------------------

Eugene tripped over the case where rq_lock(), as called in a
for_each_possible_cpu() loop came apart because rq->core hadn't been
setup yet.

This is a somewhat unusual, but valid case.

Rework things such that rq->core is initialized to point at itself. IOW
initialize each CPU as a single threaded Core. CPU online will then join
the new CPU (thread) to an existing Core where needed.

For completeness sake, have CPU offline fully undo the state so as to
not presume the topology will match the next time it comes online.

Fixes: 9edeaea1 ("sched: Core-wide rq->lock")
Reported-by: NEugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NJosh Don <joshdon@google.com>
Tested-by: NEugene Syromiatnikov <esyr@redhat.com>
Link: https://lkml.kernel.org/r/YR473ZGeKqMs6kw+@hirez.programming.kicks-ass.net
Conflicts:
	kernel/sched/core.c
	[Bugfix ed3cd45f("Merge tag 'v5.11' into sched/core,
	 to pick up fixes & refresh the branch") is not applied.]
Signed-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9cec77f2

admin-guide/hw-vuln: Rephrase a section of core-scheduling.rst · 70a9abf5

由 Fabio M. De Francesco 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.15-rc1
commit ce48ee81
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ce48ee81a1930b2218bea23490adb6673c88bf70

--------------------------------------------------------------------------

Rephrase the "For MDS" section in core-scheduling.rst for the purpose of
making it clearer what is meant by "kernel memory is still considered
untrusted".
Suggested-by: NVineeth Pillai <Vineeth.Pillai@microsoft.com>
Signed-off-by: NFabio M. De Francesco <fmdefrancesco@gmail.com>
Reviewed-by: NJoel Fernandes (Google) <joelaf@google.com>
Link: https://lore.kernel.org/r/20210721190250.26095-1-fmdefrancesco@gmail.comSigned-off-by: NJonathan Corbet <corbet@lwn.net>
Signed-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

70a9abf5

sched/core: Disable CONFIG_SCHED_CORE by default · a6d571a5

由 Ingo Molnar 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit d2343cb8
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d2343cb8d154fe20c4499711bb3a9af2095b2b4b

--------------------------------------------------------------------------

This option at minimum adds extra code to the scheduler - even if
it's default unused - and most users wouldn't want it.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a6d571a5

Documentation: Add usecases, design and interface for core scheduling · 68cf272e

由 Joel Fernandes (Google) 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 0159bb02
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0159bb020ca9a43b17aa9149f1199643c1d49426

--------------------------------------------------------------------------

Now that core scheduling is merged, update the documentation.
Co-developed-by: NChris Hyser <chris.hyser@oracle.com>
Signed-off-by: NChris Hyser <chris.hyser@oracle.com>
Co-developed-by: NJosh Don <joshdon@google.com>
Signed-off-by: NJosh Don <joshdon@google.com>
Signed-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210603013136.370918-1-joel@joelfernandes.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

68cf272e

sched: Add CONFIG_SCHED_CORE help text · 7275ce05

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 7b419f47
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7b419f47facd286c6723daca6ad69ec355473f78

--------------------------------------------------------------------------

Hugh noted that the SCHED_CORE Kconfig option could do with a help
text.
Requested-by: NHugh Dickins <hughd@google.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NRandy Dunlap <rdunlap@infradead.org>
Acked-by: NHugh Dickins <hughd@google.com>
Link: https://lkml.kernel.org/r/YKyhtwhEgvtUDOyl@hirez.programming.kicks-ass.netSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

7275ce05

sched: Fix leftover comment typos · ace13a36

由 Ingo Molnar 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit cc00c198
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cc00c1988801dc71f63bb7bad019e85046865095

--------------------------------------------------------------------------

A few more snuck in. Also capitalize 'CPU' while at it.
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ace13a36

tools headers UAPI: Sync linux/prctl.h with the kernel sources · d7278fc9

由 Arnaldo Carvalho de Melo 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.16-rc1
commit 49024204
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=49024204322cbfff892a28a67ad813cd41b6be81

--------------------------------------------------------------------------

To pick the changes in:

  61bc346c ("uapi/linux/prctl: provide macro definitions for the PR_SCHED_CORE type argument")

That don't result in any changes in tooling:

  $ tools/perf/trace/beauty/prctl_option.sh > before
  $ cp include/uapi/linux/prctl.h tools/include/uapi/linux/prctl.h
  $ tools/perf/trace/beauty/prctl_option.sh > after
  $ diff -u before after
  $

Just silences this perf tools build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/prctl.h' differs from latest version at 'include/uapi/linux/prctl.h'
  diff -u tools/include/uapi/linux/prctl.h include/uapi/linux/prctl.h

Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Eugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d7278fc9

kselftest: Add test for core sched prctl interface · c1e8abba

由 Chris Hyser 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 9f269900
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9f26990074931bbf797373e53104216059b300b1

--------------------------------------------------------------------------

Provides a selftest and examples of using the interface.

[peterz: updated to not use sched_debug]
Signed-off-by: NChris Hyser <chris.hyser@oracle.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123309.100860030@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c1e8abba

sched: prctl() core-scheduling interface · 0d6f9178

由 Chris Hyser 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 7ac592aa
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7ac592aa35a684ff1858fb9ec282886b9e3575ac

--------------------------------------------------------------------------

This patch provides support for setting and copying core scheduling
'task cookies' between threads (PID), processes (TGID), and process
groups (PGID).

The value of core scheduling isn't that tasks don't share a core,
'nosmt' can do that. The value lies in exploiting all the sharing
opportunities that exist to recover possible lost performance and that
requires a degree of flexibility in the API.

From a security perspective (and there are others), the thread,
process and process group distinction is an existent hierarchal
categorization of tasks that reflects many of the security concerns
about 'data sharing'. For example, protecting against cache-snooping
by a thread that can just read the memory directly isn't all that
useful.

With this in mind, subcommands to CREATE/SHARE (TO/FROM) provide a
mechanism to create and share cookies. CREATE/SHARE_TO specify a
target pid with enum pidtype used to specify the scope of the targeted
tasks. For example, PIDTYPE_TGID will share the cookie with the
process and all of it's threads as typically desired in a security
scenario.

API:

  prctl(PR_SCHED_CORE, PR_SCHED_CORE_GET, tgtpid, pidtype, &cookie)
  prctl(PR_SCHED_CORE, PR_SCHED_CORE_CREATE, tgtpid, pidtype, NULL)
  prctl(PR_SCHED_CORE, PR_SCHED_CORE_SHARE_TO, tgtpid, pidtype, NULL)
  prctl(PR_SCHED_CORE, PR_SCHED_CORE_SHARE_FROM, srcpid, pidtype, NULL)

where 'tgtpid/srcpid == 0' implies the current process and pidtype is
kernel enum pid_type {PIDTYPE_PID, PIDTYPE_TGID, PIDTYPE_PGID, ...}.

For return values, EINVAL, ENOMEM are what they say. ESRCH means the
tgtpid/srcpid was not found. EPERM indicates lack of PTRACE permission
access to tgtpid/srcpid. ENODEV indicates your machines lacks SMT.

[peterz: complete rewrite]
Signed-off-by: NChris Hyser <chris.hyser@oracle.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123309.039845339@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0d6f9178

sched: Inherit task cookie on fork() · c7666af0

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 85dd3f61
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=85dd3f61203c5cfa72b308ff327b5fbf3fc1ce5e

--------------------------------------------------------------------------

Note that sched_core_fork() is called from under tasklist_lock, and
not from sched_fork() earlier. This avoids a few races later.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.980003687@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c7666af0

sched: Trivial core scheduling cookie management · be234044

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 6e33cad0
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6e33cad0af49336952e5541464bd02f5b5fd433e

--------------------------------------------------------------------------

In order to not have to use pid_struct, create a new, smaller,
structure to manage task cookies for core scheduling.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.919768100@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

be234044

sched: Migration changes for core scheduling · 30a1426a

由 Aubrey Li 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 97886d9d
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=97886d9dcd86820bdbc1fa73b455982809cbc8c2

--------------------------------------------------------------------------

 - Don't migrate if there is a cookie mismatch
     Load balance tries to move task from busiest CPU to the
     destination CPU. When core scheduling is enabled, if the
     task's cookie does not match with the destination CPU's
     core cookie, this task may be skipped by this CPU. This
     mitigates the forced idle time on the destination CPU.

 - Select cookie matched idle CPU
     In the fast path of task wakeup, select the first cookie matched
     idle CPU instead of the first idle CPU.

 - Find cookie matched idlest CPU
     In the slow path of task wakeup, find the idlest CPU whose core
     cookie matches with task's cookie
Signed-off-by: NAubrey Li <aubrey.li@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.860083871@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

30a1426a

sched: Trivial forced-newidle balancer · 74ddc15c

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit d2dfa17b
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d2dfa17bc7de67e99685c4d6557837bf801a102c

--------------------------------------------------------------------------

When a sibling is forced-idle to match the core-cookie; search for
matching tasks to fill the core.

rcu_read_unlock() can incur an infrequent deadlock in
sched_core_balance(). Fix this by using the RCU-sched flavor instead.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.800048269@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

74ddc15c

sched/fair: Snapshot the min_vruntime of CPUs on force idle · 80077c25

由 Joel Fernandes (Google) 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit c6047c2e
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c6047c2e3af68dae23ad884249e0d42ff28d2d1b

--------------------------------------------------------------------------

During force-idle, we end up doing cross-cpu comparison of vruntimes
during pick_next_task. If we simply compare (vruntime-min_vruntime)
across CPUs, and if the CPUs only have 1 task each, we will always
end up comparing 0 with 0 and pick just one of the tasks all the time.
This starves the task that was not picked. To fix this, take a snapshot
of the min_vruntime when entering force idle and use it for comparison.
This min_vruntime snapshot will only be used for cross-CPU vruntime
comparison, and nothing else.

A note about the min_vruntime snapshot and force idling:

During selection:

  When we're not fi, we need to update snapshot.
  when we're fi and we were not fi, we must update snapshot.
  When we're fi and we were already fi, we must not update snapshot.

Which gives:

  fib     fi      update
  0       0       1
  0       1       1
  1       0       1
  1       1       0

Where:

  fi:  force-idled now
  fib: force-idled before

So the min_vruntime snapshot needs to be updated when: !(fib && fi).

Also, the cfs_prio_less() function needs to be aware of whether the
core is in force idle or not, since it will be use this information to
know whether to advance a cfs_rq's min_vruntime_fi in the hierarchy.
So pass this information along via pick_task() -> prio_less().
Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.738542617@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

80077c25

sched: Fix priority inversion of cookied task with sibling · 87d56255

由 Joel Fernandes (Google) 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 7afbba11
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7afbba119f0da09824d723f8081608ea1f74ff57

--------------------------------------------------------------------------

The rationale is as follows. In the core-wide pick logic, even if
need_sync == false, we need to go look at other CPUs (non-local CPUs)
to see if they could be running RT.

Say the RQs in a particular core look like this:

Let CFS1 and CFS2 be 2 tagged CFS tags.
Let RT1 be an untagged RT task.

	rq0		rq1
	CFS1 (tagged)	RT1 (no tag)
	CFS2 (tagged)

Say schedule() runs on rq0. Now, it will enter the above loop and
pick_task(RT) will return NULL for 'p'. It will enter the above if()
block and see that need_sync == false and will skip RT entirely.

The end result of the selection will be (say prio(CFS1) > prio(CFS2)):

	rq0             rq1
	CFS1            IDLE

When it should have selected:

	rq0             rq1
	IDLE            RT
Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.678425748@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

87d56255

sched/fair: Fix forced idle sibling starvation corner case · 483069d3

由 Vineeth Pillai 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 8039e96f
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8039e96fcc1de30d5bcaf05da9ca2de46a800826

--------------------------------------------------------------------------

If there is only one long running local task and the sibling is
forced idle, it  might not get a chance to run until a schedule
event happens on any cpu in the core.

So we check for this condition during a tick to see if a sibling
is starved and then give it a chance to schedule.
Signed-off-by: NVineeth Pillai <viremana@linux.microsoft.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.617407840@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

483069d3

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功