提交 · 24f890297774090e7efc35987bb2eef1bc09d8db · openeuler / Kernel

30 9月, 2022 34 次提交

arch/arm64: Fix topology initialization for core scheduling · 24f89029

由 Phil Auld 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.18-rc2
commit 5524cbb1
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5524cbb1bfcdff0cad0aaa9f94e6092002a07259

--------------------------------------------------------------------------

Arm64 systems rely on store_cpu_topology() to call update_siblings_masks()
to transfer the toplogy to the various cpu masks. This needs to be done
before the call to notify_cpu_starting() which tells the scheduler about
each cpu found, otherwise the core scheduling data structures are setup
in a way that does not match the actual topology.

With smt_mask not setup correctly we bail on `cpumask_weight(smt_mask) == 1`
for !leaders in:

 notify_cpu_starting()
   cpuhp_invoke_callback_range()
     sched_cpu_starting()
       sched_core_cpu_starting()

which leads to rq->core not being correctly set for !leader-rq's.

Without this change stress-ng (which enables core scheduling in its prctl
tests in newer versions -- i.e. with PR_SCHED_CORE support) causes a warning
and then a crash (trimmed for legibility):

[ 1853.805168] ------------[ cut here ]------------
[ 1853.809784] task_rq(b)->core != rq->core
[ 1853.809792] WARNING: CPU: 117 PID: 0 at kernel/sched/fair.c:11102 cfs_prio_less+0x1b4/0x1c4
...
[ 1854.015210] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
...
[ 1854.231256] Call trace:
[ 1854.233689]  pick_next_task+0x3dc/0x81c
[ 1854.237512]  __schedule+0x10c/0x4cc
[ 1854.240988]  schedule_idle+0x34/0x54

Fixes: 9edeaea1 ("sched: Core-wide rq->lock")
Signed-off-by: NPhil Auld <pauld@redhat.com>
Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lore.kernel.org/r/20220331153926.25742-1-pauld@redhat.comSigned-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

24f89029

sched: Teach the forced-newidle balancer about CPU affinity limitation. · 0a47eb22

由 Sebastian Andrzej Siewior 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.18-rc2
commit 386ef214
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=386ef214c3c6ab111d05e1790e79475363abaa05

--------------------------------------------------------------------------

try_steal_cookie() looks at task_struct::cpus_mask to decide if the
task could be moved to `this' CPU. It ignores that the task might be in
a migration disabled section while not on the CPU. In this case the task
must not be moved otherwise per-CPU assumption are broken.

Use is_cpu_allowed(), as suggested by Peter Zijlstra, to decide if the a
task can be moved.

Fixes: d2dfa17b ("sched: Trivial forced-newidle balancer")
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/YjNK9El+3fzGmswf@linutronix.deSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0a47eb22

sched/core: Fix forceidle balancing · f0cbe3af

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.18-rc2
commit 5b6547ed
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5b6547ed97f4f5dfc23f8e3970af6d11d7b7ed7e

--------------------------------------------------------------------------

Steve reported that ChromeOS encounters the forceidle balancer being
ran from rt_mutex_setprio()'s balance_callback() invocation and
explodes.

Now, the forceidle balancer gets queued every time the idle task gets
selected, set_next_task(), which is strictly too often.
rt_mutex_setprio() also uses set_next_task() in the 'change' pattern:

	queued = task_on_rq_queued(p); /* p->on_rq == TASK_ON_RQ_QUEUED */
	running = task_current(rq, p); /* rq->curr == p */

	if (queued)
		dequeue_task(...);
	if (running)
		put_prev_task(...);

	/* change task properties */

	if (queued)
		enqueue_task(...);
	if (running)
		set_next_task(...);

However, rt_mutex_setprio() will explicitly not run this pattern on
the idle task (since priority boosting the idle task is quite insane).
Most other 'change' pattern users are pidhash based and would also not
apply to idle.

Also, the change pattern doesn't contain a __balance_callback()
invocation and hence we could have an out-of-band balance-callback,
which *should* trigger the WARN in rq_pin_lock() (which guards against
this exact anti-pattern).

So while none of that explains how this happens, it does indicate that
having it in set_next_task() might not be the most robust option.

Instead, explicitly queue the forceidle balancer from pick_next_task()
when it does indeed result in forceidle selection. Having it here,
ensures it can only be triggered under the __schedule() rq->lock
instance, and hence must be ran from that context.

This also happens to clean up the code a little, so win-win.

Fixes: d2dfa17b ("sched: Trivial forced-newidle balancer")
Reported-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NT.J. Alumbaugh <talumbau@chromium.org>
Link: https://lkml.kernel.org/r/20220330160535.GN8939@worktop.programming.kicks-ass.netSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f0cbe3af

sched: Make cookie functions static · 1dcba20d

由 Shaokun Zhang 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.16-rc1
commit d07b2eee
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d07b2eee4501c393cbf5bfcad36143310cfd72f9

--------------------------------------------------------------------------

Make cookie functions static as these are no longer invoked directly
by other code.

No functional change intended.
Signed-off-by: NShaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210922085735.52812-1-zhangshaokun@hisilicon.comSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1dcba20d

kselftests/sched: cleanup the child processes · d7151843

由 Li Zhijian 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.16-rc1
commit 1c36432b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1c36432b278cecf1499f21fae19836e614954309

--------------------------------------------------------------------------

Previously, 'make -C sched run_tests' will block forever when it occurs
something wrong where the *selftests framework* is waiting for its child
processes to exit.

[root@iaas-rpma sched]# ./cs_prctl_test

 ## Create a thread/process/process group hiearchy
Not a core sched system
tid=74985, / tgid=74985 / pgid=74985: ffffffffffffffff
Not a core sched system
    tid=74986, / tgid=74986 / pgid=74985: ffffffffffffffff
Not a core sched system
        tid=74988, / tgid=74986 / pgid=74985: ffffffffffffffff
Not a core sched system
        tid=74989, / tgid=74986 / pgid=74985: ffffffffffffffff
Not a core sched system
        tid=74990, / tgid=74986 / pgid=74985: ffffffffffffffff
Not a core sched system
    tid=74987, / tgid=74987 / pgid=74985: ffffffffffffffff
Not a core sched system
        tid=74991, / tgid=74987 / pgid=74985: ffffffffffffffff
Not a core sched system
        tid=74992, / tgid=74987 / pgid=74985: ffffffffffffffff
Not a core sched system
        tid=74993, / tgid=74987 / pgid=74985: ffffffffffffffff

Not a core sched system
(268) FAILED: get_cs_cookie(0) == 0

 ## Set a cookie on entire process group
-1 = prctl(62, 1, 0, 2, 0)
core_sched create failed -- PGID: Invalid argument
(cs_prctl_test.c:272) -
[root@iaas-rpma sched]# ps
    PID TTY          TIME CMD
   4605 pts/2    00:00:00 bash
  74986 pts/2    00:00:00 cs_prctl_test
  74987 pts/2    00:00:00 cs_prctl_test
  74999 pts/2    00:00:00 ps
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NLi Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NChris Hyser <chris.hyser@oracle.com>
Link: https://lore.kernel.org/r/20210902024333.75983-1-lizhijian@cn.fujitsu.comSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d7151843

uapi/linux/prctl: provide macro definitions for the PR_SCHED_CORE type argument · 0b89a690

由 Eugene Syromiatnikov 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.16-rc1
commit 61bc346c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=61bc346ce64a3864ac55f5d18bdc1572cda4fb18

--------------------------------------------------------------------------

Commit 7ac592aa ("sched: prctl() core-scheduling interface")
made use of enum pid_type in prctl's arg4; this type and the associated
enumeration definitions are not exposed to userspace.  Christian
has suggested to provide additional macro definitions that convey
the meaning of the type argument more in alignment with its actual
usage, and this patch does exactly that.

Link: https://lore.kernel.org/r/20210825170613.GA3884@asgard.redhat.comSuggested-by: NChristian Brauner <christian.brauner@ubuntu.com>
Acked-by: NChristian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: NEugene Syromiatnikov <esyr@redhat.com>
Complements: 7ac592aa ("sched: prctl() core-scheduling interface")
Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0b89a690

sched/core: Simplify core-wide task selection · b3ba365f

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.16-rc1
commit bc9ffef3
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bc9ffef31bf59819c9fc032178534ff9ed7c4981

--------------------------------------------------------------------------

Tao suggested a two-pass task selection to avoid the retry loop.

Not only does it avoid the retry loop, it results in *much* simpler
code.

This also fixes an issue spotted by Josh Don where, for SMT3+, we can
forget to update max on the first pass and get to do an extra round.
Suggested-by: NTao Zhou <tao.zhou@linux.dev>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NJosh Don <joshdon@google.com>
Reviewed-by: NVineeth Pillai (Microsoft) <vineethrp@gmail.com>
Link: https://lkml.kernel.org/r/YSS9+k1teA9oPEKl@hirez.programming.kicks-ass.netSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b3ba365f

sched: Fix Core-wide rq->lock for uninitialized CPUs · 9cec77f2

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14
commit 3c474b32
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c474b3239f12fe0b00d7e82481f36a1f31e79ab

--------------------------------------------------------------------------

Eugene tripped over the case where rq_lock(), as called in a
for_each_possible_cpu() loop came apart because rq->core hadn't been
setup yet.

This is a somewhat unusual, but valid case.

Rework things such that rq->core is initialized to point at itself. IOW
initialize each CPU as a single threaded Core. CPU online will then join
the new CPU (thread) to an existing Core where needed.

For completeness sake, have CPU offline fully undo the state so as to
not presume the topology will match the next time it comes online.

Fixes: 9edeaea1 ("sched: Core-wide rq->lock")
Reported-by: NEugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NJosh Don <joshdon@google.com>
Tested-by: NEugene Syromiatnikov <esyr@redhat.com>
Link: https://lkml.kernel.org/r/YR473ZGeKqMs6kw+@hirez.programming.kicks-ass.net
Conflicts:
	kernel/sched/core.c
	[Bugfix ed3cd45f("Merge tag 'v5.11' into sched/core,
	 to pick up fixes & refresh the branch") is not applied.]
Signed-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9cec77f2

admin-guide/hw-vuln: Rephrase a section of core-scheduling.rst · 70a9abf5

由 Fabio M. De Francesco 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.15-rc1
commit ce48ee81
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ce48ee81a1930b2218bea23490adb6673c88bf70

--------------------------------------------------------------------------

Rephrase the "For MDS" section in core-scheduling.rst for the purpose of
making it clearer what is meant by "kernel memory is still considered
untrusted".
Suggested-by: NVineeth Pillai <Vineeth.Pillai@microsoft.com>
Signed-off-by: NFabio M. De Francesco <fmdefrancesco@gmail.com>
Reviewed-by: NJoel Fernandes (Google) <joelaf@google.com>
Link: https://lore.kernel.org/r/20210721190250.26095-1-fmdefrancesco@gmail.comSigned-off-by: NJonathan Corbet <corbet@lwn.net>
Signed-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

70a9abf5

sched/core: Disable CONFIG_SCHED_CORE by default · a6d571a5

由 Ingo Molnar 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit d2343cb8
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d2343cb8d154fe20c4499711bb3a9af2095b2b4b

--------------------------------------------------------------------------

This option at minimum adds extra code to the scheduler - even if
it's default unused - and most users wouldn't want it.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a6d571a5

Documentation: Add usecases, design and interface for core scheduling · 68cf272e

由 Joel Fernandes (Google) 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 0159bb02
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0159bb020ca9a43b17aa9149f1199643c1d49426

--------------------------------------------------------------------------

Now that core scheduling is merged, update the documentation.
Co-developed-by: NChris Hyser <chris.hyser@oracle.com>
Signed-off-by: NChris Hyser <chris.hyser@oracle.com>
Co-developed-by: NJosh Don <joshdon@google.com>
Signed-off-by: NJosh Don <joshdon@google.com>
Signed-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210603013136.370918-1-joel@joelfernandes.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

68cf272e

sched: Add CONFIG_SCHED_CORE help text · 7275ce05

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 7b419f47
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7b419f47facd286c6723daca6ad69ec355473f78

--------------------------------------------------------------------------

Hugh noted that the SCHED_CORE Kconfig option could do with a help
text.
Requested-by: NHugh Dickins <hughd@google.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NRandy Dunlap <rdunlap@infradead.org>
Acked-by: NHugh Dickins <hughd@google.com>
Link: https://lkml.kernel.org/r/YKyhtwhEgvtUDOyl@hirez.programming.kicks-ass.netSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

7275ce05

sched: Fix leftover comment typos · ace13a36

由 Ingo Molnar 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit cc00c198
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cc00c1988801dc71f63bb7bad019e85046865095

--------------------------------------------------------------------------

A few more snuck in. Also capitalize 'CPU' while at it.
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ace13a36

tools headers UAPI: Sync linux/prctl.h with the kernel sources · d7278fc9

由 Arnaldo Carvalho de Melo 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.16-rc1
commit 49024204
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=49024204322cbfff892a28a67ad813cd41b6be81

--------------------------------------------------------------------------

To pick the changes in:

  61bc346c ("uapi/linux/prctl: provide macro definitions for the PR_SCHED_CORE type argument")

That don't result in any changes in tooling:

  $ tools/perf/trace/beauty/prctl_option.sh > before
  $ cp include/uapi/linux/prctl.h tools/include/uapi/linux/prctl.h
  $ tools/perf/trace/beauty/prctl_option.sh > after
  $ diff -u before after
  $

Just silences this perf tools build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/prctl.h' differs from latest version at 'include/uapi/linux/prctl.h'
  diff -u tools/include/uapi/linux/prctl.h include/uapi/linux/prctl.h

Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Eugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d7278fc9

kselftest: Add test for core sched prctl interface · c1e8abba

由 Chris Hyser 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 9f269900
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9f26990074931bbf797373e53104216059b300b1

--------------------------------------------------------------------------

Provides a selftest and examples of using the interface.

[peterz: updated to not use sched_debug]
Signed-off-by: NChris Hyser <chris.hyser@oracle.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123309.100860030@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c1e8abba

sched: prctl() core-scheduling interface · 0d6f9178

由 Chris Hyser 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 7ac592aa
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7ac592aa35a684ff1858fb9ec282886b9e3575ac

--------------------------------------------------------------------------

This patch provides support for setting and copying core scheduling
'task cookies' between threads (PID), processes (TGID), and process
groups (PGID).

The value of core scheduling isn't that tasks don't share a core,
'nosmt' can do that. The value lies in exploiting all the sharing
opportunities that exist to recover possible lost performance and that
requires a degree of flexibility in the API.

From a security perspective (and there are others), the thread,
process and process group distinction is an existent hierarchal
categorization of tasks that reflects many of the security concerns
about 'data sharing'. For example, protecting against cache-snooping
by a thread that can just read the memory directly isn't all that
useful.

With this in mind, subcommands to CREATE/SHARE (TO/FROM) provide a
mechanism to create and share cookies. CREATE/SHARE_TO specify a
target pid with enum pidtype used to specify the scope of the targeted
tasks. For example, PIDTYPE_TGID will share the cookie with the
process and all of it's threads as typically desired in a security
scenario.

API:

  prctl(PR_SCHED_CORE, PR_SCHED_CORE_GET, tgtpid, pidtype, &cookie)
  prctl(PR_SCHED_CORE, PR_SCHED_CORE_CREATE, tgtpid, pidtype, NULL)
  prctl(PR_SCHED_CORE, PR_SCHED_CORE_SHARE_TO, tgtpid, pidtype, NULL)
  prctl(PR_SCHED_CORE, PR_SCHED_CORE_SHARE_FROM, srcpid, pidtype, NULL)

where 'tgtpid/srcpid == 0' implies the current process and pidtype is
kernel enum pid_type {PIDTYPE_PID, PIDTYPE_TGID, PIDTYPE_PGID, ...}.

For return values, EINVAL, ENOMEM are what they say. ESRCH means the
tgtpid/srcpid was not found. EPERM indicates lack of PTRACE permission
access to tgtpid/srcpid. ENODEV indicates your machines lacks SMT.

[peterz: complete rewrite]
Signed-off-by: NChris Hyser <chris.hyser@oracle.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123309.039845339@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0d6f9178

sched: Inherit task cookie on fork() · c7666af0

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 85dd3f61
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=85dd3f61203c5cfa72b308ff327b5fbf3fc1ce5e

--------------------------------------------------------------------------

Note that sched_core_fork() is called from under tasklist_lock, and
not from sched_fork() earlier. This avoids a few races later.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.980003687@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c7666af0

sched: Trivial core scheduling cookie management · be234044

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 6e33cad0
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6e33cad0af49336952e5541464bd02f5b5fd433e

--------------------------------------------------------------------------

In order to not have to use pid_struct, create a new, smaller,
structure to manage task cookies for core scheduling.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.919768100@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

be234044

sched: Migration changes for core scheduling · 30a1426a

由 Aubrey Li 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 97886d9d
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=97886d9dcd86820bdbc1fa73b455982809cbc8c2

--------------------------------------------------------------------------

 - Don't migrate if there is a cookie mismatch
     Load balance tries to move task from busiest CPU to the
     destination CPU. When core scheduling is enabled, if the
     task's cookie does not match with the destination CPU's
     core cookie, this task may be skipped by this CPU. This
     mitigates the forced idle time on the destination CPU.

 - Select cookie matched idle CPU
     In the fast path of task wakeup, select the first cookie matched
     idle CPU instead of the first idle CPU.

 - Find cookie matched idlest CPU
     In the slow path of task wakeup, find the idlest CPU whose core
     cookie matches with task's cookie
Signed-off-by: NAubrey Li <aubrey.li@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.860083871@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

30a1426a

sched: Trivial forced-newidle balancer · 74ddc15c

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit d2dfa17b
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d2dfa17bc7de67e99685c4d6557837bf801a102c

--------------------------------------------------------------------------

When a sibling is forced-idle to match the core-cookie; search for
matching tasks to fill the core.

rcu_read_unlock() can incur an infrequent deadlock in
sched_core_balance(). Fix this by using the RCU-sched flavor instead.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.800048269@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

74ddc15c

sched/fair: Snapshot the min_vruntime of CPUs on force idle · 80077c25

由 Joel Fernandes (Google) 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit c6047c2e
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c6047c2e3af68dae23ad884249e0d42ff28d2d1b

--------------------------------------------------------------------------

During force-idle, we end up doing cross-cpu comparison of vruntimes
during pick_next_task. If we simply compare (vruntime-min_vruntime)
across CPUs, and if the CPUs only have 1 task each, we will always
end up comparing 0 with 0 and pick just one of the tasks all the time.
This starves the task that was not picked. To fix this, take a snapshot
of the min_vruntime when entering force idle and use it for comparison.
This min_vruntime snapshot will only be used for cross-CPU vruntime
comparison, and nothing else.

A note about the min_vruntime snapshot and force idling:

During selection:

  When we're not fi, we need to update snapshot.
  when we're fi and we were not fi, we must update snapshot.
  When we're fi and we were already fi, we must not update snapshot.

Which gives:

  fib     fi      update
  0       0       1
  0       1       1
  1       0       1
  1       1       0

Where:

  fi:  force-idled now
  fib: force-idled before

So the min_vruntime snapshot needs to be updated when: !(fib && fi).

Also, the cfs_prio_less() function needs to be aware of whether the
core is in force idle or not, since it will be use this information to
know whether to advance a cfs_rq's min_vruntime_fi in the hierarchy.
So pass this information along via pick_task() -> prio_less().
Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.738542617@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

80077c25

sched: Fix priority inversion of cookied task with sibling · 87d56255

由 Joel Fernandes (Google) 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 7afbba11
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7afbba119f0da09824d723f8081608ea1f74ff57

--------------------------------------------------------------------------

The rationale is as follows. In the core-wide pick logic, even if
need_sync == false, we need to go look at other CPUs (non-local CPUs)
to see if they could be running RT.

Say the RQs in a particular core look like this:

Let CFS1 and CFS2 be 2 tagged CFS tags.
Let RT1 be an untagged RT task.

	rq0		rq1
	CFS1 (tagged)	RT1 (no tag)
	CFS2 (tagged)

Say schedule() runs on rq0. Now, it will enter the above loop and
pick_task(RT) will return NULL for 'p'. It will enter the above if()
block and see that need_sync == false and will skip RT entirely.

The end result of the selection will be (say prio(CFS1) > prio(CFS2)):

	rq0             rq1
	CFS1            IDLE

When it should have selected:

	rq0             rq1
	IDLE            RT
Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.678425748@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

87d56255

sched/fair: Fix forced idle sibling starvation corner case · 483069d3

由 Vineeth Pillai 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 8039e96f
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8039e96fcc1de30d5bcaf05da9ca2de46a800826

--------------------------------------------------------------------------

If there is only one long running local task and the sibling is
forced idle, it  might not get a chance to run until a schedule
event happens on any cpu in the core.

So we check for this condition during a tick to see if a sibling
is starved and then give it a chance to schedule.
Signed-off-by: NVineeth Pillai <viremana@linux.microsoft.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.617407840@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

483069d3

sched: Add core wide task selection and scheduling · 4bd71bc9

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 539f6512
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=539f65125d20aacab54d02d77f10a839f45b09dc

--------------------------------------------------------------------------

Instead of only selecting a local task, select a task for all SMT
siblings for every reschedule on the core (irrespective which logical
CPU does the reschedule).
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.557559654@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

4bd71bc9

sched: Basic tracking of matching tasks · c29a9a91

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 8a311c74
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8a311c740b53324ec584e0e3bb7077d56b123c28

--------------------------------------------------------------------------

Introduce task_struct::core_cookie as an opaque identifier for core
scheduling. When enabled; core scheduling will only allow matching
task to be on the core; where idle matches everything.

When task_struct::core_cookie is set (and core scheduling is enabled)
these tasks are indexed in a second RB-tree, first on cookie value
then on scheduling function, such that matching task selection always
finds the most elegible match.

NOTE: *shudder* at the overhead...

NOTE: *sigh*, a 3rd copy of the scheduling function; the alternative
is per class tracking of cookies and that just duplicates a lot of
stuff for no raisin (the 2nd copy lives in the rt-mutex PI code).

[Joel: folded fixes]
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.496975854@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c29a9a91

sched: Introduce sched_class::pick_task() · 195efd0e

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 21f56ffe
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=21f56ffe4482e501b9e83737612493eeaac21f5a

--------------------------------------------------------------------------

Because sched_class::pick_next_task() also implies
sched_class::set_next_task() (and possibly put_prev_task() and
newidle_balance) it is not state invariant. This makes it unsuitable
for remote task selection.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
[Vineeth: folded fixes]
Signed-off-by: NVineeth Remanan Pillai <viremana@linux.microsoft.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.437092775@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

195efd0e

sched: Allow sched_core_put() from atomic context · e33922b4

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 875feb41
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=875feb41fd20f6bd6054c9e79a5bcd9da6d8d2b2

--------------------------------------------------------------------------

Stuff the meat of sched_core_put() into a work such that we can use
sched_core_put() from atomic context.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.377455632@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e33922b4

sched: Optimize rq_lockp() usage · 727a6989

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 9ef7e7e3
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9ef7e7e33bcdb57be1afb28884053c28b5f05240

--------------------------------------------------------------------------

rq_lockp() includes a static_branch(), which is asm-goto, which is
asm volatile which defeats regular CSE. This means that:

	if (!static_branch(&foo))
		return simple;

	if (static_branch(&foo) && cond)
		return complex;

Doesn't fold and we get horrible code. Introduce __rq_lockp() without
the static_branch() on.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.316696988@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

727a6989

sched: Core-wide rq->lock · 2fe77d25

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 9edeaea1
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9edeaea1bc452372718837ed2ba775811baf1ba1

--------------------------------------------------------------------------

Introduce the basic infrastructure to have a core wide rq->lock.

This relies on the rq->__lock order being in increasing CPU number
(inside a core). It is also constrained to SMT8 per lockdep (and
SMT256 per preempt_count).

Luckily SMT8 is the max supported SMT count for Linux (Mips, Sparc and
Power are known to have this).
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/YJUNfzSgptjX7tG6@hirez.programming.kicks-ass.netSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2fe77d25

sched: Prepare for Core-wide rq->lock · 026e3779

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit d66f1b06
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d66f1b06b5b438cd20ba3664b8eef1f9c79e84bf

--------------------------------------------------------------------------

When switching on core-sched, CPUs need to agree which lock to use for
their RQ.

The new rule will be that rq->core_enabled will be toggled while
holding all rq->__locks that belong to a core. This means we need to
double check the rq->core_enabled value after each lock acquire and
retry if it changed.

This also has implications for those sites that take multiple RQ
locks, they need to be careful that the second lock doesn't end up
being the first lock.

Verify the lock pointer after acquiring the first lock, because if
they're on the same core, holding any of the rq->__lock instances will
pin the core state.

While there, change the rq->__lock order to CPU number, instead of rq
address, this greatly simplifies the next patch.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/YJUNY0dmrJMD/BIm@hirez.programming.kicks-ass.netSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

026e3779

sched: Wrap rq::lock access · bdb12c26

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 5cb9eaa3
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5cb9eaa3d274f75539077a28cf01e3563195fa53

--------------------------------------------------------------------------

In preparation of playing games with rq->lock, abstract the thing
using an accessor.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.136465446@infradead.org
Conflicts:
	kernel/sched/core.c
	[Bugfix a7c81556("sched: Fix migrate_disable() vs rt/dl balancing")
	 is not applied.
	 Bugfix 565790d2("sched: Fix balance_callback()") is not applied.
	 Bugfix ae792702("sched: Optimize finish_lock_switch()") is not applied.
	 Bugfix 36c6e17b("sched/core: Print out straggler tasks in sched_cpu_dying()")
	 is not applied.
	 Feature 2558aacf("sched/hotplug: Ensure only per-cpu kthreads run
	 during hotplug") is not applied.
	 Feature f2469a1f("sched/core: Wait for tasks being pushed away on hotplug")
	 is not applied.]

	kernel/sched/deadline.c
	[Bugfix a7c81556("sched: Fix migrate_disable() vs rt/dl balancing")
	 is not applied.]

	kernel/sched/fair.c
	[Feature acf66d70("sched/fair: Provide can_migrate_task_llc")
	 Feature 0826530d("sched/fair: Remove update of blocked load from newidle_balance")
	 s not applied.
	 Feature 6864cf01("sched/fair: Steal work from an overloaded CPU when CPU goes idle")]

	kernel/sched/rt.c
	[Bugfix a7c81556("sched: Fix migrate_disable() vs rt/dl balancing")
	 is not applied.]

	kernel/sched/sched.h
	[[Bugfix a7c81556("sched: Fix migrate_disable() vs rt/dl balancing")
	 is not applied.]
Signed-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

bdb12c26

sched: Provide raw_spin_rq_*lock*() helpers · a52e180b

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 39d371b7
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=39d371b7c0c299d489041884d005aacc4bba8c15

--------------------------------------------------------------------------

In prepration for playing games with rq->lock, add some rq_lock
wrappers.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.075967879@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a52e180b

sched/fair: Add a few assertions · df54b4c7

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 9099a147
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9099a14708ce1dfecb6002605594a0daa319b555

--------------------------------------------------------------------------
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.015639083@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

df54b4c7

rbtree: Add generic add and find helpers · e1c6bbe4

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.12-rc1
commit 2d24dd57
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2d24dd5798d04

--------------------------------------------------------------------------

I've always been bothered by the endless (fragile) boilerplate for
rbtree, and I recently wrote some rbtree helpers for objtool and
figured I should lift them into the kernel and use them more widely.

Provide:

partial-order; less() based:
 - rb_add(): add a new entry to the rbtree
 - rb_add_cached(): like rb_add(), but for a rb_root_cached

total-order; cmp() based:
 - rb_find(): find an entry in an rbtree
 - rb_find_add(): find an entry, and add if not found

 - rb_find_first(): find the first (leftmost) matching entry
 - rb_next_match(): continue from rb_find_first()
 - rb_for_each(): iterate a sub-tree using the previous two

Inlining and constant propagation should see the compiler inline the
whole thing, including the various compare functions.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Reviewed-by: NMichel Lespinasse <walken@google.com>
Acked-by: NDavidlohr Bueso <dbueso@suse.de>

Conflicts:
	tools/objtool/elf.c
	[Feature 3690914e("objtool: Extract elf_symbol_add()")]
Signed-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e1c6bbe4

29 9月, 2022 6 次提交

KVM: arm64: Try stage2 block mapping for host device MMIO · 789d5a97

由 Keqian Zhu 提交于 9月 29, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 2aa53d68
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5R1MW
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2aa53d68cee6

------------------------------------------------------------------

The MMIO region of a device maybe huge (GB level), try to use
block mapping in stage2 to speedup both map and unmap.

Compared to normal memory mapping, we should consider two more
points when try block mapping for MMIO region:

1. For normal memory mapping, the PA(host physical address) and
HVA have same alignment within PUD_SIZE or PMD_SIZE when we use
the HVA to request hugepage, so we don't need to consider PA
alignment when verifing block mapping. But for device memory
mapping, the PA and HVA may have different alignment.

2. For normal memory mapping, we are sure hugepage size properly
fit into vma, so we don't check whether the mapping size exceeds
the boundary of vma. But for device memory mapping, we should pay
attention to this.

This adds get_vma_page_shift() to get page shift for both normal
memory and device MMIO region, and check these two points when
selecting block mapping size for MMIO region.
Signed-off-by: NKeqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NHeng Zhang <zhangheng191@h-partners.com>
Reviewed-by: NKeqian Zhu <zhukeqian1@huawei.com>
Link: https://lore.kernel.org/r/20210507110322.23348-3-zhukeqian1@huawei.comSigned-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

789d5a97

KVM: arm64: Remove the creation time's mapping of MMIO regions · b455a717

由 Keqian Zhu 提交于 9月 29, 2022

mainline inclusion
from mainline-v5.14-rc1
commit fd6f17ba
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5R1MW
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fd6f17bade21

---------------------------------------------------------------------

The MMIO regions may be unmapped for many reasons and can be remapped
by stage2 fault path. Map MMIO regions at creation time becomes a
minor optimization and makes these two mapping path hard to sync.

Remove the mapping code while keep the useful sanity check.
Signed-off-by: NKeqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NHeng Zhang <zhangheng191@h-partners.com>
Reviewed-by: NKeqian Zhu <zhukeqian1@huawei.com>
Link: https://lore.kernel.org/r/20210507110322.23348-2-zhukeqian1@huawei.comSigned-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b455a717

ext4: fix use-after-free in ext4_ext_shift_extents · 04f91e64

由 Baokun Li 提交于 9月 29, 2022

hulk inclusion
category: bugfix
bugzilla: 187600, https://gitee.com/openeuler/kernel/issues/I5SV2U
CVE: NA

--------------------------------

If the starting position of our insert range happens to be in the hole
between the two ext4_extent_idx, because the lblk of the ext4_extent in
the previous ext4_extent_idx is always less than the start, which leads
to the "extent" variable access across the boundary, the following UAF is
triggered:

==================================================================
BUG: KASAN: use-after-free in ext4_ext_shift_extents+0x257/0x790
Read of size 4 at addr ffff88819807a008 by task fallocate/8010
CPU: 3 PID: 8010 Comm: fallocate Tainted: G            E     5.10.0+ #492
Call Trace:
 dump_stack+0x7d/0xa3
 print_address_description.constprop.0+0x1e/0x220
 kasan_report.cold+0x67/0x7f
 ext4_ext_shift_extents+0x257/0x790
 ext4_insert_range+0x5b6/0x700
 ext4_fallocate+0x39e/0x3d0
 vfs_fallocate+0x26f/0x470
 ksys_fallocate+0x3a/0x70
 __x64_sys_fallocate+0x4f/0x60
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
==================================================================

For right shifts, we can divide them into the following situations：

1. When the first ee_block of ext4_extent_idx is greater than or equal to
   start, make right shifts directly from the first ee_block.
    1) If it is greater than start, we need to continue searching in the
       previous ext4_extent_idx.
    2) If it is equal to start, we can exit the loop (iterator=NULL).

2. When the first ee_block of ext4_extent_idx is less than start, then
   traverse from the last extent to find the first extent whose ee_block
   is less than start.
    1) If extent is still the last extent after traversal, it means that
       the last ee_block of ext4_extent_idx is less than start, that is,
       start is located in the hole between idx and (idx+1), so we can
       exit the loop directly (break) without right shifts.
    2) Otherwise, make right shifts at the corresponding position of the
       found extent, and then exit the loop (iterator=NULL).

Fixes: 331573fe ("ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate")
Cc: stable@vger.kernel.org
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

04f91e64

hwtracing: hisi_ptt: Fix up for "iommu/dma: Make header private" · a9c1d0a4

由 Stephen Rothwell 提交于 9月 29, 2022

mainline inclusion
from mainline-remotes/origin/next
commit 366317ea
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5RP8T
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git/commit/?id=366317eae983a0d96aeed78ad219b9c4ed2a719a

--------------------------------------------------------------------------

drivers/hwtracing/ptt/hisi_ptt.c:13:10: fatal error: linux/dma-iommu.h: No such file or directory
   13 | #include <linux/dma-iommu.h>
      |          ^~~~~~~~~~~~~~~~~~~

Caused by:

  commit ff0de066 ("hwtracing: hisi_ptt: Add trace function support for HiSilicon PCIe Tune and Trace device")

interacting with:

  commit f2042ed2 ("iommu/dma: Make header private")

from the iommu tree.
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Acked-by: NRobin Murphy <robin.murphy@arm.com>
Acked-by: NYicong Yang <yangyicong@hisilicon.com>
[Fixed subject line and added changelog text]
Signed-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: NWangming Shao <shaowangming@h-partners.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: NJay Fang <f.fangjian@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a9c1d0a4

MAINTAINERS: Add maintainer for HiSilicon PTT driver · 14288cb8

由 Yicong Yang 提交于 9月 29, 2022

mainline inclusion
from mainline-remotes/origin/next
commit 366317ea
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5RP8T
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git/commit/?id=366317eae983a0d96aeed78ad219b9c4ed2a719a

--------------------------------------------------------------------------

Add maintainer for driver and documentation of HiSilicon PTT device.
Signed-off-by: NYicong Yang <yangyicong@hisilicon.com>
Reviewed-by: NJonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20220816114414.4092-6-yangyicong@huawei.comSigned-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: NWangming Shao <shaowangming@h-partners.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: NJay Fang <f.fangjian@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

14288cb8

docs: trace: Add HiSilicon PTT device driver documentation · 25dfba5e

由 Yicong Yang 提交于 9月 29, 2022

mainline inclusion
from mainline-remotes/origin/next
commit a7112b74
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5RP8T
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git/commit/?id=a7112b747c324dda8937d4f47b14dc0af0b465d1

--------------------------------------------------------------------------

Document the introduction and usage of HiSilicon PTT device driver as well
as the sysfs attributes description provided by the driver.
Signed-off-by: NYicong Yang <yangyicong@hisilicon.com>
Reviewed-by: NJonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: NBagas Sanjaya <bagasdotme@gmail.com>
[Fixed month and kernel version]
Link: https://lore.kernel.org/r/20220816114414.4092-5-yangyicong@huawei.comSigned-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: NWangming Shao <shaowangming@h-partners.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: NJay Fang <f.fangjian@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

25dfba5e

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功