提交 · be234044e2058b8129c448d553a715964b2b02e0 · openeuler / Kernel

30 9月, 2022 17 次提交

sched: Trivial core scheduling cookie management · be234044

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 6e33cad0
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6e33cad0af49336952e5541464bd02f5b5fd433e

--------------------------------------------------------------------------

In order to not have to use pid_struct, create a new, smaller,
structure to manage task cookies for core scheduling.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.919768100@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

be234044

sched: Migration changes for core scheduling · 30a1426a

由 Aubrey Li 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 97886d9d
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=97886d9dcd86820bdbc1fa73b455982809cbc8c2

--------------------------------------------------------------------------

 - Don't migrate if there is a cookie mismatch
     Load balance tries to move task from busiest CPU to the
     destination CPU. When core scheduling is enabled, if the
     task's cookie does not match with the destination CPU's
     core cookie, this task may be skipped by this CPU. This
     mitigates the forced idle time on the destination CPU.

 - Select cookie matched idle CPU
     In the fast path of task wakeup, select the first cookie matched
     idle CPU instead of the first idle CPU.

 - Find cookie matched idlest CPU
     In the slow path of task wakeup, find the idlest CPU whose core
     cookie matches with task's cookie
Signed-off-by: NAubrey Li <aubrey.li@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.860083871@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

30a1426a

sched: Trivial forced-newidle balancer · 74ddc15c

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit d2dfa17b
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d2dfa17bc7de67e99685c4d6557837bf801a102c

--------------------------------------------------------------------------

When a sibling is forced-idle to match the core-cookie; search for
matching tasks to fill the core.

rcu_read_unlock() can incur an infrequent deadlock in
sched_core_balance(). Fix this by using the RCU-sched flavor instead.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.800048269@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

74ddc15c

sched/fair: Snapshot the min_vruntime of CPUs on force idle · 80077c25

由 Joel Fernandes (Google) 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit c6047c2e
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c6047c2e3af68dae23ad884249e0d42ff28d2d1b

--------------------------------------------------------------------------

During force-idle, we end up doing cross-cpu comparison of vruntimes
during pick_next_task. If we simply compare (vruntime-min_vruntime)
across CPUs, and if the CPUs only have 1 task each, we will always
end up comparing 0 with 0 and pick just one of the tasks all the time.
This starves the task that was not picked. To fix this, take a snapshot
of the min_vruntime when entering force idle and use it for comparison.
This min_vruntime snapshot will only be used for cross-CPU vruntime
comparison, and nothing else.

A note about the min_vruntime snapshot and force idling:

During selection:

  When we're not fi, we need to update snapshot.
  when we're fi and we were not fi, we must update snapshot.
  When we're fi and we were already fi, we must not update snapshot.

Which gives:

  fib     fi      update
  0       0       1
  0       1       1
  1       0       1
  1       1       0

Where:

  fi:  force-idled now
  fib: force-idled before

So the min_vruntime snapshot needs to be updated when: !(fib && fi).

Also, the cfs_prio_less() function needs to be aware of whether the
core is in force idle or not, since it will be use this information to
know whether to advance a cfs_rq's min_vruntime_fi in the hierarchy.
So pass this information along via pick_task() -> prio_less().
Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.738542617@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

80077c25

sched: Fix priority inversion of cookied task with sibling · 87d56255

由 Joel Fernandes (Google) 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 7afbba11
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7afbba119f0da09824d723f8081608ea1f74ff57

--------------------------------------------------------------------------

The rationale is as follows. In the core-wide pick logic, even if
need_sync == false, we need to go look at other CPUs (non-local CPUs)
to see if they could be running RT.

Say the RQs in a particular core look like this:

Let CFS1 and CFS2 be 2 tagged CFS tags.
Let RT1 be an untagged RT task.

	rq0		rq1
	CFS1 (tagged)	RT1 (no tag)
	CFS2 (tagged)

Say schedule() runs on rq0. Now, it will enter the above loop and
pick_task(RT) will return NULL for 'p'. It will enter the above if()
block and see that need_sync == false and will skip RT entirely.

The end result of the selection will be (say prio(CFS1) > prio(CFS2)):

	rq0             rq1
	CFS1            IDLE

When it should have selected:

	rq0             rq1
	IDLE            RT
Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.678425748@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

87d56255

sched/fair: Fix forced idle sibling starvation corner case · 483069d3

由 Vineeth Pillai 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 8039e96f
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8039e96fcc1de30d5bcaf05da9ca2de46a800826

--------------------------------------------------------------------------

If there is only one long running local task and the sibling is
forced idle, it  might not get a chance to run until a schedule
event happens on any cpu in the core.

So we check for this condition during a tick to see if a sibling
is starved and then give it a chance to schedule.
Signed-off-by: NVineeth Pillai <viremana@linux.microsoft.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.617407840@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

483069d3

sched: Add core wide task selection and scheduling · 4bd71bc9

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 539f6512
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=539f65125d20aacab54d02d77f10a839f45b09dc

--------------------------------------------------------------------------

Instead of only selecting a local task, select a task for all SMT
siblings for every reschedule on the core (irrespective which logical
CPU does the reschedule).
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.557559654@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

4bd71bc9

sched: Basic tracking of matching tasks · c29a9a91

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 8a311c74
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8a311c740b53324ec584e0e3bb7077d56b123c28

--------------------------------------------------------------------------

Introduce task_struct::core_cookie as an opaque identifier for core
scheduling. When enabled; core scheduling will only allow matching
task to be on the core; where idle matches everything.

When task_struct::core_cookie is set (and core scheduling is enabled)
these tasks are indexed in a second RB-tree, first on cookie value
then on scheduling function, such that matching task selection always
finds the most elegible match.

NOTE: *shudder* at the overhead...

NOTE: *sigh*, a 3rd copy of the scheduling function; the alternative
is per class tracking of cookies and that just duplicates a lot of
stuff for no raisin (the 2nd copy lives in the rt-mutex PI code).

[Joel: folded fixes]
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.496975854@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c29a9a91

sched: Introduce sched_class::pick_task() · 195efd0e

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 21f56ffe
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=21f56ffe4482e501b9e83737612493eeaac21f5a

--------------------------------------------------------------------------

Because sched_class::pick_next_task() also implies
sched_class::set_next_task() (and possibly put_prev_task() and
newidle_balance) it is not state invariant. This makes it unsuitable
for remote task selection.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
[Vineeth: folded fixes]
Signed-off-by: NVineeth Remanan Pillai <viremana@linux.microsoft.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.437092775@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

195efd0e

sched: Allow sched_core_put() from atomic context · e33922b4

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 875feb41
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=875feb41fd20f6bd6054c9e79a5bcd9da6d8d2b2

--------------------------------------------------------------------------

Stuff the meat of sched_core_put() into a work such that we can use
sched_core_put() from atomic context.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.377455632@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e33922b4

sched: Optimize rq_lockp() usage · 727a6989

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 9ef7e7e3
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9ef7e7e33bcdb57be1afb28884053c28b5f05240

--------------------------------------------------------------------------

rq_lockp() includes a static_branch(), which is asm-goto, which is
asm volatile which defeats regular CSE. This means that:

	if (!static_branch(&foo))
		return simple;

	if (static_branch(&foo) && cond)
		return complex;

Doesn't fold and we get horrible code. Introduce __rq_lockp() without
the static_branch() on.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.316696988@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

727a6989

sched: Core-wide rq->lock · 2fe77d25

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 9edeaea1
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9edeaea1bc452372718837ed2ba775811baf1ba1

--------------------------------------------------------------------------

Introduce the basic infrastructure to have a core wide rq->lock.

This relies on the rq->__lock order being in increasing CPU number
(inside a core). It is also constrained to SMT8 per lockdep (and
SMT256 per preempt_count).

Luckily SMT8 is the max supported SMT count for Linux (Mips, Sparc and
Power are known to have this).
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/YJUNfzSgptjX7tG6@hirez.programming.kicks-ass.netSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2fe77d25

sched: Prepare for Core-wide rq->lock · 026e3779

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit d66f1b06
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d66f1b06b5b438cd20ba3664b8eef1f9c79e84bf

--------------------------------------------------------------------------

When switching on core-sched, CPUs need to agree which lock to use for
their RQ.

The new rule will be that rq->core_enabled will be toggled while
holding all rq->__locks that belong to a core. This means we need to
double check the rq->core_enabled value after each lock acquire and
retry if it changed.

This also has implications for those sites that take multiple RQ
locks, they need to be careful that the second lock doesn't end up
being the first lock.

Verify the lock pointer after acquiring the first lock, because if
they're on the same core, holding any of the rq->__lock instances will
pin the core state.

While there, change the rq->__lock order to CPU number, instead of rq
address, this greatly simplifies the next patch.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/YJUNY0dmrJMD/BIm@hirez.programming.kicks-ass.netSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

026e3779

sched: Wrap rq::lock access · bdb12c26

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 5cb9eaa3
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5cb9eaa3d274f75539077a28cf01e3563195fa53

--------------------------------------------------------------------------

In preparation of playing games with rq->lock, abstract the thing
using an accessor.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.136465446@infradead.org
Conflicts:
	kernel/sched/core.c
	[Bugfix a7c81556("sched: Fix migrate_disable() vs rt/dl balancing")
	 is not applied.
	 Bugfix 565790d2("sched: Fix balance_callback()") is not applied.
	 Bugfix ae792702("sched: Optimize finish_lock_switch()") is not applied.
	 Bugfix 36c6e17b("sched/core: Print out straggler tasks in sched_cpu_dying()")
	 is not applied.
	 Feature 2558aacf("sched/hotplug: Ensure only per-cpu kthreads run
	 during hotplug") is not applied.
	 Feature f2469a1f("sched/core: Wait for tasks being pushed away on hotplug")
	 is not applied.]

	kernel/sched/deadline.c
	[Bugfix a7c81556("sched: Fix migrate_disable() vs rt/dl balancing")
	 is not applied.]

	kernel/sched/fair.c
	[Feature acf66d70("sched/fair: Provide can_migrate_task_llc")
	 Feature 0826530d("sched/fair: Remove update of blocked load from newidle_balance")
	 s not applied.
	 Feature 6864cf01("sched/fair: Steal work from an overloaded CPU when CPU goes idle")]

	kernel/sched/rt.c
	[Bugfix a7c81556("sched: Fix migrate_disable() vs rt/dl balancing")
	 is not applied.]

	kernel/sched/sched.h
	[[Bugfix a7c81556("sched: Fix migrate_disable() vs rt/dl balancing")
	 is not applied.]
Signed-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

bdb12c26

sched: Provide raw_spin_rq_*lock*() helpers · a52e180b

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 39d371b7
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=39d371b7c0c299d489041884d005aacc4bba8c15

--------------------------------------------------------------------------

In prepration for playing games with rq->lock, add some rq_lock
wrappers.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.075967879@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a52e180b

sched/fair: Add a few assertions · df54b4c7

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 9099a147
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9099a14708ce1dfecb6002605594a0daa319b555

--------------------------------------------------------------------------
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NDon Hiatt <dhiatt@digitalocean.com>
Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123308.015639083@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

df54b4c7

rbtree: Add generic add and find helpers · e1c6bbe4

由 Peter Zijlstra 提交于 9月 30, 2022

mainline inclusion
from mainline-v5.12-rc1
commit 2d24dd57
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2d24dd5798d04

--------------------------------------------------------------------------

I've always been bothered by the endless (fragile) boilerplate for
rbtree, and I recently wrote some rbtree helpers for objtool and
figured I should lift them into the kernel and use them more widely.

Provide:

partial-order; less() based:
 - rb_add(): add a new entry to the rbtree
 - rb_add_cached(): like rb_add(), but for a rb_root_cached

total-order; cmp() based:
 - rb_find(): find an entry in an rbtree
 - rb_find_add(): find an entry, and add if not found

 - rb_find_first(): find the first (leftmost) matching entry
 - rb_next_match(): continue from rb_find_first()
 - rb_for_each(): iterate a sub-tree using the previous two

Inlining and constant propagation should see the compiler inline the
whole thing, including the various compare functions.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Reviewed-by: NMichel Lespinasse <walken@google.com>
Acked-by: NDavidlohr Bueso <dbueso@suse.de>

Conflicts:
	tools/objtool/elf.c
	[Feature 3690914e("objtool: Extract elf_symbol_add()")]
Signed-off-by: NLin Shengwang <linshengwang1@huawei.com>
Reviewed-by: Nlihua <hucool.lihua@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e1c6bbe4

29 9月, 2022 23 次提交

KVM: arm64: Try stage2 block mapping for host device MMIO · 789d5a97

由 Keqian Zhu 提交于 9月 29, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 2aa53d68
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5R1MW
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2aa53d68cee6

------------------------------------------------------------------

The MMIO region of a device maybe huge (GB level), try to use
block mapping in stage2 to speedup both map and unmap.

Compared to normal memory mapping, we should consider two more
points when try block mapping for MMIO region:

1. For normal memory mapping, the PA(host physical address) and
HVA have same alignment within PUD_SIZE or PMD_SIZE when we use
the HVA to request hugepage, so we don't need to consider PA
alignment when verifing block mapping. But for device memory
mapping, the PA and HVA may have different alignment.

2. For normal memory mapping, we are sure hugepage size properly
fit into vma, so we don't check whether the mapping size exceeds
the boundary of vma. But for device memory mapping, we should pay
attention to this.

This adds get_vma_page_shift() to get page shift for both normal
memory and device MMIO region, and check these two points when
selecting block mapping size for MMIO region.
Signed-off-by: NKeqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NHeng Zhang <zhangheng191@h-partners.com>
Reviewed-by: NKeqian Zhu <zhukeqian1@huawei.com>
Link: https://lore.kernel.org/r/20210507110322.23348-3-zhukeqian1@huawei.comSigned-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

789d5a97

KVM: arm64: Remove the creation time's mapping of MMIO regions · b455a717

由 Keqian Zhu 提交于 9月 29, 2022

mainline inclusion
from mainline-v5.14-rc1
commit fd6f17ba
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5R1MW
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fd6f17bade21

---------------------------------------------------------------------

The MMIO regions may be unmapped for many reasons and can be remapped
by stage2 fault path. Map MMIO regions at creation time becomes a
minor optimization and makes these two mapping path hard to sync.

Remove the mapping code while keep the useful sanity check.
Signed-off-by: NKeqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NHeng Zhang <zhangheng191@h-partners.com>
Reviewed-by: NKeqian Zhu <zhukeqian1@huawei.com>
Link: https://lore.kernel.org/r/20210507110322.23348-2-zhukeqian1@huawei.comSigned-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b455a717

ext4: fix use-after-free in ext4_ext_shift_extents · 04f91e64

由 Baokun Li 提交于 9月 29, 2022

hulk inclusion
category: bugfix
bugzilla: 187600, https://gitee.com/openeuler/kernel/issues/I5SV2U
CVE: NA

--------------------------------

If the starting position of our insert range happens to be in the hole
between the two ext4_extent_idx, because the lblk of the ext4_extent in
the previous ext4_extent_idx is always less than the start, which leads
to the "extent" variable access across the boundary, the following UAF is
triggered:

==================================================================
BUG: KASAN: use-after-free in ext4_ext_shift_extents+0x257/0x790
Read of size 4 at addr ffff88819807a008 by task fallocate/8010
CPU: 3 PID: 8010 Comm: fallocate Tainted: G            E     5.10.0+ #492
Call Trace:
 dump_stack+0x7d/0xa3
 print_address_description.constprop.0+0x1e/0x220
 kasan_report.cold+0x67/0x7f
 ext4_ext_shift_extents+0x257/0x790
 ext4_insert_range+0x5b6/0x700
 ext4_fallocate+0x39e/0x3d0
 vfs_fallocate+0x26f/0x470
 ksys_fallocate+0x3a/0x70
 __x64_sys_fallocate+0x4f/0x60
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
==================================================================

For right shifts, we can divide them into the following situations：

1. When the first ee_block of ext4_extent_idx is greater than or equal to
   start, make right shifts directly from the first ee_block.
    1) If it is greater than start, we need to continue searching in the
       previous ext4_extent_idx.
    2) If it is equal to start, we can exit the loop (iterator=NULL).

2. When the first ee_block of ext4_extent_idx is less than start, then
   traverse from the last extent to find the first extent whose ee_block
   is less than start.
    1) If extent is still the last extent after traversal, it means that
       the last ee_block of ext4_extent_idx is less than start, that is,
       start is located in the hole between idx and (idx+1), so we can
       exit the loop directly (break) without right shifts.
    2) Otherwise, make right shifts at the corresponding position of the
       found extent, and then exit the loop (iterator=NULL).

Fixes: 331573fe ("ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate")
Cc: stable@vger.kernel.org
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

04f91e64

hwtracing: hisi_ptt: Fix up for "iommu/dma: Make header private" · a9c1d0a4

由 Stephen Rothwell 提交于 9月 29, 2022

mainline inclusion
from mainline-remotes/origin/next
commit 366317ea
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5RP8T
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git/commit/?id=366317eae983a0d96aeed78ad219b9c4ed2a719a

--------------------------------------------------------------------------

drivers/hwtracing/ptt/hisi_ptt.c:13:10: fatal error: linux/dma-iommu.h: No such file or directory
   13 | #include <linux/dma-iommu.h>
      |          ^~~~~~~~~~~~~~~~~~~

Caused by:

  commit ff0de066 ("hwtracing: hisi_ptt: Add trace function support for HiSilicon PCIe Tune and Trace device")

interacting with:

  commit f2042ed2 ("iommu/dma: Make header private")

from the iommu tree.
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Acked-by: NRobin Murphy <robin.murphy@arm.com>
Acked-by: NYicong Yang <yangyicong@hisilicon.com>
[Fixed subject line and added changelog text]
Signed-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: NWangming Shao <shaowangming@h-partners.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: NJay Fang <f.fangjian@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a9c1d0a4

MAINTAINERS: Add maintainer for HiSilicon PTT driver · 14288cb8

由 Yicong Yang 提交于 9月 29, 2022

mainline inclusion
from mainline-remotes/origin/next
commit 366317ea
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5RP8T
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git/commit/?id=366317eae983a0d96aeed78ad219b9c4ed2a719a

--------------------------------------------------------------------------

Add maintainer for driver and documentation of HiSilicon PTT device.
Signed-off-by: NYicong Yang <yangyicong@hisilicon.com>
Reviewed-by: NJonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20220816114414.4092-6-yangyicong@huawei.comSigned-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: NWangming Shao <shaowangming@h-partners.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: NJay Fang <f.fangjian@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

14288cb8

docs: trace: Add HiSilicon PTT device driver documentation · 25dfba5e

由 Yicong Yang 提交于 9月 29, 2022

mainline inclusion
from mainline-remotes/origin/next
commit a7112b74
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5RP8T
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git/commit/?id=a7112b747c324dda8937d4f47b14dc0af0b465d1

--------------------------------------------------------------------------

Document the introduction and usage of HiSilicon PTT device driver as well
as the sysfs attributes description provided by the driver.
Signed-off-by: NYicong Yang <yangyicong@hisilicon.com>
Reviewed-by: NJonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: NBagas Sanjaya <bagasdotme@gmail.com>
[Fixed month and kernel version]
Link: https://lore.kernel.org/r/20220816114414.4092-5-yangyicong@huawei.comSigned-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: NWangming Shao <shaowangming@h-partners.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: NJay Fang <f.fangjian@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

25dfba5e

hwtracing: hisi_ptt: Add tune function support for HiSilicon PCIe Tune and Trace device · 7dedf63e

由 Yicong Yang 提交于 9月 29, 2022

mainline inclusion
from mainline-remotes/origin/next
commit 5ca57b03
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5RP8T
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git/commit/?id=5ca57b03d8c5de4c59234cc11fe9dd9f13d57f48

--------------------------------------------------------------------------

Add tune function for the HiSilicon Tune and Trace device. The interface
of tune is exposed through sysfs attributes of PTT PMU device.
Acked-by: NMathieu Poirier <mathieu.poirier@linaro.org>
Reviewed-by: NJonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NYicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20220816114414.4092-4-yangyicong@huawei.comSigned-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: NWangming Shao <shaowangming@h-partners.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: NJay Fang <f.fangjian@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

7dedf63e

hwtracing: hisi_ptt: Add trace function support for HiSilicon PCIe Tune and Trace device · 5f68d4cc

由 Yicong Yang 提交于 9月 29, 2022

mainline inclusion
from mainline-remotes/origin/next
commit ff0de066
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5RP8T
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git/commit/?id=ff0de066b4632ccb2b2e50f90c0c5be7f4689de7

--------------------------------------------------------------------------

HiSilicon PCIe tune and trace device(PTT) is a PCIe Root Complex integrated
Endpoint(RCiEP) device, providing the capability to dynamically monitor and
tune the PCIe traffic and trace the TLP headers.

Add the driver for the device to enable the trace function. Register PMU
device of PTT trace, then users can use trace through perf command. The
driver makes use of perf AUX trace function and support the following
events to configure the trace:

- filter: select Root port or Endpoint to trace
- type: select the type of traced TLP headers
- direction: select the direction of traced TLP headers
- format: select the data format of the traced TLP headers

This patch initially add basic trace support of PTT device.
Acked-by: NMathieu Poirier <mathieu.poirier@linaro.org>
Reviewed-by: NJonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NYicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20220816114414.4092-3-yangyicong@huawei.comSigned-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: NWangming Shao <shaowangming@h-partners.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Reviewed-by: NJay Fang <f.fangjian@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5f68d4cc

iommu/arm-smmu-v3: Make default domain type of HiSilicon PTT device to identity · 86b23c69

由 Yicong Yang 提交于 9月 29, 2022

mainline inclusion
from mainline-remotes/origin/next
commit 24b6c779
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5RP8T
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git/commit/?id=24b6c7798a0122012ca848ea0d25e973334266b0

--------------------------------------------------------------------------

The DMA operations of HiSilicon PTT device can only work properly with
identical mappings. So add a quirk for the device to force the domain
as passthrough.
Acked-by: NWill Deacon <will@kernel.org>
Signed-off-by: NYicong Yang <yangyicong@hisilicon.com>
Reviewed-by: NJohn Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/20220816114414.4092-2-yangyicong@huawei.comSigned-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: NWangming Shao <shaowangming@h-partners.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Reviewed-by: NJay Fang <f.fangjian@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

86b23c69

mm: Force TLB flush for PFNMAP mappings before unlink_file_vma() · 1916192d

由 Jann Horn 提交于 9月 29, 2022

stable inclusion
from stable-v5.10.142
commit 895428ee124ad70b9763259308354877b725c31d
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5PE9S
CVE: CVE-2022-39188

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=895428ee124ad70b9763259308354877b725c31d

--------------------------------

commit b67fbebd upstream.

Some drivers rely on having all VMAs through which a PFN might be
accessible listed in the rmap for correctness.
However, on X86, it was possible for a VMA with stale TLB entries
to not be listed in the rmap.

This was fixed in mainline with
commit b67fbebd ("mmu_gather: Force tlb-flush VM_PFNMAP vmas"),
but that commit relies on preceding refactoring in
commit 18ba064e ("mmu_gather: Let there be one tlb_{start,end}_vma()
implementation") and commit 1e9fdf21 ("mmu_gather: Remove per arch
tlb_{start,end}_vma()").

This patch provides equivalent protection without needing that
refactoring, by forcing a TLB flush between removing PTEs in
unmap_vmas() and the call to unlink_file_vma() in free_pgtables().

[This is a stable-specific rewrite of the upstream commit!]
Signed-off-by: NJann Horn <jannh@google.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Nze zuo <zuoze1@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1916192d

sched/fair: Fix kabi broken in struct cfs_rq · 014a36c8

由 Zheng Zengkai 提交于 9月 29, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6CQ
CVE: NA

---------------------------------------

In struct cfs_rq, the name of 'throttled_clock_pelt' and
'throttled_clock_pelt_time' changed causing kabi broken,
use KABI_REPLACE to fix it.
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: NWang Hai <wanghai38@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

014a36c8

sched/fair: Fix cfs_rq_clock_pelt() for throttled cfs_rq · 5bbd0d05

由 Chengming Zhou 提交于 9月 29, 2022

stable inclusion
from stable-v5.10.121
commit 147a376c1afea117eccda36451121ea781aa5028
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6CQ

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=147a376c1afea117eccda36451121ea781aa5028

--------------------------------

[ Upstream commit 64eaf507 ]

Since commit 23127296 ("sched/fair: Update scale invariance of PELT")
change to use rq_clock_pelt() instead of rq_clock_task(), we should also
use rq_clock_pelt() for throttled_clock_task_time and throttled_clock_task
accounting to get correct cfs_rq_clock_pelt() of throttled cfs_rq. And
rename throttled_clock_task(_time) to be clock_pelt rather than clock_task.

Fixes: 23127296 ("sched/fair: Update scale invariance of PELT")
Signed-off-by: NChengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NBen Segall <bsegall@google.com>
Reviewed-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20220408115309.81603-1-zhouchengming@bytedance.comSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

5bbd0d05

ext4: only allow test_dummy_encryption when supported · 91d20b0c

由 Eric Biggers 提交于 9月 29, 2022

stable inclusion
from stable-v5.10.121
commit a67100f42665cf7a5ed7821376140f62def0d31e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6CQ

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a67100f42665cf7a5ed7821376140f62def0d31e

--------------------------------

commit 5f41fdae upstream.

Make the test_dummy_encryption mount option require that the encrypt
feature flag be already enabled on the filesystem, rather than
automatically enabling it.  Practically, this means that "-O encrypt"
will need to be included in MKFS_OPTIONS when running xfstests with the
test_dummy_encryption mount option.  (ext4/053 also needs an update.)

Moreover, as long as the preconditions for test_dummy_encryption are
being tightened anyway, take the opportunity to start rejecting it when
!CONFIG_FS_ENCRYPTION rather than ignoring it.

The motivation for requiring the encrypt feature flag is that:

- Having the filesystem auto-enable feature flags is problematic, as it
  bypasses the usual sanity checks.  The specific issue which came up
  recently is that in kernel versions where ext4 supports casefold but
  not encrypt+casefold (v5.1 through v5.10), the kernel will happily add
  the encrypt flag to a filesystem that has the casefold flag, making it
  unmountable -- but only for subsequent mounts, not the initial one.
  This confused the casefold support detection in xfstests, causing
  generic/556 to fail rather than be skipped.

- The xfstests-bld test runners (kvm-xfstests et al.) already use the
  required mkfs flag, so they will not be affected by this change.  Only
  users of test_dummy_encryption alone will be affected.  But, this
  option has always been for testing only, so it should be fine to
  require that the few users of this option update their test scripts.

- f2fs already requires it (for its equivalent feature flag).
Signed-off-by: NEric Biggers <ebiggers@google.com>
Reviewed-by: NGabriel Krisman Bertazi <krisman@collabora.com>
Link: https://lore.kernel.org/r/20220519204437.61645-1-ebiggers@kernel.orgSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

 Conflicts:
	fs/ext4/super.c
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

91d20b0c

MIPS: IP30: Remove incorrect `cpu_has_fpu' override · 87558a3c

由 Maciej W. Rozycki 提交于 9月 29, 2022

stable inclusion
from stable-v5.10.121
commit 96662c77466dfc2285519c87a2b955bb2d4f5278
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6CQ

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=96662c77466dfc2285519c87a2b955bb2d4f5278

--------------------------------

commit f44b3e74 upstream.

Remove unsupported forcing of `cpu_has_fpu' to 1, which makes the `nofpu'
kernel parameter non-functional, and also causes a link error:

ld: arch/mips/kernel/traps.o: in function `trap_init':
./arch/mips/include/asm/msa.h:(.init.text+0x348): undefined reference to `handle_fpe'
ld: ./arch/mips/include/asm/msa.h:(.init.text+0x354): undefined reference to `handle_fpe'
ld: ./arch/mips/include/asm/msa.h:(.init.text+0x360): undefined reference to `handle_fpe'

where the CONFIG_MIPS_FP_SUPPORT configuration option has been disabled.
Signed-off-by: NMaciej W. Rozycki <macro@orcam.me.uk>
Reported-by: NStephen Zhang <starzhangzsd@gmail.com>
Fixes: 7505576d ("MIPS: add support for SGI Octane (IP30)")
Cc: stable@vger.kernel.org # v5.5+
Signed-off-by: NThomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

87558a3c

MIPS: IP27: Remove incorrect `cpu_has_fpu' override · 9dd4b6ce

由 Maciej W. Rozycki 提交于 9月 29, 2022

stable inclusion
from stable-v5.10.121
commit 57e561573f2e51f9f53428caa17eae6a7090f0f5
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6CQ

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=57e561573f2e51f9f53428caa17eae6a7090f0f5

--------------------------------

commit 424c3781 upstream.

Remove unsupported forcing of `cpu_has_fpu' to 1, which makes the `nofpu'
kernel parameter non-functional, and also causes a link error:

ld: arch/mips/kernel/traps.o: in function `trap_init':
./arch/mips/include/asm/msa.h:(.init.text+0x348): undefined reference to `handle_fpe'
ld: ./arch/mips/include/asm/msa.h:(.init.text+0x354): undefined reference to `handle_fpe'
ld: ./arch/mips/include/asm/msa.h:(.init.text+0x360): undefined reference to `handle_fpe'

where the CONFIG_MIPS_FP_SUPPORT configuration option has been disabled.
Signed-off-by: NMaciej W. Rozycki <macro@orcam.me.uk>
Reported-by: NStephen Zhang <starzhangzsd@gmail.com>
Fixes: 0ebb2f41 ("MIPS: IP27: Update/restructure CPU overrides")
Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: NThomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

9dd4b6ce

RDMA/rxe: Generate a completion for unsupported/invalid opcode · b66d62a9

由 Xiao Yang 提交于 9月 29, 2022

stable inclusion
from stable-v5.10.121
commit bb55ca1612923b06c4d86ab28b8dd8fdca55ced1
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6CQ

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=bb55ca1612923b06c4d86ab28b8dd8fdca55ced1

--------------------------------

commit 2f917af7 upstream.

Current rxe_requester() doesn't generate a completion when processing an
unsupported/invalid opcode. If rxe driver doesn't support a new opcode
(e.g. RDMA Atomic Write) and RDMA library supports it, an application
using the new opcode can reproduce this issue. Fix the issue by calling
"goto err;".

Fixes: 8700e3e7 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20220410113513.27537-1-yangx.jy@fujitsu.comSigned-off-by: NXiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

b66d62a9

Revert "random: use static branch for crng_ready()" · 06356571

由 Jason A. Donenfeld 提交于 9月 29, 2022

stable inclusion
from stable-v5.10.121
commit 72268945b124cd61336f9b4cac538b0516399a2d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6CQ

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=72268945b124cd61336f9b4cac538b0516399a2d

--------------------------------

This reverts upstream commit f5bda35f
from stable. It's not essential and will take some time during 5.19 to
work out properly.
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

06356571

block: fix bio_clone_blkg_association() to associate with proper blkcg_gq · 9ba8bda5

由 Jan Kara 提交于 9月 29, 2022

stable inclusion
from stable-v5.10.121
commit 6b03dc67dde3811b11125b089bec876f1a9806b7
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6CQ

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6b03dc67dde3811b11125b089bec876f1a9806b7

--------------------------------

commit 22b106e5 upstream.

Commit d92c370a ("block: really clone the block cgroup in
bio_clone_blkg_association") changed bio_clone_blkg_association() to
just clone bio->bi_blkg reference from source to destination bio. This
is however wrong if the source and destination bios are against
different block devices because struct blkcg_gq is different for each
bdev-blkcg pair. This will result in IOs being accounted (and throttled
as a result) multiple times against the same device (src bdev) while
throttling of the other device (dst bdev) is ignored. In case of BFQ the
inconsistency can even result in crashes in bfq_bic_update_cgroup().
Fix the problem by looking up correct blkcg_gq for the cloned bio.
Reported-by: NLogan Gunthorpe <logang@deltatee.com>
Reported-and-tested-by: NDonald Buczek <buczek@molgen.mpg.de>
Fixes: d92c370a ("block: really clone the block cgroup in bio_clone_blkg_association")
CC: stable@vger.kernel.org
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220602081242.7731-1-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

9ba8bda5

bfq: Remove pointless bfq_init_rq() calls · e6782018

由 Jan Kara 提交于 9月 29, 2022

stable inclusion
from stable-v5.10.121
commit 80b0a2b3dfea5de3224ba756830b9243709c6e9e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6CQ

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=80b0a2b3dfea5de3224ba756830b9243709c6e9e

--------------------------------

commit 5f550ede upstream.

We call bfq_init_rq() from request merging functions where requests we
get should have already gone through bfq_init_rq() during insert and
anyway we want to do anything only if the request is already tracked by
BFQ. So replace calls to bfq_init_rq() with RQ_BFQQ() instead to simply
skip requests untracked by BFQ. We move bfq_init_rq() call in
bfq_insert_request() a bit earlier to cover request merging and thus
can transfer FIFO position in case of a merge.

CC: stable@vger.kernel.org
Tested-by: N"yukuai (C)" <yukuai3@huawei.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220401102752.8599-6-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

 Conflicts:
	block/bfq-iosched.c
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

e6782018

bfq: Drop pointless unlock-lock pair · 83d20aeb

由 Jan Kara 提交于 9月 29, 2022

stable inclusion
from stable-v5.10.121
commit 13599aac1b983341a1240199e461bf1a8ee55dfb
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6CQ

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=13599aac1b983341a1240199e461bf1a8ee55dfb

--------------------------------

commit fc84e1f9 upstream.

In bfq_insert_request() we unlock bfqd->lock only to call
trace_block_rq_insert() and then lock bfqd->lock again. This is really
pointless since tracing is disabled if we really care about performance
and even if the tracepoint is enabled, it is a quick call.

CC: stable@vger.kernel.org
Tested-by: N"yukuai (C)" <yukuai3@huawei.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220401102752.8599-5-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

83d20aeb

bfq: Avoid merging queues with different parents · 2b99328b

由 Jan Kara 提交于 9月 29, 2022

stable inclusion
from stable-v5.10.121
commit 7d172b9dc913e161d8ff88770eea01701ff553de
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6CQ

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7d172b9dc913e161d8ff88770eea01701ff553de

--------------------------------

commit c1cee4ab upstream.

It can happen that the parent of a bfqq changes between the moment we
decide two queues are worth to merge (and set bic->stable_merge_bfqq)
and the moment bfq_setup_merge() is called. This can happen e.g. because
the process submitted IO for a different cgroup and thus bfqq got
reparented. It can even happen that the bfqq we are merging with has
parent cgroup that is already offline and going to be destroyed in which
case the merge can lead to use-after-free issues such as:

BUG: KASAN: use-after-free in __bfq_deactivate_entity+0x9cb/0xa50
Read of size 8 at addr ffff88800693c0c0 by task runc:[2:INIT]/10544

CPU: 0 PID: 10544 Comm: runc:[2:INIT] Tainted: G            E     5.15.2-0.g5fb85fd-default #1 openSUSE Tumbleweed (unreleased) f1f3b891c72369aebecd2e43e4641a6358867c70
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
Call Trace:
 <IRQ>
 dump_stack_lvl+0x46/0x5a
 print_address_description.constprop.0+0x1f/0x140
 ? __bfq_deactivate_entity+0x9cb/0xa50
 kasan_report.cold+0x7f/0x11b
 ? __bfq_deactivate_entity+0x9cb/0xa50
 __bfq_deactivate_entity+0x9cb/0xa50
 ? update_curr+0x32f/0x5d0
 bfq_deactivate_entity+0xa0/0x1d0
 bfq_del_bfqq_busy+0x28a/0x420
 ? resched_curr+0x116/0x1d0
 ? bfq_requeue_bfqq+0x70/0x70
 ? check_preempt_wakeup+0x52b/0xbc0
 __bfq_bfqq_expire+0x1a2/0x270
 bfq_bfqq_expire+0xd16/0x2160
 ? try_to_wake_up+0x4ee/0x1260
 ? bfq_end_wr_async_queues+0xe0/0xe0
 ? _raw_write_unlock_bh+0x60/0x60
 ? _raw_spin_lock_irq+0x81/0xe0
 bfq_idle_slice_timer+0x109/0x280
 ? bfq_dispatch_request+0x4870/0x4870
 __hrtimer_run_queues+0x37d/0x700
 ? enqueue_hrtimer+0x1b0/0x1b0
 ? kvm_clock_get_cycles+0xd/0x10
 ? ktime_get_update_offsets_now+0x6f/0x280
 hrtimer_interrupt+0x2c8/0x740

Fix the problem by checking that the parent of the two bfqqs we are
merging in bfq_setup_merge() is the same.

Link: https://lore.kernel.org/linux-block/20211125172809.GC19572@quack2.suse.cz/
CC: stable@vger.kernel.org
Fixes: 430a67f9 ("block, bfq: merge bursts of newly-created queues")
Tested-by: N"yukuai (C)" <yukuai3@huawei.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220401102752.8599-2-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

2b99328b

thermal/core: Fix memory leak in the error path · 3415af3c

由 Daniel Lezcano 提交于 9月 29, 2022

stable inclusion
from stable-v5.10.121
commit 54cdc10ac7184f2159a4f5658b497e90244d1516
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6CQ

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=54cdc10ac7184f2159a4f5658b497e90244d1516

--------------------------------

commit d44616c6 upstream.

Fix the following error:

 smatch warnings:
 drivers/thermal/thermal_core.c:1020 __thermal_cooling_device_register() warn: possible memory leak of 'cdev'

by freeing the cdev when exiting the function in the error path.

Fixes: 58483761 ("thermal/drivers/core: Use a char pointer for the cooling device name")
Reported-by: Nkernel test robot <lkp@intel.com>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210319202257.890848-1-daniel.lezcano@linaro.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

3415af3c

thermal/core: fix a UAF bug in __thermal_cooling_device_register() · 1467ec66

由 Ziyang Xuan 提交于 9月 29, 2022

stable inclusion
from stable-v5.10.121
commit b132abaa6515e14e0db292389c25007d666e1925
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6CQ

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b132abaa6515e14e0db292389c25007d666e1925

--------------------------------

commit 0a5c2671 upstream.

When device_register() return failed, program will goto out_kfree_type
to release 'cdev->device' by put_device(). That will call thermal_release()
to free 'cdev'. But the follow-up processes access 'cdev' continually.
That trggers the UAF bug.
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

====================================================================
BUG: KASAN: use-after-free in __thermal_cooling_device_register+0x75b/0xa90
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
 dump_stack_lvl+0xe2/0x152
 print_address_description.constprop.0+0x21/0x140
 ? __thermal_cooling_device_register+0x75b/0xa90
 kasan_report.cold+0x7f/0x11b
 ? __thermal_cooling_device_register+0x75b/0xa90
 __thermal_cooling_device_register+0x75b/0xa90
 ? memset+0x20/0x40
 ? __sanitizer_cov_trace_pc+0x1d/0x50
 ? __devres_alloc_node+0x130/0x180
 devm_thermal_of_cooling_device_register+0x67/0xf0
 max6650_probe.cold+0x557/0x6aa
......

Freed by task 258:
 kasan_save_stack+0x1b/0x40
 kasan_set_track+0x1c/0x30
 kasan_set_free_info+0x20/0x30
 __kasan_slab_free+0x109/0x140
 kfree+0x117/0x4c0
 thermal_release+0xa0/0x110
 device_release+0xa7/0x240
 kobject_put+0x1ce/0x540
 put_device+0x20/0x30
 __thermal_cooling_device_register+0x731/0xa90
 devm_thermal_of_cooling_device_register+0x67/0xf0
 max6650_probe.cold+0x557/0x6aa [max6650]

Do not use 'cdev' again after put_device() to fix the problem like doing
in thermal_zone_device_register().

[dlezcano]: as requested by Rafael, change the affectation into two statements.

Fixes: 58483761 ("thermal/drivers/core: Use a char pointer for the cooling device name")
Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
Reported-by: Nkernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/20211015024504.947520-1-william.xuanziyang@huawei.comSigned-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1467ec66

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功