sched: cfs: add bpf hooks to control wakeup and tick preemption

maillist inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5F6X6 CVE: NA Reference: https://lore.kernel.org/all/20210916162451.709260-1-guro@fb.com/ ------------------- This patch adds 3 hooks to control wakeup and tick preemption: cfs_check_preempt_tick cfs_check_preempt_wakeup cfs_wakeup_preempt_entity The first one allows to force or suppress a preemption from a tick context. An obvious usage example is to minimize the number of non-voluntary context switches and decrease an associated latency penalty by (conditionally) providing tasks or task groups an extended execution slice. It can be used instead of tweaking sysctl_sched_min_granularity. The second one is called from the wakeup preemption code and allows to redefine whether a newly woken task should preempt the execution of the current task. This is useful to minimize a number of preemptions of latency sensitive tasks. To some extent it's a more flexible analog of a sysctl_sched_wakeup_granularity. The third one is similar, but it tweaks the wakeup_preempt_entity() function, which is called not only from a wakeup context, but also from pick_next_task(), which allows to influence the decision on which task will be running next. It's a place for a discussion whether we need both these hooks or only one of them: the second is more powerful, but depends more on the current implementation. In any case, bpf hooks are not an ABI, so it's not a deal breaker. The idea of the wakeup_preempt_entity hook belongs to Rik van Riel. He also contributed a lot to the whole patchset by proving his ideas, recommendations and a feedback for earlier (non-public) versions. Signed-off-by: N Roman Gushchin <guro@fb.com> Signed-off-by: N Chen Hui <judy.chenhui@huawei.com> Signed-off-by: N Ren Zhijie <renzhijie2@huawei.com>

sched: cfs: add bpf hooks to control wakeup and tick preemption
maillist inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5F6X6 CVE: NA Reference: https://lore.kernel.org/all/20210916162451.709260-1-guro@fb.com/ ------------------- This patch adds 3 hooks to control wakeup and tick preemption: cfs_check_preempt_tick cfs_check_preempt_wakeup cfs_wakeup_preempt_entity The first one allows to force or suppress a preemption from a tick context. An obvious usage example is to minimize the number of non-voluntary context switches and decrease an associated latency penalty by (conditionally) providing tasks or task groups an extended execution slice. It can be used instead of tweaking sysctl_sched_min_granularity. The second one is called from the wakeup preemption code and allows to redefine whether a newly woken task should preempt the execution of the current task. This is useful to minimize a number of preemptions of latency sensitive tasks. To some extent it's a more flexible analog of a sysctl_sched_wakeup_granularity. The third one is similar, but it tweaks the wakeup_preempt_entity() function, which is called not only from a wakeup context, but also from pick_next_task(), which allows to influence the decision on which task will be running next. It's a place for a discussion whether we need both these hooks or only one of them: the second is more powerful, but depends more on the current implementation. In any case, bpf hooks are not an ABI, so it's not a deal breaker. The idea of the wakeup_preempt_entity hook belongs to Rik van Riel. He also contributed a lot to the whole patchset by proving his ideas, recommendations and a feedback for earlier (non-public) versions. Signed-off-by: N Roman Gushchin <guro@fb.com> Signed-off-by: N Chen Hui <judy.chenhui@huawei.com> Signed-off-by: N Ren Zhijie <renzhijie2@huawei.com>
f9a09a81 · Roman Gushchin · Zheng Zengkai · 915c4dfc · f9a09a81 · f9a09a81
隐藏空白更改
内联并排

Showing with 37 addition and 1 deletion

include/linux/sched_hook_defs.h include/linux/sched_hook_defs.h +4 -1

kernel/sched/fair.c kernel/sched/fair.c +33 -0

未找到文件。
--- a/include/linux/sched_hook_defs.h
+++ b/include/linux/sched_hook_defs.h
 /* SPDX-License-Identifier: GPL-2.0 */
-BPF_SCHED_HOOK(int, 0, dummy, void)
+BPF_SCHED_HOOK(int, 0, cfs_check_preempt_tick, struct sched_entity *curr, unsigned long delta_exec)
+BPF_SCHED_HOOK(int, 0, cfs_check_preempt_wakeup, struct task_struct *curr, struct task_struct *p)
+BPF_SCHED_HOOK(int, 0, cfs_wakeup_preempt_entity, struct sched_entity *curr,
+	struct sched_entity *se)
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -28,6 +28,7 @@
 #include <linux/delay.h>
 #include <linux/tracehook.h>
 #endif
+#include <linux/bpf_sched.h>
 /*
 * Targeted preemption latency for CPU-bound tasks:
@@ -4474,6 +4475,18 @@ check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
 	ideal_runtime = sched_slice(cfs_rq, curr);
 	delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime;
+#ifdef CONFIG_BPF_SCHED
+	if (bpf_sched_enabled()) {
+		int ret = bpf_sched_cfs_check_preempt_tick(curr, delta_exec);
+		if (ret < 0)
+			return;
+		else if (ret > 0)
+			resched_curr(rq_of(cfs_rq));
+	}
+#endif
 	if (delta_exec > ideal_runtime) {
 		resched_curr(rq_of(cfs_rq));
 		/*
@@ -7043,6 +7056,15 @@ wakeup_preempt_entity(struct sched_entity *curr, struct sched_entity *se)
 {
 	s64 gran, vdiff = curr->vruntime - se->vruntime;
+#ifdef CONFIG_BPF_SCHED
+	if (bpf_sched_enabled()) {
+		int ret = bpf_sched_cfs_wakeup_preempt_entity(curr, se);
+		if (ret)
+			return ret;
+	}
+#endif
 	if (vdiff <= 0)
 		return -1;
@@ -7129,6 +7151,17 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_
 	    likely(!task_has_idle_policy(p)))
 		goto preempt;
+#ifdef CONFIG_BPF_SCHED
+	if (bpf_sched_enabled()) {
+		int ret = bpf_sched_cfs_check_preempt_wakeup(current, p);
+		if (ret < 0)
+			return;
+		else if (ret > 0)
+			goto preempt;
+	}
+#endif
 	/*
 	 * Batch and idle tasks do not preempt non-idle tasks (their preemption
 	 * is driven by the tick):