alinux: mm, memcg: fix soft lockup in priority oom

Assuming that there is a memory cgroup tree as follows: A (use_priority_oom=1, limit=2.5G) / \ / C (priority=3, usage=1.5G) B (priority=0, usage=1G) As task in C (task-c) invokes oom-killer, task in B (task-b) is chosen and killed, and then task-c returns from mem_cgroup_oom and retries in try_charge. If memory page_counter of B has not been reset yet, leading to task-c invokes oom-killer again, the soft lockup may happen. In this situation, task-c keeps selecting bad process in B, while the only task-b in B has already been set PF_EXITING flag, which makes task-b skipped in css_task_iter_advance. Finally, task-c selected no bad process in B and keeps retrying, and task-b is stalled in synchronize_rcu when do_exit, exit_task_namespaces specifically. In a nutshell, the new behavior of css_task_iter_advance, i.e., commit c03cd7738a83 ("cgroup: Include dying leaders with live threads in PROCS iterations"), causes priority oom to misbehave. This fixes the soft lockup by accounting num_oom_skip of the victim memcg and its parents (sift up to oc->memcg), if no bad process is chosen from it. Signed-off-by: N Xu Yu <xuyu@linux.alibaba.com> Reviewed-by: N Yang Shi <yang.shi@linux.alibaba.com> Reviewed-by: N Xunlei Pang <xlpang@linux.alibaba.com>

alinux: mm, memcg: fix soft lockup in priority oom
Assuming that there is a memory cgroup tree as follows: A (use_priority_oom=1, limit=2.5G) / \ / C (priority=3, usage=1.5G) B (priority=0, usage=1G) As task in C (task-c) invokes oom-killer, task in B (task-b) is chosen and killed, and then task-c returns from mem_cgroup_oom and retries in try_charge. If memory page_counter of B has not been reset yet, leading to task-c invokes oom-killer again, the soft lockup may happen. In this situation, task-c keeps selecting bad process in B, while the only task-b in B has already been set PF_EXITING flag, which makes task-b skipped in css_task_iter_advance. Finally, task-c selected no bad process in B and keeps retrying, and task-b is stalled in synchronize_rcu when do_exit, exit_task_namespaces specifically. In a nutshell, the new behavior of css_task_iter_advance, i.e., commit c03cd7738a83 ("cgroup: Include dying leaders with live threads in PROCS iterations"), causes priority oom to misbehave. This fixes the soft lockup by accounting num_oom_skip of the victim memcg and its parents (sift up to oc->memcg), if no bad process is chosen from it. Signed-off-by: N Xu Yu <xuyu@linux.alibaba.com> Reviewed-by: N Yang Shi <yang.shi@linux.alibaba.com> Reviewed-by: N Xunlei Pang <xlpang@linux.alibaba.com>
fb4c5ea6 · Xu Yu · Shile Zhang · f1046eaf · fb4c5ea6
隐藏空白更改
内联并排

Showing with 17 addition and 10 deletion

mm/memcontrol.c mm/memcontrol.c +17 -10

未找到文件。
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1078,10 +1078,10 @@ static void invalidate_reclaim_iterators(struct mem_cgroup *dead_memcg)
 						dead_memcg);
 }
-/* memcg priority */
+/* memcg oom priority */
 /*
- * mem_cgroup_account_oom_skip - account the OOM-unkillable task
+ * do_mem_cgroup_account_oom_skip - account the memcg with OOM-unkillable task
- * @task: non OOM-killable task
+ * @memcg: mem_cgroup struct with OOM-unkillable task
 * @oc: oom_control struct
 *
 * Account OOM-unkillable task to its cgroup and up to the OOMing cgroup's
@@ -1093,21 +1093,20 @@ static void invalidate_reclaim_iterators(struct mem_cgroup *dead_memcg)
 * tasks might become killable.
 *
 */
-void mem_cgroup_account_oom_skip(struct task_struct *task,
+static void do_mem_cgroup_account_oom_skip(struct mem_cgroup *memcg,
-		struct oom_control *oc)
+					   struct oom_control *oc)
 {
-	struct mem_cgroup *root, *memcg;
+	struct mem_cgroup *root;
 	struct cgroup_subsys_state *css;
 	if (!oc->use_priority_oom)
 		return;
+	if (unlikely(!memcg))
+		return;
 	root = oc->memcg;
 	if (!root)
 		root = root_mem_cgroup;
-	memcg = mem_cgroup_from_task(task);
-	if (unlikely(!memcg))
-		return;
 	css = &memcg->css;
 	while (css) {
 		struct mem_cgroup *tmp;
@@ -1132,6 +1131,12 @@ void mem_cgroup_account_oom_skip(struct task_struct *task,
 	}
 }
+void mem_cgroup_account_oom_skip(struct task_struct *task,
+		struct oom_control *oc)
+{
+	do_mem_cgroup_account_oom_skip(mem_cgroup_from_task(task), oc);
+}
 static struct mem_cgroup *
 mem_cgroup_select_victim_cgroup(struct mem_cgroup *memcg)
 {
@@ -1261,8 +1266,10 @@ void mem_cgroup_select_bad_process(struct oom_control *oc)
 	mem_cgroup_scan_tasks(victim, oom_evaluate_task, oc);
 	if (oc->use_priority_oom) {
 		css_put(&victim->css);
-		if (!oc->chosen && victim != memcg)
+		if (!oc->chosen && victim != memcg) {
+			do_mem_cgroup_account_oom_skip(victim, oc);
 			goto retry;
+		}
 	}
 out:
 	/* See commets in mem_cgroup_account_oom_skip() */