提交 fb4c5ea6 编写于 作者: X Xu Yu 提交者: Shile Zhang

alinux: mm, memcg: fix soft lockup in priority oom

Assuming that there is a memory cgroup tree as follows:

        A (use_priority_oom=1, limit=2.5G)
       / \
      /   C (priority=3, usage=1.5G)
     B (priority=0, usage=1G)

As task in C (task-c) invokes oom-killer, task in B (task-b) is chosen
and killed, and then task-c returns from mem_cgroup_oom and retries in
try_charge.

If memory page_counter of B has not been reset yet, leading to task-c
invokes oom-killer again, the soft lockup may happen. In this situation,
task-c keeps selecting bad process in B, while the only task-b in B has
already been set PF_EXITING flag, which makes task-b skipped in
css_task_iter_advance.

Finally, task-c selected no bad process in B and keeps retrying, and
task-b is stalled in synchronize_rcu when do_exit, exit_task_namespaces
specifically.

In a nutshell, the new behavior of css_task_iter_advance, i.e., commit
c03cd7738a83 ("cgroup: Include dying leaders with live threads in PROCS
iterations"), causes priority oom to misbehave.

This fixes the soft lockup by accounting num_oom_skip of the victim
memcg and its parents (sift up to oc->memcg), if no bad process is
chosen from it.
Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
上级 f1046eaf
......@@ -1078,10 +1078,10 @@ static void invalidate_reclaim_iterators(struct mem_cgroup *dead_memcg)
dead_memcg);
}
/* memcg priority */
/* memcg oom priority */
/*
* mem_cgroup_account_oom_skip - account the OOM-unkillable task
* @task: non OOM-killable task
* do_mem_cgroup_account_oom_skip - account the memcg with OOM-unkillable task
* @memcg: mem_cgroup struct with OOM-unkillable task
* @oc: oom_control struct
*
* Account OOM-unkillable task to its cgroup and up to the OOMing cgroup's
......@@ -1093,21 +1093,20 @@ static void invalidate_reclaim_iterators(struct mem_cgroup *dead_memcg)
* tasks might become killable.
*
*/
void mem_cgroup_account_oom_skip(struct task_struct *task,
struct oom_control *oc)
static void do_mem_cgroup_account_oom_skip(struct mem_cgroup *memcg,
struct oom_control *oc)
{
struct mem_cgroup *root, *memcg;
struct mem_cgroup *root;
struct cgroup_subsys_state *css;
if (!oc->use_priority_oom)
return;
if (unlikely(!memcg))
return;
root = oc->memcg;
if (!root)
root = root_mem_cgroup;
memcg = mem_cgroup_from_task(task);
if (unlikely(!memcg))
return;
css = &memcg->css;
while (css) {
struct mem_cgroup *tmp;
......@@ -1132,6 +1131,12 @@ void mem_cgroup_account_oom_skip(struct task_struct *task,
}
}
void mem_cgroup_account_oom_skip(struct task_struct *task,
struct oom_control *oc)
{
do_mem_cgroup_account_oom_skip(mem_cgroup_from_task(task), oc);
}
static struct mem_cgroup *
mem_cgroup_select_victim_cgroup(struct mem_cgroup *memcg)
{
......@@ -1261,8 +1266,10 @@ void mem_cgroup_select_bad_process(struct oom_control *oc)
mem_cgroup_scan_tasks(victim, oom_evaluate_task, oc);
if (oc->use_priority_oom) {
css_put(&victim->css);
if (!oc->chosen && victim != memcg)
if (!oc->chosen && victim != memcg) {
do_mem_cgroup_account_oom_skip(victim, oc);
goto retry;
}
}
out:
/* See commets in mem_cgroup_account_oom_skip() */
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册