提交 3c90e6e9 编写于 作者: S Srivatsa Vaddagiri 提交者: Ingo Molnar

sched: fix copy_namespace() <-> sched_fork() dependency in do_fork

Sukadev Bhattiprolu reported a kernel crash with control groups.
There are couple of problems discovered by Suka's test:

- The test requires the cgroup filesystem to be mounted with
  atleast the cpu and ns options (i.e both namespace and cpu 
  controllers are active in the same hierarchy). 

	# mkdir /dev/cpuctl
	# mount -t cgroup -ocpu,ns none cpuctl
	(or simply)
	# mount -t cgroup none cpuctl -> Will activate all controllers
					 in same hierarchy.

- The test invokes clone() with CLONE_NEWNS set. This causes a a new child
  to be created, also a new group (do_fork->copy_namespaces->ns_cgroup_clone->
  cgroup_clone) and the child is attached to the new group (cgroup_clone->
  attach_task->sched_move_task). At this point in time, the child's scheduler 
  related fields are uninitialized (including its on_rq field, which it has
  inherited from parent). As a result sched_move_task thinks its on
  runqueue, when it isn't.

  As a solution to this problem, I moved sched_fork() call, which
  initializes scheduler related fields on a new task, before
  copy_namespaces(). I am not sure though whether moving up will
  cause other side-effects. Do you see any issue?

- The second problem exposed by this test is that task_new_fair()
  assumes that parent and child will be part of the same group (which 
  needn't be as this test shows). As a result, cfs_rq->curr can be NULL
  for the child.

  The solution is to test for curr pointer being NULL in
  task_new_fair().

With the patch below, I could run ns_exec() fine w/o a crash.
Reported-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: NSrivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
上级 502d26b5
...@@ -1123,6 +1123,9 @@ static struct task_struct *copy_process(unsigned long clone_flags, ...@@ -1123,6 +1123,9 @@ static struct task_struct *copy_process(unsigned long clone_flags,
p->blocked_on = NULL; /* not blocked yet */ p->blocked_on = NULL; /* not blocked yet */
#endif #endif
/* Perform scheduler related setup. Assign this task to a CPU. */
sched_fork(p, clone_flags);
if ((retval = security_task_alloc(p))) if ((retval = security_task_alloc(p)))
goto bad_fork_cleanup_policy; goto bad_fork_cleanup_policy;
if ((retval = audit_alloc(p))) if ((retval = audit_alloc(p)))
...@@ -1212,9 +1215,6 @@ static struct task_struct *copy_process(unsigned long clone_flags, ...@@ -1212,9 +1215,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
INIT_LIST_HEAD(&p->ptrace_children); INIT_LIST_HEAD(&p->ptrace_children);
INIT_LIST_HEAD(&p->ptrace_list); INIT_LIST_HEAD(&p->ptrace_list);
/* Perform scheduler related setup. Assign this task to a CPU. */
sched_fork(p, clone_flags);
/* Now that the task is set up, run cgroup callbacks if /* Now that the task is set up, run cgroup callbacks if
* necessary. We need to run them before the task is visible * necessary. We need to run them before the task is visible
* on the tasklist. */ * on the tasklist. */
......
...@@ -1067,8 +1067,9 @@ static void task_new_fair(struct rq *rq, struct task_struct *p) ...@@ -1067,8 +1067,9 @@ static void task_new_fair(struct rq *rq, struct task_struct *p)
update_curr(cfs_rq); update_curr(cfs_rq);
place_entity(cfs_rq, se, 1); place_entity(cfs_rq, se, 1);
/* 'curr' will be NULL if the child belongs to a different group */
if (sysctl_sched_child_runs_first && this_cpu == task_cpu(p) && if (sysctl_sched_child_runs_first && this_cpu == task_cpu(p) &&
curr->vruntime < se->vruntime) { curr && curr->vruntime < se->vruntime) {
/* /*
* Upon rescheduling, sched_class::put_prev_task() will place * Upon rescheduling, sched_class::put_prev_task() will place
* 'current' within the tree based on its new key value. * 'current' within the tree based on its new key value.
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册