1. 23 3月, 2011 2 次提交
    • A
      oom: skip zombies when iterating tasklist · 30e2b41f
      Andrey Vagin 提交于
      We shouldn't defer oom killing if a thread has already detached its ->mm
      and still has TIF_MEMDIE set.  Memory needs to be freed, so find kill
      other threads that pin the same ->mm or find another task to kill.
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <stable@kernel.org>		[2.6.38.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      30e2b41f
    • D
      oom: prevent unnecessary oom kills or kernel panics · 3a5dda7a
      David Rientjes 提交于
      This patch prevents unnecessary oom kills or kernel panics by reverting
      two commits:
      
      	495789a5 (oom: make oom_score to per-process value)
      	cef1d352 (oom: multi threaded process coredump don't make deadlock)
      
      First, 495789a5 (oom: make oom_score to per-process value) ignores the
      fact that all threads in a thread group do not necessarily exit at the
      same time.
      
      It is imperative that select_bad_process() detect threads that are in the
      exit path, specifically those with PF_EXITING set, to prevent needlessly
      killing additional tasks.  If a process is oom killed and the thread group
      leader exits, select_bad_process() cannot detect the other threads that
      are PF_EXITING by iterating over only processes.  Thus, it currently
      chooses another task unnecessarily for oom kill or panics the machine when
      nothing else is eligible.
      
      By iterating over threads instead, it is possible to detect threads that
      are exiting and nominate them for oom kill so they get access to memory
      reserves.
      
      Second, cef1d352 (oom: multi threaded process coredump don't make
      deadlock) erroneously avoids making the oom killer a no-op when an
      eligible thread other than current isfound to be exiting.  We want to
      detect this situation so that we may allow that exiting thread time to
      exit and free its memory; if it is able to exit on its own, that should
      free memory so current is no loner oom.  If it is not able to exit on its
      own, the oom killer will nominate it for oom kill which, in this case,
      only means it will get access to memory reserves.
      
      Without this change, it is easy for the oom killer to unnecessarily target
      tasks when all threads of a victim don't exit before the thread group
      leader or, in the worst case, panic the machine.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Andrey Vagin <avagin@openvz.org>
      Cc: <stable@kernel.org>		[2.6.38.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3a5dda7a
  2. 15 3月, 2011 2 次提交
  3. 27 10月, 2010 2 次提交
    • D
      oom: kill all threads sharing oom killed task's mm · 1e99bad0
      David Rientjes 提交于
      It's necessary to kill all threads that share an oom killed task's mm if
      the goal is to lead to future memory freeing.
      
      This patch reintroduces the code removed in 8c5cd6f3 (oom: oom_kill
      doesn't kill vfork parent (or child)) since it is obsoleted.
      
      It's now guaranteed that any task passed to oom_kill_task() does not share
      an mm with any thread that is unkillable.  Thus, we're safe to issue a
      SIGKILL to any thread sharing the same mm.
      
      This is especially necessary to solve an mm->mmap_sem livelock issue
      whereas an oom killed thread must acquire the lock in the exit path while
      another thread is holding it in the page allocator while trying to
      allocate memory itself (and will preempt the oom killer since a task was
      already killed).  Since tasks with pending fatal signals are now granted
      access to memory reserves, the thread holding the lock may quickly
      allocate and release the lock so that the oom killed task may exit.
      
      This mainly is for threads that are cloned with CLONE_VM but not
      CLONE_THREAD, so they are in a different thread group.  Non-NPTL threads
      exist in the wild and this change is necessary to prevent the livelock in
      such cases.  We care more about preventing the livelock than incurring the
      additional tasklist in the oom killer when a task has been killed.
      Systems that are sufficiently large to not want the tasklist scan in the
      oom killer in the first place already have the option of enabling
      /proc/sys/vm/oom_kill_allocating_task, which was designed specifically for
      that purpose.
      
      This code had existed in the oom killer for over eight years dating back
      to the 2.4 kernel.
      
      [akpm@linux-foundation.org: add nice comment]
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1e99bad0
    • D
      oom: avoid killing a task if a thread sharing its mm cannot be killed · e18641e1
      David Rientjes 提交于
      The oom killer's goal is to kill a memory-hogging task so that it may
      exit, free its memory, and allow the current context to allocate the
      memory that triggered it in the first place.  Thus, killing a task is
      pointless if other threads sharing its mm cannot be killed because of its
      /proc/pid/oom_adj or /proc/pid/oom_score_adj value.
      
      This patch checks whether any other thread sharing p->mm has an
      oom_score_adj of OOM_SCORE_ADJ_MIN.  If so, the thread cannot be killed
      and oom_badness(p) returns 0, meaning it's unkillable.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e18641e1
  4. 23 9月, 2010 2 次提交
    • D
      oom: filter unkillable tasks from tasklist dump · e85bfd3a
      David Rientjes 提交于
      /proc/sys/vm/oom_dump_tasks is enabled by default, so it's necessary to
      limit as much information as possible that it should emit.
      
      The tasklist dump should be filtered to only those tasks that are eligible
      for oom kill.  This is already done for memcg ooms, but this patch extends
      it to both cpuset and mempolicy ooms as well as init.
      
      In addition to suppressing irrelevant information, this also reduces
      confusion since users currently don't know which tasks in the tasklist
      aren't eligible for kill (such as those attached to cpusets or bound to
      mempolicies with a disjoint set of mems or nodes, respectively) since that
      information is not shown.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e85bfd3a
    • D
      oom: always return a badness score of non-zero for eligible tasks · f19e8aa1
      David Rientjes 提交于
      A task's badness score is roughly a proportion of its rss and swap
      compared to the system's capacity.  The scale ranges from 0 to 1000 with
      the highest score chosen for kill.  Thus, this scale operates on a
      resolution of 0.1% of RAM + swap.  Admin tasks are also given a 3% bonus,
      so the badness score of an admin task using 3% of memory, for example,
      would still be 0.
      
      It's possible that an exceptionally large number of tasks will combine to
      exhaust all resources but never have a single task that uses more than
      0.1% of RAM and swap (or 3.0% for admin tasks).
      
      This patch ensures that the badness score of any eligible task is never 0
      so the machine doesn't unnecessarily panic because it cannot find a task
      to kill.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f19e8aa1
  5. 21 8月, 2010 3 次提交
  6. 11 8月, 2010 1 次提交
  7. 10 8月, 2010 28 次提交