1. 15 3月, 2011 1 次提交
  2. 27 10月, 2010 2 次提交
    • D
      oom: kill all threads sharing oom killed task's mm · 1e99bad0
      David Rientjes 提交于
      It's necessary to kill all threads that share an oom killed task's mm if
      the goal is to lead to future memory freeing.
      
      This patch reintroduces the code removed in 8c5cd6f3 (oom: oom_kill
      doesn't kill vfork parent (or child)) since it is obsoleted.
      
      It's now guaranteed that any task passed to oom_kill_task() does not share
      an mm with any thread that is unkillable.  Thus, we're safe to issue a
      SIGKILL to any thread sharing the same mm.
      
      This is especially necessary to solve an mm->mmap_sem livelock issue
      whereas an oom killed thread must acquire the lock in the exit path while
      another thread is holding it in the page allocator while trying to
      allocate memory itself (and will preempt the oom killer since a task was
      already killed).  Since tasks with pending fatal signals are now granted
      access to memory reserves, the thread holding the lock may quickly
      allocate and release the lock so that the oom killed task may exit.
      
      This mainly is for threads that are cloned with CLONE_VM but not
      CLONE_THREAD, so they are in a different thread group.  Non-NPTL threads
      exist in the wild and this change is necessary to prevent the livelock in
      such cases.  We care more about preventing the livelock than incurring the
      additional tasklist in the oom killer when a task has been killed.
      Systems that are sufficiently large to not want the tasklist scan in the
      oom killer in the first place already have the option of enabling
      /proc/sys/vm/oom_kill_allocating_task, which was designed specifically for
      that purpose.
      
      This code had existed in the oom killer for over eight years dating back
      to the 2.4 kernel.
      
      [akpm@linux-foundation.org: add nice comment]
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1e99bad0
    • D
      oom: avoid killing a task if a thread sharing its mm cannot be killed · e18641e1
      David Rientjes 提交于
      The oom killer's goal is to kill a memory-hogging task so that it may
      exit, free its memory, and allow the current context to allocate the
      memory that triggered it in the first place.  Thus, killing a task is
      pointless if other threads sharing its mm cannot be killed because of its
      /proc/pid/oom_adj or /proc/pid/oom_score_adj value.
      
      This patch checks whether any other thread sharing p->mm has an
      oom_score_adj of OOM_SCORE_ADJ_MIN.  If so, the thread cannot be killed
      and oom_badness(p) returns 0, meaning it's unkillable.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e18641e1
  3. 23 9月, 2010 2 次提交
    • D
      oom: filter unkillable tasks from tasklist dump · e85bfd3a
      David Rientjes 提交于
      /proc/sys/vm/oom_dump_tasks is enabled by default, so it's necessary to
      limit as much information as possible that it should emit.
      
      The tasklist dump should be filtered to only those tasks that are eligible
      for oom kill.  This is already done for memcg ooms, but this patch extends
      it to both cpuset and mempolicy ooms as well as init.
      
      In addition to suppressing irrelevant information, this also reduces
      confusion since users currently don't know which tasks in the tasklist
      aren't eligible for kill (such as those attached to cpusets or bound to
      mempolicies with a disjoint set of mems or nodes, respectively) since that
      information is not shown.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e85bfd3a
    • D
      oom: always return a badness score of non-zero for eligible tasks · f19e8aa1
      David Rientjes 提交于
      A task's badness score is roughly a proportion of its rss and swap
      compared to the system's capacity.  The scale ranges from 0 to 1000 with
      the highest score chosen for kill.  Thus, this scale operates on a
      resolution of 0.1% of RAM + swap.  Admin tasks are also given a 3% bonus,
      so the badness score of an admin task using 3% of memory, for example,
      would still be 0.
      
      It's possible that an exceptionally large number of tasks will combine to
      exhaust all resources but never have a single task that uses more than
      0.1% of RAM and swap (or 3.0% for admin tasks).
      
      This patch ensures that the badness score of any eligible task is never 0
      so the machine doesn't unnecessarily panic because it cannot find a task
      to kill.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f19e8aa1
  4. 21 8月, 2010 3 次提交
  5. 11 8月, 2010 1 次提交
  6. 10 8月, 2010 30 次提交
  7. 28 5月, 2010 1 次提交