1. 01 8月, 2012 6 次提交
  2. 21 6月, 2012 2 次提交
  3. 09 6月, 2012 1 次提交
  4. 30 5月, 2012 1 次提交
  5. 03 5月, 2012 1 次提交
  6. 24 3月, 2012 1 次提交
  7. 22 3月, 2012 6 次提交
  8. 13 1月, 2012 2 次提交
  9. 11 1月, 2012 1 次提交
    • K
      tracepoint: add tracepoints for debugging oom_score_adj · 43d2b113
      KAMEZAWA Hiroyuki 提交于
      oom_score_adj is used for guarding processes from OOM-Killer.  One of
      problem is that it's inherited at fork().  When a daemon set oom_score_adj
      and make children, it's hard to know where the value is set.
      
      This patch adds some tracepoints useful for debugging. This patch adds
      3 trace points.
        - creating new task
        - renaming a task (exec)
        - set oom_score_adj
      
      To debug, users need to enable some trace pointer. Maybe filtering is useful as
      
      # EVENT=/sys/kernel/debug/tracing/events/task/
      # echo "oom_score_adj != 0" > $EVENT/task_newtask/filter
      # echo "oom_score_adj != 0" > $EVENT/task_rename/filter
      # echo 1 > $EVENT/enable
      # EVENT=/sys/kernel/debug/tracing/events/oom/
      # echo 1 > $EVENT/enable
      
      output will be like this.
      # grep oom /sys/kernel/debug/tracing/trace
      bash-7699  [007] d..3  5140.744510: oom_score_adj_update: pid=7699 comm=bash oom_score_adj=-1000
      bash-7699  [007] ...1  5151.818022: task_newtask: pid=7729 comm=bash clone_flags=1200011 oom_score_adj=-1000
      ls-7729  [003] ...2  5151.818504: task_rename: pid=7729 oldcomm=bash newcomm=ls oom_score_adj=-1000
      bash-7699  [002] ...1  5175.701468: task_newtask: pid=7730 comm=bash clone_flags=1200011 oom_score_adj=-1000
      grep-7730  [007] ...2  5175.701993: task_rename: pid=7730 oldcomm=bash newcomm=grep oom_score_adj=-1000
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      43d2b113
  10. 21 12月, 2011 1 次提交
    • F
      oom: fix integer overflow of points in oom_badness · ff05b6f7
      Frantisek Hrbata 提交于
      An integer overflow will happen on 64bit archs if task's sum of rss,
      swapents and nr_ptes exceeds (2^31)/1000 value.  This was introduced by
      commit
      
      f755a042 oom: use pte pages in OOM score
      
      where the oom score computation was divided into several steps and it's no
      longer computed as one expression in unsigned long(rss, swapents, nr_pte
      are unsigned long), where the result value assigned to points(int) is in
      range(1..1000).  So there could be an int overflow while computing
      
      176          points *= 1000;
      
      and points may have negative value. Meaning the oom score for a mem hog task
      will be one.
      
      196          if (points <= 0)
      197                  return 1;
      
      For example:
      [ 3366]     0  3366 35390480 24303939   5       0             0 oom01
      Out of memory: Kill process 3366 (oom01) score 1 or sacrifice child
      
      Here the oom1 process consumes more than 24303939(rss)*4096~=92GB physical
      memory, but it's oom score is one.
      
      In this situation the mem hog task is skipped and oom killer kills another and
      most probably innocent task with oom score greater than one.
      
      The points variable should be of type long instead of int to prevent the
      int overflow.
      Signed-off-by: NFrantisek Hrbata <fhrbata@redhat.com>
      Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: <stable@vger.kernel.org>		[2.6.36+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ff05b6f7
  11. 22 11月, 2011 1 次提交
    • T
      freezer: rename thaw_process() to __thaw_task() and simplify the implementation · a5be2d0d
      Tejun Heo 提交于
      thaw_process() now has only internal users - system and cgroup
      freezers.  Remove the unnecessary return value, rename, unexport and
      collapse __thaw_process() into it.  This will help further updates to
      the freezer code.
      
      -v3: oom_kill grew a use of thaw_process() while this patch was
           pending.  Convert it to use __thaw_task() for now.  In the longer
           term, this should be handled by allowing tasks to die if killed
           even if it's frozen.
      
      -v2: minor style update as suggested by Matt.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Paul Menage <menage@google.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      a5be2d0d
  12. 16 11月, 2011 1 次提交
  13. 01 11月, 2011 4 次提交
  14. 31 10月, 2011 1 次提交
  15. 02 8月, 2011 1 次提交
  16. 26 7月, 2011 1 次提交
  17. 23 6月, 2011 1 次提交
  18. 25 5月, 2011 1 次提交
    • D
      oom: replace PF_OOM_ORIGIN with toggling oom_score_adj · 72788c38
      David Rientjes 提交于
      There's a kernel-wide shortage of per-process flags, so it's always
      helpful to trim one when possible without incurring a significant penalty.
       It's even more important when you're planning on adding a per- process
      flag yourself, which I plan to do shortly for transparent hugepages.
      
      PF_OOM_ORIGIN is used by ksm and swapoff to prefer current since it has a
      tendency to allocate large amounts of memory and should be preferred for
      killing over other tasks.  We'd rather immediately kill the task making
      the errant syscall rather than penalizing an innocent task.
      
      This patch removes PF_OOM_ORIGIN since its behavior is equivalent to
      setting the process's oom_score_adj to OOM_SCORE_ADJ_MAX.
      
      The process's old oom_score_adj is stored and then set to
      OOM_SCORE_ADJ_MAX during the time it used to have PF_OOM_ORIGIN.  The old
      value is then reinstated when the process should no longer be considered a
      high priority for oom killing.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Izik Eidus <ieidus@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      72788c38
  19. 29 4月, 2011 1 次提交
  20. 15 4月, 2011 1 次提交
  21. 25 3月, 2011 1 次提交
  22. 24 3月, 2011 1 次提交
  23. 23 3月, 2011 3 次提交
    • D
      oom: suppress nodes that are not allowed from meminfo on oom kill · ddd588b5
      David Rientjes 提交于
      The oom killer is extremely verbose for machines with a large number of
      cpus and/or nodes.  This verbosity can often be harmful if it causes other
      important messages to be scrolled from the kernel log and incurs a
      signicant time delay, specifically for kernels with CONFIG_NODES_SHIFT >
      8.
      
      This patch causes only memory information to be displayed for nodes that
      are allowed by current's cpuset when dumping the VM state.  Information
      for all other nodes is irrelevant to the oom condition; we don't care if
      there's an abundance of memory elsewhere if we can't access it.
      
      This only affects the behavior of dumping memory information when an oom
      is triggered.  Other dumps, such as for sysrq+m, still display the
      unfiltered form when using the existing show_mem() interface.
      
      Additionally, the per-cpu pageset statistics are extremely verbose in oom
      killer output, so it is now suppressed.  This removes
      
      	nodes_weight(current->mems_allowed) * (1 + nr_cpus)
      
      lines from the oom killer output.
      
      Callers may use __show_mem(SHOW_MEM_FILTER_NODES) to filter disallowed
      nodes.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ddd588b5
    • D
      oom: avoid deferring oom killer if exiting task is being traced · edd45544
      David Rientjes 提交于
      The oom killer naturally defers killing anything if it finds an eligible
      task that is already exiting and has yet to detach its ->mm.  This avoids
      unnecessarily killing tasks when one is already in the exit path and may
      free enough memory that the oom killer is no longer needed.  This is
      detected by PF_EXITING since threads that have already detached its ->mm
      are no longer considered at all.
      
      The problem with always deferring when a thread is PF_EXITING, however, is
      that it may never actually exit when being traced, specifically if another
      task is tracing it with PTRACE_O_TRACEEXIT.  The oom killer does not want
      to defer in this case since there is no guarantee that thread will ever
      exit without intervention.
      
      This patch will now only defer the oom killer when a thread is PF_EXITING
      and no ptracer has stopped its progress in the exit path.  It also ensures
      that a child is sacrificed for the chosen parent only if it has a
      different ->mm as the comment implies: this ensures that the thread group
      leader is always targeted appropriately.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Reported-by: NOleg Nesterov <oleg@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Andrey Vagin <avagin@openvz.org>
      Cc: <stable@kernel.org>		[2.6.38.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      edd45544
    • A
      oom: skip zombies when iterating tasklist · 30e2b41f
      Andrey Vagin 提交于
      We shouldn't defer oom killing if a thread has already detached its ->mm
      and still has TIF_MEMDIE set.  Memory needs to be freed, so find kill
      other threads that pin the same ->mm or find another task to kill.
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <stable@kernel.org>		[2.6.38.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      30e2b41f