1. 12 12月, 2012 2 次提交
    • D
      mm, oom: change type of oom_score_adj to short · a9c58b90
      David Rientjes 提交于
      The maximum oom_score_adj is 1000 and the minimum oom_score_adj is -1000,
      so this range can be represented by the signed short type with no
      functional change.  The extra space this frees up in struct signal_struct
      will be used for per-thread oom kill flags in the next patch.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9c58b90
    • D
      mm, oom: allow exiting threads to have access to memory reserves · 9ff4868e
      David Rientjes 提交于
      Exiting threads, those with PF_EXITING set, can pagefault and require
      memory before they can make forward progress.  This happens, for instance,
      when a process must fault task->robust_list, a userspace structure, before
      detaching its memory.
      
      These threads also aren't guaranteed to get access to memory reserves
      unless oom killed or killed from userspace.  The oom killer won't grant
      memory reserves if other threads are also exiting other than current and
      stalling at the same point.  This prevents needlessly killing processes
      when others are already exiting.
      
      Instead of special casing all the possible situations between PF_EXITING
      getting set and a thread detaching its mm where it may allocate memory,
      which probably wouldn't get updated when a change is made to the exit
      path, the solution is to give all exiting threads access to memory
      reserves if they call the oom killer.  This allows them to quickly
      allocate, detach its mm, and free the memory it represents.
      
      Summary of Luigi's bug report:
      
      : He had an oom condition where threads were faulting on task->robust_list
      : and repeatedly called the oom killer but it would defer killing a thread
      : because it saw other PF_EXITING threads.  This can happen anytime we need
      : to allocate memory after setting PF_EXITING and before detaching our mm;
      : if there are other threads in the same state then the oom killer won't do
      : anything unless one of them happens to be killed from userspace.
      :
      : So instead of only deferring for PF_EXITING and !task->robust_list, it's
      : better to just give them access to memory reserves to prevent a potential
      : livelock so that any other faults that may be introduced in the future in
      : the exit path don't cause the same problem (and hopefully we don't allow
      : too many of those!).
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Tested-by: NLuigi Semenzato <semenzato@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9ff4868e
  2. 09 10月, 2012 1 次提交
  3. 01 8月, 2012 8 次提交
  4. 21 6月, 2012 2 次提交
  5. 09 6月, 2012 1 次提交
  6. 30 5月, 2012 1 次提交
  7. 03 5月, 2012 1 次提交
  8. 24 3月, 2012 1 次提交
  9. 22 3月, 2012 6 次提交
  10. 13 1月, 2012 2 次提交
  11. 11 1月, 2012 1 次提交
    • K
      tracepoint: add tracepoints for debugging oom_score_adj · 43d2b113
      KAMEZAWA Hiroyuki 提交于
      oom_score_adj is used for guarding processes from OOM-Killer.  One of
      problem is that it's inherited at fork().  When a daemon set oom_score_adj
      and make children, it's hard to know where the value is set.
      
      This patch adds some tracepoints useful for debugging. This patch adds
      3 trace points.
        - creating new task
        - renaming a task (exec)
        - set oom_score_adj
      
      To debug, users need to enable some trace pointer. Maybe filtering is useful as
      
      # EVENT=/sys/kernel/debug/tracing/events/task/
      # echo "oom_score_adj != 0" > $EVENT/task_newtask/filter
      # echo "oom_score_adj != 0" > $EVENT/task_rename/filter
      # echo 1 > $EVENT/enable
      # EVENT=/sys/kernel/debug/tracing/events/oom/
      # echo 1 > $EVENT/enable
      
      output will be like this.
      # grep oom /sys/kernel/debug/tracing/trace
      bash-7699  [007] d..3  5140.744510: oom_score_adj_update: pid=7699 comm=bash oom_score_adj=-1000
      bash-7699  [007] ...1  5151.818022: task_newtask: pid=7729 comm=bash clone_flags=1200011 oom_score_adj=-1000
      ls-7729  [003] ...2  5151.818504: task_rename: pid=7729 oldcomm=bash newcomm=ls oom_score_adj=-1000
      bash-7699  [002] ...1  5175.701468: task_newtask: pid=7730 comm=bash clone_flags=1200011 oom_score_adj=-1000
      grep-7730  [007] ...2  5175.701993: task_rename: pid=7730 oldcomm=bash newcomm=grep oom_score_adj=-1000
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      43d2b113
  12. 21 12月, 2011 1 次提交
    • F
      oom: fix integer overflow of points in oom_badness · ff05b6f7
      Frantisek Hrbata 提交于
      An integer overflow will happen on 64bit archs if task's sum of rss,
      swapents and nr_ptes exceeds (2^31)/1000 value.  This was introduced by
      commit
      
      f755a042 oom: use pte pages in OOM score
      
      where the oom score computation was divided into several steps and it's no
      longer computed as one expression in unsigned long(rss, swapents, nr_pte
      are unsigned long), where the result value assigned to points(int) is in
      range(1..1000).  So there could be an int overflow while computing
      
      176          points *= 1000;
      
      and points may have negative value. Meaning the oom score for a mem hog task
      will be one.
      
      196          if (points <= 0)
      197                  return 1;
      
      For example:
      [ 3366]     0  3366 35390480 24303939   5       0             0 oom01
      Out of memory: Kill process 3366 (oom01) score 1 or sacrifice child
      
      Here the oom1 process consumes more than 24303939(rss)*4096~=92GB physical
      memory, but it's oom score is one.
      
      In this situation the mem hog task is skipped and oom killer kills another and
      most probably innocent task with oom score greater than one.
      
      The points variable should be of type long instead of int to prevent the
      int overflow.
      Signed-off-by: NFrantisek Hrbata <fhrbata@redhat.com>
      Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: <stable@vger.kernel.org>		[2.6.36+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ff05b6f7
  13. 22 11月, 2011 1 次提交
    • T
      freezer: rename thaw_process() to __thaw_task() and simplify the implementation · a5be2d0d
      Tejun Heo 提交于
      thaw_process() now has only internal users - system and cgroup
      freezers.  Remove the unnecessary return value, rename, unexport and
      collapse __thaw_process() into it.  This will help further updates to
      the freezer code.
      
      -v3: oom_kill grew a use of thaw_process() while this patch was
           pending.  Convert it to use __thaw_task() for now.  In the longer
           term, this should be handled by allowing tasks to die if killed
           even if it's frozen.
      
      -v2: minor style update as suggested by Matt.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Paul Menage <menage@google.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      a5be2d0d
  14. 16 11月, 2011 1 次提交
  15. 01 11月, 2011 4 次提交
  16. 31 10月, 2011 1 次提交
  17. 02 8月, 2011 1 次提交
  18. 26 7月, 2011 1 次提交
  19. 23 6月, 2011 1 次提交
  20. 25 5月, 2011 1 次提交
    • D
      oom: replace PF_OOM_ORIGIN with toggling oom_score_adj · 72788c38
      David Rientjes 提交于
      There's a kernel-wide shortage of per-process flags, so it's always
      helpful to trim one when possible without incurring a significant penalty.
       It's even more important when you're planning on adding a per- process
      flag yourself, which I plan to do shortly for transparent hugepages.
      
      PF_OOM_ORIGIN is used by ksm and swapoff to prefer current since it has a
      tendency to allocate large amounts of memory and should be preferred for
      killing over other tasks.  We'd rather immediately kill the task making
      the errant syscall rather than penalizing an innocent task.
      
      This patch removes PF_OOM_ORIGIN since its behavior is equivalent to
      setting the process's oom_score_adj to OOM_SCORE_ADJ_MAX.
      
      The process's old oom_score_adj is stored and then set to
      OOM_SCORE_ADJ_MAX during the time it used to have PF_OOM_ORIGIN.  The old
      value is then reinstated when the process should no longer be considered a
      high priority for oom killing.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Izik Eidus <ieidus@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      72788c38
  21. 29 4月, 2011 1 次提交
  22. 15 4月, 2011 1 次提交