1. 26 3月, 2016 6 次提交
    • V
      oom: make oom_reaper_list single linked · 29c696e1
      Vladimir Davydov 提交于
      Entries are only added/removed from oom_reaper_list at head so we can
      use a single linked list and hence save a word in task_struct.
      Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29c696e1
    • M
      oom, oom_reaper: disable oom_reaper for oom_kill_allocating_task · 855b0183
      Michal Hocko 提交于
      Tetsuo has reported that oom_kill_allocating_task=1 will cause
      oom_reaper_list corruption because oom_kill_process doesn't follow
      standard OOM exclusion (aka ignores TIF_MEMDIE) and allows to enqueue
      the same task multiple times - e.g.  by sacrificing the same child
      multiple times.
      
      This patch fixes the issue by introducing a new MMF_OOM_KILLED mm flag
      which is set in oom_kill_process atomically and oom reaper is disabled
      if the flag was already set.
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      855b0183
    • M
      mm, oom_reaper: implement OOM victims queuing · 03049269
      Michal Hocko 提交于
      wake_oom_reaper has allowed only 1 oom victim to be queued.  The main
      reason for that was the simplicity as other solutions would require some
      way of queuing.  The current approach is racy and that was deemed
      sufficient as the oom_reaper is considered a best effort approach to
      help with oom handling when the OOM victim cannot terminate in a
      reasonable time.  The race could lead to missing an oom victim which can
      get stuck
      
      out_of_memory
        wake_oom_reaper
          cmpxchg // OK
          			oom_reaper
      			  oom_reap_task
      			    __oom_reap_task
      oom_victim terminates
      			      atomic_inc_not_zero // fail
      out_of_memory
        wake_oom_reaper
          cmpxchg // fails
      			  task_to_reap = NULL
      
      This race requires 2 OOM invocations in a short time period which is not
      very likely but certainly not impossible.  E.g.  the original victim
      might have not released a lot of memory for some reason.
      
      The situation would improve considerably if wake_oom_reaper used a more
      robust queuing.  This is what this patch implements.  This means adding
      oom_reaper_list list_head into task_struct (eat a hole before embeded
      thread_struct for that purpose) and a oom_reaper_lock spinlock for
      queuing synchronization.  wake_oom_reaper will then add the task on the
      queue and oom_reaper will dequeue it.
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Andrea Argangeli <andrea@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      03049269
    • M
      mm, oom_reaper: report success/failure · bc448e89
      Michal Hocko 提交于
      Inform about the successful/failed oom_reaper attempts and dump all the
      held locks to tell us more who is blocking the progress.
      
      [akpm@linux-foundation.org: fix CONFIG_MMU=n build]
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: Andrea Argangeli <andrea@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bc448e89
    • M
      oom: clear TIF_MEMDIE after oom_reaper managed to unmap the address space · 36324a99
      Michal Hocko 提交于
      When oom_reaper manages to unmap all the eligible vmas there shouldn't
      be much of the freable memory held by the oom victim left anymore so it
      makes sense to clear the TIF_MEMDIE flag for the victim and allow the
      OOM killer to select another task.
      
      The lack of TIF_MEMDIE also means that the victim cannot access memory
      reserves anymore but that shouldn't be a problem because it would get
      the access again if it needs to allocate and hits the OOM killer again
      due to the fatal_signal_pending resp.  PF_EXITING check.  We can safely
      hide the task from the OOM killer because it is clearly not a good
      candidate anymore as everyhing reclaimable has been torn down already.
      
      This patch will allow to cap the time an OOM victim can keep TIF_MEMDIE
      and thus hold off further global OOM killer actions granted the oom
      reaper is able to take mmap_sem for the associated mm struct.  This is
      not guaranteed now but further steps should make sure that mmap_sem for
      write should be blocked killable which will help to reduce such a lock
      contention.  This is not done by this patch.
      
      Note that exit_oom_victim might be called on a remote task from
      __oom_reap_task now so we have to check and clear the flag atomically
      otherwise we might race and underflow oom_victims or wake up waiters too
      early.
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Suggested-by: NJohannes Weiner <hannes@cmpxchg.org>
      Suggested-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Andrea Argangeli <andrea@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      36324a99
    • M
      mm, oom: introduce oom reaper · aac45363
      Michal Hocko 提交于
      This patch (of 5):
      
      This is based on the idea from Mel Gorman discussed during LSFMM 2015
      and independently brought up by Oleg Nesterov.
      
      The OOM killer currently allows to kill only a single task in a good
      hope that the task will terminate in a reasonable time and frees up its
      memory.  Such a task (oom victim) will get an access to memory reserves
      via mark_oom_victim to allow a forward progress should there be a need
      for additional memory during exit path.
      
      It has been shown (e.g.  by Tetsuo Handa) that it is not that hard to
      construct workloads which break the core assumption mentioned above and
      the OOM victim might take unbounded amount of time to exit because it
      might be blocked in the uninterruptible state waiting for an event (e.g.
      lock) which is blocked by another task looping in the page allocator.
      
      This patch reduces the probability of such a lockup by introducing a
      specialized kernel thread (oom_reaper) which tries to reclaim additional
      memory by preemptively reaping the anonymous or swapped out memory owned
      by the oom victim under an assumption that such a memory won't be needed
      when its owner is killed and kicked from the userspace anyway.  There is
      one notable exception to this, though, if the OOM victim was in the
      process of coredumping the result would be incomplete.  This is
      considered a reasonable constrain because the overall system health is
      more important than debugability of a particular application.
      
      A kernel thread has been chosen because we need a reliable way of
      invocation so workqueue context is not appropriate because all the
      workers might be busy (e.g.  allocating memory).  Kswapd which sounds
      like another good fit is not appropriate as well because it might get
      blocked on locks during reclaim as well.
      
      oom_reaper has to take mmap_sem on the target task for reading so the
      solution is not 100% because the semaphore might be held or blocked for
      write but the probability is reduced considerably wrt.  basically any
      lock blocking forward progress as described above.  In order to prevent
      from blocking on the lock without any forward progress we are using only
      a trylock and retry 10 times with a short sleep in between.  Users of
      mmap_sem which need it for write should be carefully reviewed to use
      _killable waiting as much as possible and reduce allocations requests
      done with the lock held to absolute minimum to reduce the risk even
      further.
      
      The API between oom killer and oom reaper is quite trivial.
      wake_oom_reaper updates mm_to_reap with cmpxchg to guarantee only
      NULL->mm transition and oom_reaper clear this atomically once it is done
      with the work.  This means that only a single mm_struct can be reaped at
      the time.  As the operation is potentially disruptive we are trying to
      limit it to the ncessary minimum and the reaper blocks any updates while
      it operates on an mm.  mm_struct is pinned by mm_count to allow parallel
      exit_mmap and a race is detected by atomic_inc_not_zero(mm_users).
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Suggested-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Andrea Argangeli <andrea@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aac45363
  2. 18 3月, 2016 3 次提交
  3. 16 3月, 2016 1 次提交
  4. 15 1月, 2016 1 次提交
  5. 13 12月, 2015 1 次提交
  6. 07 11月, 2015 1 次提交
  7. 06 11月, 2015 7 次提交
  8. 09 9月, 2015 4 次提交
  9. 25 6月, 2015 7 次提交
  10. 16 4月, 2015 1 次提交
  11. 15 4月, 2015 1 次提交
  12. 12 2月, 2015 6 次提交
    • K
      mm: account pmd page tables to the process · dc6c9a35
      Kirill A. Shutemov 提交于
      Dave noticed that unprivileged process can allocate significant amount of
      memory -- >500 MiB on x86_64 -- and stay unnoticed by oom-killer and
      memory cgroup.  The trick is to allocate a lot of PMD page tables.  Linux
      kernel doesn't account PMD tables to the process, only PTE.
      
      The use-cases below use few tricks to allocate a lot of PMD page tables
      while keeping VmRSS and VmPTE low.  oom_score for the process will be 0.
      
      	#include <errno.h>
      	#include <stdio.h>
      	#include <stdlib.h>
      	#include <unistd.h>
      	#include <sys/mman.h>
      	#include <sys/prctl.h>
      
      	#define PUD_SIZE (1UL << 30)
      	#define PMD_SIZE (1UL << 21)
      
      	#define NR_PUD 130000
      
      	int main(void)
      	{
      		char *addr = NULL;
      		unsigned long i;
      
      		prctl(PR_SET_THP_DISABLE);
      		for (i = 0; i < NR_PUD ; i++) {
      			addr = mmap(addr + PUD_SIZE, PUD_SIZE, PROT_WRITE|PROT_READ,
      					MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
      			if (addr == MAP_FAILED) {
      				perror("mmap");
      				break;
      			}
      			*addr = 'x';
      			munmap(addr, PMD_SIZE);
      			mmap(addr, PMD_SIZE, PROT_WRITE|PROT_READ,
      					MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED, -1, 0);
      			if (addr == MAP_FAILED)
      				perror("re-mmap"), exit(1);
      		}
      		printf("PID %d consumed %lu KiB in PMD page tables\n",
      				getpid(), i * 4096 >> 10);
      		return pause();
      	}
      
      The patch addresses the issue by account PMD tables to the process the
      same way we account PTE.
      
      The main place where PMD tables is accounted is __pmd_alloc() and
      free_pmd_range(). But there're few corner cases:
      
       - HugeTLB can share PMD page tables. The patch handles by accounting
         the table to all processes who share it.
      
       - x86 PAE pre-allocates few PMD tables on fork.
      
       - Architectures with FIRST_USER_ADDRESS > 0. We need to adjust sanity
         check on exit(2).
      
      Accounting only happens on configuration where PMD page table's level is
      present (PMD is not folded).  As with nr_ptes we use per-mm counter.  The
      counter value is used to calculate baseline for badness score by
      oom-killer.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Reviewed-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: David Rientjes <rientjes@google.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dc6c9a35
    • M
      oom, PM: make OOM detection in the freezer path raceless · c32b3cbe
      Michal Hocko 提交于
      Commit 5695be14 ("OOM, PM: OOM killed task shouldn't escape PM
      suspend") has left a race window when OOM killer manages to
      note_oom_kill after freeze_processes checks the counter.  The race
      window is quite small and really unlikely and partial solution deemed
      sufficient at the time of submission.
      
      Tejun wasn't happy about this partial solution though and insisted on a
      full solution.  That requires the full OOM and freezer's task freezing
      exclusion, though.  This is done by this patch which introduces oom_sem
      RW lock and turns oom_killer_disable() into a full OOM barrier.
      
      oom_killer_disabled check is moved from the allocation path to the OOM
      level and we take oom_sem for reading for both the check and the whole
      OOM invocation.
      
      oom_killer_disable() takes oom_sem for writing so it waits for all
      currently running OOM killer invocations.  Then it disable all the further
      OOMs by setting oom_killer_disabled and checks for any oom victims.
      Victims are counted via mark_tsk_oom_victim resp.  unmark_oom_victim.  The
      last victim wakes up all waiters enqueued by oom_killer_disable().
      Therefore this function acts as the full OOM barrier.
      
      The page fault path is covered now as well although it was assumed to be
      safe before.  As per Tejun, "We used to have freezing points deep in file
      system code which may be reacheable from page fault." so it would be
      better and more robust to not rely on freezing points here.  Same applies
      to the memcg OOM killer.
      
      out_of_memory tells the caller whether the OOM was allowed to trigger and
      the callers are supposed to handle the situation.  The page allocation
      path simply fails the allocation same as before.  The page fault path will
      retry the fault (more on that later) and Sysrq OOM trigger will simply
      complain to the log.
      
      Normally there wouldn't be any unfrozen user tasks after
      try_to_freeze_tasks so the function will not block. But if there was an
      OOM killer racing with try_to_freeze_tasks and the OOM victim didn't
      finish yet then we have to wait for it. This should complete in a finite
      time, though, because
      
      	- the victim cannot loop in the page fault handler (it would die
      	  on the way out from the exception)
      	- it cannot loop in the page allocator because all the further
      	  allocation would fail and __GFP_NOFAIL allocations are not
      	  acceptable at this stage
      	- it shouldn't be blocked on any locks held by frozen tasks
      	  (try_to_freeze expects lockless context) and kernel threads and
      	  work queues are not frozen yet
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Suggested-by: NTejun Heo <tj@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c32b3cbe
    • M
      oom: thaw the OOM victim if it is frozen · 63a8ca9b
      Michal Hocko 提交于
      oom_kill_process only sets TIF_MEMDIE flag and sends a signal to the
      victim.  This is basically noop when the task is frozen though because the
      task sleeps in the uninterruptible sleep.  The victim is eventually thawed
      later when oom_scan_process_thread meets the task again in a later OOM
      invocation so the OOM killer doesn't live lock.  But this is less than
      optimal.
      
      Let's add __thaw_task into mark_tsk_oom_victim after we set TIF_MEMDIE to
      the victim.  We are not checking whether the task is frozen because that
      would be racy and __thaw_task does that already.  oom_scan_process_thread
      doesn't need to care about freezer anymore as TIF_MEMDIE and freezer are
      excluded completely now.
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      63a8ca9b
    • M
      oom: add helpers for setting and clearing TIF_MEMDIE · 49550b60
      Michal Hocko 提交于
      This patchset addresses a race which was described in the changelog for
      5695be14 ("OOM, PM: OOM killed task shouldn't escape PM suspend"):
      
      : PM freezer relies on having all tasks frozen by the time devices are
      : getting frozen so that no task will touch them while they are getting
      : frozen.  But OOM killer is allowed to kill an already frozen task in order
      : to handle OOM situtation.  In order to protect from late wake ups OOM
      : killer is disabled after all tasks are frozen.  This, however, still keeps
      : a window open when a killed task didn't manage to die by the time
      : freeze_processes finishes.
      
      The original patch hasn't closed the race window completely because that
      would require a more complex solution as it can be seen by this patchset.
      
      The primary motivation was to close the race condition between OOM killer
      and PM freezer _completely_.  As Tejun pointed out, even though the race
      condition is unlikely the harder it would be to debug weird bugs deep in
      the PM freezer when the debugging options are reduced considerably.  I can
      only speculate what might happen when a task is still runnable
      unexpectedly.
      
      On a plus side and as a side effect the oom enable/disable has a better
      (full barrier) semantic without polluting hot paths.
      
      I have tested the series in KVM with 100M RAM:
      - many small tasks (20M anon mmap) which are triggering OOM continually
      - s2ram which resumes automatically is triggered in a loop
      	echo processors > /sys/power/pm_test
      	while true
      	do
      		echo mem > /sys/power/state
      		sleep 1s
      	done
      - simple module which allocates and frees 20M in 8K chunks. If it sees
        freezing(current) then it tries another round of allocation before calling
        try_to_freeze
      - debugging messages of PM stages and OOM killer enable/disable/fail added
        and unmark_oom_victim is delayed by 1s after it clears TIF_MEMDIE and before
        it wakes up waiters.
      - rebased on top of the current mmotm which means some necessary updates
        in mm/oom_kill.c. mark_tsk_oom_victim is now called under task_lock but
        I think this should be OK because __thaw_task shouldn't interfere with any
        locking down wake_up_process. Oleg?
      
      As expected there are no OOM killed tasks after oom is disabled and
      allocations requested by the kernel thread are failing after all the tasks
      are frozen and OOM disabled.  I wasn't able to catch a race where
      oom_killer_disable would really have to wait but I kinda expected the race
      is really unlikely.
      
      [  242.609330] Killed process 2992 (mem_eater) total-vm:24412kB, anon-rss:2164kB, file-rss:4kB
      [  243.628071] Unmarking 2992 OOM victim. oom_victims: 1
      [  243.636072] (elapsed 2.837 seconds) done.
      [  243.641985] Trying to disable OOM killer
      [  243.643032] Waiting for concurent OOM victims
      [  243.644342] OOM killer disabled
      [  243.645447] Freezing remaining freezable tasks ... (elapsed 0.005 seconds) done.
      [  243.652983] Suspending console(s) (use no_console_suspend to debug)
      [  243.903299] kmem_eater: page allocation failure: order:1, mode:0x204010
      [...]
      [  243.992600] PM: suspend of devices complete after 336.667 msecs
      [  243.993264] PM: late suspend of devices complete after 0.660 msecs
      [  243.994713] PM: noirq suspend of devices complete after 1.446 msecs
      [  243.994717] ACPI: Preparing to enter system sleep state S3
      [  243.994795] PM: Saving platform NVS memory
      [  243.994796] Disabling non-boot CPUs ...
      
      The first 2 patches are simple cleanups for OOM.  They should go in
      regardless the rest IMO.
      
      Patches 3 and 4 are trivial printk -> pr_info conversion and they should
      go in ditto.
      
      The main patch is the last one and I would appreciate acks from Tejun and
      Rafael.  I think the OOM part should be OK (except for __thaw_task vs.
      task_lock where a look from Oleg would appreciated) but I am not so sure I
      haven't screwed anything in the freezer code.  I have found several
      surprises there.
      
      This patch (of 5):
      
      This patch is just a preparatory and it doesn't introduce any functional
      change.
      
      Note:
      I am utterly unhappy about lowmemory killer abusing TIF_MEMDIE just to
      wait for the oom victim and to prevent from new killing. This is
      just a side effect of the flag. The primary meaning is to give the oom
      victim access to the memory reserves and that shouldn't be necessary
      here.
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      49550b60
    • M
      oom: make sure that TIF_MEMDIE is set under task_lock · 83363b91
      Michal Hocko 提交于
      OOM killer tries to exclude tasks which do not have mm_struct associated
      because killing such a task wouldn't help much.  The OOM victim gets
      TIF_MEMDIE set to disable OOM killer while the current victim releases the
      memory and then enables the OOM killer again by dropping the flag.
      
      oom_kill_process is currently prone to a race condition when the OOM
      victim is already exiting and TIF_MEMDIE is set after the task releases
      its address space.  This might theoretically lead to OOM livelock if the
      OOM victim blocks on an allocation later during exiting because it
      wouldn't kill any other process and the exiting one won't be able to exit.
       The situation is highly unlikely because the OOM victim is expected to
      release some memory which should help to sort out OOM situation.
      
      Fix this by checking task->mm and setting TIF_MEMDIE flag under task_lock
      which will serialize the OOM killer with exit_mm which sets task->mm to
      NULL.  Setting the flag for current is not necessary because check and set
      is not racy.
      Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83363b91
    • T
      oom: don't count on mm-less current process · d7a94e7e
      Tetsuo Handa 提交于
      out_of_memory() doesn't trigger the OOM killer if the current task is
      already exiting or it has fatal signals pending, and gives the task
      access to memory reserves instead.  However, doing so is wrong if
      out_of_memory() is called by an allocation (e.g. from exit_task_work())
      after the current task has already released its memory and cleared
      TIF_MEMDIE at exit_mm().  If we again set TIF_MEMDIE to post-exit_mm()
      current task, the OOM killer will be blocked by the task sitting in the
      final schedule() waiting for its parent to reap it.  It will trigger an
      OOM livelock if its parent is unable to reap it due to doing an
      allocation and waiting for the OOM killer to kill it.
      Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7a94e7e
  13. 14 12月, 2014 1 次提交