1. 08 10月, 2016 2 次提交
  2. 25 9月, 2016 1 次提交
    • H
      mm: delete unnecessary and unsafe init_tlb_ubc() · b385d21f
      Hugh Dickins 提交于
      init_tlb_ubc() looked unnecessary to me: tlb_ubc is statically
      initialized with zeroes in the init_task, and copied from parent to
      child while it is quiescent in arch_dup_task_struct(); so I went to
      delete it.
      
      But inserted temporary debug WARN_ONs in place of init_tlb_ubc() to
      check that it was always empty at that point, and found them firing:
      because memcg reclaim can recurse into global reclaim (when allocating
      biosets for swapout in my case), and arrive back at the init_tlb_ubc()
      in shrink_node_memcg().
      
      Resetting tlb_ubc.flush_required at that point is wrong: if the upper
      level needs a deferred TLB flush, but the lower level turns out not to,
      we miss a TLB flush.  But fortunately, that's the only part of the
      protocol that does not nest: with the initialization removed, cpumask
      collects bits from upper and lower levels, and flushes TLB when needed.
      
      Fixes: 72b252ae ("mm: send one IPI per CPU to TLB flush all entries after unmapping pages")
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: stable@vger.kernel.org # 4.3+
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b385d21f
  3. 02 9月, 2016 1 次提交
  4. 03 8月, 2016 1 次提交
  5. 29 7月, 2016 32 次提交
  6. 27 7月, 2016 2 次提交
  7. 21 5月, 2016 1 次提交
    • M
      mm, oom: rework oom detection · 0a0337e0
      Michal Hocko 提交于
      __alloc_pages_slowpath has traditionally relied on the direct reclaim
      and did_some_progress as an indicator that it makes sense to retry
      allocation rather than declaring OOM.  shrink_zones had to rely on
      zone_reclaimable if shrink_zone didn't make any progress to prevent from
      a premature OOM killer invocation - the LRU might be full of dirty or
      writeback pages and direct reclaim cannot clean those up.
      
      zone_reclaimable allows to rescan the reclaimable lists several times
      and restart if a page is freed.  This is really subtle behavior and it
      might lead to a livelock when a single freed page keeps allocator
      looping but the current task will not be able to allocate that single
      page.  OOM killer would be more appropriate than looping without any
      progress for unbounded amount of time.
      
      This patch changes OOM detection logic and pulls it out from shrink_zone
      which is too low to be appropriate for any high level decisions such as
      OOM which is per zonelist property.  It is __alloc_pages_slowpath which
      knows how many attempts have been done and what was the progress so far
      therefore it is more appropriate to implement this logic.
      
      The new heuristic is implemented in should_reclaim_retry helper called
      from __alloc_pages_slowpath.  It tries to be more deterministic and
      easier to follow.  It builds on an assumption that retrying makes sense
      only if the currently reclaimable memory + free pages would allow the
      current allocation request to succeed (as per __zone_watermark_ok) at
      least for one zone in the usable zonelist.
      
      This alone wouldn't be sufficient, though, because the writeback might
      get stuck and reclaimable pages might be pinned for a really long time
      or even depend on the current allocation context.  Therefore there is a
      backoff mechanism implemented which reduces the reclaim target after
      each reclaim round without any progress.  This means that we should
      eventually converge to only NR_FREE_PAGES as the target and fail on the
      wmark check and proceed to OOM.  The backoff is simple and linear with
      1/16 of the reclaimable pages for each round without any progress.  We
      are optimistic and reset counter for successful reclaim rounds.
      
      Costly high order pages mostly preserve their semantic and those without
      __GFP_REPEAT fail right away while those which have the flag set will
      back off after the amount of reclaimable pages reaches equivalent of the
      requested order.  The only difference is that if there was no progress
      during the reclaim we rely on zone watermark check.  This is more
      logical thing to do than previous 1<<order attempts which were a result
      of zone_reclaimable faking the progress.
      
      [vdavydov@virtuozzo.com: check classzone_idx for shrink_zone]
      [hannes@cmpxchg.org: separate the heuristic into should_reclaim_retry]
      [rientjes@google.com: use zone_page_state_snapshot for NR_FREE_PAGES]
      [rientjes@google.com: shrink_zones doesn't need to return anything]
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0a0337e0