• T
    memcg: punt high overage reclaim to return-to-userland path · b23afb93
    Tejun Heo 提交于
    Currently, try_charge() tries to reclaim memory synchronously when the
    high limit is breached; however, if the allocation doesn't have
    __GFP_WAIT, synchronous reclaim is skipped.  If a process performs only
    speculative allocations, it can blow way past the high limit.  This is
    actually easily reproducible by simply doing "find /".  slab/slub
    allocator tries speculative allocations first, so as long as there's
    memory which can be consumed without blocking, it can keep allocating
    memory regardless of the high limit.
    
    This patch makes try_charge() always punt the over-high reclaim to the
    return-to-userland path.  If try_charge() detects that high limit is
    breached, it adds the overage to current->memcg_nr_pages_over_high and
    schedules execution of mem_cgroup_handle_over_high() which performs
    synchronous reclaim from the return-to-userland path.
    
    As long as kernel doesn't have a run-away allocation spree, this should
    provide enough protection while making kmemcg behave more consistently.
    It also has the following benefits.
    
    - All over-high reclaims can use GFP_KERNEL regardless of the specific
      gfp mask in use, e.g. GFP_NOFS, when the limit was breached.
    
    - It copes with prio inversion.  Previously, a low-prio task with
      small memory.high might perform over-high reclaim with a bunch of
      locks held.  If a higher prio task needed any of these locks, it
      would have to wait until the low prio task finished reclaim and
      released the locks.  By handing over-high reclaim to the task exit
      path this issue can be avoided.
    Signed-off-by: NTejun Heo <tj@kernel.org>
    Acked-by: NMichal Hocko <mhocko@kernel.org>
    Reviewed-by: NVladimir Davydov <vdavydov@parallels.com>
    Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
    b23afb93
sched.h 91.4 KB