• R
    mm: don't raise MEMCG_OOM event due to failed high-order allocation · 5213c279
    Roman Gushchin 提交于
    commit 7a1adfddaf0d11a39fdcaf6e82a88e9c0586e08b upstream.
    
    It was reported that on some of our machines containers were restarted
    with OOM symptoms without an obvious reason.  Despite there were almost no
    memory pressure and plenty of page cache, MEMCG_OOM event was raised
    occasionally, causing the container management software to think, that OOM
    has happened.  However, no tasks have been killed.
    
    The following investigation showed that the problem is caused by a failing
    attempt to charge a high-order page.  In such case, the OOM killer is
    never invoked.  As shown below, it can happen under conditions, which are
    very far from a real OOM: e.g.  there is plenty of clean page cache and no
    memory pressure.
    
    There is no sense in raising an OOM event in this case, as it might
    confuse a user and lead to wrong and excessive actions (e.g.  restart the
    workload, as in my case).
    
    Let's look at the charging path in try_charge().  If the memory usage is
    about memory.max, which is absolutely natural for most memory cgroups, we
    try to reclaim some pages.  Even if we were able to reclaim enough memory
    for the allocation, the following check can fail due to a race with
    another concurrent allocation:
    
        if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
            goto retry;
    
    For regular pages the following condition will save us from triggering
    the OOM:
    
       if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER))
           goto retry;
    
    But for high-order allocation this condition will intentionally fail.  The
    reason behind is that we'll likely fall to regular pages anyway, so it's
    ok and even preferred to return ENOMEM.
    
    In this case the idea of raising MEMCG_OOM looks dubious.
    
    Fix this by moving MEMCG_OOM raising to mem_cgroup_oom() after allocation
    order check, so that the event won't be raised for high order allocations.
    This change doesn't affect regular pages allocation and charging.
    
    Link: http://lkml.kernel.org/r/20181004214050.7417-1-guro@fb.comSigned-off-by: NRoman Gushchin <guro@fb.com>
    Acked-by: NDavid Rientjes <rientjes@google.com>
    Acked-by: NMichal Hocko <mhocko@kernel.org>
    Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
    Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
    Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
    Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
    5213c279
memcontrol.c 190.9 KB