• M
    memcg, oom: move out_of_memory back to the charge path · 29ef680a
    Michal Hocko 提交于
    Commit 3812c8c8 ("mm: memcg: do not trap chargers with full
    callstack on OOM") has changed the ENOMEM semantic of memcg charges.
    Rather than invoking the oom killer from the charging context it delays
    the oom killer to the page fault path (pagefault_out_of_memory).  This
    in turn means that many users (e.g.  slab or g-u-p) will get ENOMEM when
    the corresponding memcg hits the hard limit and the memcg is is OOM.
    This is behavior is inconsistent with !memcg case where the oom killer
    is invoked from the allocation context and the allocator keeps retrying
    until it succeeds.
    
    The difference in the behavior is user visible.  mmap(MAP_POPULATE)
    might result in not fully populated ranges while the mmap return code
    doesn't tell that to the userspace.  Random syscalls might fail with
    ENOMEM etc.
    
    The primary motivation of the different memcg oom semantic was the
    deadlock avoidance.  Things have changed since then, though.  We have an
    async oom teardown by the oom reaper now and so we do not have to rely
    on the victim to tear down its memory anymore.  Therefore we can return
    to the original semantic as long as the memcg oom killer is not handed
    over to the users space.
    
    There is still one thing to be careful about here though.  If the oom
    killer is not able to make any forward progress - e.g.  because there is
    no eligible task to kill - then we have to bail out of the charge path
    to prevent from same class of deadlocks.  We have basically two options
    here.  Either we fail the charge with ENOMEM or force the charge and
    allow overcharge.  The first option has been considered more harmful
    than useful because rare inconsistencies in the ENOMEM behavior is hard
    to test for and error prone.  Basically the same reason why the page
    allocator doesn't fail allocations under such conditions.  The later
    might allow runaways but those should be really unlikely unless somebody
    misconfigures the system.  E.g.  allowing to migrate tasks away from the
    memcg to a different unlimited memcg with move_charge_at_immigrate
    disabled.
    
    Link: http://lkml.kernel.org/r/20180628151101.25307-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
    Acked-by: NGreg Thelen <gthelen@google.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Shakeel Butt <shakeelb@google.com>
    Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
    29ef680a
sched.h 52.3 KB