• M
    mm, memcg: fix the active list aging for lowmem requests when memcg is enabled · b4536f0c
    Michal Hocko 提交于
    Nils Holland and Klaus Ethgen have reported unexpected OOM killer
    invocations with 32b kernel starting with 4.8 kernels
    
    	kworker/u4:5 invoked oom-killer: gfp_mask=0x2400840(GFP_NOFS|__GFP_NOFAIL), nodemask=0, order=0, oom_score_adj=0
    	kworker/u4:5 cpuset=/ mems_allowed=0
    	CPU: 1 PID: 2603 Comm: kworker/u4:5 Not tainted 4.9.0-gentoo #2
    	[...]
    	Mem-Info:
    	active_anon:58685 inactive_anon:90 isolated_anon:0
    	 active_file:274324 inactive_file:281962 isolated_file:0
    	 unevictable:0 dirty:649 writeback:0 unstable:0
    	 slab_reclaimable:40662 slab_unreclaimable:17754
    	 mapped:7382 shmem:202 pagetables:351 bounce:0
    	 free:206736 free_pcp:332 free_cma:0
    	Node 0 active_anon:234740kB inactive_anon:360kB active_file:1097296kB inactive_file:1127848kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:29528kB dirty:2596kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 184320kB anon_thp: 808kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
    	DMA free:3952kB min:788kB low:984kB high:1180kB active_anon:0kB inactive_anon:0kB active_file:7316kB inactive_file:0kB unevictable:0kB writepending:96kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:3200kB slab_unreclaimable:1408kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
    	lowmem_reserve[]: 0 813 3474 3474
    	Normal free:41332kB min:41368kB low:51708kB high:62048kB active_anon:0kB inactive_anon:0kB active_file:532748kB inactive_file:44kB unevictable:0kB writepending:24kB present:897016kB managed:836248kB mlocked:0kB slab_reclaimable:159448kB slab_unreclaimable:69608kB kernel_stack:1112kB pagetables:1404kB bounce:0kB free_pcp:528kB local_pcp:340kB free_cma:0kB
    	lowmem_reserve[]: 0 0 21292 21292
    	HighMem free:781660kB min:512kB low:34356kB high:68200kB active_anon:234740kB inactive_anon:360kB active_file:557232kB inactive_file:1127804kB unevictable:0kB writepending:2592kB present:2725384kB managed:2725384kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:800kB local_pcp:608kB free_cma:0kB
    
    the oom killer is clearly pre-mature because there there is still a lot
    of page cache in the zone Normal which should satisfy this lowmem
    request.  Further debugging has shown that the reclaim cannot make any
    forward progress because the page cache is hidden in the active list
    which doesn't get rotated because inactive_list_is_low is not memcg
    aware.
    
    The code simply subtracts per-zone highmem counters from the respective
    memcg's lru sizes which doesn't make any sense.  We can simply end up
    always seeing the resulting active and inactive counts 0 and return
    false.  This issue is not limited to 32b kernels but in practice the
    effect on systems without CONFIG_HIGHMEM would be much harder to notice
    because we do not invoke the OOM killer for allocations requests
    targeting < ZONE_NORMAL.
    
    Fix the issue by tracking per zone lru page counts in mem_cgroup_per_node
    and subtract per-memcg highmem counts when memcg is enabled.  Introduce
    helper lruvec_zone_lru_size which redirects to either zone counters or
    mem_cgroup_get_zone_lru_size when appropriate.
    
    We are losing empty LRU but non-zero lru size detection introduced by
    ca707239 ("mm: update_lru_size warn and reset bad lru_size") because
    of the inherent zone vs. node discrepancy.
    
    Fixes: f8d1a311 ("mm: consider whether to decivate based on eligible zones inactive ratio")
    Link: http://lkml.kernel.org/r/20170104100825.3729-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
    Reported-by: NNils Holland <nholland@tisys.org>
    Tested-by: NNils Holland <nholland@tisys.org>
    Reported-by: NKlaus Ethgen <Klaus@Ethgen.de>
    Acked-by: NMinchan Kim <minchan@kernel.org>
    Acked-by: NMel Gorman <mgorman@suse.de>
    Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
    Reviewed-by: NVladimir Davydov <vdavydov.dev@gmail.com>
    Cc: <stable@vger.kernel.org>	[4.8+]
    Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
    b4536f0c
memcontrol.c 154.4 KB