• V
    mm: memcontrol: only mark charged pages with PageKmemcg · c4159a75
    Vladimir Davydov 提交于
    To distinguish non-slab pages charged to kmemcg we mark them PageKmemcg,
    which sets page->_mapcount to -512.  Currently, we set/clear PageKmemcg
    in __alloc_pages_nodemask()/free_pages_prepare() for any page allocated
    with __GFP_ACCOUNT, including those that aren't actually charged to any
    cgroup, i.e. allocated from the root cgroup context.  To avoid overhead
    in case cgroups are not used, we only do that if memcg_kmem_enabled() is
    true.  The latter is set iff there are kmem-enabled memory cgroups
    (online or offline).  The root cgroup is not considered kmem-enabled.
    
    As a result, if a page is allocated with __GFP_ACCOUNT for the root
    cgroup when there are kmem-enabled memory cgroups and is freed after all
    kmem-enabled memory cgroups were removed, e.g.
    
      # no memory cgroups has been created yet, create one
      mkdir /sys/fs/cgroup/memory/test
      # run something allocating pages with __GFP_ACCOUNT, e.g.
      # a program using pipe
      dmesg | tail
      # remove the memory cgroup
      rmdir /sys/fs/cgroup/memory/test
    
    we'll get bad page state bug complaining about page->_mapcount != -1:
    
      BUG: Bad page state in process swapper/0  pfn:1fd945c
      page:ffffea007f651700 count:0 mapcount:-511 mapping:          (null) index:0x0
      flags: 0x1000000000000000()
    
    To avoid that, let's mark with PageKmemcg only those pages that are
    actually charged to and hence pin a non-root memory cgroup.
    
    Fixes: 4949148a ("mm: charge/uncharge kmemcg from generic page allocator paths")
    Reported-and-tested-by: NEric Dumazet <eric.dumazet@gmail.com>
    Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
    Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
    c4159a75
memcontrol.c 153.2 KB