• K
    memcg: add mem_cgroup_replace_page_cache() to fix LRU issue · ab936cbc
    KAMEZAWA Hiroyuki 提交于
    Commit ef6a3c63 ("mm: add replace_page_cache_page() function") added a
    function replace_page_cache_page().  This function replaces a page in the
    radix-tree with a new page.  WHen doing this, memory cgroup needs to fix
    up the accounting information.  memcg need to check PCG_USED bit etc.
    
    In some(many?) cases, 'newpage' is on LRU before calling
    replace_page_cache().  So, memcg's LRU accounting information should be
    fixed, too.
    
    This patch adds mem_cgroup_replace_page_cache() and removes the old hooks.
     In that function, old pages will be unaccounted without touching
    res_counter and new page will be accounted to the memcg (of old page).
    WHen overwriting pc->mem_cgroup of newpage, take zone->lru_lock and avoid
    races with LRU handling.
    
    Background:
      replace_page_cache_page() is called by FUSE code in its splice() handling.
      Here, 'newpage' is replacing oldpage but this newpage is not a newly allocated
      page and may be on LRU. LRU mis-accounting will be critical for memory cgroup
      because rmdir() checks the whole LRU is empty and there is no account leak.
      If a page is on the other LRU than it should be, rmdir() will fail.
    
    This bug was added in March 2011, but no bug report yet.  I guess there
    are not many people who use memcg and FUSE at the same time with upstream
    kernels.
    
    The result of this bug is that admin cannot destroy a memcg because of
    account leak.  So, no panic, no deadlock.  And, even if an active cgroup
    exist, umount can succseed.  So no problem at shutdown.
    Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
    Acked-by: NMichal Hocko <mhocko@suse.cz>
    Cc: Miklos Szeredi <mszeredi@suse.cz>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
    ab936cbc
memcontrol.h 10.6 KB