• J
    mm: memcontrol: rewrite charge API · 00501b53
    Johannes Weiner 提交于
    These patches rework memcg charge lifetime to integrate more naturally
    with the lifetime of user pages.  This drastically simplifies the code and
    reduces charging and uncharging overhead.  The most expensive part of
    charging and uncharging is the page_cgroup bit spinlock, which is removed
    entirely after this series.
    
    Here are the top-10 profile entries of a stress test that reads a 128G
    sparse file on a freshly booted box, without even a dedicated cgroup (i.e.
     executing in the root memcg).  Before:
    
        15.36%              cat  [kernel.kallsyms]   [k] copy_user_generic_string
        13.31%              cat  [kernel.kallsyms]   [k] memset
        11.48%              cat  [kernel.kallsyms]   [k] do_mpage_readpage
         4.23%              cat  [kernel.kallsyms]   [k] get_page_from_freelist
         2.38%              cat  [kernel.kallsyms]   [k] put_page
         2.32%              cat  [kernel.kallsyms]   [k] __mem_cgroup_commit_charge
         2.18%          kswapd0  [kernel.kallsyms]   [k] __mem_cgroup_uncharge_common
         1.92%          kswapd0  [kernel.kallsyms]   [k] shrink_page_list
         1.86%              cat  [kernel.kallsyms]   [k] __radix_tree_lookup
         1.62%              cat  [kernel.kallsyms]   [k] __pagevec_lru_add_fn
    
    After:
    
        15.67%           cat  [kernel.kallsyms]   [k] copy_user_generic_string
        13.48%           cat  [kernel.kallsyms]   [k] memset
        11.42%           cat  [kernel.kallsyms]   [k] do_mpage_readpage
         3.98%           cat  [kernel.kallsyms]   [k] get_page_from_freelist
         2.46%           cat  [kernel.kallsyms]   [k] put_page
         2.13%       kswapd0  [kernel.kallsyms]   [k] shrink_page_list
         1.88%           cat  [kernel.kallsyms]   [k] __radix_tree_lookup
         1.67%           cat  [kernel.kallsyms]   [k] __pagevec_lru_add_fn
         1.39%       kswapd0  [kernel.kallsyms]   [k] free_pcppages_bulk
         1.30%           cat  [kernel.kallsyms]   [k] kfree
    
    As you can see, the memcg footprint has shrunk quite a bit.
    
       text    data     bss     dec     hex filename
      37970    9892     400   48262    bc86 mm/memcontrol.o.old
      35239    9892     400   45531    b1db mm/memcontrol.o
    
    This patch (of 4):
    
    The memcg charge API charges pages before they are rmapped - i.e.  have an
    actual "type" - and so every callsite needs its own set of charge and
    uncharge functions to know what type is being operated on.  Worse,
    uncharge has to happen from a context that is still type-specific, rather
    than at the end of the page's lifetime with exclusive access, and so
    requires a lot of synchronization.
    
    Rewrite the charge API to provide a generic set of try_charge(),
    commit_charge() and cancel_charge() transaction operations, much like
    what's currently done for swap-in:
    
      mem_cgroup_try_charge() attempts to reserve a charge, reclaiming
      pages from the memcg if necessary.
    
      mem_cgroup_commit_charge() commits the page to the charge once it
      has a valid page->mapping and PageAnon() reliably tells the type.
    
      mem_cgroup_cancel_charge() aborts the transaction.
    
    This reduces the charge API and enables subsequent patches to
    drastically simplify uncharging.
    
    As pages need to be committed after rmap is established but before they
    are added to the LRU, page_add_new_anon_rmap() must stop doing LRU
    additions again.  Revive lru_cache_add_active_or_unevictable().
    
    [hughd@google.com: fix shmem_unuse]
    [hughd@google.com: Add comments on the private use of -EAGAIN]
    Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
    Acked-by: NMichal Hocko <mhocko@suse.cz>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Vladimir Davydov <vdavydov@parallels.com>
    Signed-off-by: NHugh Dickins <hughd@google.com>
    Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
    00501b53
memory.c 103.0 KB