• K
    memcg: coalesce charging via percpu storage · cdec2e42
    KAMEZAWA Hiroyuki 提交于
    This is a patch for coalescing access to res_counter at charging by percpu
    caching.  At charge, memcg charges 64pages and remember it in percpu
    cache.  Because it's cache, drain/flush if necessary.
    
    This version uses public percpu area.
     2 benefits for using public percpu area.
     1. Sum of stocked charge in the system is limited to # of cpus
        not to the number of memcg. This shows better synchonization.
     2. drain code for flush/cpuhotplug is very easy (and quick)
    
    The most important point of this patch is that we never touch res_counter
    in fast path. The res_counter is system-wide shared counter which is modified
    very frequently. We shouldn't touch it as far as we can for avoiding
    false sharing.
    
    On x86-64 8cpu server, I tested overheads of memcg at page fault by
    running a program which does map/fault/unmap in a loop. Running
    a task per a cpu by taskset and see sum of the number of page faults
    in 60secs.
    
    [without memcg config]
      40156968  page-faults              #      0.085 M/sec   ( +-   0.046% )
      27.67 cache-miss/faults
    
    [root cgroup]
      36659599  page-faults              #      0.077 M/sec   ( +-   0.247% )
      31.58 cache miss/faults
    
    [in a child cgroup]
      18444157  page-faults              #      0.039 M/sec   ( +-   0.133% )
      69.96 cache miss/faults
    
    [ + coalescing uncharge patch]
      27133719  page-faults              #      0.057 M/sec   ( +-   0.155% )
      47.16 cache miss/faults
    
    [ + coalescing uncharge patch + this patch ]
      34224709  page-faults              #      0.072 M/sec   ( +-   0.173% )
      34.69 cache miss/faults
    
    Changelog (since Oct/2):
      - updated comments
      - replaced get_cpu_var() with __get_cpu_var() if possible.
      - removed mutex for system-wide drain. adds a counter instead of it.
      - removed CONFIG_HOTPLUG_CPU
    
    Changelog (old):
      - rebased onto the latest mmotm
      - moved charge size check before __GFP_WAIT check for avoiding unnecesary
      - added asynchronous flush routine.
      - fixed bugs pointed out by Nishimura-san.
    
    [akpm@linux-foundation.org: tweak comments]
    [nishimura@mxp.nes.nec.co.jp: don't do INIT_WORK() repeatedly against the same work_struct]
    Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Cc: Balbir Singh <balbir@in.ibm.com>
    Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
    Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
    cdec2e42
memcontrol.c 86.0 KB