1. 24 9月, 2009 1 次提交
  2. 20 9月, 2009 1 次提交
  3. 12 6月, 2009 1 次提交
  4. 03 4月, 2009 1 次提交
    • K
      cgroups: use css id in swap cgroup for saving memory v5 · a3b2d692
      KAMEZAWA Hiroyuki 提交于
      Try to use CSS ID for records in swap_cgroup.  By this, on 64bit machine,
      size of swap_cgroup goes down to 2 bytes from 8bytes.
      
      This means, when 2GB of swap is equipped, (assume the page size is 4096bytes)
      
      	From size of swap_cgroup = 2G/4k * 8 = 4Mbytes.
      	To   size of swap_cgroup = 2G/4k * 2 = 1Mbytes.
      
      Reduction is large.  Of course, there are trade-offs.  This CSS ID will
      add overhead to swap-in/swap-out/swap-free.
      
      But in general,
        - swap is a resource which the user tend to avoid use.
        - If swap is never used, swap_cgroup area is not used.
        - Reading traditional manuals, size of swap should be proportional to
          size of memory. Memory size of machine is increasing now.
      
      I think reducing size of swap_cgroup makes sense.
      
      Note:
        - ID->CSS lookup routine has no locks, it's under RCU-Read-Side.
        - memcg can be obsolete at rmdir() but not freed while refcnt from
          swap_cgroup is available.
      
      Changelog v4->v5:
       - reworked on to memcg-charge-swapcache-to-proper-memcg.patch
      Changlog ->v4:
       - fixed not configured case.
       - deleted unnecessary comments.
       - fixed NULL pointer bug.
       - fixed message in dmesg.
      
      [nishimura@mxp.nes.nec.co.jp: css_tryget can be called twice in !PageCgroupUsed case]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a3b2d692
  5. 09 1月, 2009 2 次提交
    • K
      memcg: synchronized LRU · 08e552c6
      KAMEZAWA Hiroyuki 提交于
      A big patch for changing memcg's LRU semantics.
      
      Now,
        - page_cgroup is linked to mem_cgroup's its own LRU (per zone).
      
        - LRU of page_cgroup is not synchronous with global LRU.
      
        - page and page_cgroup is one-to-one and statically allocated.
      
        - To find page_cgroup is on what LRU, you have to check pc->mem_cgroup as
          - lru = page_cgroup_zoneinfo(pc, nid_of_pc, zid_of_pc);
      
        - SwapCache is handled.
      
      And, when we handle LRU list of page_cgroup, we do following.
      
      	pc = lookup_page_cgroup(page);
      	lock_page_cgroup(pc); .....................(1)
      	mz = page_cgroup_zoneinfo(pc);
      	spin_lock(&mz->lru_lock);
      	.....add to LRU
      	spin_unlock(&mz->lru_lock);
      	unlock_page_cgroup(pc);
      
      But (1) is spin_lock and we have to be afraid of dead-lock with zone->lru_lock.
      So, trylock() is used at (1), now. Without (1), we can't trust "mz" is correct.
      
      This is a trial to remove this dirty nesting of locks.
      This patch changes mz->lru_lock to be zone->lru_lock.
      Then, above sequence will be written as
      
              spin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRU
      	mem_cgroup_add/remove/etc_lru() {
      		pc = lookup_page_cgroup(page);
      		mz = page_cgroup_zoneinfo(pc);
      		if (PageCgroupUsed(pc)) {
      			....add to LRU
      		}
              spin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRU
      
      This is much simpler.
      (*) We're safe even if we don't take lock_page_cgroup(pc). Because..
          1. When pc->mem_cgroup can be modified.
             - at charge.
             - at account_move().
          2. at charge
             the PCG_USED bit is not set before pc->mem_cgroup is fixed.
          3. at account_move()
             the page is isolated and not on LRU.
      
      Pros.
        - easy for maintenance.
        - memcg can make use of laziness of pagevec.
        - we don't have to duplicated LRU/Active/Unevictable bit in page_cgroup.
        - LRU status of memcg will be synchronized with global LRU's one.
        - # of locks are reduced.
        - account_move() is simplified very much.
      Cons.
        - may increase cost of LRU rotation.
          (no impact if memcg is not configured.)
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      08e552c6
    • K
      memcg: swap cgroup for remembering usage · 27a7faa0
      KAMEZAWA Hiroyuki 提交于
      For accounting swap, we need a record per swap entry, at least.
      
      This patch adds following function.
        - swap_cgroup_swapon() .... called from swapon
        - swap_cgroup_swapoff() ... called at the end of swapoff.
      
        - swap_cgroup_record() .... record information of swap entry.
        - swap_cgroup_lookup() .... lookup information of swap entry.
      
      This patch just implements "how to record information".  No actual method
      for limit the usage of swap.  These routine uses flat table to record and
      lookup.  "wise" lookup system like radix-tree requires requires memory
      allocation at new records but swap-out is usually called under memory
      shortage (or memcg hits limit.) So, I used static allocation.  (maybe
      dynamic allocation is not very hard but it adds additional memory
      allocation in memory shortage path.)
      
      Note1: In this, we use pointer to record information and this means
            8bytes per swap entry. I think we can reduce this when we
            create "id of cgroup" in the range of 0-65535 or 0-255.
      Reported-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Reviewed-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Tested-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Reported-by: NHugh Dickins <hugh@veritas.com>
      Reported-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Reported-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Pavel Emelianov <xemul@openvz.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      27a7faa0
  6. 01 12月, 2008 1 次提交
  7. 23 10月, 2008 1 次提交
    • K
      memcg: fix page_cgroup allocation · 94b6da5a
      KAMEZAWA Hiroyuki 提交于
      page_cgroup_init() is called from mem_cgroup_init(). But at this
      point, we cannot call alloc_bootmem().
      (and this caused panic at boot.)
      
      This patch moves page_cgroup_init() to init/main.c.
      
      Time table is following:
      ==
        parse_args(). # we can trust mem_cgroup_subsys.disabled bit after this.
        ....
        cgroup_init_early()  # "early" init of cgroup.
        ....
        setup_arch()         # memmap is allocated.
        ...
        page_cgroup_init();
        mem_init();   # we cannot call alloc_bootmem after this.
        ....
        cgroup_init() # mem_cgroup is initialized.
      ==
      
      Before page_cgroup_init(), mem_map must be initialized. So,
      I added page_cgroup_init() to init/main.c directly.
      
      (*) maybe this is not very clean but
          - cgroup_init_early() is too early
          - in cgroup_init(), we have to use vmalloc instead of alloc_bootmem().
          use of vmalloc area in x86-32 is important and we should avoid very large
          vmalloc() in x86-32. So, we want to use alloc_bootmem() and added page_cgroup_init()
          directly to init/main.c
      
      [akpm@linux-foundation.org: remove unneeded/bad mem_cgroup_subsys declaration]
      [akpm@linux-foundation.org: fix build]
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Tested-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      94b6da5a
  8. 20 10月, 2008 1 次提交
    • K
      memcg: allocate all page_cgroup at boot · 52d4b9ac
      KAMEZAWA Hiroyuki 提交于
      Allocate all page_cgroup at boot and remove page_cgroup poitner from
      struct page.  This patch adds an interface as
      
       struct page_cgroup *lookup_page_cgroup(struct page*)
      
      All FLATMEM/DISCONTIGMEM/SPARSEMEM  and MEMORY_HOTPLUG is supported.
      
      Remove page_cgroup pointer reduces the amount of memory by
       - 4 bytes per PAGE_SIZE.
       - 8 bytes per PAGE_SIZE
      if memory controller is disabled. (even if configured.)
      
      On usual 8GB x86-32 server, this saves 8MB of NORMAL_ZONE memory.
      On my x86-64 server with 48GB of memory, this saves 96MB of memory.
      I think this reduction makes sense.
      
      By pre-allocation, kmalloc/kfree in charge/uncharge are removed.
      This means
        - we're not necessary to be afraid of kmalloc faiulre.
          (this can happen because of gfp_mask type.)
        - we can avoid calling kmalloc/kfree.
        - we can avoid allocating tons of small objects which can be fragmented.
        - we can know what amount of memory will be used for this extra-lru handling.
      
      I added printk message as
      
      	"allocated %ld bytes of page_cgroup"
              "please try cgroup_disable=memory option if you don't want"
      
      maybe enough informative for users.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      52d4b9ac