1. 03 11月, 2011 1 次提交
    • S
      cgroup/kmemleak: Annotate alloc_page() for cgroup allocations · ff7ee93f
      Steven Rostedt 提交于
      When the cgroup base was allocated with kmalloc, it was necessary to
      annotate the variable with kmemleak_not_leak().  But because it has
      recently been changed to be allocated with alloc_page() (which skips
      kmemleak checks) causes a warning on boot up.
      
      I was triggering this output:
      
       allocated 8388608 bytes of page_cgroup
       please try 'cgroup_disable=memory' option if you don't want memory cgroups
       kmemleak: Trying to color unknown object at 0xf5840000 as Grey
       Pid: 0, comm: swapper Not tainted 3.0.0-test #12
       Call Trace:
        [<c17e34e6>] ? printk+0x1d/0x1f^M
        [<c10e2941>] paint_ptr+0x4f/0x78
        [<c178ab57>] kmemleak_not_leak+0x58/0x7d
        [<c108ae9f>] ? __rcu_read_unlock+0x9/0x7d
        [<c1cdb462>] kmemleak_init+0x19d/0x1e9
        [<c1cbf771>] start_kernel+0x346/0x3ec
        [<c1cbf1b4>] ? loglevel+0x18/0x18
        [<c1cbf0aa>] i386_start_kernel+0xaa/0xb0
      
      After a bit of debugging I tracked the object 0xf840000 (and others) down
      to the cgroup code.  The change from allocating base with kmalloc to
      alloc_page() has the base not calling kmemleak_alloc() which adds the
      pointer to the object_tree_root, but kmemleak_not_leak() adds it to the
      crt_early_log[] table.  On kmemleak_init(), the entry is found in the
      early_log[] but not the object_tree_root, and this error message is
      displayed.
      
      If alloc_page() fails then it defaults back to vmalloc() which still uses
      the kmemleak_alloc() which makes us still need the kmemleak_not_leak()
      call.  The solution is to call the kmemleak_alloc() directly if the
      alloc_page() succeeds.
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NJonathan Nieder <jrnieder@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ff7ee93f
  2. 15 9月, 2011 1 次提交
  3. 26 7月, 2011 2 次提交
  4. 16 6月, 2011 1 次提交
  5. 27 5月, 2011 3 次提交
  6. 12 5月, 2011 1 次提交
  7. 31 3月, 2011 1 次提交
  8. 24 3月, 2011 3 次提交
  9. 23 3月, 2011 1 次提交
  10. 19 7月, 2010 1 次提交
  11. 18 3月, 2010 1 次提交
  12. 13 3月, 2010 1 次提交
    • D
      memcg: move charges of anonymous swap · 02491447
      Daisuke Nishimura 提交于
      This patch is another core part of this move-charge-at-task-migration
      feature.  It enables moving charges of anonymous swaps.
      
      To move the charge of swap, we need to exchange swap_cgroup's record.
      
      In current implementation, swap_cgroup's record is protected by:
      
        - page lock: if the entry is on swap cache.
        - swap_lock: if the entry is not on swap cache.
      
      This works well in usual swap-in/out activity.
      
      But this behavior make the feature of moving swap charge check many
      conditions to exchange swap_cgroup's record safely.
      
      So I changed modification of swap_cgroup's recored(swap_cgroup_record())
      to use xchg, and define a new function to cmpxchg swap_cgroup's record.
      
      This patch also enables moving charge of non pte_present but not uncharged
      swap caches, which can be exist on swap-out path, by getting the target
      pages via find_get_page() as do_mincore() does.
      
      [kosaki.motohiro@jp.fujitsu.com: fix ia64 build]
      [akpm@linux-foundation.org: fix typos]
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      02491447
  13. 22 9月, 2009 1 次提交
  14. 19 6月, 2009 3 次提交
  15. 12 6月, 2009 2 次提交
  16. 03 4月, 2009 2 次提交
    • K
      memcg: remove redundant message at swapon · 627991a2
      KAMEZAWA Hiroyuki 提交于
      It's pointed out that swap_cgroup's message at swapon() is nonsense.
      Because
      
        * It can be calculated very easily if all necessary information is
          written in Kconfig.
      
        * It's not necessary to annoying people at every swapon().
      
      In other view, now, memory usage per swp_entry is reduced to 2bytes from
      8bytes(64bit) and I think it's reasonably small.
      Reported-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      627991a2
    • K
      cgroups: use css id in swap cgroup for saving memory v5 · a3b2d692
      KAMEZAWA Hiroyuki 提交于
      Try to use CSS ID for records in swap_cgroup.  By this, on 64bit machine,
      size of swap_cgroup goes down to 2 bytes from 8bytes.
      
      This means, when 2GB of swap is equipped, (assume the page size is 4096bytes)
      
      	From size of swap_cgroup = 2G/4k * 8 = 4Mbytes.
      	To   size of swap_cgroup = 2G/4k * 2 = 1Mbytes.
      
      Reduction is large.  Of course, there are trade-offs.  This CSS ID will
      add overhead to swap-in/swap-out/swap-free.
      
      But in general,
        - swap is a resource which the user tend to avoid use.
        - If swap is never used, swap_cgroup area is not used.
        - Reading traditional manuals, size of swap should be proportional to
          size of memory. Memory size of machine is increasing now.
      
      I think reducing size of swap_cgroup makes sense.
      
      Note:
        - ID->CSS lookup routine has no locks, it's under RCU-Read-Side.
        - memcg can be obsolete at rmdir() but not freed while refcnt from
          swap_cgroup is available.
      
      Changelog v4->v5:
       - reworked on to memcg-charge-swapcache-to-proper-memcg.patch
      Changlog ->v4:
       - fixed not configured case.
       - deleted unnecessary comments.
       - fixed NULL pointer bug.
       - fixed message in dmesg.
      
      [nishimura@mxp.nes.nec.co.jp: css_tryget can be called twice in !PageCgroupUsed case]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a3b2d692
  17. 12 2月, 2009 1 次提交
  18. 09 1月, 2009 4 次提交
    • H
      memcg: add mem_cgroup_disabled() · f8d66542
      Hirokazu Takahashi 提交于
      We check mem_cgroup is disabled or not by checking
      mem_cgroup_subsys.disabled.  I think it has more references than expected,
      now.
      
      replacing
         if (mem_cgroup_subsys.disabled)
      with
         if (mem_cgroup_disabled())
      
      give us good look, I think.
      
      [kamezawa.hiroyu@jp.fujitsu.com: fix typo]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f8d66542
    • K
      memcg: synchronized LRU · 08e552c6
      KAMEZAWA Hiroyuki 提交于
      A big patch for changing memcg's LRU semantics.
      
      Now,
        - page_cgroup is linked to mem_cgroup's its own LRU (per zone).
      
        - LRU of page_cgroup is not synchronous with global LRU.
      
        - page and page_cgroup is one-to-one and statically allocated.
      
        - To find page_cgroup is on what LRU, you have to check pc->mem_cgroup as
          - lru = page_cgroup_zoneinfo(pc, nid_of_pc, zid_of_pc);
      
        - SwapCache is handled.
      
      And, when we handle LRU list of page_cgroup, we do following.
      
      	pc = lookup_page_cgroup(page);
      	lock_page_cgroup(pc); .....................(1)
      	mz = page_cgroup_zoneinfo(pc);
      	spin_lock(&mz->lru_lock);
      	.....add to LRU
      	spin_unlock(&mz->lru_lock);
      	unlock_page_cgroup(pc);
      
      But (1) is spin_lock and we have to be afraid of dead-lock with zone->lru_lock.
      So, trylock() is used at (1), now. Without (1), we can't trust "mz" is correct.
      
      This is a trial to remove this dirty nesting of locks.
      This patch changes mz->lru_lock to be zone->lru_lock.
      Then, above sequence will be written as
      
              spin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRU
      	mem_cgroup_add/remove/etc_lru() {
      		pc = lookup_page_cgroup(page);
      		mz = page_cgroup_zoneinfo(pc);
      		if (PageCgroupUsed(pc)) {
      			....add to LRU
      		}
              spin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRU
      
      This is much simpler.
      (*) We're safe even if we don't take lock_page_cgroup(pc). Because..
          1. When pc->mem_cgroup can be modified.
             - at charge.
             - at account_move().
          2. at charge
             the PCG_USED bit is not set before pc->mem_cgroup is fixed.
          3. at account_move()
             the page is isolated and not on LRU.
      
      Pros.
        - easy for maintenance.
        - memcg can make use of laziness of pagevec.
        - we don't have to duplicated LRU/Active/Unevictable bit in page_cgroup.
        - LRU status of memcg will be synchronized with global LRU's one.
        - # of locks are reduced.
        - account_move() is simplified very much.
      Cons.
        - may increase cost of LRU rotation.
          (no impact if memcg is not configured.)
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      08e552c6
    • K
      memcg: swap cgroup for remembering usage · 27a7faa0
      KAMEZAWA Hiroyuki 提交于
      For accounting swap, we need a record per swap entry, at least.
      
      This patch adds following function.
        - swap_cgroup_swapon() .... called from swapon
        - swap_cgroup_swapoff() ... called at the end of swapoff.
      
        - swap_cgroup_record() .... record information of swap entry.
        - swap_cgroup_lookup() .... lookup information of swap entry.
      
      This patch just implements "how to record information".  No actual method
      for limit the usage of swap.  These routine uses flat table to record and
      lookup.  "wise" lookup system like radix-tree requires requires memory
      allocation at new records but swap-out is usually called under memory
      shortage (or memcg hits limit.) So, I used static allocation.  (maybe
      dynamic allocation is not very hard but it adds additional memory
      allocation in memory shortage path.)
      
      Note1: In this, we use pointer to record information and this means
            8bytes per swap entry. I think we can reduce this when we
            create "id of cgroup" in the range of 0-65535 or 0-255.
      Reported-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Reviewed-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Tested-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Reported-by: NHugh Dickins <hugh@veritas.com>
      Reported-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Reported-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Pavel Emelianov <xemul@openvz.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      27a7faa0
    • F
      memcg: do not recalculate section unnecessarily in init_section_page_cgroup · 0753b0ef
      Fernando Luis Vazquez Cao 提交于
      In init_section_page_cgroup() the section a given pfn belongs to is
      calculated at the top of the function and, despite the fact that the
      pfn/section correspondence does not change, it is recalculated further
      down the same function.  By computing this just once and reusing that
      value we save some bytes in the object file and do not waste CPU cycles.
      Signed-off-by: NFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0753b0ef
  19. 07 1月, 2009 1 次提交
  20. 11 12月, 2008 1 次提交
  21. 02 12月, 2008 1 次提交
    • K
      memcg: memory hotplug fix for notifier callback · dc19f9db
      KAMEZAWA Hiroyuki 提交于
      Fixes for memcg/memory hotplug.
      
      While memory hotplug allocate/free memmap, page_cgroup doesn't free
      page_cgroup at OFFLINE when page_cgroup is allocated via bootomem.
      (Because freeing bootmem requires special care.)
      
      Then, if page_cgroup is allocated by bootmem and memmap is freed/allocated
      by memory hotplug, page_cgroup->page == page is no longer true.
      
      But current MEM_ONLINE handler doesn't check it and update
      page_cgroup->page if it's not necessary to allocate page_cgroup.  (This
      was not found because memmap is not freed if SPARSEMEM_VMEMMAP is y.)
      
      And I noticed that MEM_ONLINE can be called against "part of section".
      So, freeing page_cgroup at CANCEL_ONLINE will cause trouble.  (freeing
      used page_cgroup) Don't rollback at CANCEL.
      
      One more, current memory hotplug notifier is stopped by slub because it
      sets NOTIFY_STOP_MASK to return vaule.  So, page_cgroup's callback never
      be called.  (low priority than slub now.)
      
      I think this slub's behavior is not intentional(BUG). and fixes it.
      
      Another way to be considered about page_cgroup allocation:
        - free page_cgroup at OFFLINE even if it's from bootmem
          and remove specieal handler. But it requires more changes.
      
      Addresses http://bugzilla.kernel.org/show_bug.cgi?id=12041Signed-off-by: NKAMEZAWA Hiruyoki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Tested-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dc19f9db
  22. 01 12月, 2008 1 次提交
  23. 13 11月, 2008 1 次提交
  24. 23 10月, 2008 2 次提交
  25. 20 10月, 2008 1 次提交
    • K
      memcg: allocate all page_cgroup at boot · 52d4b9ac
      KAMEZAWA Hiroyuki 提交于
      Allocate all page_cgroup at boot and remove page_cgroup poitner from
      struct page.  This patch adds an interface as
      
       struct page_cgroup *lookup_page_cgroup(struct page*)
      
      All FLATMEM/DISCONTIGMEM/SPARSEMEM  and MEMORY_HOTPLUG is supported.
      
      Remove page_cgroup pointer reduces the amount of memory by
       - 4 bytes per PAGE_SIZE.
       - 8 bytes per PAGE_SIZE
      if memory controller is disabled. (even if configured.)
      
      On usual 8GB x86-32 server, this saves 8MB of NORMAL_ZONE memory.
      On my x86-64 server with 48GB of memory, this saves 96MB of memory.
      I think this reduction makes sense.
      
      By pre-allocation, kmalloc/kfree in charge/uncharge are removed.
      This means
        - we're not necessary to be afraid of kmalloc faiulre.
          (this can happen because of gfp_mask type.)
        - we can avoid calling kmalloc/kfree.
        - we can avoid allocating tons of small objects which can be fragmented.
        - we can know what amount of memory will be used for this extra-lru handling.
      
      I added printk message as
      
      	"allocated %ld bytes of page_cgroup"
              "please try cgroup_disable=memory option if you don't want"
      
      maybe enough informative for users.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      52d4b9ac