提交 03f3c433 编写于 作者: K KAMEZAWA Hiroyuki 提交者: Linus Torvalds

memcg: fix swap accounting leak

Fix swapin charge operation of memcg.

Now, memcg has hooks to swap-out operation and checks SwapCache is really
unused or not.  That check depends on contents of struct page.  I.e.  If
PageAnon(page) && page_mapped(page), the page is recoginized as
still-in-use.

Now, reuse_swap_page() calles delete_from_swap_cache() before establishment
of any rmap. Then, in followinig sequence

	(Page fault with WRITE)
	try_charge() (charge += PAGESIZE)
	commit_charge() (Check page_cgroup is used or not..)
	reuse_swap_page()
		-> delete_from_swapcache()
			-> mem_cgroup_uncharge_swapcache() (charge -= PAGESIZE)
	......
New charge is uncharged soon....
To avoid this,  move commit_charge() after page_mapcount() goes up to 1.
By this,

	try_charge()		(usage += PAGESIZE)
	reuse_swap_page()	(may usage -= PAGESIZE if PCG_USED is set)
	commit_charge()		(If page_cgroup is not marked as PCG_USED,
				 add new charge.)
Accounting will be correct.

Changelog (v2) -> (v3)
  - fixed invalid charge to swp_entry==0.
  - updated documentation.
Changelog (v1) -> (v2)
  - fixed comment.

[nishimura@mxp.nes.nec.co.jp: swap accounting leak doc fix]
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Tested-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
上级 42e9abb6
Memory Resource Controller(Memcg) Implementation Memo. Memory Resource Controller(Memcg) Implementation Memo.
Last Updated: 2008/12/10 Last Updated: 2008/12/15
Base Kernel Version: based on 2.6.28-rc7-mm. Base Kernel Version: based on 2.6.28-rc8-mm.
Because VM is getting complex (one of reasons is memcg...), memcg's behavior Because VM is getting complex (one of reasons is memcg...), memcg's behavior
is complex. This is a document for memcg's internal behavior. is complex. This is a document for memcg's internal behavior.
...@@ -111,9 +111,40 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y. ...@@ -111,9 +111,40 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
(b) If the SwapCache has been mapped by processes, it has been (b) If the SwapCache has been mapped by processes, it has been
charged already. charged already.
In case (a), we charge it. In case (b), we don't charge it. This swap-in is one of the most complicated work. In do_swap_page(),
(But racy state between (a) and (b) exists. We do check it.) following events occur when pte is unchanged.
At charging, a charge recorded in swap_cgroup is moved to page_cgroup.
(1) the page (SwapCache) is looked up.
(2) lock_page()
(3) try_charge_swapin()
(4) reuse_swap_page() (may call delete_swap_cache())
(5) commit_charge_swapin()
(6) swap_free().
Considering following situation for example.
(A) The page has not been charged before (2) and reuse_swap_page()
doesn't call delete_from_swap_cache().
(B) The page has not been charged before (2) and reuse_swap_page()
calls delete_from_swap_cache().
(C) The page has been charged before (2) and reuse_swap_page() doesn't
call delete_from_swap_cache().
(D) The page has been charged before (2) and reuse_swap_page() calls
delete_from_swap_cache().
memory.usage/memsw.usage changes to this page/swp_entry will be
Case (A) (B) (C) (D)
Event
Before (2) 0/ 1 0/ 1 1/ 1 1/ 1
===========================================
(3) +1/+1 +1/+1 +1/+1 +1/+1
(4) - 0/ 0 - -1/ 0
(5) 0/-1 0/ 0 -1/-1 0/ 0
(6) - 0/-1 - 0/-1
===========================================
Result 1/ 1 1/ 1 1/ 1 1/ 1
In any cases, charges to this page should be 1/ 1.
4.2 Swap-out. 4.2 Swap-out.
At swap-out, typical state transition is below. At swap-out, typical state transition is below.
......
...@@ -1169,10 +1169,11 @@ void mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr) ...@@ -1169,10 +1169,11 @@ void mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr)
/* /*
* Now swap is on-memory. This means this page may be * Now swap is on-memory. This means this page may be
* counted both as mem and swap....double count. * counted both as mem and swap....double count.
* Fix it by uncharging from memsw. This SwapCache is stable * Fix it by uncharging from memsw. Basically, this SwapCache is stable
* because we're still under lock_page(). * under lock_page(). But in do_swap_page()::memory.c, reuse_swap_page()
* may call delete_from_swap_cache() before reach here.
*/ */
if (do_swap_account) { if (do_swap_account && PageSwapCache(page)) {
swp_entry_t ent = {.val = page_private(page)}; swp_entry_t ent = {.val = page_private(page)};
struct mem_cgroup *memcg; struct mem_cgroup *memcg;
memcg = swap_cgroup_record(ent, NULL); memcg = swap_cgroup_record(ent, NULL);
......
...@@ -2457,22 +2457,23 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma, ...@@ -2457,22 +2457,23 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
* while the page is counted on swap but not yet in mapcount i.e. * while the page is counted on swap but not yet in mapcount i.e.
* before page_add_anon_rmap() and swap_free(); try_to_free_swap() * before page_add_anon_rmap() and swap_free(); try_to_free_swap()
* must be called after the swap_free(), or it will never succeed. * must be called after the swap_free(), or it will never succeed.
* And mem_cgroup_commit_charge_swapin(), which uses the swp_entry * Because delete_from_swap_page() may be called by reuse_swap_page(),
* in page->private, must be called before reuse_swap_page(), * mem_cgroup_commit_charge_swapin() may not be able to find swp_entry
* which may delete_from_swap_cache(). * in page->private. In this case, a record in swap_cgroup is silently
* discarded at swap_free().
*/ */
mem_cgroup_commit_charge_swapin(page, ptr);
inc_mm_counter(mm, anon_rss); inc_mm_counter(mm, anon_rss);
pte = mk_pte(page, vma->vm_page_prot); pte = mk_pte(page, vma->vm_page_prot);
if (write_access && reuse_swap_page(page)) { if (write_access && reuse_swap_page(page)) {
pte = maybe_mkwrite(pte_mkdirty(pte), vma); pte = maybe_mkwrite(pte_mkdirty(pte), vma);
write_access = 0; write_access = 0;
} }
flush_icache_page(vma, page); flush_icache_page(vma, page);
set_pte_at(mm, address, page_table, pte); set_pte_at(mm, address, page_table, pte);
page_add_anon_rmap(page, vma, address); page_add_anon_rmap(page, vma, address);
/* It's better to call commit-charge after rmap is established */
mem_cgroup_commit_charge_swapin(page, ptr);
swap_free(entry); swap_free(entry);
if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册