提交 5b4e655e 编写于 作者: K KAMEZAWA Hiroyuki 提交者: Linus Torvalds

memcg: avoid accounting special pages

There are not-on-LRU pages which can be mapped and they are not worth to
be accounted.  (becasue we can't shrink them and need dirty codes to
handle specical case) We'd like to make use of usual objrmap/radix-tree's
protcol and don't want to account out-of-vm's control pages.

When special_mapping_fault() is called, page->mapping is tend to be NULL
and it's charged as Anonymous page.  insert_page() also handles some
special pages from drivers.

This patch is for avoiding to account special pages.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
上级 b7abea96
...@@ -112,14 +112,22 @@ the per cgroup LRU. ...@@ -112,14 +112,22 @@ the per cgroup LRU.
2.2.1 Accounting details 2.2.1 Accounting details
All mapped pages (RSS) and unmapped user pages (Page Cache) are accounted. All mapped anon pages (RSS) and cache pages (Page Cache) are accounted.
RSS pages are accounted at the time of page_add_*_rmap() unless they've already (some pages which never be reclaimable and will not be on global LRU
been accounted for earlier. A file page will be accounted for as Page Cache; are not accounted. we just accounts pages under usual vm management.)
it's mapped into the page tables of a process, duplicate accounting is carefully
avoided. Page Cache pages are accounted at the time of add_to_page_cache(). RSS pages are accounted at page_fault unless they've already been accounted
The corresponding routines that remove a page from the page tables or removes for earlier. A file page will be accounted for as Page Cache when it's
a page from Page Cache is used to decrement the accounting counters of the inserted into inode (radix-tree). While it's mapped into the page tables of
cgroup. processes, duplicate accounting is carefully avoided.
A RSS page is unaccounted when it's fully unmapped. A PageCache page is
unaccounted when it's removed from radix-tree.
At page migration, accounting information is kept.
Note: we just account pages-on-lru because our purpose is to control amount
of used pages. not-on-lru pages are tend to be out-of-control from vm view.
2.3 Shared Page Accounting 2.3 Shared Page Accounting
......
...@@ -1323,18 +1323,14 @@ static int insert_page(struct vm_area_struct *vma, unsigned long addr, ...@@ -1323,18 +1323,14 @@ static int insert_page(struct vm_area_struct *vma, unsigned long addr,
pte_t *pte; pte_t *pte;
spinlock_t *ptl; spinlock_t *ptl;
retval = mem_cgroup_charge(page, mm, GFP_KERNEL);
if (retval)
goto out;
retval = -EINVAL; retval = -EINVAL;
if (PageAnon(page)) if (PageAnon(page))
goto out_uncharge; goto out;
retval = -ENOMEM; retval = -ENOMEM;
flush_dcache_page(page); flush_dcache_page(page);
pte = get_locked_pte(mm, addr, &ptl); pte = get_locked_pte(mm, addr, &ptl);
if (!pte) if (!pte)
goto out_uncharge; goto out;
retval = -EBUSY; retval = -EBUSY;
if (!pte_none(*pte)) if (!pte_none(*pte))
goto out_unlock; goto out_unlock;
...@@ -1350,8 +1346,6 @@ static int insert_page(struct vm_area_struct *vma, unsigned long addr, ...@@ -1350,8 +1346,6 @@ static int insert_page(struct vm_area_struct *vma, unsigned long addr,
return retval; return retval;
out_unlock: out_unlock:
pte_unmap_unlock(pte, ptl); pte_unmap_unlock(pte, ptl);
out_uncharge:
mem_cgroup_uncharge_page(page);
out: out:
return retval; return retval;
} }
...@@ -2463,6 +2457,7 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma, ...@@ -2463,6 +2457,7 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
struct page *page; struct page *page;
pte_t entry; pte_t entry;
int anon = 0; int anon = 0;
int charged = 0;
struct page *dirty_page = NULL; struct page *dirty_page = NULL;
struct vm_fault vmf; struct vm_fault vmf;
int ret; int ret;
...@@ -2503,6 +2498,12 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma, ...@@ -2503,6 +2498,12 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
ret = VM_FAULT_OOM; ret = VM_FAULT_OOM;
goto out; goto out;
} }
if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
ret = VM_FAULT_OOM;
page_cache_release(page);
goto out;
}
charged = 1;
/* /*
* Don't let another task, with possibly unlocked vma, * Don't let another task, with possibly unlocked vma,
* keep the mlocked page. * keep the mlocked page.
...@@ -2543,11 +2544,6 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma, ...@@ -2543,11 +2544,6 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
} }
if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
ret = VM_FAULT_OOM;
goto out;
}
page_table = pte_offset_map_lock(mm, pmd, address, &ptl); page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
/* /*
...@@ -2585,7 +2581,8 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma, ...@@ -2585,7 +2581,8 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
/* no need to invalidate: a not-present page won't be cached */ /* no need to invalidate: a not-present page won't be cached */
update_mmu_cache(vma, address, entry); update_mmu_cache(vma, address, entry);
} else { } else {
mem_cgroup_uncharge_page(page); if (charged)
mem_cgroup_uncharge_page(page);
if (anon) if (anon)
page_cache_release(page); page_cache_release(page);
else else
......
...@@ -727,8 +727,8 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma) ...@@ -727,8 +727,8 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma)
page_clear_dirty(page); page_clear_dirty(page);
set_page_dirty(page); set_page_dirty(page);
} }
if (PageAnon(page))
mem_cgroup_uncharge_page(page); mem_cgroup_uncharge_page(page);
__dec_zone_page_state(page, __dec_zone_page_state(page,
PageAnon(page) ? NR_ANON_PAGES : NR_FILE_MAPPED); PageAnon(page) ? NR_ANON_PAGES : NR_FILE_MAPPED);
/* /*
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册