提交 · ecca5c3acc0d0933d89abc44e60afb0cc8170e35 · openeuler / raspberrypi-kernel

13 4月, 2012 1 次提交

memcg: fix up documentation on global LRU · 6252efcc

由 Ying Han 提交于 4月 12, 2012

In v3.3-rc1, the global LRU was removed in commit 925b7673 ("mm:
make per-memcg LRU lists exclusive").  The patch fixes up the memcg
docs.

I left the swap session to someone who has better understanding of
'memory+swap'.
Signed-off-by: NYing Han <yinghan@google.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6252efcc

13 1月, 2012 2 次提交

memcg: fix pgpgin/pgpgout documentation · 0527b690

由 Ying Han 提交于 1月 12, 2012

The two memcg stats pgpgin/pgpgout have different meaning than the ones
in vmstat, which indicates that we picked a bad naming for them.

It might be late to change the stat name, but better documentation is
always helpful.
Signed-off-by: NYing Han <yinghan@google.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0527b690

Documentation/cgroups/memory.txt: fix typo · d66c1ce7

由 Zhu Yanhai 提交于 1月 12, 2012

It should be memsw.max_usage_in_bytes. This typo has been there for
a really long time.
Signed-off-by: NZhu Yanhai <gaoyang.zyh@taobao.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d66c1ce7

23 12月, 2011 1 次提交

Partial revert "Basic kernel memory functionality for the Memory Controller" · 65c64ce8

由 Glauber Costa 提交于 12月 22, 2011

This reverts commit e5671dfa.

After a follow up discussion with Michal, it was agreed it would
be better to leave the kmem controller with just the tcp files,
deferring the behavior of the other general memory.kmem.* files
for a later time, when more caches are controlled. This is because
generic kmem files are not used by tcp accounting and it is
not clear how other slab caches would fit into the scheme.

We are reverting the original commit so we can track the reference.
Part of the patch is kept, because it was used by the later tcp
code. Conflicts are shown in the bottom. init/Kconfig is removed from
the revert entirely.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>
CC: Kirill A. Shutemov <kirill@shutemov.name>
CC: Paul Menage <paul@paulmenage.org>
CC: Greg Thelen <gthelen@google.com>
CC: Johannes Weiner <jweiner@redhat.com>
CC: David S. Miller <davem@davemloft.net>

Conflicts:

	Documentation/cgroups/memory.txt
	mm/memcontrol.c
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

65c64ce8

13 12月, 2011 5 次提交

Display current tcp memory allocation in kmem cgroup · 5a6dd343

由 Glauber Costa 提交于 12月 11, 2011

This patch introduces kmem.tcp.usage_in_bytes file, living in the
kmem_cgroup filesystem. It is a simple read-only file that displays the
amount of kernel memory currently consumed by the cgroup.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Reviewed-by: NHiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com>
CC: David S. Miller <davem@davemloft.net>
CC: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a6dd343

tcp buffer limitation: per-cgroup limit · 3aaabe23

由 Glauber Costa 提交于 12月 11, 2011

This patch uses the "tcp.limit_in_bytes" field of the kmem_cgroup to
effectively control the amount of kernel memory pinned by a cgroup.

This value is ignored in the root cgroup, and in all others,
caps the value specified by the admin in the net namespaces'
view of tcp_sysctl_mem.

If namespaces are being used, the admin is allowed to set a
value bigger than cgroup's maximum, the same way it is allowed
to set pretty much unlimited values in a real box.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Reviewed-by: NHiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com>
CC: David S. Miller <davem@davemloft.net>
CC: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3aaabe23

tcp memory pressure controls · d1a4c0b3

由 Glauber Costa 提交于 12月 11, 2011

This patch introduces memory pressure controls for the tcp
protocol. It uses the generic socket memory pressure code
introduced in earlier patches, and fills in the
necessary data in cg_proto struct.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com>
CC: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1a4c0b3

socket: initial cgroup code. · e1aab161

由 Glauber Costa 提交于 12月 11, 2011

The goal of this work is to move the memory pressure tcp
controls to a cgroup, instead of just relying on global
conditions.

To avoid excessive overhead in the network fast paths,
the code that accounts allocated memory to a cgroup is
hidden inside a static_branch(). This branch is patched out
until the first non-root cgroup is created. So when nobody
is using cgroups, even if it is mounted, no significant performance
penalty should be seen.

This patch handles the generic part of the code, and has nothing
tcp-specific.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtsu.com>
CC: Kirill A. Shutemov <kirill@shutemov.name>
CC: David S. Miller <davem@davemloft.net>
CC: Eric W. Biederman <ebiederm@xmission.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e1aab161

Basic kernel memory functionality for the Memory Controller · e5671dfa

由 Glauber Costa 提交于 12月 11, 2011

This patch lays down the foundation for the kernel memory component
of the Memory Controller.

As of today, I am only laying down the following files:

 * memory.independent_kmem_limit
 * memory.kmem.limit_in_bytes (currently ignored)
 * memory.kmem.usage_in_bytes (always zero)
Signed-off-by: NGlauber Costa <glommer@parallels.com>
CC: Kirill A. Shutemov <kirill@shutemov.name>
CC: Paul Menage <paul@paulmenage.org>
CC: Greg Thelen <gthelen@google.com>
CC: Johannes Weiner <jweiner@redhat.com>
CC: Michal Hocko <mhocko@suse.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e5671dfa

03 11月, 2011 1 次提交

memcg: skip scanning active lists based on individual size · 9b272977

由 Johannes Weiner 提交于 11月 02, 2011

Reclaim decides to skip scanning an active list when the corresponding
inactive list is above a certain size in comparison to leave the assumed
working set alone while there are still enough reclaim candidates around.

The memcg implementation of comparing those lists instead reports whether
the whole memcg is low on the requested type of inactive pages,
considering all nodes and zones.

This can lead to an oversized active list not being scanned because of the
state of the other lists in the memcg, as well as an active list being
scanned while its corresponding inactive list has enough pages.

Not only is this wrong, it's also a scalability hazard, because the global
memory state over all nodes and zones has to be gathered for each memcg
and zone scanned.

Make these calculations purely based on the size of the two LRU lists
that are actually affected by the outcome of the decision.
Signed-off-by: NJohannes Weiner <jweiner@redhat.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
Reviewed-by: NYing Han <yinghan@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9b272977

15 9月, 2011 1 次提交

memcg: Revert "memcg: add memory.vmscan_stat" · 185efc0f

由 Johannes Weiner 提交于 9月 14, 2011

Revert the post-3.0 commit 82f9d486 ("memcg: add
memory.vmscan_stat").

The implementation of per-memcg reclaim statistics violates how memcg
hierarchies usually behave: hierarchically.

The reclaim statistics are accounted to child memcgs and the parent
hitting the limit, but not to hierarchy levels in between.  Usually,
hierarchical statistics are perfectly recursive, with each level
representing the sum of itself and all its children.

Since this exports statistics to userspace, this may lead to confusion
and problems with changing things after the release, so revert it now,
we can try again later.
Signed-off-by: NJohannes Weiner <jweiner@redhat.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Ying Han <yinghan@google.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

185efc0f

27 7月, 2011 1 次提交

memcg: add memory.vmscan_stat · 82f9d486

由 KAMEZAWA Hiroyuki 提交于 7月 26, 2011

The commit log of 0ae5e89c ("memcg: count the soft_limit reclaim
in...") says it adds scanning stats to memory.stat file.  But it doesn't
because we considered we needed to make a concensus for such new APIs.

This patch is a trial to add memory.scan_stat. This shows
  - the number of scanned pages(total, anon, file)
  - the number of rotated pages(total, anon, file)
  - the number of freed pages(total, anon, file)
  - the number of elaplsed time (including sleep/pause time)

  for both of direct/soft reclaim.

The biggest difference with oringinal Ying's one is that this file
can be reset by some write, as

  # echo 0 ...../memory.scan_stat

Example of output is here. This is a result after make -j 6 kernel
under 300M limit.

  [kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.scan_stat
  [kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.vmscan_stat
  scanned_pages_by_limit 9471864
  scanned_anon_pages_by_limit 6640629
  scanned_file_pages_by_limit 2831235
  rotated_pages_by_limit 4243974
  rotated_anon_pages_by_limit 3971968
  rotated_file_pages_by_limit 272006
  freed_pages_by_limit 2318492
  freed_anon_pages_by_limit 962052
  freed_file_pages_by_limit 1356440
  elapsed_ns_by_limit 351386416101
  scanned_pages_by_system 0
  scanned_anon_pages_by_system 0
  scanned_file_pages_by_system 0
  rotated_pages_by_system 0
  rotated_anon_pages_by_system 0
  rotated_file_pages_by_system 0
  freed_pages_by_system 0
  freed_anon_pages_by_system 0
  freed_file_pages_by_system 0
  elapsed_ns_by_system 0
  scanned_pages_by_limit_under_hierarchy 9471864
  scanned_anon_pages_by_limit_under_hierarchy 6640629
  scanned_file_pages_by_limit_under_hierarchy 2831235
  rotated_pages_by_limit_under_hierarchy 4243974
  rotated_anon_pages_by_limit_under_hierarchy 3971968
  rotated_file_pages_by_limit_under_hierarchy 272006
  freed_pages_by_limit_under_hierarchy 2318492
  freed_anon_pages_by_limit_under_hierarchy 962052
  freed_file_pages_by_limit_under_hierarchy 1356440
  elapsed_ns_by_limit_under_hierarchy 351386416101
  scanned_pages_by_system_under_hierarchy 0
  scanned_anon_pages_by_system_under_hierarchy 0
  scanned_file_pages_by_system_under_hierarchy 0
  rotated_pages_by_system_under_hierarchy 0
  rotated_anon_pages_by_system_under_hierarchy 0
  rotated_file_pages_by_system_under_hierarchy 0
  freed_pages_by_system_under_hierarchy 0
  freed_anon_pages_by_system_under_hierarchy 0
  freed_file_pages_by_system_under_hierarchy 0
  elapsed_ns_by_system_under_hierarchy 0

total_xxxx is for hierarchy management.

This will be useful for further memcg developments and need to be
developped before we do some complicated rework on LRU/softlimit
management.

This patch adds a new struct memcg_scanrecord into scan_control struct.
sc->nr_scanned at el is not designed for exporting information.  For
example, nr_scanned is reset frequentrly and incremented +2 at scanning
mapped pages.

To avoid complexity, I added a new param in scan_control which is for
exporting scanning score.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Ying Han <yinghan@google.com>
Cc: Andrew Bresticker <abrestic@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

82f9d486

16 6月, 2011 3 次提交

Documentation: fix cgroup typos and formatting · 67de0162

由 Jörg Sommer 提交于 6月 15, 2011

Fix format and spelling.
Signed-off-by: NJörg Sommer <joerg@alea.gnuu.de>
Acked-by: NPaul Menage <menage@google.com>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

67de0162

Documentation: update cgroupfs mount point · f6e07d38

由 Jörg Sommer 提交于 6月 15, 2011

According to commit 676db4af ("cgroupfs: create /sys/fs/cgroup to
mount cgroupfs on") the canonical mountpoint for the cgroup filesystem
is /sys/fs/cgroup.  Hence, this should be used in the documentation.
Signed-off-by: NJörg Sommer <joerg@alea.gnuu.de>
Acked-by: NPaul Menage <menage@google.com>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f6e07d38

memcg: add documentation for the memory.numastat API · 50c35e5b

由 Ying Han 提交于 6月 15, 2011

[akpm@linux-foundation.org: rework text, fit it into 80-cols]
Signed-off-by: NYing Han <yinghan@google.com>
Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NBalbir Singh <bsingharora@gmail.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

50c35e5b

29 4月, 2011 1 次提交

memcg: update documentation to describe usage_in_bytes · a111c966

由 Daisuke Nishimura 提交于 4月 27, 2011

Since 569b846d ("memcg: coalesce uncharge during unmap/truncate"), we do
batched (delayed) uncharge at truncation/unmap.  And since cdec2e42(memcg:
coalesce charging via percpu storage), we have percpu cache for
res_counter.

These changes improved performance of memory cgroup very much, but made
res_counter->usage usually have a bigger value than the actual value of
memory usage.  So, *.usage_in_bytes, which show res_counter->usage, are
not desirable for precise values of memory(and swap) usage anymore.

Instead of removing these files completely(because we cannot know
res_counter->usage without them), this patch updates the meaning of those
files.
Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a111c966

17 2月, 2011 1 次提交

memcg: clarify use_hierarchy documentation · 689bca3b

由 Greg Thelen 提交于 2月 16, 2011

The memcg code does not allow changing memory.use_hierarchy if the
parent cgroup has enabled use_hierarchy.  Update documentation to match
the code.
Signed-off-by: NGreg Thelen <gthelen@google.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

689bca3b

14 1月, 2011 2 次提交

revert documentaion update for memcg's dirty ratio. · 11ff26c8

由 KAMEZAWA Hiroyuki 提交于 1月 14, 2011

Subjct: Revert memory cgroup dirty_ratio Documentation.

The commit ece72400 adds documentation
for memcg's dirty ratio. But the function is not implemented yet.
Remove the documentation for avoiding confusing users.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: NGreg Thelen <gthelen@google.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

11ff26c8

memcg: document cgroup dirty memory interfaces · ece72400

由 Greg Thelen 提交于 1月 13, 2011

Document cgroup dirty memory interfaces and statistics.

[akpm@linux-foundation.org: fix use_hierarchy description]
Signed-off-by: NAndrea Righi <arighi@develer.com>
Signed-off-by: NGreg Thelen <gthelen@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ece72400

28 5月, 2010 4 次提交

memcg: update documentation · dc10e281

由 KAMEZAWA Hiroyuki 提交于 5月 26, 2010

Some information are old, and I think current document doesn't work as "a
guide for users".  We need summary of all of our controls, at least.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: NRandy Dunlap <randy.dunlap@oracle.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dc10e281

memcg: move charge of file pages · 87946a72

由 Daisuke Nishimura 提交于 5月 26, 2010

This patch adds support for moving charge of file pages, which include
normal file, tmpfs file and swaps of tmpfs file.  It's enabled by setting
bit 1 of <target cgroup>/memory.move_charge_at_immigrate.

Unlike the case of anonymous pages, file pages(and swaps) in the range
mmapped by the task will be moved even if the task hasn't done page fault,
i.e.  they might not be the task's "RSS", but other task's "RSS" that maps
the same file.  And mapcount of the page is ignored(the page can be moved
even if page_mapcount(page) > 1).  So, conditions that the page/swap
should be met to be moved is that it must be in the range mmapped by the
target task and it must be charged to the old cgroup.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix warning]
Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

87946a72

memcg: oom kill disable and oom status · 3c11ecf4

由 KAMEZAWA Hiroyuki 提交于 5月 26, 2010

This adds a feature to disable oom-killer for memcg, if disabled, of
course, tasks under memcg will stop.

But now, we have oom-notifier for memcg.  And the world around memcg is
not under out-of-memory.  memcg's out-of-memory just shows memcg hits
limit.  Then, administrator or management daemon can recover the situation
by

	- kill some process
	- enlarge limit, add more swap.
	- migrate some tasks
	- remove file cache on tmps (difficult ?)

Unlike oom-killer, you can take enough information before killing tasks.
(by gcore, or, ps etc.)

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3c11ecf4

memcg: oom notifier · 9490ff27

由 KAMEZAWA Hiroyuki 提交于 5月 26, 2010

Considering containers or other resource management softwares in userland,
event notification of OOM in memcg should be implemented.  Now, memcg has
"threshold" notifier which uses eventfd, we can make use of it for oom
notification.

This patch adds oom notification eventfd callback for memcg.  The usage is
very similar to threshold notifier, but control file is memory.oom_control
and no arguments other than eventfd is required.

	% cgroup_event_notifier /cgroup/A/memory.oom_control dummy
	(About cgroup_event_notifier, see Documentation/cgroup/)
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: David Rientjes <rientjes@google.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9490ff27

23 4月, 2010 1 次提交

Documentation/: it's -> its where appropriate · a33f3224

由 Francis Galiegue 提交于 4月 23, 2010

Fix obvious cases of "it's" being used when "its" was meant.
Signed-off-by: NFrancis Galiegue <fgaliegue@gmail.com>
Acked-by: NRandy Dunlap <rdunlap@xenotime.net>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

a33f3224

25 3月, 2010 1 次提交

memcg: fix typo in memcg documentation · 5ca9ea9a

由 Greg Thelen 提交于 3月 23, 2010

Update memory.txt to be more consistent: s/swapiness/swappiness/
Signed-off-by: NGreg Thelen <gthelen@google.com>
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5ca9ea9a

19 3月, 2010 1 次提交

memcg: fix typo in memcg documentation · ab5097b1

由 Greg Thelen 提交于 3月 18, 2010

Updated memory.txt to be more consistent: s/swapiness/swappiness/
Signed-off-by: NGreg Thelen <gthelen@google.com>
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

ab5097b1

13 3月, 2010 4 次提交

memcg: handle panic_on_oom=always case · daaf1e68

由 KAMEZAWA Hiroyuki 提交于 3月 10, 2010

Presently, if panic_on_oom=2, the whole system panics even if the oom
happend in some special situation (as cpuset, mempolicy....).  Then,
panic_on_oom=2 means painc_on_oom_always.

Now, memcg doesn't check panic_on_oom flag. This patch adds a check.

BTW, how it's useful ?

kdump+panic_on_oom=2 is the last tool to investigate what happens in
oom-ed system.  When a task is killed, the sysytem recovers and there will
be few hint to know what happnes.  In mission critical system, oom should
never happen.  Then, panic_on_oom=2+kdump is useful to avoid next OOM by
knowing precise information via snapshot.

TODO:
 - For memcg, it's for isolate system's memory usage, oom-notiifer and
   freeze_at_oom (or rest_at_oom) should be implemented. Then, management
   daemon can do similar jobs (as kdump) or taking snapshot per cgroup.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Reviewed-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

daaf1e68

memcg: implement memory thresholds · 2e72b634

由 Kirill A. Shutemov 提交于 3月 10, 2010

It allows to register multiple memory and memsw thresholds and gets
notifications when it crosses.

To register a threshold application need:
- create an eventfd;
- open memory.usage_in_bytes or memory.memsw.usage_in_bytes;
- write string like "<event_fd> <memory.usage_in_bytes> <threshold>" to
  cgroup.event_control.

Application will be notified through eventfd when memory usage crosses
threshold in any direction.

It's applicable for root and non-root cgroup.

It uses stats to track memory usage, simmilar to soft limits. It checks
if we need to send event to userspace on every 100 page in/out. I guess
it's good compromise between performance and accuracy of thresholds.

[akpm@linux-foundation.org: coding-style fixes]
[nishimura@mxp.nes.nec.co.jp: fix documentation merge issue]
Signed-off-by: NKirill A. Shutemov <kirill@shutemov.name>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Dan Malek <dan@embeddedalley.com>
Cc: Vladislav Buzov <vbuzov@embeddedalley.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Alexander Shishkin <virtuoso@slind.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2e72b634

memcg: move charges of anonymous swap · 02491447

由 Daisuke Nishimura 提交于 3月 10, 2010

This patch is another core part of this move-charge-at-task-migration
feature.  It enables moving charges of anonymous swaps.

To move the charge of swap, we need to exchange swap_cgroup's record.

In current implementation, swap_cgroup's record is protected by:

  - page lock: if the entry is on swap cache.
  - swap_lock: if the entry is not on swap cache.

This works well in usual swap-in/out activity.

But this behavior make the feature of moving swap charge check many
conditions to exchange swap_cgroup's record safely.

So I changed modification of swap_cgroup's recored(swap_cgroup_record())
to use xchg, and define a new function to cmpxchg swap_cgroup's record.

This patch also enables moving charge of non pte_present but not uncharged
swap caches, which can be exist on swap-out path, by getting the target
pages via find_get_page() as do_mincore() does.

[kosaki.motohiro@jp.fujitsu.com: fix ia64 build]
[akpm@linux-foundation.org: fix typos]
Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

02491447

memcg: add interface to move charge at task migration · 7dc74be0

由 Daisuke Nishimura 提交于 3月 10, 2010

In current memcg, charges associated with a task aren't moved to the new
cgroup at task migration.  Some users feel this behavior to be strange.
These patches are for this feature, that is, for charging to the new
cgroup and, of course, uncharging from the old cgroup at task migration.

This patch adds "memory.move_charge_at_immigrate" file, which is a flag
file to determine whether charges should be moved to the new cgroup at
task migration or not and what type of charges should be moved.  This
patch also adds read and write handlers of the file.

This patch also adds no-op handlers for this feature.  These handlers will
be implemented in later patches.  And you cannot write any values other
than 0 to move_charge_at_immigrate yet.
Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7dc74be0

24 9月, 2009 2 次提交

memory controller: soft limit documentation · a6df6361

由 Balbir Singh 提交于 9月 23, 2009

Soft limits is a new feature for the memory resource controller, something
similar has existed in the group scheduler in the form of shares.  The CPU
controllers interpretation of shares is very different though.

Soft limits are the most useful feature to have for environments where the
administrator wants to overcommit the system, such that only on memory
contention do the limits become active.  The current soft limits
implementation provides a soft_limit_in_bytes interface for the memory
controller and not for memory+swap controller.  The implementation
maintains an RB-Tree of groups that exceed their soft limit and starts
reclaiming from the group that exceeds this limit by the maximum amount.

This patch:

Add documentation for soft limits
Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a6df6361

memcg: remove the overhead associated with the root cgroup · 4b3bde4c

由 Balbir Singh 提交于 9月 23, 2009

Change the memory cgroup to remove the overhead associated with accounting
all pages in the root cgroup.  As a side-effect, we can no longer set a
memory hard limit in the root cgroup.

A new flag to track whether the page has been accounted or not has been
added as well.  Flags are now set atomically for page_cgroup,
pcg_default_flags is now obsolete and removed.

[akpm@linux-foundation.org: fix a few documentation glitches]
Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4b3bde4c

19 6月, 2009 2 次提交

memcg: add interface to reset limits · c5b947b2

由 Daisuke Nishimura 提交于 6月 17, 2009

We don't have an interface to reset mem.limit or memsw.limit now.

This patch allows to reset mem.limit or memsw.limit when they are being
set to -1.
Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c5b947b2

memcg: fix behavior under memory.limit equals to memsw.limit · 22a668d7

由 KAMEZAWA Hiroyuki 提交于 6月 17, 2009

A user can set memcg.limit_in_bytes == memcg.memsw.limit_in_bytes when the
user just want to limit the total size of applications, in other words,
not very interested in memory usage itself.  In this case, swap-out will
be done only by global-LRU.

But, under current implementation, memory.limit_in_bytes is checked at
first and try_to_free_page() may do swap-out.  But, that swap-out is
useless for memsw.limit_in_bytes and the thread may hit limit again.

This patch tries to fix the current behavior at memory.limit ==
memsw.limit case.  And documentation is updated to explain the behavior of
this special case.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

22a668d7

14 4月, 2009 1 次提交

memcg: fix documentation · c863d835

由 Bharata B Rao 提交于 4月 13, 2009

The description about various statistics from memory.stat is not accurate
and confusing at times.

Correct this along with a few other minor cleanups.
Signed-off-by: NBharata B Rao <bharata@linux.vnet.ibm.com>
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c863d835

30 3月, 2009 1 次提交

trivial: cgroups: documentation typo and spelling corrections · caa790ba

由 Chris Samuel 提交于 1月 17, 2009

Minor typo and spelling corrections fixed whilst reading
to learn about cgroups capabilities.
Signed-off-by: NChris Samuel <chris@csamuel.org>
Acked-by: NPaul Menage <menage@google.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

caa790ba

16 1月, 2009 1 次提交

cgroups: consolidate cgroup documents · 45ce80fb

由 Li Zefan 提交于 1月 15, 2009

Move Documentation/cpusets.txt and Documentation/controllers/* to
Documentation/cgroups/
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: NPaul Menage <menage@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

45ce80fb

09 1月, 2009 3 次提交

memcg: swappiness · a7885eb8

由 KOSAKI Motohiro 提交于 1月 07, 2009

Currently, /proc/sys/vm/swappiness can change swappiness ratio for global
reclaim.  However, memcg reclaim doesn't have tuning parameter for itself.

In general, the optimal swappiness depend on workload.  (e.g.  hpc
workload need to low swappiness than the others.)

Then, per cgroup swappiness improve administrator tunability.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a7885eb8

memcg: show reclaim stat · 7f016ee8

由 KOSAKI Motohiro 提交于 1月 07, 2009

Add the following four fields to memory.stat file:

  - inactive_ratio
  - recent_rotated_anon
  - recent_rotated_file
  - recent_scanned_anon
  - recent_scanned_file
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7f016ee8

memcg: memory cgroup hierarchy documentation · 52bc0d82

由 Balbir Singh 提交于 1月 07, 2009

Documentation updates for hierarchy support
Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

52bc0d82