提交 · d7911ef30cb7bec52234c2b7a5c275ac8f07905a · OpenHarmony / kernel_linux

27 5月, 2011 3 次提交

由 Daniel Lezcano 提交于 5月 26, 2011

The ns_cgroup is an annoying cgroup at the namespace / cgroup frontier and
leads to some problems:

  * cgroup creation is out-of-control
  * cgroup name can conflict when pids are looping
  * it is not possible to have a single process handling a lot of
    namespaces without falling in a exponential creation time
  * we may want to create a namespace without creating a cgroup

  The ns_cgroup was replaced by a compatibility flag 'clone_children',
  where a newly created cgroup will copy the parent cgroup values.
  The userspace has to manually create a cgroup and add a task to
  the 'tasks' file.

This patch removes the ns_cgroup as suggested in the following thread:

https://lists.linux-foundation.org/pipermail/containers/2009-June/018616.html

The 'cgroup_clone' function is removed because it is no longer used.

This is a userspace-visible change.  Commit 45531757 ("cgroup: notify
ns_cgroup deprecated") (merged into 2.6.27) caused the kernel to emit a
printk warning users that the feature is planned for removal.  Since that
time we have heard from XXX users who were affected by this.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: NSerge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jamal Hadi Salim <hadi@cyberus.ca>
Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
Acked-by: NPaul Menage <menage@google.com>
Acked-by: NMatt Helsley <matthltc@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a77aea92

cgroups: make procs file writable · 74a1166d

由 Ben Blum 提交于 5月 26, 2011

Make procs file writable to move all threads by tgid at once.

Add functionality that enables users to move all threads in a threadgroup
at once to a cgroup by writing the tgid to the 'cgroup.procs' file.  This
current implementation makes use of a per-threadgroup rwsem that's taken
for reading in the fork() path to prevent newly forking threads within the
threadgroup from "escaping" while the move is in progress.
Signed-off-by: NBen Blum <bblum@andrew.cmu.edu>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Reviewed-by: NPaul Menage <menage@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

74a1166d

cgroups: add per-thread subsystem callbacks · f780bdb7

由 Ben Blum 提交于 5月 26, 2011

Add cgroup subsystem callbacks for per-thread attachment in atomic contexts

Add can_attach_task(), pre_attach(), and attach_task() as new callbacks
for cgroups's subsystem interface.  Unlike can_attach and attach, these
are for per-thread operations, to be called potentially many times when
attaching an entire threadgroup.

Also, the old "bool threadgroup" interface is removed, as replaced by
this.  All subsystems are modified for the new interface - of note is
cpuset, which requires from/to nodemasks for attach to be globally scoped
(though per-cpuset would work too) to persist from its pre_attach to
attach_task and attach.

This is a pre-patch for cgroup-procs-writable.patch.
Signed-off-by: NBen Blum <bblum@andrew.cmu.edu>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Reviewed-by: NPaul Menage <menage@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f780bdb7

29 4月, 2011 1 次提交

memcg: update documentation to describe usage_in_bytes · a111c966

由 Daisuke Nishimura 提交于 4月 27, 2011

Since 569b846d ("memcg: coalesce uncharge during unmap/truncate"), we do
batched (delayed) uncharge at truncation/unmap.  And since cdec2e42(memcg:
coalesce charging via percpu storage), we have percpu cache for
res_counter.

These changes improved performance of memory cgroup very much, but made
res_counter->usage usually have a bigger value than the actual value of
memory usage.  So, *.usage_in_bytes, which show res_counter->usage, are
not desirable for precise values of memory(and swap) usage anymore.

Instead of removing these files completely(because we cannot know
res_counter->usage without them), this patch updates the meaning of those
files.
Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a111c966

05 4月, 2011 1 次提交

Documentation: update cgroups info user groups names · 6ad85239

由 Geunsik Lim 提交于 4月 04, 2011

Update suitable words to explain / understand cgroups contents.
Signed-off-by: NGeunsik Lim <geunsik.lim@samsung.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6ad85239

17 3月, 2011 1 次提交

Documentation: update cgroup pid and cpuset information · bb6405ea

由 Eric B Munson 提交于 3月 15, 2011

The cgroup documentation does not specify how a process can be removed
from a particular group.  This patch adds a note at the end of the
simple example about how this is done.  Also, some cgroups (like
cpusets) require user input before a new group can be used.  This is
noted in the patch as well.
Signed-off-by: NEric B Munson <emunson@mgebm.net>
Acked-by: NPaul Menage <menage@google.com>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bb6405ea

09 3月, 2011 2 次提交

Update cpuset info & webiste for cgroups · 8671139b

由 GeunSik Lim 提交于 3月 03, 2011

cpuset related websited is changed.
and, update list of cpuset using cgroup(controller group).
Signed-off-by: NGeunsik Lim <geunsik.lim@samsung.com>
Acked-by: NPaul Menage <menage@google.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

8671139b

blk-cgroup: Lower minimum weight from 100 to 10. · df457f84

由 Justin TerAvest 提交于 3月 08, 2011

We've found that we still get good, useful isolation at weights this
low. I'd like to adjust the minimum so that any other changes can take
these values into account.
Signed-off-by: NJustin TerAvest <teravest@google.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

df457f84

02 3月, 2011 1 次提交

cfq-iosched: Always provide group isolation. · 0bbfeb83

由 Justin TerAvest 提交于 3月 01, 2011

Effectively, make group_isolation=1 the default and remove the tunable.
The setting group_isolation=0 was because by default we idle on
sync-noidle tree and on fast devices, this can be very harmful for
throughput.

However, this problem can also be addressed by tuning slice_idle and
possibly group_idle on faster storage devices.

This change simplifies the CFQ code by removing the feature entirely.
Signed-off-by: NJustin TerAvest <teravest@google.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

0bbfeb83

17 2月, 2011 1 次提交

memcg: clarify use_hierarchy documentation · 689bca3b

由 Greg Thelen 提交于 2月 16, 2011

The memcg code does not allow changing memory.use_hierarchy if the
parent cgroup has enabled use_hierarchy.  Update documentation to match
the code.
Signed-off-by: NGreg Thelen <gthelen@google.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

689bca3b

14 1月, 2011 3 次提交

revert documentaion update for memcg's dirty ratio. · 11ff26c8

由 KAMEZAWA Hiroyuki 提交于 1月 14, 2011

Subjct: Revert memory cgroup dirty_ratio Documentation.

The commit ece72400 adds documentation
for memcg's dirty ratio. But the function is not implemented yet.
Remove the documentation for avoiding confusing users.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: NGreg Thelen <gthelen@google.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

11ff26c8

memcg: document cgroup dirty memory interfaces · ece72400

由 Greg Thelen 提交于 1月 13, 2011

Document cgroup dirty memory interfaces and statistics.

[akpm@linux-foundation.org: fix use_hierarchy description]
Signed-off-by: NAndrea Righi <arighi@develer.com>
Signed-off-by: NGreg Thelen <gthelen@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ece72400

cgroups: remove deprecated subsystem from examples. · 1bdcd78e

由 Trevor Woerner 提交于 1月 12, 2011

The 'ns' cgroup is considered deprecated. Change the cgroup subsystem
used in the examples of the cgroup documentation from 'ns' to 'blkio'.
Signed-off-by: NTrevor Woerner <twoerner@gmail.com>
Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
Acked-by: NPaul Menage <menage@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1bdcd78e

16 11月, 2010 1 次提交

blk-cgroup: Allow creation of hierarchical cgroups · bdc85df7

由 Vivek Goyal 提交于 11月 15, 2010

o Allow hierarchical cgroup creation for blkio controller

o Currently we disallow it as both the io controller policies (throttling
  as well as proportion bandwidth) do not support hierarhical accounting
  and control. But the flip side is that blkio controller can not be used with
  libvirt as libvirt creates a cgroup hierarchy deeper than 1 level.

  <top-level-cgroup-dir>/<controller>/libvirt/qemu/<virtual-machine-groups>

o So this patch will allow creation of cgroup hierarhcy but at the backend
  everything will be treated as flat. So if somebody created a an hierarchy
  like as follows.

			root
			/  \
		     test1 test2
			|
		     test3

  CFQ and throttling will practically treat all groups at same level.

				pivot
			     /  |   \  \
			root  test1 test2  test3

o Once we have actual support for hierarchical accounting and control
  then we can introduce another cgroup tunable file "blkio.use_hierarchy"
  which will be 0 by default but if user wants to enforce hierarhical
  control then it can be set to 1. This way there should not be any
  ABI problems down the line.

o The only not so pretty part is introduction of extra file "use_hierarchy"
  down the line. Kame-san had mentioned that hierarhical accounting is
  expensive in memory controller hence they keep it off by default. I
  suspect same will be the case for IO controller also as for each IO
  completion we shall have to account IO through hierarchy up to the root.
  if yes, then it probably is not a very bad idea to introduce this extra
  file so that it will be used only when somebody needs it and some people
  might enable hierarchy only in part of the hierarchy.

o This is how basically memory controller also uses "use_hierarhcy" and
  they also allowed creation of hierarchies when actual backend support
  was not available.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Reviewed-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
Reviewed-by: NCiju Rajan K <ciju@linux.vnet.ibm.com>
Tested-by: NCiju Rajan K <ciju@linux.vnet.ibm.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

bdc85df7

02 11月, 2010 1 次提交

tree-wide: fix comment/printk typos · b595076a

由 Uwe Kleine-König 提交于 11月 01, 2010

"gadget", "through", "command", "maintain", "maintain", "controller", "address",
"between", "initiali[zs]e", "instead", "function", "select", "already",
"equal", "access", "management", "hierarchy", "registration", "interest",
"relative", "memory", "offset", "already",
Signed-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

b595076a

28 10月, 2010 1 次提交

cgroup: add clone_children control file · 97978e6d

由 Daniel Lezcano 提交于 10月 27, 2010

The ns_cgroup is a control group interacting with the namespaces.  When a
new namespace is created, a corresponding cgroup is automatically created
too.  The cgroup name is the pid of the process who did 'unshare' or the
child of 'clone'.

This cgroup is tied with the namespace because it prevents a process to
escape the control group and use the post_clone callback, so the child
cgroup inherits the values of the parent cgroup.

Unfortunately, the more we use this cgroup and the more we are facing
problems with it:

(1) when a process unshares, the cgroup name may conflict with a
    previous cgroup with the same pid, so unshare or clone return -EEXIST

(2) the cgroup creation is out of control because there may have an
    application creating several namespaces where the system will
    automatically create several cgroups in his back and let them on the
    cgroupfs (eg.  a vrf based on the network namespace).

(3) the mix of (1) and (2) force an administrator to regularly check
    and clean these cgroups.

This patchset removes the ns_cgroup by adding a new flag to the cgroup and
the cgroupfs mount option.  It enables the copy of the parent cgroup when
a child cgroup is created.  We can then safely remove the ns_cgroup as
this flag brings a compatibility.  We have now to manually create and add
the task to a cgroup, which is consistent with the cgroup framework.

This patch:

Sent as an answer to a previous thread around the ns_cgroup.

https://lists.linux-foundation.org/pipermail/containers/2009-June/018627.html

It adds a control file 'clone_children' for a cgroup.  This control file
is a boolean specifying if the child cgroup should be a clone of the
parent cgroup or not.  The default value is 'false'.

This flag makes the child cgroup to call the post_clone callback of all
the subsystem, if it is available.

At present, the cpuset is the only one which had implemented the
post_clone callback.

The option can be set at mount time by specifying the 'clone_children'
mount option.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: NSerge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: NPaul Menage <menage@google.com>
Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
Cc: Jamal Hadi Salim <hadi@cyberus.ca>
Cc: Matt Helsley <matthltc@us.ibm.com>
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

97978e6d

16 9月, 2010 1 次提交

blkio: Documentation Update · 2786c4e5

由 Vivek Goyal 提交于 9月 15, 2010

o Documentation update
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

2786c4e5

23 8月, 2010 1 次提交

cfq-iosched: Documentation help for new tunables · 6d6ac1c1

由 Vivek Goyal 提交于 8月 23, 2010

Some documentation to provide help with tunables.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Acked-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

6d6ac1c1

04 8月, 2010 1 次提交

Documentation: update broken web addresses. · 0ea6e611

由 Justin P. Mattock 提交于 7月 23, 2010

Below you will find an updated version from the original series bunching all patches into one big patch
updating broken web addresses that are located in Documentation/*
Some of the addresses date as far far back as 1995 etc... so searching became a bit difficult,
the best way to deal with these is to use web.archive.org to locate these addresses that are outdated.
Now there are also some addresses pointing to .spec files some are located, but some(after searching
on the companies site)where still no where to be found. In this case I just changed the address
to the company site this way the users can contact the company and they can locate them for the users.
Signed-off-by: NJustin P. Mattock <justinmattock@gmail.com>
Signed-off-by: NThomas Weber <weber@corscience.de>
Signed-off-by: NMike Frysinger <vapier.adi@gmail.com>
Cc: Paulo Marques <pmarques@grupopie.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Michael Neuling <mikey@neuling.org>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

0ea6e611

28 5月, 2010 5 次提交

memcg: update documentation · dc10e281

由 KAMEZAWA Hiroyuki 提交于 5月 26, 2010

Some information are old, and I think current document doesn't work as "a
guide for users".  We need summary of all of our controls, at least.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: NRandy Dunlap <randy.dunlap@oracle.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dc10e281

memcg: move charge of file pages · 87946a72

由 Daisuke Nishimura 提交于 5月 26, 2010

This patch adds support for moving charge of file pages, which include
normal file, tmpfs file and swaps of tmpfs file.  It's enabled by setting
bit 1 of <target cgroup>/memory.move_charge_at_immigrate.

Unlike the case of anonymous pages, file pages(and swaps) in the range
mmapped by the task will be moved even if the task hasn't done page fault,
i.e.  they might not be the task's "RSS", but other task's "RSS" that maps
the same file.  And mapcount of the page is ignored(the page can be moved
even if page_mapcount(page) > 1).  So, conditions that the page/swap
should be met to be moved is that it must be in the range mmapped by the
target task and it must be charged to the old cgroup.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix warning]
Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

87946a72

memcg: oom kill disable and oom status · 3c11ecf4

由 KAMEZAWA Hiroyuki 提交于 5月 26, 2010

This adds a feature to disable oom-killer for memcg, if disabled, of
course, tasks under memcg will stop.

But now, we have oom-notifier for memcg.  And the world around memcg is
not under out-of-memory.  memcg's out-of-memory just shows memcg hits
limit.  Then, administrator or management daemon can recover the situation
by

	- kill some process
	- enlarge limit, add more swap.
	- migrate some tasks
	- remove file cache on tmps (difficult ?)

Unlike oom-killer, you can take enough information before killing tasks.
(by gcore, or, ps etc.)

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3c11ecf4

memcg: oom notifier · 9490ff27

由 KAMEZAWA Hiroyuki 提交于 5月 26, 2010

Considering containers or other resource management softwares in userland,
event notification of OOM in memcg should be implemented.  Now, memcg has
"threshold" notifier which uses eventfd, we can make use of it for oom
notification.

This patch adds oom notification eventfd callback for memcg.  The usage is
very similar to threshold notifier, but control file is memory.oom_control
and no arguments other than eventfd is required.

	% cgroup_event_notifier /cgroup/A/memory.oom_control dummy
	(About cgroup_event_notifier, see Documentation/cgroup/)
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: David Rientjes <rientjes@google.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9490ff27

Documentation/cgroups/cgroups.txt: fix reference to "numtasks" · 595f4b69

由 Trevor Woerner 提交于 5月 26, 2010

Signed-off-by: NTrevor Woerner <twoerner@gmail.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

595f4b69

27 4月, 2010 1 次提交

blk-cgroup: config options re-arrangement · afc24d49

由 Vivek Goyal 提交于 4月 26, 2010

This patch fixes few usability and configurability issues.

o All the cgroup based controller options are configurable from
  "Genral Setup/Control Group Support/" menu. blkio is the only exception.
  Hence make this option visible in above menu and make it configurable from
  there to bring it inline with rest of the cgroup based controllers.

o Get rid of CONFIG_DEBUG_CFQ_IOSCHED.

  This option currently does two things.

  - Enable printing of cgroup paths in blktrace
  - Enables CONFIG_DEBUG_BLK_CGROUP, which in turn displays additional stat
    files in cgroup.

  If we are using group scheduling, blktrace data is of not really much use
  if cgroup information is not present. To get this data, currently one has to
  also enable CONFIG_DEBUG_CFQ_IOSCHED, which in turn brings the overhead of
  all the additional debug stat files which is not desired.

  Hence, this patch moves printing of cgroup paths under
  CONFIG_CFQ_GROUP_IOSCHED.

  This allows us to get rid of CONFIG_DEBUG_CFQ_IOSCHED completely. Now all
  the debug stat files are controlled only by CONFIG_DEBUG_BLK_CGROUP which
  can be enabled through config menu.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Acked-by: NDivyesh Shah <dpshah@google.com>
Reviewed-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

afc24d49

25 4月, 2010 1 次提交

cgroups: fix procs documentation · 7716fa66

由 KAMEZAWA Hiroyuki 提交于 4月 23, 2010

Writing to cgroup.procs is not supported now.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
Cc: Ben Blum <bblum@andrew.cmu.edu>
Cc: Paul Menage <menage@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7716fa66

23 4月, 2010 1 次提交

Documentation/: it's -> its where appropriate · a33f3224

由 Francis Galiegue 提交于 4月 23, 2010

Fix obvious cases of "it's" being used when "its" was meant.
Signed-off-by: NFrancis Galiegue <fgaliegue@gmail.com>
Acked-by: NRandy Dunlap <rdunlap@xenotime.net>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

a33f3224

13 4月, 2010 1 次提交

io-controller: Document for blkio.weight_device · da69da18

由 Gui Jianfeng 提交于 4月 13, 2010

Here is the document for blkio.weight_device
Signed-off-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

da69da18

09 4月, 2010 4 次提交

blkio: Add more debug-only per-cgroup stats · 812df48d

由 Divyesh Shah 提交于 4月 08, 2010

1) group_wait_time - This is the amount of time the cgroup had to wait to get a
timeslice for one of its queues from when it became busy, i.e., went from 0
to 1 request queued. This is different from the io_wait_time which is the
cumulative total of the amount of time spent by each IO in that cgroup waiting
in the scheduler queue. This stat is a great way to find out any jobs in the
fleet that are being starved or waiting for longer than what is expected (due
to an IO controller bug or any other issue).
2) empty_time - This is the amount of time a cgroup spends w/o any pending
requests. This stat is useful when a job does not seem to be able to use its
assigned disk share by helping check if that is happening due to an IO
controller bug or because the job is not submitting enough IOs.
3) idle_time - This is the amount of time spent by the IO scheduler idling
for a given cgroup in anticipation of a better request than the exising ones
from other queues/cgroups.

All these stats are recorded using start and stop events. When reading these
stats, we do not add the delta between the current time and the last start time
if we're between the start and stop events. We avoid doing this to make sure
that these numbers are always monotonically increasing when read. Since we're
using sched_clock() which may use the tsc as its source, it may induce some
inconsistency (due to tsc resync across cpus) if we included the current delta.

Signed-off-by: Divyesh Shah<dpshah@google.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

812df48d

blkio: Add io_queued and avg_queue_size stats · cdc1184c

由 Divyesh Shah 提交于 4月 08, 2010

These stats are useful for getting a feel for the queue depth of the cgroup,
i.e., how filled up its queues are at a given instant and over the existence of
the cgroup. This ability is useful when debugging problems in the wild as it
helps understand the application's IO pattern w/o having to read through the
userspace code (coz its tedious or just not available) or w/o the ability
to run blktrace (since you may not have root access and/or not want to disturb
performance).

Signed-off-by: Divyesh Shah<dpshah@google.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

cdc1184c

blkio: Add io_merged stat · 812d4026

由 Divyesh Shah 提交于 4月 08, 2010

This includes both the number of bios merged into requests belonging to this
cgroup as well as the number of requests merged together.
In the past, we've observed different merging behavior across upstream kernels,
some by design some actual bugs. This stat helps a lot in debugging such
problems when applications report decreased throughput with a new kernel
version.

This needed adding an extra elevator function to capture bios being merged as I
did not want to pollute elevator code with blkiocg knowledge and hence needed
the accounting invocation to come from CFQ.

Signed-off-by: Divyesh Shah<dpshah@google.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

812d4026

blkio: Changes to IO controller additional stats patches · 84c124da

由 Divyesh Shah 提交于 4月 09, 2010

that include some minor fixes and addresses all comments.

Changelog: (most based on Vivek Goyal's comments)
o renamed blkiocg_reset_write to blkiocg_reset_stats
o more clarification in the documentation on io_service_time and io_wait_time
o Initialize blkg->stats_lock
o rename io_add_stat to blkio_add_stat and declare it static
o use bool for direction and sync
o derive direction and sync info from existing rq methods
o use 12 for major:minor string length
o define io_service_time better to cover the NCQ case
o add a separate reset_stats interface
o make the indexed stats a 2d array to simplify macro and function pointer code
o blkio.time now exports in jiffies as before
o Added stats description in patch description and
  Documentation/cgroup/blkio-controller.txt
o Prefix all stats functions with blkio and make them static as applicable
o replace IO_TYPE_MAX with IO_TYPE_TOTAL
o Moved #define constant to top of blk-cgroup.c
o Pass dev_t around instead of char *
o Add note to documentation file about resetting stats
o use BLK_CGROUP_MODULE in addition to BLK_CGROUP config option in #ifdef
  statements
o Avoid struct request specific knowledge in blk-cgroup. blk-cgroup.h now has
  rq_direction() and rq_sync() functions which are used by CFQ and when using
  io-controller at a higher level, bio_* functions can be added.

Signed-off-by: Divyesh Shah<dpshah@google.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

84c124da

25 3月, 2010 2 次提交

cpuset: Fix documentation punctuation · 5239c4ff

由 Greg Thelen 提交于 3月 24, 2010

Fix cpusets.txt documentation punctuation.
Signed-off-by: NGreg Thelen <gthelen@google.com>
Acked-by: NRandy Dunlap <rdunlap@xenotime.net>
Acked-by: NPaul Menage <menage@google.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

5239c4ff

memcg: fix typo in memcg documentation · 5ca9ea9a

由 Greg Thelen 提交于 3月 23, 2010

Update memory.txt to be more consistent: s/swapiness/swappiness/
Signed-off-by: NGreg Thelen <gthelen@google.com>
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5ca9ea9a

19 3月, 2010 1 次提交

memcg: fix typo in memcg documentation · ab5097b1

由 Greg Thelen 提交于 3月 18, 2010

Updated memory.txt to be more consistent: s/swapiness/swappiness/
Signed-off-by: NGreg Thelen <gthelen@google.com>
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

ab5097b1

16 3月, 2010 1 次提交

Fix typos in comments · 88393161

由 Thomas Weber 提交于 3月 16, 2010

[Ss]ytem => [Ss]ystem
udpate => update
paramters => parameters
orginal => original
Signed-off-by: NThomas Weber <swirl@gmx.li>
Acked-by: NRandy Dunlap <rdunlap@xenotime.net>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

88393161

13 3月, 2010 4 次提交

memcg: fix typos in memcg_test.txt · 0263c12c

由 Kirill A. Shutemov 提交于 3月 10, 2010

Signed-off-by: NKirill A. Shutemov <kirill@shutemov.name>
Reviewed-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0263c12c

memcg: update memcg_test.txt to describe memory thresholds · 1e111452

由 Kirill A. Shutemov 提交于 3月 10, 2010

Decription of sanity check for memory thresholds.
Signed-off-by: NKirill A.  Shutemov <kirill@shutemov.name>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Dan Malek <dan@embeddedalley.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1e111452

cgroups: add simple listener of cgroup events to documentation · 1d8fd973

由 Kirill A. Shutemov 提交于 3月 10, 2010

An example of cgroup notification API usage.
Signed-off-by: NKirill A.  Shutemov <kirill@shutemov.name>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Dan Malek <dan@embeddedalley.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1d8fd973

memcg: handle panic_on_oom=always case · daaf1e68

由 KAMEZAWA Hiroyuki 提交于 3月 10, 2010

Presently, if panic_on_oom=2, the whole system panics even if the oom
happend in some special situation (as cpuset, mempolicy....).  Then,
panic_on_oom=2 means painc_on_oom_always.

Now, memcg doesn't check panic_on_oom flag. This patch adds a check.

BTW, how it's useful ?

kdump+panic_on_oom=2 is the last tool to investigate what happens in
oom-ed system.  When a task is killed, the sysytem recovers and there will
be few hint to know what happnes.  In mission critical system, oom should
never happen.  Then, panic_on_oom=2+kdump is useful to avoid next OOM by
knowing precise information via snapshot.

TODO:
 - For memcg, it's for isolate system's memory usage, oom-notiifer and
   freeze_at_oom (or rest_at_oom) should be implemented. Then, management
   daemon can do similar jobs (as kdump) or taking snapshot per cgroup.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Reviewed-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

daaf1e68

OpenHarmony / kernel_linux 上一次同步 大约 4 年

OpenHarmony / kernel_linux
上一次同步大约 4 年